Another Me Planet—An Alter Ego That Shows Potential Future Self

Keigo Fushio, Atsushi Ohtsuka, Narichika Nomoto,
Shin-ichiro Eitoku, Takashi Nakamura, and Yuhei Oga

Abstract

At Expo 2025 Osaka, Kansai, Japan, a Virtual Expo was held where visitors could experience the event 24 hours a day. The Virtual NTT Pavilion provided content linked to a real venue in Yumeshima, Osaka. The pavilion also provided Another Me Planet, which enables people to communicate with themselves in future occupations and realize their potential so that they can experience the world view of Another Me™, a theme of the real venue. This article introduces Another Me Planet and the technology to achieve it.

Keywords: Another Me, speech synthesis, large language model

1. Virtual NTT Pavilion linking real and virtual

In addition to a real venue in Yumeshima, Osaka, NTT also exhibited a virtual pavilion at the Virtual Expo of Expo 2025 Osaka, Kansai, Japan [1], which consisted of three different venues (Rooms 1, 2, and 3) in conjunction with the real venue (Fig. 1).

Fig. 1. Configuration of Virtual NTT Pavilion.

The theme of Room 3 was Another Me™, one of the Grand Challenges [2] of the Digital Twin Computing Initiative proposed by NTT in 2020. Another Me is a digital twin. With Another Me of oneself that can act autonomously, we aim to expand various life opportunities.

Another Me in the Virtual NTT Pavilion used the virtual space’s ability to provide content tailored to each user with a high degree of freedom and provided Another Me Planet as a unique application that makes visitors feel closer to the world view of Another Me. The purpose of this application is to expand the possibilities of one’s future by using oneself as a use case and by communicating with Another Me, who has a job in the future that one does not currently have. This article introduces various initiatives for the actualization of Another Me and Another Me Planet.

2. Technologies that make up Another Me

In Another Me, what elements are needed to be able to express oneself? NTT laboratories have been conducting research and development by dividing the elements of “self” into two major layers: deep and surface (Fig. 2). The deep layer is innate or acquired through various experiences and serves as a standard of behavior. We are investigating the structuring of memories on the basis of various experiences, extraction of the axis of values, and extraction of sensibility using brain waves. The surface layer is a way of expressing behavior. In addition to appearance, NTT laboratories are investigating the generation of voice, speech content, and body movement.

Fig. 2. Composition of Another Me.

By combining the technologies that make up Another Me, we are demonstrating how to use it in various fields. As a new expression by this alter ego, we have been demonstrating a new expression on the stage by Another Me. We have also been demonstrating support for building relationships in virtual space as support for building human relationships with Another Me.

3. Another Me Planet—an application where one realizes their potential by interacting with their future alter ego

Another Me Planet is an application of expanding one’s own potential by visualizing various possible selves through Another Me. Another Me Planet was exhibited at the Virtual NTT Pavilion so that users could experience the concept of “Through Another Me, people achieve their potential and expand various life opportunities” (Fig. 3).

Fig. 3. Another Me Planet.

To use the “working self” in one’s life as a use case and to determine one’s potential, Another Me with a job that the user does not currently have but could have in the future will be created. Through interaction with Another Me, people will be able to visualize and realize their potential, which is difficult to do themselves.

Figure 4 shows the experience flow of Another Me Planet. The user first takes a photo of their face to generate the appearance of Another Me. To generate the voice of Another Me to sound like the voice of the user, the user is then asked to read a very short sentence that forms the basis of the generated voice. Finally, to give characteristics to the utterances of Another Me, the user answers a questionnaire. On the basis of the above, Another Me with a future job is generated. The user can also ask a variety of text-based questions to this Another Me.

Fig. 4. Experience flow in Another Me Planet.

We are adding the necessary elements to Another Me so that users can feel like it is more like themselves. We take advantage of the surface layer to make the voice that Another Me speaks more like the user’s, i.e., not simply focusing on appearance. The content that Another Me talks about is determined on the basis of the characteristics of the user. We are using the following technologies under research and development at NTT laboratories to achieve this.

4. Technologies that make up Another Me in Another Me Planet

4.1 Zero-shot text-to-speech synthesis technology

In Another Me Planet, users record a few seconds of their voice then listen to messages from their “future self” in a career of their choice, with the voice reflecting the user’s unique vocal characteristics. This audio is generated using one of the latest features of NTT laboratories’ text-to-speech synthesis technology [3]: zero-shot speech synthesis (see Fig. 5).

Fig. 5. Zero-shot speech synthesis and use in Another Me Planet.

Zero-shot speech synthesis enables the generation of synthetic speech closely resembling the voice one wants to synthesize (target speaker’s) by simply preparing a few seconds of reference speech from the target speaker. This technology accurately extracts the characteristics of the reference speech and reflects them in the synthetic speech without requiring any additional training of the speech-synthesis model for the specific speaker. Thus, the burden on users and service providers is limited to preparing just a few seconds of reference speech, making the technology highly accessible. It will be useful for a wide range of target speakers and user groups including extremely busy public figures and corporate executives and the elderly or young children who may find it difficult to record long samples. By extracting reference speech from previously recorded data, this technology can be used in various scenarios such as recreating the voice of someone who has lost their ability to speak or generating audio using a person’s voice before it changed (e.g., before puberty).

4.2 Speech generation and ethical-inappropriateness-estimation technique using NTT’s large language model “tsuzumi”

In Another Me Planet, users can interact with an Another Me who has a future occupation. The engine that generates utterances of this Another Me in Japanese uses tsuzumi [4], a large language model that NTT is developing. On the basis of the future occupation of Another Me determined by the user and the personality and work characteristics of Another Me set on the basis of the answers to the questionnaire, a prompt is created to automatically generate utterances of this Another Me to resemble the user (Fig. 6).

Fig. 6. Example utterances of Another Me generated using tsuzumi.

For the candidate utterances of Another Me generated with tsuzumi, a technology [5] under research and development at NTT laboratories is also used to further improve the quality of the final utterance content. It automatically determines the level of ethical inappropriateness of the generated utterances, confirms that the likelihood of problematic words and sentences being included is low enough, then determines the final utterance of Another Me. When a sentence to be judged is input, it is possible to respond with the degree of ethical inappropriateness of the input sentence and the reasoning.

5. Future prospects

With the remarkable technological progress of artificial intelligence (AI), it is becoming more closely involved in many aspects of our lives. Through Another Me Planet, we will continue to collect information obtained through the construction of Another Me Planet application, as well as impressions and system-usage data obtained through social networking services throughout the session, to explore better relationships between humans and AI that directly interacts with humans, including Another Me Planet. We plan to deepen our knowledge of the acceptability of communication by such AI from various user perspectives and the effectiveness of various technologies including speech generation.

References

[1]	Virtual Expo, https://www.expo2025.or.jp/en/future-index/virtual/
[2]	Press release issued by NTT, “NTT announces new R&D projects of Digital Twin Computing,” Nov. 13, 2020. https://group.ntt/en/newsrelease/2020/11/13/201113c.html
[3]	K. Fujita, T. Ashihara, H. Kanagawa, T. Moriya, and Y. Ijima, “Zero-shot Text-to-speech Synthesis Conditioned Using Self-supervised Speech Representation Model,” Proc. of the 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW), Rhodes Island, Greece, June 2023.
[4]	NTT’s Large Language Models ‘tsuzumi’, https://www.rd.ntt/e/research/LLM_tsuzumi.html
[5]	Y. Oga, T. Hasegawa, K. Nishida, and K. Saito, “A Study on Automatic Generation of Undesirable Conversation Data Using LLM,” Proc. of the 30th Annual Conference of the Association for Natural Language Processing (NLP2024), pp. 2159–2164, Kobe, Japan, Mar. 2024 (in Japanese).

	Keigo Fushio Researcher, NTT Human Informatics Laboratories. He received an M.E. in mathematical physics from Osaka Prefecture University in 2018. From 2021 to 2025, he was with NTT Human Informatics Laboratories, where he contributed to several studies in human-computer interaction. He joined NTT WEST in 2025 and is currently engaged in automating business processes using generative AI. He is currently manager at NTT WEST.
	Atsushi Ohtsuka Senior Research Engineer, NTT Human Informatics Laboratories. He received an M.S. in informatics from the University of Tsukuba, Ibaraki, and joined NTT in 2013. He received a Ph.D. in informatics from the University of Tsukuba in 2020. His research interests are natural language processing and information retrieval. He has been in charge of research for Digital Twin Computing at NTT Human Informatics Laboratories.
	Narichika Nomoto Senior Research Engineer, NTT Human Informatics Laboratories. He received a B.A. and M.M.G. in information science and technology from Keio University, Tokyo, in 2005 and 2007. He joined NTT in 2007 and studied speech and emotion recognition. He worked as a data scientist at NTT Communications from 2012 to 2016. He is currently researching and developing natural language processing at NTT Human Informatics Laboratories.
	Shin-ichiro Eitoku Senior Research Engineer, NTT Human Informatics Laboratories. He received an M.E. and Ph.D. in information science and technology from the University of Tokyo in 2006 and 2013 and joined NTT in 2006. His research interests include information systems and multimedia systems for communications.
	Takashi Nakamura Senior Research Engineer, NTT Human Informatics Laboratories. He received an M.E. in information and communication engineering from the University of Electro-Communications, Tokyo, in 2005 and joined NTT Cyber Space Laboratories the same year. His interests include the practical development and business development of speech synthesis. He is a member of the Acoustical Society of Japan. He is currently senior manager at Value Design Department, NTT WEST.
	Yuhei Oga Researcher, NTT Human Informatics Laboratories. He received an M.E. in engineering from the University of Tsukuba, Ibaraki, in 2021. He joined NTT Human Informatics Laboratories in 2022 and investigated natural language processing. He is currently investigating large language models. He is a member of the Association for Natural Language Processing and the Japanese Society for Artificial Intelligence.

↑ TOP