Feature Articles: Exploring Humans and Information through the Harmony of Knowledge and Envisioning the Future

Vol. 23, No. 10, pp. 51–55, Oct. 2025. https://doi.org/10.53829/ntr202510fa6

Exploring the Nature of Humans and Information, and Connecting Them—Communication Science for a Sustainable Future through the Discovery of Hidden Truths and Interdisciplinary Research

Futoshi Naya

Abstract

NTT Communication Science Laboratories (CS Labs) conducts fundamental research to deeply understand both information and humans, and on the basis of this understanding, develops core technologies that connect information and humans. Our research and development efforts focus on achieving ¡Èheart-to-heart communication¡É between humans, between humans and artificial intelligence, and between humans and society. This article highlights some of the latest research activities at CS Labs.

Keywords: communication science, artificial intelligence, brain science

PDF

1. Introduction

The term “VUCA” has become widely used, which stands for “volatility,” “uncertainty,” “complexity,” and “ambiguity,” and describes situations in which everything is rapidly changing and the future is difficult to predict. Originally coined in the late 1990s to describe the complexity of military strategies, the concept of a “VUCA World” gained global recognition after it was introduced at the World Economic Forum in Davos in 2016. Today, the world is experiencing unprecedented challenges: global-scale extreme weather events, large-scale natural disasters, pandemics such as COVID-19, drastic shifts in the economic policies of various countries due to changing global conditions, and the rapid advancement of science and technology, particularly generative artificial intelligence (AI). These factors are intricately intertwined, making the future even more uncertain and unpredictable.

In this age of VUCA, there is a growing need to not only understand the intrinsic value hidden within increasingly complex and diverse information but also to deeply understand our own sensory, cognitive, behavioral, and emotional mechanisms—and the diversity within them—that influence how we perceive information, make decisions, and take action. Since its establishment, NTT Communication Science Laboratories (CS Labs) has been committed to making new discoveries through a deep understanding of the nature of both information and humanity and creating innovative technologies on the basis of these discoveries. This issue highlights some of CS Labs’ latest initiatives, including fundamental research aimed at uncovering the nature of information and humans, discovering previously unknown knowledge, and developing new forms of communication that connect people to information, to each other, and to society in an increasingly diverse world.

2. Understanding the nature of information

At CS Labs, our research focuses on information processing technologies across all types of media that convey information in both human-to-human and human-to-computer communication. Advances in AI technology have been remarkable. For example, the performance of AI in image and speech recognition has rapidly improved, now surpassing human capabilities in some areas. In speech recognition, portable devices such as AI voice recorders capable of distinguishing between speakers and transcribing multi-party conversations are now available [1]. Beyond high-performance speech recognition, these devices significantly enhance work efficiency by generating summaries in conjunction with generative AI. These products are the result of cutting-edge technologies specialized in processing human speech. However, research has expanded to enable AI to recognize not only human speech but also a wide range of surrounding sounds, such as those produced by nature, animals, and vehicles.

In conventional sound recognition processing, feature extraction is manually designed to convert basic elements of raw sound signals, such as frequency components, intensity, and pitch, into numerical values called “features” that help distinguish different sounds. These feature values are then paired with correct sound labels (e.g., dog bark) and used to train a system to classify sounds. However, as the number of sound categories increases, it becomes extremely difficult to manually determine which features are useful for classification. For example, distinguishing between a “threatening bark” and a “happy bark” in a dog’s “woof” requires capturing subtle differences that are difficult to define by hand. To address this challenge, a method called representation learning has gained attention. With this method, the feature extraction process is executed automatically by machine learning, rather than being manually crafted, enabling a system to learn the most useful feature directly from sound data. Among representation-learning methods, self-supervised learning, which uses only a large amount of diverse sound data and does not require correct labels, can learn essential and general-purpose feature representations of individual sounds without human intervention simply by having a computer listen to the sounds. An article in this issue introduces a new self-supervised learning method called Masked Modeling Duo (M2D) that enables the automatic acquisition of essential feature representations that constitute all types of sounds by learning representations with a fill-in-the-blank problem, in which the original sound is partially masked and the hidden parts are predicted [2]. M2D has demonstrated high accuracy in benchmark tasks such as environmental sound and speaker identification, music-genre recognition, and musical instrument classification. By combining M2D with large language models, it has become possible to enhance performance in a variety of applications, including language tasks that generate detailed textual descriptions of the sounds produced by different metals when struck.

3. Understanding the nature of humans

CS Labs is actively advancing research in the human sciences, aiming to elucidating the mechanisms of human sensation, emotion, and movement from the three perspectives of information science, psychology, and neuroscience. CS Labs also promotes research in the developmental sciences, focusing on how children acquire language and social skills. In human science research, “mind-reading” studies are being conducted to decode latent mental states that are difficult to verbalize or even consciously recognize [3]. These findings highlight biological signals—such as eye movements and changes in pupil diameter—to infer personal preferences for music and faces, as well as identify the direction of a person’s auditory attention [4]. Beyond merely reading psychological states, research has found that stimulating intrinsically photosensitive retinal ganglion cells (ipRGCs)—the third type of photoreceptor cell in the retina, alongside cones and rods—using special light can unconsciously improve concentration, enhance short-term-memory performance, and reduce drowsiness and fatigue [5]. This research aims to create future environments where lighting can be automatically adjusted on the basis of a person’s cognitive state, enabling people to regulate their autonomic nervous system and work more efficiently without experiencing fatigue.

In our studies aimed at elucidating the exceptional latent brain functions of top athletes, we found that professional baseball batters use distinct gaze-movement strategies depending on their type [6]. In baseball, it is crucial for batters to keep the ball centered in their visual field to accurately track its movement. Immediately after the ball is released from the pitcher’s hand, its angular velocity within the batter’s visual field is relatively low, but this velocity increases rapidly as the ball approaches. To cope with this, batters must quickly shift their gaze using rapid eye movements known as “saccades” to jump ahead to the predicted point of arrival. However, during saccades, visual information processing is temporarily suppressed, making it difficult for batters to accurately gather visual information in the moment. Previous studies suggest that batters who delay their saccade timing may be able to track the ball longer and use visual information more effectively. Using this indicator, we measured and analyzed the gaze behavior of 39 players from a professional baseball team in Japan during batting. The results revealed distinct player types: for example, Player A has an average swing speed—indicating average physical ability—but demonstrates delayed saccade timing and excellent visual tracking ability, earning titles such as most home runs, batting champion, and highest on-base percentage. In contrast, Player B exhibits earlier saccade timing and lower visual tracking ability but compensates with a faster swing speed and has achieved titles such as batting champion and highest on-base percentage. Further analysis including head movement uncovered a key difference in strategy: Player A adjusts their head movement according to pitch speed, continuously keeping the ball centered in their vision, while Player B maintains a consistent head movement regardless of pitch speed, relying on past experience to predict the ball’s trajectory. These findings highlight the diversity of gaze-movement strategies tailored to each batter’s unique characteristics, providing valuable insights for developing individualized training and coaching methods that align with differences in visual information processing. Looking ahead, we aim to apply these insights not only to enhancing athletic performance but also to fields such as rehabilitation and skill acquisition.

4. Connecting people and information

With advancements in sensing technologies and information and communication technologies (ICT), it has become possible to capture and track phenomena that were previously difficult for humans to observe directly. However, there are still numerous cases in which the phenomena remain undiscovered or their underlying causes are unknown. For example, accurately understanding how pandemics of infectious diseases such as COVID-19 emerge, identifying precise transmission routes, elucidating mechanisms of viral mutation, and determining the predispositions of individuals prone to severe illness remain significant challenges. Addressing these issues through the analysis of diverse observational data is essential for preventing the spread of future infectious diseases and for advancing the research and development of more effective treatments and preventive measures. Research has extended beyond using conventional health check-up data to include genetic information for forecasting disease susceptibility and facilitating early detection. However, as the volume and complexity of observed data continue to grow, it becomes increasingly difficult to identify clusters of patient groups exhibiting characteristic symptoms from among countless possible data combinations. Since the early 2000s, CS Labs has been engaged in research on relational data analysis, a machine-learning technique designed to extract hidden, previously unknown information from the complex intersections of diverse datasets. A key feature of this technology is that it does not rely on human-defined assumptions about data patterns or groupings. Instead, it uses a data-driven methodology that considers an “infinite” range of potential patterns. An article in this issue introduces our efforts at CS Labs to uncover previously unknown knowledge in the field of medical healthcare, leveraging state-of-the-art data-analysis technologies [7].

While the opportunities to use generative AI, such as ChatGPT, in business settings are rapidly increasing, efficiently obtaining the desired results still requires users to provide various types of context—such as roles, prior information, knowledge, and examples—as a part of text prompts, and to engage in multiple rounds of trial and error. Newer models such as ChatGPT-4o support multimodal input and output, enabling voice and image interactions and facilitating real-time conversations. However, current dialogue systems still require users to manually configure these settings for each session. Unlike natural conversations between humans, it remains challenging for AI to understand the dynamic conversational context, interpersonal relationships that include other people involved in the dialogue, and engage in appropriate conversations that consider the feelings and positions of others on the basis of such understanding. An article in this issue introduces initiatives aimed at developing conversational AI that can, like humans, capture the nuances of conversational dynamics and the subtle emotional cues and provide users with the desired information in a more natural and seamless manner [8].

The application of such conversational AI is gradually expanding as it is integrated into agents and communication robots, with growing use in areas such as amusement, customer service, and caregiving. In particular, the use of communication robots in children’s educational settings is expected to increase with the advancement of interactive AI functions. These robots may eventually be able to provide children with information and knowledge tailored to their individual interests and preferences—including topics they want to explore and things they have not yet encountered. In anticipation of this future, CS Labs is conducting developmental science research to investigate how children perceive robots as social entities and how their perceptions evolve with age, using experimental psychology methods. An article in this issue discusses the impact of socially interactive robots—capable of engaging through conversation and gestures—on the altruistic behavior of five-year-old children, as well as future perspectives on using robots as partners that can support and accompany children’s autonomous learning [9].

5. Connecting people with each other and with society

The advent and widespread adoption of ICT, such as smartphones and social media, have brought about a profound transformation in how individuals communicate with each other and with broader society. Triggered by the COVID-19 pandemic, the adoption of video conferencing and chat tools has increased, and opportunities for face-to-face communication have decreased significantly compared with the past. Conversely, the cultivation of empathy and the strengthening of emotional bonds between individuals are indispensable for sustaining our vitality and overall well-being. At CS Labs, we are conducting research focusing particularly on the pivotal role of the body in the mechanisms through which empathy emerges between individuals. On the basis of these insights, we are also developing and demonstrating remote communication technologies that enable embodied interaction. We have developed a device that transmits and reproduces the sensation of a heartbeat to strengthen the bonds between family members separated by circumstances such as premature infants who must remain hospitalized and their parents or other relatives. These efforts are described in detail in an article in this issue [10].

The widespread use of social networking services (SNSs) has made the ways in which individuals engage with society increasingly diverse and complex. Objectively capturing changes in personal relationships across various contexts—such as within families, child-rearing, and local communities—is essential for providing meaningful support for social engagement. However, achieving this remains a significant challenge. At CS Labs, we have developed a tool called Social Orbit, which visualizes changes in a person’s social relationships—such as distancing or increasing closeness—as trajectories in a two-dimensional space without needing to access the content of SNS message logs. Instead, the tool analyzes variations in features such as the intervals and frequency of messages, enabling even non-experts to intuitively grasp shifts in social dynamics [11]. Through demonstration experiments, we have found that Social Orbit is effective in helping individuals recognize subtle patterns in their social interactions that are often difficult to notice independently and in encouraging self-reflection on their own behavior. This initiative is part of our broader effort to foster a society where, especially in today’s VUCA era, individuals can autonomously recognize, select, and promote positive behavioral changes across diverse social environment.

6. Conclusion

Representative examples of the latest research at CS Labs were introduced. The perspective of connecting people to information and people to each other is considered extremely important, not only for advancing our current research but also for pioneering entirely new fields of study. The press release in May 2025, “Uncovering the Connection Between Independently Studied Models in Pure Mathematics and Quantum Optics” [12], presents a groundbreaking achievement. It proved that a model studied in the field of fundamental mathematics is equivalent to the two-photon quantum Rabi model, a physical model that describes the interaction between light and matter. This proof is based on the mathematical concept of spatial symmetry in fundamental mathematics. This research is expected to further deepen collaboration between mathematicians and physicists and lead to new discoveries in quantum optics. CS Labs will continue to accelerate the pursuit of unknown truths and promote interdisciplinary research, contributing to a sustainable future through mutual cooperation between humans and AI.

References

[1] Plaud.ai,
https://www.plaud.ai/
[2] D. Niizumi, “AI that Learns to Listen on Its Own—Advancing Self-supervised Audio Representation toward Cutting-edge Sound Understanding with Large Language Models,” NTT Technical Review, Vol. 23, No. 10, pp. 67–72, Oct. 2025.
https://ntt-review.jp/archive/ntttechnical.php?contents=ntr202510fa9.html
[3] M. Kashino, M. Yoneya, H. Liao, and S. Furukawa, “Reading the Implicit Mind from the Body,” NTT Technical Review, Vol. 12, No. 11, pp. 31–36, 2014.
https://doi.org/10.53829/ntr201411fa6
[4] H. Liao, H. Fujihira, S. Yamagishi, Y. Yang, and S. Furukawa, “Seeing an Auditory Object: Pupillary Light Response Reflects Covert Attention to Auditory Space and Object,” Journal of Cognitive Neuroscience, Vol. 35, No. 2, pp. 276–290, 2023.
https://doi.org/10.1162/jocn_a_01935
[5] NTT CS Labs Open House 2025, “Unseen Light That Enhances Cognitive Task Performance,”
https://www.kecl.ntt.co.jp/openhouse/2025/exhibition_19_en.html
[6] NTT CS Labs Open House 2025, “Seeing the Essence of Baseball Batting,”
https://www.kecl.ntt.co.jp/openhouse/2025/exhibition_17_en.html
[7] M. Nakano, “Discovery of Hidden Knowledge in Data Relationships—Prospects for Reliable Healthcare through Infinite-hypothesis AI Models That Interpret Biological Phenomena,” NTT Technical Review, Vol. 23, No. 10, pp. 73–77, Oct. 2025.
https://ntt-review.jp/archive/ntttechnical.php?contents=ntr202510fa10.html
[8] Y. Chiba, “Techniques for ‘Reading the Room’ in Attentive Conversational AI—Understanding Dialogue Context through Multimodal Information and Incremental Response Generation,” NTT Technical Review, Vol. 23, No. 10, pp. 61–66, Oct. 2025.
https://ntt-review.jp/archive/ntttechnical.php?contents=ntr202510fa8.html
[9] Y. Okumura, “Children Perceive Minds in Robots—Learning Companion Robots for the Future of Early Childhood Education,” NTT Technical Review, Vol. 23, No. 10, pp. 78–82, Oct. 2025.
https://ntt-review.jp/archive/ntttechnical.php?contents=ntr202510fa11.html
[10] A. Murata, “From the Study of Embodied Empathy to Supporting Family Wellbeing—Understanding Embodied Empathy and Connecting Distant Families via Bodily Information Transfer,” NTT Technical Review, Vol. 23, No. 10, pp. 56–60, Oct. 2025.
https://ntt-review.jp/archive/ntttechnical.php?contents=ntr202510fa7.html
[11] NTT CS Labs Open House 2025, “Capturing Temporal Relationship Changes on SNS,”
https://www.kecl.ntt.co.jp/openhouse/2025/exhibition_14_en.html
[12] Press release issued by NTT, “Uncovering the Connection Between Independently Studied Models in Pure Mathematics and Quantum Optics,” May 13, 2025.
https://group.ntt/en/newsrelease/2025/05/13/250513a.html
Futoshi Naya
Vice President, Head of NTT Communication Science Laboratories.
He received a B.E. in electrical engineering, M.S. in computer science, and Ph.D. in engineering from Keio University, Kanagawa, in 1992, 1994, and 2010. He joined NTT Communication Science Laboratories in 1994. From 2003 to 2009, he was with Intelligent Robotics and Communication Laboratories, Advanced Telecommunications Research Institute International (ATR). His research interests include communication robots, sensor networks, pattern recognition, data mining in cyber-physical systems, and AI-based tailor-made education support. He is a member of the Institute of Electrical and Electronics Engineers (IEEE), the Society of Instrument and Control Engineers, and the Institute of Electronics, Information and Communication Engineers (IEICE).

↑ TOP