Feature Articles: NTT R&D at Expo 2025 Osaka, Kansai, Japan

Vol. 23, No. 10, pp. 25–32, Oct. 2025. https://doi.org/10.53829/ntr202510fa2

IOWN × Spatial Transmission—A Communication Experience in Which Distant Spaces Become as One

Takahiro Matsumoto, Hiroshi Chigira,
Kazuhiko Murasaki, Takahiro Kusabuka,
Masakatsu Aoki, Masafumi Suzuki, Kakagu Komazaki, Shunsuke Konagai, Mariko Yamaguchi, Taiga Yoshida,
Ryuichi Tanida, Takayoshi Mochizuki,
and Jun Shimamura

Abstract

We introduce our technological efforts involved in transmitting the performance of the Japanese female group Perfume in real time over NTT’s Innovative Optical and Wireless Network (IOWN), as exhibited at the NTT Pavilion at Expo 2025 Osaka, Kansai, Japan. In this production, we connected Expo’70 Commemorative Park in Suita, Osaka and Yumeshima, Osaka (the site of Expo 2025) by IOWN and achieved an experience in which Perfume seemed to appear right before one’s eyes despite performing at a distant location through dynamic three-dimensional spatial-data transmission and reproduction technology and vibrotactile sound field presentation technology.

Keywords: IOWN, 3D space transmission, Expo 2025 Osaka, Kansai, Japan

PDF

1. Background and goal

Expo 2025 Osaka, Kansai, Japan (hereafter referred to as Expo 2025) was held exactly 55 years after the announcement of wireless telephones as “telephones of the future” at Japan World Exposition Osaka 1970 (Expo ’70) held in Suita, Osaka. At NTT laboratories, we aim to surpass two-dimensional (2D) communication based on video and audio by sharing entire 3D spaces at distant locations via NTT’s Innovative Optical and Wireless Network (IOWN) with low latency and to make communication that connects all of the five human senses, not just sight and hearing, feel natural.

In an attempt to embody this form of future communication at Expo 2025, we conducted a trial for the first time in the world by connecting the NTT Pavilion at Yumeshima, Osaka (the site of Expo 2025) with Expo’70 Commemorative Park in Suita, Osaka over IOWN and transmitting a live performance of the popular Japanese female group Perfume. We achieved an experience in which the performers at a distant location seemed to be truly present in the same space by transmitting haptic information in addition to 3D space with latency low enough to enable bidirectional communication. In this article, we describe dynamic 3D spatial-data transmission and reproduction and vibrotactile sound field presentation—the two key technologies used in this production—and introduce our behind-the-scenes activities to show how this performance was transmitted live using these technologies.

2. Dynamic 3D spatial-data transmission and reproduction technology

Dynamic 3D spatial-data transmission and reproduction technology enables the transmission of not only the geometry of space but also the dynamic changes over time, that is, spatial motion. However, several problems have arisen when attempting to transmit live performances using conventional technologies for measuring and transmitting dynamic 3D spatial data.

The first problem is the difficulty of performing high-accuracy measurements over a wide area. Methods for measuring 3D data include RGB-D (Red Green Blue - Depth) cameras such as Microsoft’s Kinect devices. However, an RGB-D camera usually has a narrow measurement range of several meters, and significant distortions may occur in measurement values obtained at longer distances, especially near measurement boundaries.

The second problem is the difficulty involved in achieving low latency from measurement to display. Other than RGB-D cameras, a method for making dynamic 3D measurements of a fixed area that is being rapidly commercialized is volumetric capture. This method arranges multiple cameras around an object to measure 3D data, but because processing for estimating and restoring a 3D shape from multiple camera images is required, processing time of several seconds is needed even for cases of relatively small delay. However, being able to use a bidirectional communication system such as a telephone or web conference without discomfort requires that one-way communication from measurement to presentation be completed in less than 200 ms. Volumetric capture also requires a stable environment with fixed lighting conditions such as a studio, so it would be difficult to apply it to the live performance presented in this article, which includes intense lighting effects in tune with the song or performer movements.

Instead of using only cameras as in volumetric capture, dynamic 3D spatial-data transmission and reproduction technology developed by NTT laboratories measures 3D space with an original configuration of measuring equipment (Fig. 1) that combines one camera with three light detection and ranging (LiDAR)*1 sensors. This technology achieves wide-area spatial measurements by using multiple sets of the measurement equipment shown in Fig. 1 and executing time synchronization control of all camera and LiDAR devices over the network. The LiDAR sensors used for measurement are of a type used mainly in on-vehicle applications for self-driving, which makes them capable of measuring positional information of an object situated up to 200 m away. Dynamic 3D spatial-data transmission and reproduction technology also reduces processing by combining LiDAR data that can directly obtain depth values with camera image data, achieving short times from measurement to display.


Fig. 1. Dynamic point-cloud space measurement unit.

With this technology, time synchronization control of camera/LiDAR devices uses the Precision Time Protocol (PTP), which synchronizes devices via the network with nanosecond-to-submicrosecond accuracy. While each LiDAR sensor can only measure space at a frequency of 10 times per second, PTP can be used to change the timing at which three LiDAR sensors measure space every 1/30 of a second, thus achieving spatial measurement at 30 times per second. The ability to match and collect multiple sets of sensor data with high accuracy on the basis of position and time is the foundation of this technology and plays an important role in sensor fusion that correctly integrates data measured using separate LiDAR sensors into one set of 3D data and that colorizes LiDAR 3D point-cloud*2 data having no color information using images obtained with another sensor, i.e., the camera.

As described above, the key feature of this technology is its combination of camera and LiDAR data. However, many technical issues must nevertheless be addressed to make the technology practical.

One issue is that the point-cloud data obtained from LiDAR measurements are “sparse” compared with camera images. In general, a LiDAR sensor emits laser light as a scanning line and calculates the distance to an object by the time-of-flight method that uses the time taken for the light to reflect off of the object’s surface and return. The spatial density that can be measured is therefore dependent on the number of scanning lines, and at present, the number of scanning lines is only 128 at maximum density. Considering that a camera can shoot at a granularity of 1080 pixels in the horizontal direction in the case of full high-definition quality, it can be seen that this number of scanning lines is indeed sparse.

The second issue is the generation of noise and misalignments between devices when making measurements using multiple camera/LiDAR sensors. Correctly integrating data measured by multiple camera/LiDAR devices into a single set of data requires that the positional relationship among those devices be accurately obtained and that calibration be performed to precisely compensate for measurement distortion in each device. However, conducting completely error-free calibration is difficult, which makes it necessary to take countermeasures to misalignments and distortion generated by such errors and to errors that occur in colorizing point-cloud data by camera images.

In dynamic 3D spatial-data transmission and reproduction technology, the problems of density and misalignment/distortion are solved using our original densification technology and noise removal technology. The densification technology predicts depth values from camera images with low latency in areas that cannot be measured with LiDAR. This has been achieved through the development of an algorithm that is 20-times faster than the existing technique. The noise removal technology involves a technique for removing afterimage noise generated by a fast-moving object, technique for removing point-cloud data erroneously colored from camera images, and filter processing to prevent an accumulation of noise and distortion from each measurement device that causes images to deteriorate when integrating 3D data measured from many angles into one set of data.

As a result of these research achievements, dynamic 3D spatial-data transmission and reproduction technology can make data measurements, apply densification and noise removal, send and receive data, and put that data into a state viewable as 3D information all within 100 ms. This technology therefore exhibits high real-time characteristics greater than a one-way delay of 200 ms deemed necessary to enable bidirectional communication without discomfort. With respect to the issues of 3D-space measurement density and misalignment/distortion, the technology has achieved a level that enables viewing without discomfort with the large 3D light-emitting diode (LED) display at the NTT Pavilion.

*1 LiDAR: A technology that irradiates an object with laser light and uses information on that reflected light to measure the 3D shape of the object as point-cloud data.
*2 Point cloud: A data-representation format that describes positional information in 3D space in the form of many points. A point cloud measured using LiDAR includes no color information, which needs to be measured using another type of sensor such as a camera.

3. Vibrotactile sound field presentation technology

Vibrotactile sound field presentation technology measures tactile vibrations of footsteps that vary according to the structure and shape of the ground, shoes being worn, and intensity of steps, and obtains a sense of localization as well as reproduces the above at a distant location. For example, in this live performance in which the members of Perfume wore boots on a wooden stage, this technology made it possible to transmit the texture and feel of their firm footsteps made while dancing with sharp movements so that viewers at a distant location could feel as if they were at the same stage. Measurement and reproduction that includes localization means that, when the position of a subject that generates vibration changes, viewers in the other venue can sense that change in position. For example, when a performer moves in a side-to-side manner, this technology is able to transmit that movement in a way that conveys tactile vibration in both left and right directions.

In conventional technology, the technique commonly used to measure tactile vibration involves embedding an array of microphones under the floor. However, such a microphone-embedding technique requires doubling of the arena or stage flooring, installation of many microphones, wiring to collect signals, etc., thus driving up installation and construction costs. In a live performance like the one described here, floor microphones can pick up even sounds from the music being played in addition to tactile vibration, making it difficult to clearly measure only the signals desired. At NTT laboratories, we investigated these problems by developing an original compact wireless sensor (Fig. 2) and a technique that directly attaches that sensor to a performer’s shoe. This sensor measures acceleration in contrast to an ordinary microphone that measures acoustic vibrations as sound. Using an accelerometer in this way enables tactile vibrations to be measured without being affected by environmental sounds such as music accompanying a dance, which differs from the method using microphones. However, when measuring tactile vibration through acceleration, it is generally easy to lose “vibration texture,” such as the feeling of boots knocking on a hard floor, compared with microphones. This technology, however, controls emphasis and restraint by frequency when reproducing vibration, thus reproducing texture even with the use of acceleration data.


Fig. 2. Footstep vibrotactile sensor.

At NTT laboratories, we have also developed a new tracking system to track the locations generating vibration to reproduce vibration that includes localization. Specifically, we developed a technique that combines camera-based markerless tracking and tracking by self-emitting markers on the basis of an original design. This technique achieves a level of robustness that never loses track of the target subjects even in an environment in which measurement targets move frequently and change positions such as in this live performance.

NTT’s original technology was also used to efficiently reproduce measured tactile vibrations together with positional information in a large-scale environment. To enable a maximum of 70 visitors to simultaneously experience a performance at the NTT Pavilion, the experiential space was covered with a raised free-access floor consisting of as many as 352 50 × 50-cm tiles. To make matters worse, vibrations cannot propagate across these tiles, so to enable vibrations to be experienced anywhere in the experiential space, a vibrator would have to be attached to each of these tiles. Thus, we developed an original floor design that adds a vibration propagation layer, which enabled us to reduce the number of vibrators to approximately one-third the previous number while maintaining a sense of localization in eight directions (forward and backward, left and right, and diagonally) and reproducing powerful vibrations. By using an originally developed vibration controller (Fig. 3) to control which tile vibrators to actuate depending on the positions of the measured vibration sources, it becomes possible to direct various types of vibration that express, for example, uniform shaking all over or a change in vibration location according to the positional relationship of the performers.


Fig. 3. Vibrotactile control of a floor’s surface using our original controller.

At NTT laboratories, we have also developed a device that enables vibrations to be felt with one’s hands to enable individuals in wheelchairs to experience vibration with ease (Fig. 4). This device has a simple structure that measures nearby floor vibrations and transmits those vibrations to a hand-held unit, which provides flexible support for experiencing vibration regardless of the user’s position.


Fig. 4. Extension of vibration representation for all types of people.

4. Behind the scenes at the trial

We held a trial in transmitting a live performance on April 2, 2025 connecting the former site of the Telecommunications Pavilion at Expo’70 Commemorative Park in Suita, Osaka and the NTT Pavilion at Expo 2025. In conjunction with the NTT Pavilion experience theme of “PARALLEL TRAVEL,” these two locations were selected as sites symbolizing the 1970 and 2025 expositions, and data transmission between them was achieved using NTT WEST’s All-Photonics Connect powered by IOWN over an interval of approximately 20 km.

We constructed a stage in Expo’70 Commemorative Park to hold the performance within a 10 × 15-m tent (Fig. 5). Around this stage, we arranged 7 sets of measuring equipment to capture dynamic point-cloud space, 28 cameras to track vibration positions, and 4 3D cameras each with a resolution of 4K to capture stereo video. The voices of the three members of Perfume were picked up by pin microphones, and footstep vibrations were measured using three sets of vibrotactile sensors with one sensor attached to each boot of each member.


Fig. 5. Performance stage.

The total amount of dynamic point-cloud data, vibrotactile data, stereo-camera data, and voice data transmitted between these two locations over the IOWN All-Photonics Network (APN) came to 25 Gbit/s, and the amount of dynamic point-cloud data transmitted per second came to 45 million points using the data format developed by NTT laboratories. Stereo video was transmitted using the SMPTE 2110 world standard and an Internet Protocol (IP)-based 4K 60P video transmission standard, and vibrotactile data and voice data were transmitted using the Dante standard that enables IP transmission of audio signals with low latency.

The above data were then received and reproduced at the NTT Pavilion in a powerfully expressive manner by displaying the dynamic point-cloud data and stereo-camera data on the 13.2 × 4.8 m (W × H) 3D Immersive LED System*3 (Fig. 6) provided by Hibino Corporation while merging it with past Perfume 3D data previously obtained by volumetric capture. Camera work, light-particle effects, and other stage directions were added in real time to the dynamic point-cloud data transmitted as described above to achieve a form of expression that makes good use of that point-cloud data. Tactile vibrations were conveyed by controlling 128 floor-embedded vibrators according to the positions of the performers who are displayed stereoscopically on the 3D LED display. This scheme achieved a live experience that combined the sense of touch with the senses of sight and hearing.


Fig. 6. NTT Pavilion and actual reproduction of 3D data.

Preparations for this trial were particularly intensive from mid-March when the tent was set up in Expo’70 Commemorative Park to the time of the live performance, a period of less than three weeks. During this period, work progressed with staff divided into those on the Expo’70 Commemorative Park side and those on the Yumeshima (NTT Pavilion) side. However, remote production achieved by controlling the equipment necessary for video, audio, and lighting directions over the IOWN APN made it possible to significantly decrease the number of round trips made by those in charge between the two venues, and the work was completed in an efficient manner.

This transmission trial was showcased in front of many members of the media invited to the NTT Pavilion as a one-time-only live performance. The live transmission was also delivered on YouTube via a special website [1, 2]. The live performance, which lasted for about seven minutes, was followed by a demonstration of bidirectional communication consisting of conversation between a host at the NTT Pavilion and members of Perfume at Expo’70 Commemorative Park. By experiencing this conversation, one could not help but feel the possibility of conveying all five senses and not just sight and hearing with low latency in future communication by IOWN regardless of the distance between two locations.

The live performance of this trial was saved just as it was transmitted on April 2, 2025 so that visitors to Zone 2 of the NTT Pavilion could relive the experience of that day during Expo 2025.

*3 Immersive LED System: A 3D LED display system that enables stereoscopic vision by wearing specialized 3D glasses. This trial used a 3D LED display system that enables viewing with passive-type 3D glasses.

5. Future outlook

The dynamic 3D spatial-data transmission and reproduction technology and vibrotactile sound field presentation technology introduced in this article are promising for application to solutions for stadiums and arenas that aim to deliver a high-presence experience. By enabling measured data to be used in cyberspace, diverse applications will become possible beyond simply experiencing content in the metaverse. We can envision the use of such data in self-driving systems, in the maintenance and management of social infrastructures, and in prediction, analysis, and simulations conducted in cyberspace in a variety of industrial fields.

References

[1] “IOWN × Perfume” performance archived in a special website (in Japanese),
https://group.ntt/jp/expo2025/pavilion/perfume/
[2] “IOWN × Perfume” behind the scenes (in Japanese),
https://www.youtube.com/watch?v=mncwzmH_Gt4
Takahiro Matsumoto
Senior Research Engineer, Cyber-World Laboratory, NTT Human Informatics Laboratories.
He received a B.E., M.E., and Ph.D. in information engineering from Keio University, Tokyo, in 2009, 2011, and 2022. He joined NTT laboratories in 2012, where he has investigated multi-device integration, human–robot interaction, hyperspectral media processing, and 3D point-cloud media processing. He received an Honorable Mention Award at the International Conference on Human Agent Interaction in 2014. He is a member of the Information Processing Society of Japan and serves on the steering committee of its Special Interest Group on Consumer Devices and Systems.
Hiroshi Chigira
Senior Research Engineer, NTT Human Informatics Laboratories.
He received a B.E. and M.E. in engineering from Waseda University, Tokyo, in 2007 and 2009. He joined NTT in 2009 and has been engaged in diverse human-computer interaction research fields, such as vital sensor design, physiological monitoring, point-cloud processing, multi-modal interaction, creative live sports viewing, and artistic live performance production.
Kazuhiko Murasaki
Senior Research Engineer, NTT Human Informatics Laboratories.
He received an M.E. from the University of Tokyo in 2011. In the same year, he joined NTT, then received a Ph.D. from Tokyo University of Science in 2019. He has been engaged in research on computer vision and its application. He is a member of the Institute of Electronics, Information and Communication Engineers (IEICE).
Takahiro Kusabuka
Senior Research Engineer, NTT Human Informatics Laboratories.
He received an M.S. in engineering from Ritsumeikan University, Kyoto, in 2012 and joined NTT the same year. His research interests include mainly user interface and human-computer interaction. He has been working on projects involving multi-modal interfaces, with a focus on how haptic technologies can enhance the sense of presence, for example, in live entertainment experiences. He is a member of the Association for Computing Machinery.
Masakatsu Aoki
Senior Research Engineer, NTT Human Informatics Laboratories.
He received a B.E. and M.E. in engineering from Shinshu University, Nagano, in 1994 and 1996. He joined NTT Human Interface Laboratories in 1996 and investigated research related to computer graphics. His research interests include human-computer interaction, life logs, and visual communication.
Masafumi Suzuki
Research Engineer, NTT Human Informatics Laboratories.
He received an M.S. in engineering from Ritsumeikan University, Kyoto, in 2011 and joined NTT the same year. He was engaged in research and development on mobile applications and carrier networks. He has begun to focus his research on human-computer interaction.
Kakagu Komazaki
Senior Research Scientist and Designer, NTT Social Informatics Laboratories/NTT Communication Science Laboratories/NTT Human Informatics Laboratories.
He received a B.A. from the Department of Industrial Design at Nihon University College of Art, where he developed a strong foundation in both design and technology. He specializes in designing experiences using vibrotactile technology, focusing on how vibration-based touch can enhance user interaction.
Shunsuke Konagai
Senior Research Engineer, NTT Human Informatics Laboratories.
He received a B.E. in computer science from the University of Electro-Communications, Tokyo, in 1991. Since joining NTT the same year, he has been involved in research and development of multimedia communication systems, web text search engines, and geographic information systems.
Mariko Yamaguchi
Researcher, NTT Human Informatics Laboratories.
She received an M.E. in engineering from Institute of Science Tokyo in 2015 and joined NTT the same year. She has been engaged in research and development on immersion and realism for freely moving between real space and cyberspace.
Taiga Yoshida
Senior Research Engineer, NTT Human Informatics Laboratories.
He received a B.E. and M.E. in informatics from Kyoto University in 2006 and 2009. He joined NTT Cyber Solutions Laboratories in 2009 and has been engaged in research on computer vision.
Ryuichi Tanida
Senior Research Engineer, Supervisor, NTT Human Informatics Laboratories.
He received a B.E. and M.E. in electrical engineering from Kyoto University in 2001 and 2003. Since joining NTT in 2003, he has been engaged in research on video coding and its application technology. He is a member of IEICE.
Takayoshi Mochizuki
Senior Research Engineer, Group Leader, NTT Human Informatics Laboratories.
He received an M.E. in human science from Waseda University, Tokyo in 2001 and joined NTT the same year. He is engaged in research and development aimed at creating a co-creation space (hyper-realistic cyberspace) where people can play an active role through free movement between real space and cyberspace.
Jun Shimamura
Executive Research Engineer, Cyber-World Laboratory, NTT Human Informatics Laboratories.
He received a B.E. in engineering science from Osaka University in 1998 and M.E. and Ph.D. from the Nara Institute of Science and Technology in 2000 and 2006. He joined NTT Cyber Space Laboratories in 2000. His research interests include computer vision and mixed reality. He is a member of IEICE.

↑ TOP