Feature Articles: Keynote Speeches at NTT R&D Forum 2018
Creating a Prosperous Future through the Fruits of Research and Development
This article introduces NTT’s latest research and development activities based on a lecture presented by Hiromichi Shinohara, NTT Senior Executive Vice President and Head of the Research and Development Planning Department, at NTT R&D Forum 2018, which took place in February 2018.
Keywords: artificial intelligence, IoT, security
1. Introduction: roles of NTT research and development (R&D)
The roles of NTT R&D are to create new technologies and to work with NTT Group operating companies to address pressing issues such as the explosive growth in the traffic handled by telecommunications carriers and the increasing sophistication of cyberattacks. In addition, we aim to enhance productivity and address security and disaster prevention issues so that we can help strengthen industrial competitiveness and solve social issues. Because information and communication technology is employed across a wide range of fields, we are tackling these issues through collaboration with NTT Group companies and in partnership with different industries.
We believe that if our technologies are to be deployed in various fields, they should be made to feel as natural as possible to users (Fig. 1). To achieve this, it is necessary to take into account several perspectives: “enhance” to understand people’s thoughts better and convey their intention correctly; “unconscious” to enable people to benefit from advanced technologies without making a conscious effort; and “barrier-free” to enable individuals to use technologies personally in ways that are adapted to their particular needs or preferences.
From a business standpoint, we believe that these technologies should evolve to enable enterprises to develop strong bonds with their customers. To achieve this, it is necessary to acknowledge other perspectives: “awareness” to rapidly understand changes in customer behavior or surrounding environments; “data-centric” to use various kinds of data processing to innovate business processes, create value, and support decision-making; and “servitization” to provide events (experiences) rather than things.
We believe that the key technologies to achieve these targets are artificial intelligence (AI), media, the Internet of Things (IoT), security, and the network. Our latest activities in these areas are introduced below.
2. AI technologies: corevo®
The NTT Group’s AI technologies are provided under the brand name “corevo,” which expresses our wish to bring about new, revolutionary development in collaboration with a variety of players (co-revolution) by integrating different types of AI technology. We are focusing on four categories of AI (Fig. 2). The first is Agent-AI, which supports people. The second is Ambient-AI, which creates value from the things around us. The next is Heart-Touching-AI, which sees things from a human perspective by taking the subconscious or unconscious into consideration. The last is Network-AI, which embraces two concepts: connecting different types of AI in order to create new value and applying AI to networks in order to enhance their operability .
Auditory and dialog technologies employed in Agent-AI are presented here.
(1) Auditory technologies
To make it possible for a robot agent to correctly understand what a speaker says, we need noise suppression technology, as this enables the agent to hear a human voice clearly even in noisy environments. We have thus far finished developing a technology that makes it possible to hear a human voice even with background noise exceeding 100 dB. Speaker-separated voice pick-up technology enables the user to pick up the voice of a particular speaker from among many people speaking simultaneously. Currently, this technology can identify—using a single microphone—the voice of an individual person from among up to six people talking at the same time. We are also working on speaker tracking technology, which can pick up the voice of a moving speaker, and remote sound collection technology, which can pick up sound at a distance.
To understand speech that has been picked up as a spoken language, it is necessary to identify the language the person is speaking. We are developing spoken language identification technology, which can determine the spoken language with an accuracy rate of 90% just one second after vocalization was initiated and at a rate of 99% five seconds after the start of vocalization. After the language has been determined, the speech is recognized and the result is put into a database. NTT’s spoken language recognition technology covers ten languages, including English and some Southeast Asian languages. In addition, we are improving the technology to support various Japanese dialects so that people can use this technology naturally (Fig. 3).
After a spoken language has been identified, it is necessary to understand what the speaker is saying. Two speech comprehension technologies are effective for this. First, people express the same thing in a variety of ways. We are developing speech intention comprehension technology that determines whether different expressions have the same meaning by evaluating speech data identified by spoken language recognition as high-level semantic vectors and calculating distances between these vectors. For example, “The golf ball does not fly very far” and “How can I extend the flying distance” are completely different character strings, but they essentially mean the same thing in the context of golf. This technology recognizes these phrases as having the same meaning.
The second speech comprehension technology is one that understands what emotion is being expressed. It recognizes whether the speaker is angry, happy, or satisfied. There are two types of anger: hot anger and cold anger. A person with hot anger shouts, whereas a person with cold anger is very calm and sarcastic. This technology can understand the emotions of hot anger, cold anger, satisfaction, joy, and sorrow.
(2) Dialog technology
When you talk with others, you will notice that some people are good at taking, or recognizing, speech cues and some are not. Most robot agents today cannot recognize such cues. When a person stops talking, the agent cannot control the timing of its speech unless it can determine whether the person has only paused, or if his/her speech has come to an end. Dialog control technology achieves natural dialog between a person and an agent by enabling the agent to do just this and also to detect that the person is interrupting the agent’s speech so that the agent can respond appropriately.
Users may want the agent to speak by mimicking their own voices. Cross-lingual speech synthesis technology can synthesize any words with the user’s voice quality from an approximately 30-minute-long recorded sample of his/her Japanese speech. Moreover, by incorporating the user’s speech sample into other languages, this technology can synthesize speech in English or other languages so that it resembles the user’s own voice.
Dialogs can be classified into two categories: task dialog intended to achieve a specific purpose, such as questions and answers, and desultory conversation, which tends to be random and spontaneous rather than focused on a specific topic. It is said that about 60% of daily exchanges are desultory. It is important that an agent is able to conduct both a task dialog and a desultory conversation. We are developing desultory conversation technology that enables an agent to continue chatting away by generating wide-ranging topics based on a large volume of text data and rules developed through research on desultory chats.
However, when an agent makes small talk with someone, the topic can drift in a direction the user is not happy with. To solve this problem, we are studying how to control the scope of topics. A way to regain a natural flow when a conversation goes astray is to have two agents speak with someone and have one of them put in a word to help out.
(3) Application example of Agent-AI
One application of auditory technologies is an intelligent microphone system for cars (Fig. 4). It enables natural conversations inside cars by combining intelligent microphone technology and low-delay howling suppression technology. In a small car, a person in the front seat can hear what a person in the back seat is saying. However, if there are three or four rows of seats as in a van, the driver cannot hear clearly when a person in the farthest seat speaks to him/her because of road noise, music, or other sounds. This technology enables people in any seat to conduct a smooth conversation. In addition, a person in any seat can use speech to control various in-vehicle devices such as the air conditioner and interior lights. When an echo canceler is incorporated, the driver can use a small microphone speaker to make a hands-free call that satisfies the audio quality standards for eCall (emergency call), which is a mandatory emergency reporting system in Europe.
One application of Ambient-AI is prediction of the flows and distribution of people. We are working on predicting the distribution of people 30 minutes, one hour, or two hours from the present time, for example, by combining demographic data derived from NTT DOCOMO’s data on radio access acquired by smartphones and mobile phones with weather data and event data (Fig. 5). NTT DOCOMO has been using this technology in a service it offers called AI Taxi since February of 2018. This service is already in use by two taxi companies in Tokyo and Nagoya. In a field trial conducted in 2016 and 2017, it was confirmed that the service boosted drivers’ income while at the same time reducing waiting times for taxi customers. This service predicts where people will be distributed 30 minutes from the present time. We are using spatio-temporal variables online-prediction technology to extend the prediction to one hour or two hours and conducting a field trial to predict the number of people in various areas in the near future.
In addition, we are also working on human flow guidance since it is important to reduce incidences of congestion and accidents by controlling the flow of people. For example, heavy congestion will occur if people head for a railway station en masse after the end of a sports event or concert. We are studying learning-based guidance technology that efficiently enables people to avoid congestion and select alternative routes .
A completely different type of Ambient-AI is pAUC (partial area under ROC [receiver operating characteristic] curve) maximization learning. With a high degree of accuracy, this technology learns only correct data from a group of data of which only an extremely minute fraction is correct. We are analyzing space photos in collaboration with the Japan Science and Technology Agency, the University of Tokyo, University of Tsukuba, and the Institute of Statistical Mathematics. We have used this technology to achieve automatic detection of Ia-type supernovae from a multitude of space photos. Although only about one in 1000 novae in the learning data is a supernova, this technology has reduced the observation time required to zero in on supernovae to one several hundredth of that required where a conventional method is used.
Application of AI technology to networks makes it possible to analyze log data of network devices for prediction or early detection of latent faults or changes in demand and to control the devices to eliminate these problems or recover from them quickly. Network anomaly detection technology learns log data of network devices. However, very little learning of faults or anomalous states can be achieved because log data rarely contain data related to these problems. This technology uses deep learning to solve this problem. It learns normal states in log data and detects anomalies such as silent faults by identifying deviations from the normal states. This makes it possible to improve network operability.
The telecommunications infrastructure includes telephone poles and cables. We are developing 2D-3D (two dimension-to-three dimension) matching technology that uses a vehicle-mounted laser to automatically measure these types of infrastructure devices and then analyzes the measured data to detect tilting poles or hanging cables, thereby making it possible to downsize the maintenance workforce.
We are carrying out a sports brain science project that is aimed at studying how the brains of top athletes function differently from those of amateurs . It was found that top athletes excel at the unconscious or subconscious level rather than at the conscious level. We believe that further research into the role the unconscious or subconscious plays in human behavior will help develop AI that is agreeable to humans.
3. Highly realistic sensation technologies
We are taking three approaches to developing highly realistic sensation technologies. The first is to create and transmit a space, as represented by our immersive telepresence technology called “Kirari!”. The second is to get into a certain space. An example is enabling the user to experience a ball thrown by a professional baseball player from the perspective of a batter. The third is to create a space using an illusion. These technologies will provide a higher level of value and a new sense of excitement in sports and the performing arts, particularly stage performances. From a technical standpoint it is of course important to increase the number of pixels, but we believe that new possibilities will open up if we add a new element other than performance and precision.
3.1 Creating and transmitting a space
The immersive telepresence technology called “Kirari!”  has been exhibited at the NTT R&D Forum since 2015, when images of people were extracted from a recorded video and displayed in quasi-3D. In 2016, for the first time, images of people were extracted from a video in real time during my keynote address. In 2017, we extracted images of people from a wider area in a higher-definition video and explained that the future direction of Kirari! is to develop technology for projecting a 3D image in such a way that it can be viewed from an arena-shaped spectator area. In other words, the image will be viewed not just from one direction but from four different directions. In 2018, we implemented a real arena-type Kirari! system. We have thus developed technology that enables an image such as that of a karate contest or an ice skater to be viewed from four different directions (in a four-sided arena) (Fig. 6).
The advanced MMT (MPEG*1 Media Transport) technology used in Kirari! can synchronize images sent from different sites. In autumn of 2017, we proved its feasibility by carrying out a live broadcast in which dance footage of the Japanese pop group called Perfume was shown. In that event, one group member was in Tokyo, one was in London, and one was in New York, and the three images of them were synchronized. The three cities are more than 10,000 km apart, so the image signals experienced delays and fluctuations. Although this would normally cause time misalignment among the images, this technology perfectly synchronized the broadcast images.
3.2 Creating illusory perceptions of space
We have technology capable of creating illusory perceptions of space . Existing 3D movie images are blurred unless the viewer in a movie theater wears a dedicated pair of 3D glasses. Suppose two people want to go to a movie together. If one wants to see a film in 3D and the other in 2D, they have to go to different cinemas. Hidden Stereo is a technology that uses illusion to solve this problem. It adds light patterns to a 2D image so that the 2D video can be viewed clearly without glasses, but a 3D video appears when the viewer wears 3D glasses (Fig. 7).
4. Activities involving IoT
The IoT is rapidly gaining momentum. Since IoT requirements vary depending on the application area, a single platform cannot support all types of IoT systems. In contrast, if IoT systems were to be developed individually without using a platform, it would be difficult to reuse technology. Therefore, we believe that a solution is to ensure that all IoT systems have a common architecture. NTT has defined a basic IoT architecture (Fig. 8) and is developing IoT systems that are built on this architecture and that will be used in several industries.
4.1 Optimization of manufacturing in factories
We intend to help factories continue to improve productivity by implementing advanced technology and operations. A way to achieve this is to link operations of various machine tools in a factory in real time and to have the machine tools download and use the latest applications just as smartphones do. To ensure real-time operation, it is important to use edge computing, in which necessary functions are allocated to edges near machine tools rather than to a cloud. We have combined this edge computing technology with IoT data exchange technology, which enables information about different types of machine tools to be shared as common data, and software component delivery technology, which makes it possible to select applications that are appropriate depending on the particular purpose. We began providing a FIELD system*2 service in October 2017 in collaboration with FANUC CORPORATION.
4.2 Acceleration of development of next-generation IoT for ships
Ship IoT is designed to ensure the efficient and safe operation of ships by processing data collected by various sensors installed in a vessel. Ship IoT differs from general IoT in that, being at sea, vessels can use only a satellite link, which has a very low bit rate, to connect to an operation center on land. In collaboration with NYK Line and MTI Co., Ltd., we are conducting a field trial in which data from sensors in a vessel are pre-processed onboard, and the results are sent to an operation center on land. The center performs various operations for the vessel or delivers the latest software to the vessel. Since all sensors are concentrated onboard and only a satellite link is available, it is important to use processing technology that exploits both edge computing and the cloud in a hybrid configuration.
4.3 Health management with “hitoe”
We aim to help recovering patients select the rehabilitation activities best suited to them by providing upper body garments that incorporate “hitoe,” a fabric that can collect biosignals from the body . The ability to select the optimal rehabilitation activities will help to reduce the duration of hospitalization (Fig. 9). Currently, only heart rate information is available, but we are collaborating with Fujita Health University and Toray Industries, Inc. to measure myoelectric signals so that information about muscles can also be used to select the most appropriate rehabilitation activities. Technologies that are important for achieving this are a high-speed distributed processing platform for processing biological data that are constantly being generated in real time and IoT gate technology for collecting data from many people.
4.4 Activities involving connected cars
In the field of connected cars, we are working in conjunction with Toyota Motor Corporation to step up efforts to define the requirements for these cars. It is becoming apparent that this is a very difficult proposition partly because there are numerous cars on the road.
In addition to addressing the purely technical aspects, it is necessary to address issues related to standardization. For example, connected cars that satisfy the requirements defined by NTT and Toyota may work with cars from Toyota but may not work with cars from other automakers. Since it is paramount to avoid accidents, it is necessary for the requirements to be internationally standardized and adopted by all related industries. There are areas where all companies must cooperate and other areas where companies must compete with each other. Together with other partners, we have launched the Automotive Edge Computing Consortium to promote joint technical development in the cooperative area and press ahead with standardization. We are hoping that many companies worldwide will participate in this endeavor and contribute to the development of next-generation cars.
5. Activities involving security
While it is important for us to study cybersecurity and cryptographic theory to protect communication, it is also necessary to address the security of control systems in order to protect companies and institutions as well as society as a whole. NTT is working on IoT security and information technology (IT) security to make sure that information can be fully protected.
5.1 Control system security
Generally, it can be said that with information systems, priority is given to protecting confidentiality—that is, preventing information from being divulged. In contrast, with control systems, priority is given to maintaining availability and uninterrupted operation. NTT and Mitsubishi Heavy Industries, Ltd. have jointly developed “InteRSePT®*3,” a security technology for automatically detecting cyberattacks on control systems and defending the systems against them. The beauty of this technology is that when it detects a cyberattack, it does not simply shut down communication, but rather, it selectively shuts down some parts but allows other parts to continue to operate in accordance with the specific conditions of the system so that the control system as a whole can continue to operate. It also supports protocols used by control systems. The message exchange frequencies and cycles of these protocols differ from those of ordinary communication.
We are also working on security technology for connected cars (Fig. 10). Since many cars are connected through a network, it is important to detect any cyberattack on a car and deal with it quickly. We have now reached a level at which an in-vehicle detection engine can detect anomalous behavior caused by an attack and shut down the control signal of the attacked car so that the car will not cause an accident. The next level we aim at is for the engine to achieve high-precision detection and response by working with clouds.
5.2 IoT security
We are working on FSU (Fujioka-Suzuki-Ustaoglu) protocol, a technology for achieving mutual authentication between a cloud and IoT devices using ID (identifier)-based encryption, which is a type of public key encryption. FSU has been adopted in an international standard (ISO/IEC*4 11770-3). Its main advantages are that there is no risk of authentication information being leaked because the server does not retain the authentication information of devices, and that authentication is extremely easy because devices do not have to have a password or certificate.
We are developing IoT device anomaly detection technology. It transfers device information and traffic data observed by the IoT gateway to the analysis server, which learns traffic data using machine learning and also monitors traffic. It can detect an anomaly with a high level of accuracy even when the anomaly was caused by an unknown attack.
5.3 IT security
Current methods of detecting malware to defend endpoints, which are typically personal computers, are mainly based on pattern matching of data in files. However, malware that cannot be detected by pattern matching is on the increase because many variations of malware are constantly being created. Taint analysis and symbolic execution can detect conventionally undetectable malware because they are able to find malware based on its behavioral characteristics.
We are studying blockchain technology that can be used for managing secret data (Fig. 11). Blockchains are beginning to be used in a variety of fields and are expected to be employed for business purposes. Some business activities involve handling data that should not be disclosed to third parties. Since the major feature of blockchains is that authority is distributed, our technology can control who is permitted to access data and who is not even when authority is distributed.
6. Activities involving networks
The environment surrounding networks has been changing dramatically. Both traffic and power consumption continue to rise. Needs are emerging to enable users to seamlessly access networks by selectively using various means of access, including not only fixed and mobile lines but also Wi-Fi and LoRa (long range). Telecommunications facilities are increasingly implemented using software-defined networking and network functions virtualization, and the amount being invested in facilities by OTT (over-the-top) operators such as Amazon, Google, and Facebook is exceeding the amount invested by telecommunications carriers in North America. In addition, implementation of telecommunications facilities in software has given rise to a trend to utilize open source software (OSS). As a telecommunications carrier, NTT is developing new networks by incorporating these trends. We would be left behind if we remained stuck following the conventional approach.
6.1 Virtualization and generalization of high-end routers
High-end routers have conventionally incorporated all the functions of transfer, control, and service. We are now working on MSF (multi-service fabric), which separates these functions and uses OSS. The transfer function will be implemented in a general-purpose switch or whitebox switch, the control function in a network controller, and the service function in a cloud. We have developed a prototype and confirmed that it is possible to increase switching capacity and reduce installation space and power consumption in comparison to conventional routers (Fig. 12). To carry out this type of activity globally, we are vigorously engaged in a variety of open source projects and open communities such as the Open Network Foundation and Telecom Infra Project.
7. Activities in basic research
The time it takes to transition from basic research to commercialization has been shrinking dramatically in recent years. For example, rapid progress in AI has already led to basic research in the acoustic, speech, and linguistic fields progressing to the development of commercial products. We believe that we must further strengthen our basic research in order to ensure sustained development of the NTT Group and NTT R&D.
A novel computer has recently emerged from our basic research. One of the combination optimization problems is the traveling salesman problem, which is a problem of minimizing the total travel cost when a salesman is required to visit each city once and return to the starting point. This problem can easily be solved if the number of city combinations is small. However, as the number of combinations increases, an enormous amount of computation is required. Conventional computers have not been able to solve this type of combination optimization problem.
We have developed a new computer called LASOLV based on the Ising model, which is a computer model that uses a physical phenomenon to solve this problem. The model represents this problem as positions of magnets and decides that the state in which the magnets are the most stable is the answer. LASOLV uses light pulses instead of magnets (Fig. 13). The name was coined by combining the words laser and solve. The key device is a phase sensitive amplifier, which the NTT laboratories have been studying for many years. Other computers based on the Ising model operate at an extremely low temperature of about –200°C and thus require large air conditioning facilities. In contrast, LASOLV can operate at room temperature. It has been made generally accessible as QNNcloud  since November 2017.
The current performance level of LASOLV is such that it can solve a problem of 2000 nodes with 2 million node connections. This is already unprecedented, but we are aiming to raise its performance further so that it can solve a problem of 100,000 nodes with 10 billion node connections by 2019 or 2020.
LASOLV can be useful, for example, for developing new drugs or solving traffic congestion problems. However, because users currently need to have special knowledge in mathematics and physics, it is not easy for drug developers to use LASOLV. Thus, we believe that it is important to build a computing environment in which LASOLV can be easily accessed by those without specialized programming knowledge.
7.2 Biodegradable battery
As IoT advances, we will be surrounded by various sensors. Each sensor contains a battery and a circuit. Ideally, all sensors should be recovered after use. However, some sensors may not be recovered. To prevent unrecovered sensors from having adverse impacts on the natural environment, we have developed a battery that decomposes into soil (Fig. 14). The battery is made only of fertilizer and biological materials. Currently, the battery can light up a lamp and operate a buzzer. However, the battery capacity is about one-tenth that of commercial batteries and thus can last for only a short time. Further technical development will be needed to extend battery life.
I believe that we need to have two skills if we are to create new value in collaboration with partners in various fields. One is the ability to master forward-looking technologies. The other is the ability to combine the strengths of all collaborators. We will work to refine these two skills and endeavor to contribute to the development of society.
All brand names, product names, and company names that appear in this article are trademarks or registered trademarks of their respective owners.