Feature Articles: Creating Immersive UX Services for Beyond 2020
A 120 fps High Frame Rate Real-time Video Encoder
This article describes a real-time HEVC (High Efficiency Video Coding) encoder operating at 120 frames per second (fps) that is designed to achieve higher frame rate video services. The encoder achieves 4K/120 fps video encoding in real time through the synchronized operation of multiple 2K/120 fps encoders working in parallel. This encoder also makes it possible to achieve temporal scalable coding and transmission with upward compatibility for existing 60 fps based systems. This temporal scalability is expected to contribute to rapid expansion of the high frame rate video service field. The proposed encoder systems will also open the door to next-generation high frame rate ultra-high definition television services.
Keywords: high frame rate, encoder, hardware
The latest video coding standard, H.265/HEVC (High Efficiency Video Coding), achieves double the coding efficiency of H.264/AVC (Advanced Video Coding), making it possible to provide higher definition video services economically. In recent years, 4K video broadcasting and distribution have become increasingly widespread, and realistic image representations are becoming increasingly popular. Such representations demand not only high spatial resolution but also many other factors (Fig. 1).
A high frame rate (HFR) improves the moving picture quality and is essential for creating more realistic image representations . The next-generation television system specified in Recommendation ITU-R*1 BT.2020—Parameter values for ultra-high definition television (UHDTV) systems for production and international programme exchange—supports HFR formats up to 120 fps. For the spread of HFR video services, temporal scalable coding with upward compatibility for current 60 fps based video services is also necessary for encoders. Several HEVC real-time encoders have been developed to enable HEVC encoding of over-4K images. However, they are only capable of encoding images up to 60 fps due to the increase in computational complexity and the need for temporal scalable functionality.
This article presents our new 4K/120 fps HEVC encoder, which achieves HFR 120 fps real-time encoding and 120/60 fps temporal scalable functionality by exploiting our 4K/60 fps HEVC codec large-scale integrated circuit (LSI) called NARA, which stands for Next-generation Encoder Architecture for Real-time HEVC Applications .
When the flexible customizable software architecture of the NARA LSI is utilized, merely modifying the custom functions layer of the NARA software architecture makes it possible to achieve temporal scalability such as upward compatible reference picture structures and dual-stream bit-rate control functions. An encoder equipped with one NARA LSI has 2K/120 fps encoding capability, and synchronized operation of multiple encoders working in parallel also achieves scalability to larger 4K/120 fps input images. These scalable functionalities of the proposed encoder will contribute to the development of new broadcasting and distribution systems for the upcoming UHDTV services.
2. Encoder system architecture for HFR temporal scalability
The Association of Radio Industries and Businesses (ARIB), an incorporated association promoting the practical application and dissemination of radio systems in Japan, regulates 120/60 fps temporal scalable formats as an HFR scalable coding standard based on HEVC . The ARIB standard specifies that 120 fps HFR bit streams must have temporal scalability, and that encoded picture data are distributed into a 60 fps base layer stream and an enhancement layer stream, as illustrated in Fig. 2. Dual-stream bit-rate control must be performed for both the base layer and enhancement layer streams to ensure constant bit-rate encoding and distribution for both 60 fps and 120 fps decoders. In addition, base layer images and enhancement layer images should be periodically received and decoded one by one alternately to prevent deviation of the decoding time for 60 fps decoders. This limitation of the decoding time leads to changes in the reference structure of inter-frame prediction.
The temporal scalable encoding function is added to the existing NARA LSI by utilizing the flexible customizable software architecture of a NARA LSI with large motion search capability. The LSI’s software architecture consists of three hierarchical layers; the top function layer is the software for handling fundamental HEVC functions and user functions. This software hierarchy not only solves the complexity of tediously controlling the hardware using a low-level interface and the difficulty of handling HEVC common basic functions, but also provides a simple programming interface as custom functions of the top layer. Temporal scalability essentially requires complicated modifications of the LSI’s encoding method. However, because of this software architecture, the dual-stream bit-rate control and the reference structure modification are achieved in some of the custom functions, and thus, they can be easily customized with higher level programming.
3. System configuration
The system configuration of the 4K/120 fps encoder we have developed is shown in Fig. 3. It consists of four 2K/120 fps encoders. Each 2K/120 fps encoder includes one NARA LSI and handles the encoding of one of the 2K images, which is squarely divided from the original 4K input images. The 2K/120 fps encoder receives a 2K/120 fps input video as two sets of 2K/60 fps video sequences, which correspond to the base layer and the enhancement layer, by using a multi-channel input functionality of the NARA LSI. The encoder rearranges the two 2K/60 fps sequences to form one 2K/120 fps image, and all 2K/120 fps encoders operate cooperatively by exchanging synchronization signals in order to share the common clocking and time stamp values to form one 4K/120 fps encoder.
The output stream is constructed in the MPEG-2 Transport Stream (TS)*2 format so that the base layer stream and the enhancement layer stream have different video packet identifiers. Here, the NARA LSI’s multi-channel video output function is effectively utilized. Four 2K/120 fps transport streams can be transmitted in parallel, and they can also be transmitted with one multiplexed stream by using the encoder’s multi-channel multiplex and output functionality with cascaded transport stream input/output connections, as illustrated in Fig. 3.
A photograph of the overall 4K/120 fps encoder with four 2K/120 fps encoder devices is shown in Fig. 4, and the encoder specifications are listed in Table 1. This encoder inputs a 4K/120 fps video with eight 3G-SDIs (third-generation serial digital interfaces), and outputs an HEVC temporal scalable stream in the MPEG-2 TS format by using a DVB-ASI (digital video broadcasting - asynchronous serial interface) or by using Internet protocol (IP) connections. We observed that the 4K/120 fps bit streams encoded by our encoder were successfully decoded and played by other HFR systems.
A photograph taken during a demonstration of 4K/120 fps real-time transmission with our HFR encoder is shown in Fig. 5. Uncompressed 4K/120 fps images were encoded at a constant bit rate of 80 Mbit/s, where the bit rate of the base layer was 60 Mbit/s and that of the enhancement layer was 20 Mbit/s and images were distributed over an IP connection. The other real-time 4K/120 fps software decoder received the compressed bit stream, which was demultiplexed from MPEG-2 TS and decoded in real time. Then the 4K/120 fps decoded images were displayed on the screen by the 4K/120 fps-enabled projector.
No 120 fps video services have been available to date due to the enormous video data size and the requirement for compatibility with legacy systems. This has made it difficult to achieve 120 fps video distribution. Our device solves this problem and provides advanced high quality HFR video by combining it with other existing 4K/120 fps-enabled devices such as a camera, decoder, and projector.
We presented a new 120 fps real-time HEVC encoder for higher frame rate video encoding and transmission exploiting the existing HEVC encoder LSI. This 120 fps encoder makes it possible to achieve temporal scalable HEVC coding and transmission with upward compatibility for 60 fps. We plan to continue developing video coding devices to contribute to the provision of new services.