Hardware Acceleration Technique for Radio Resource Scheduling in 5G Mobile Systems
This article presents a hardware acceleration technique for the scheduling process in ultra-high-density distributed antenna systems for fifth-generation (5G) mobile communications systems. In 5G systems, the overall system throughputs for a huge number of combinations of antennas and user equipment (UE) for communications have to be calculated in the scheduling process. To speed up the calculation, this acceleration technique calculates the throughputs of each UE simultaneously. Experimental results show that the acceleration technique calculates the system throughput approximately 60 times faster than without the acceleration. As a result, the acceleration technique improved the throughput by about 73% for a system with 32 antennas and 256 UEs. The hardware acceleration technique therefore enables a future practical 5G system.
Keywords: 5G mobile communications systems, resource scheduling, hardware acceleration
In mobile communications systems, resource scheduling assigns user equipment (UE) to each antenna for downlink transmission. Scheduling that decides the optimal combination of antennas and UEs is needed for efficient communications and improved overall system throughput . In systems preceding fifth-generation (5G) mobile communications systems, this assignment has been executed by software-based processing due to the small number of antennas, as shown in Fig. 1(a), and the small number of possible combinations of antennas and UEs.
For 5G systems, researchers have been studying flexible antenna deployment such as localized massive multiple-input multiple-output (MIMO) and distributed massive MIMO . In this article, we focus on distributed massive MIMO (i.e., distributed antenna systems). In distributed antenna systems, as shown in Fig. 1(b), a huge number of antennas are deployed at ultra-high density in order to increase the overall system throughput [3, 4]. The number of possible combinations reaches approximately 1076 in a system with 32 antennas and 256 UEs, which is based on a 5G-system model in the Mobile and wireless communications Enablers for the Twenty-twenty Information Society (METIS) project .
To obtain the appropriate combination from this explosive increase in the number of possible combinations, the scheduler approximately searches for the appropriate combination . In general, an approximate search can approach the appropriate combination as the number of searched combinations increases. However, it will be difficult to increase the number of searched combinations by using software-based processing within the limited scheduling period of 1 ms  because of the limitation in the number of CPU (central processing unit) cores.
To overcome this issue, we devised a hardware acceleration technique that enables the scheduler to accelerate the scheduling process in ultra-high-density distributed antenna systems, and a search process for quickly obtaining the right combination. The details of the search process and the hardware acceleration technique are respectively described in sections 2 and 3.
2. Search process
The search process approximately decides the combination by iterating processes for improving the combination so that the system throughput increases. In general, the UEs having the highest throughput are simultaneously chosen for all antennas when the scheduling searches for a combination. In the search process, the UE having the highest system throughput is chosen only for one antenna. Then UEs are chosen for the other antennas. This choosing of the UE is carried out one by one by other antennas. In this way, the best UEs are assigned to antennas so that the system throughput always increases. The combination can be approximated to the better combination by iterating this assignment of UEs, and the system throughput is improved.
The procedure for deciding the combination in the search technique is shown in Fig. 2. First, all antennas are set to blank, which means the radio transmission is stopped. Then, one of the antennas is selected. In the case shown in Fig. 2(a), antenna A is selected. In order to select the UE to which antenna A should transmit data, the system throughputs under this condition are calculated. In this case, the system throughputs of (UE#1, Blank, Blank), (UE#2, Blank, Blank), and (UE#3, Blank, Blank) are calculated, and these three system throughputs are compared. In this example, (UE#1, Blank, Blank) has the highest system throughput. Consequently, UE#1 is provisionally selected for antenna A, and the combination is updated.
Next, antenna B is selected. In the case shown in Fig. 2(b), in order to select the UE for antenna B, the system throughputs are calculated taking the interference power from antenna A and B into account. The throughput of UE#1 may change because the interference power from antenna B changes. When UE#4 is chosen, the system throughput is calculated by summing the throughputs of UE#1 and UE#4. With the same calculation as above, the system throughput is calculated by summing the throughputs of UE#1 and UE#5. These two system throughputs are compared, and the highest one is obtained when antenna B transmits data to UE#4 in this example. Thus, UE#4 is provisionally selected for antenna B, and the combination is updated. This technique enables the scheduler to take inter-cell interference power into account when scheduling decides the combination for the antennas.
In this technique, all antennas are selected one by one. This UE selection for each antenna is carried out sequentially to other antennas. In the case shown in Fig. 2(c), antenna A is selected again, and the throughputs of UE#1, UE#2, and UE#3 are calculated again because the interference powers from antenna B and antenna C change.
In this way, the combination is always updated so that the system throughput increases. This increases the system throughput monotonically as the number of iterations increases.
3. Hardware acceleration technique
The hardware acceleration technique accelerates the calculation of the system throughput for each combination during the search procedure. The search procedure finds the optimal combination to achieve higher system throughput by iterating the processing for improving the combination. As shown in Fig. 3, the conventional scheme (without acceleration) requires a longer processing time to obtain the optimal combination. In contrast, the system throughput is quickly improved in the acceleration technique. In this way, the combination that achieves higher system throughput is obtained within the required period.
The hardware acceleration technique for the scheduling process is shown in Fig. 4. In this technique, the search process is executed by a dedicated circuit in order to increase the number of searched combinations, which improves the scheduling performance. The software sets possible UEs for each antenna in the hardware accelerator before starting the search process. The details of the hardware acceleration technique are described below.
The flow of the search process performed by the hardware accelerator is shown in Fig. 5. First, the combinations of antennas and UEs are generated. Next, the system throughput is calculated by summing the throughputs of all the UEs in the generated combinations. Then, the combination for which the scheduling achieves higher system throughput is decided by comparing the system throughput for each combination. These three steps are iterated until the scheduling period expires so that the optimal combination can be obtained.
We investigated the processing time for each step in order to clarify the steps that should be accelerated in the above search process. The investigation by software-based processing revealed that the system-throughput calculation accounts for more than 90% of the processing time to execute the search process. On the basis of the results, we devised a parallel and pipeline processing technique to accelerate the system-throughput calculation.
A block diagram of the circuit is depicted in Fig. 6. The circuit comprises three parts: a combination-generation part that outputs the combination of antennas and UEs, a system-throughput-calculation part, and a combination-decision part that decides the combination by comparing the system throughput for each combination. Our proposed technique consists of two kinds of processing: parallel processing to calculate the throughput for all UEs simultaneously and pipeline processing to obtain the system throughput for generated combinations at every clock cycle.
The system-throughput-calculation part consists of multiple throughput-calculation blocks that output the throughputs of the UEs in parallel. The throughput-calculation blocks are provided with the same number of antennas. The throughput-summation block outputs the system throughput by summing the throughputs of the UEs. The throughputs of the UEs are simultaneously calculated at the throughput-calculation blocks. Hence, the circuit executes the search process at high speed. Furthermore, the circuit scale can be minimized by optimizing a parallel number for the same number of antennas.
The timing chart of the system-throughput-calculation block is shown in Fig. 7. In this block, the received signal power to interference power and noise ratio (SINR) is calculated. Next, the calculated SINR is converted to the throughput. Then the throughputs of the UEs are summed. These steps are independent of the preceding and the following combinations. Therefore, these steps are executed in the pipeline. This enables the scheduler to obtain the system throughput for generated combinations at every clock cycle.
4. Performance evaluation
We carried out experimental measurements and system-level simulations in order to evaluate the performance of the acceleration technique and the search process.
To verify the number of searched combinations within the scheduling period of 1 ms, we measured the processing time spent for the search process. The proposed technique was implemented on a field-programmable gate array (FPGA) (Xilinx Zynq-7045) at the clock frequency of 100 MHz. The processing time was measured with the FPGA. The processing time without acceleration was measured with a general-purpose processor (Intel Core i5) at the clock frequency of 2.67 GHz.
We carried out system-level simulations to evaluate the performance of the proposed scheme. The performance was evaluated in practical conditions based on the small-cell scenario in LTE (Long-Term Evolution) specifications . The simulation conditions are listed in Table 1. The simulation conditions were based on an assumption of ultra-high-density distributed antenna systems, so 32 antennas were uniformly distributed in a circle with a radius of 155 m. The minimum distance between antennas was 20 m.
The results of the performance evaluation are given in Table 2. The processing time per searched combination measured with acceleration was 10 ns. The processing time without acceleration was 596 ns. The circuit executed the search process about 60 times faster than without acceleration. These results indicate that the number of searched combinations within the scheduling period of 1 ms using the proposed technique was 105, and the number was 1679 in the processing without acceleration. These results show that the number of searched combinations with the proposed technique is about 60 times larger than without the acceleration.
Furthermore, we carried out system-level simulations to evaluate the system throughput. The acceleration technique improved the system throughput by about 73% when there were 32 antennas and 256 UEs. Consequently, the proposed technique enables the scheduler to obtain the appropriate combination in ultra-high-density distributed antenna systems.
In this article, we proposed a hardware acceleration technique that accelerates the search process in the scheduling of ultra-high-density distributed antenna systems. Our technique consists of parallel processing to calculate the throughputs of each UE simultaneously and pipeline processing to obtain the system throughput for combinations at every clock cycle. As a result, it performs the search process 60 times faster than processing without acceleration. The proposed technique enables the scheduler to substantially increase the number of searched combinations. Consequently, the proposed technique enables the scheduler to obtain the appropriate combination in ultra-high-density distributed antenna systems. With the acceleration techniques, the scheduler improved the system throughput by about 73% when there were 32 antennas and 256 UEs. The scheduling with the proposed technique therefore enables a practical 5G system.
This article includes part of the results of “The research and development project for realization of the fifth-generation mobile communications system” commissioned by The Ministry of Internal Affairs and Communications, Japan.
Intel Core is a trademark of Intel Corporation or its subsidiaries in the United States and/or other countries.