

Feature Articles: Interdisciplinary R&D of Big Data Technology at Machine Learning and Data Science Center From Multidimensional Mixture Data Analysis to Spatiotemporal Multidimensional Collective Data AnalysisAbstractThis article introduces a multidimensional mixture data analysis technique that can efficiently extract significant features that transect different types of data with multiple attributes such as application logs available on the Web and sensor data collected from IoT (Internet of Things) sensors. The basic algorithm and an example of an application to review site data analysis are described. Spatiotemporal data modeling and an extension of a spatiotemporal multidimensional collective data analysis technique for predicting the time and place of nearfuture events are also explained. Keywords: data mining, machine learning, spatiotemporal analysis 1. IntroductionAs the use of ecommerce sites, smartphone apps, the Internet of Things (IoT) devices, and other such services and devices increases, various types of data related to the behavior of people and the movement of things are being generated. For example, the purchase log of an ecommerce site includes data such as the product item purchased, the time and place, and the customer age group and gender. Characteristic customer class and product groups are extracted by analyzing the huge amount of accumulated data (Fig. 1(a)). Another recent trend is application programs, mainly for smartphones, that obtain user location data from GPS (Global Positioning System), WiFi, and beacons with user consent. Combining such location data with the purchase log information described above enables finer analysis that includes more details concerning the purchase circumstances, such as whether the purchase was made while the purchaser was out or at home. Previous methods for extracting such customer classes and product groups have used cross tabulation, which, for example, aggregates the number of items sold for each combination of place and product, and represents the results in table format or as a matrix (Fig. 1(b)). The type of shop and the product item are referred to as attributes or axes. In the example shown in Fig. 1, there are two attributes (place and item), so the data are said to have two axes. The attributes can have various values because the place may be a supermarket, convenience store, or other shop, and the items might be coffee, tea, or another product. From the aggregation results, it is possible to determine the trend in a single attribute value, such as what items have high sales numbers for a particular place, or the places where a particular item is sold. However, the results become more complex as the number of possible attribute values increases. Because the data focus on combinations that have high aggregate numbers, it is difficult to extract features such as “Sales of black tea are relatively high, even for vending machines,” as indicated by the red circle in Fig. 1(b).
2. Multidimensional data analysisOne way to overcome the problem described above is multidimensional data analysis, in which a matrix that represents twodimensional aggregate data is decomposed to the sum of K matrices (Fig. 2). The data represented by the K individual matrices T_{1} to T_{K} are called clusters. The aggregate values of individual clusters obtained by multidimensional data analysis are biased toward a particular place or item, so it is easier to identify trends in individual clusters. For example, the items that are often sold in vending machines can be extracted in cluster 1; cluster 2 reveals that most bread is sold in supermarkets, and only about half that amount is sold in convenience stores; and cluster K indicates that the top item in supermarket sales is milk.
Our concern here is the method used in matrix decomposition. We take the original aggregate data as an I × J matrix T and decompose it such that it can be approximated as the sum of K matrices T_{1} to T_{K} (T ≅ T_{1} + T_{2} +…+ T_{K}). In the example described above, I is the number of places and J is the number of items. In that case, the individual matrices that result from the decomposition, T_{i} (i = 1 ... K ), are also I × J matrices. Although the way the matrix is decomposed is arbitrary, multidimensional data analysis decomposes the original matrix so that each resulting matrix T_{i} (i = 1 ... K ) is the product of I × 1 column vectors a_{i} and 1 × J row vectors b_{i} (T_{i} = a_{i} × b_{i}), where we impose the constraint that the elements of each vector take nonnegative values (0 or higher). The above decomposition can then be written as T_{1} + T_{2} + …+ T_{K} = a_{1} × b_{1} + a_{2} × b_{2} + ... + a_{K} × b_{K} = (a_{1} a_{2} ... a_{K}) × (b_{1} b_{2} ... b_{K}) = A × B. This is nothing other than an approximation of the original I × J matrix T as the product of I × K matrix A and K × J matrix B (T ≅ A × B ), where K is much smaller than I and J (K << I, J). Matrices A and B, which are called factor matrices, are obtained by initially setting their elements to random values and then reconstructing them such that A × B = T’, where the factor matrices A and B are updated so as to minimize the error between T’ and the original matrix T. This process is repeated until the error value converges and the final factor matrices A and B are obtained. In the analysis example described above, A is the factor that corresponds to place, and B is the factor that corresponds to item. This method is a machine learning technique referred to as nonnegative matrix factorization (NMF) and is commonly applied in data mining [1]. This approach can also be applied to highdimensional data (tensors) that involve three or more attribute values, in which case it is referred to as nonnegative tensor factorization (NTF). 3. Extension to multidimensional ‘mixture’ data analysisNTF can be used to extract latent features from data by decomposing a matrix (tensor) of data that aggregates various attribute combinations into the product of factor matrices. However, as variation in the attribute values increases, bias in the aggregate data is more likely to develop. Analyzing aggregate data to which a new attribute (user age group) is incorporated in addition to place and item attributes would make it possible to extract customer clusters that indicate trends based on user class, place, and purchased item. Nevertheless, the number of attribute combinations increases explosively as the number of attributes increases and the data become sparser, meaning that the aggregate values for most of the combinations are zero. The data aggregation for the case in which the purchase log shown in Fig. 1 involves four attributes (axes) comprises 240 million value combinations (e.g., 24 time values, one for each hour of the day × 100 place values × 1000 users × 100 products). Although it depends on the type of data, 99.9% of the elements for combinations of four or more attributes actually become zero. That is to say, less than 0.1% of all of the elements in the aggregate data have nonzero values, and it is difficult to obtain satisfactory results from factor analysis of such extremely sparse aggregate data. NTT laboratories have developed nonnegative multiple tensor factorization (NMTF) as a solution to mitigate the sparseness problem. Instead of aggregating the original data as a highdimensional tensor (five dimensions in this case), this approach constructs multiple tensors or matrices that represent nonsparse aggregate data (having mostly nonzero elements). The attributes (axes) are reduced to three or fewer, and the analysis is performed assuming that factor matrices share common attributes among multiple tensors or matrices (e.g., the customer axis in Fig. 3). Because multiple items of aggregate data are combined for the multidimensional analysis, this method is referred to as a multidimensional mixture data analysis technique.
4. Example using a Yelp data setTo evaluate the effectiveness of multidimensional mixture data analysis with NMTF, we applied it to an open data set collected by the Yelp review site in the United States (Fig. 4). Yelp data are available for research purposes [2] and contain a variety of review information on commercial facilities in the form of attributes, including the shop name, category, location, customers, rating, terms used in reviews, and day and time of checkins, which provide customers’ selfreported location via social networking service applications. From such data, it is possible to aggregate the following three types of tensor data, for example.
(1) Review tensor: user × shop × day of the week. The value is the number of reviews. (2) Checkin tensor: shop × time × location (longitude and latitude). The value is the number of checkins. (3) Term frequency tensor: user × category × term. The value is the termfrequency of each word that appears in a review. The tensors have the user and shop axes in common. By applying NMTF for the three tensors described above, we can obtain significant clusters that capture features with many attribute combinations such as “Weekend morning leisure activities” and “Meals at Japanese restaurants” (Fig. 5). From these clusters, it is possible to obtain results for areas where there are many facilities or restaurants, the day and time period when facilities are most often used, the geographic distribution, and the terms often used in reviews. The details of the analysis algorithm and results are presented in a published study [3]. In this way, NMTF can efficiently extract significant clusters from data with multidimensional attributes while mitigating the sparseness problem.
5. Future developmentAlthough multidimensional mixture data analysis by NMTF can efficiently extract latent feature clusters from data, other factors such as the temporal causeandeffect relationships among the elements of aggregated data or the spatial adjacency are not taken into account. In particular, the analysis of location data for users and vehicles can be based on an area divided into a spatial mesh and the number of persons and vehicles observed in each time period. However, even when NMTF is applied to such data, it is not possible to obtain analysis results that predict the times at which events such as congestion will occur in the future. For that reason, we plan to push forward with the development of innovative spatiotemporal multidimensional collective data analysis techniques that enable modeling of the temporal relationships of multidimensional data and prediction of future events. Spatiotemporal multidimensional collective data analysis considers four data aspects (time, space, multidimensional, and collective) for the purpose of gaining insight on nearfuture events (Fig. 6). The collective aspect is for estimating the spatiotemporal flow of people or traffic in cases where individuals cannot be recognized, such as when counting the number of persons or vehicles in a spatial mesh and only aggregate statistical data are available.
Looking toward 2020, we will continue with research and development on the use of spatiotemporal multidimensional collective data analysis techniques and realtime observation data to predict nearfuture events such as congestion in order to implement proactive navigation to relieve congestion at largescale event venues [4]. We will also investigate how this research can be applied to stabilize the communication infrastructure. References
