CN109685101B

CN109685101B - Multi-dimensional data self-adaptive acquisition method and system

Info

Publication number: CN109685101B
Application number: CN201811345413.8A
Authority: CN
Inventors: 蔺华庆; 闫峥
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2018-11-13
Filing date: 2018-11-13
Publication date: 2021-09-28
Anticipated expiration: 2038-11-13
Also published as: CN109685101A

Abstract

The invention belongs to the technical field of big data acquisition and discloses a self-adaptive acquisition method and system for multidimensional data. The method comprises the steps of reducing the dimension of multi-dimensional data by using a dimension reduction technology, and reducing the dimension of the multi-dimensional data to one dimension to obtain one-dimensional principal components of the multi-dimensional data; the method comprises the steps of utilizing one-dimensional principal components of original multi-dimensional data as reference data for judging data change, and inputting the reference data into a one-dimensional adaptive data acquisition algorithm; and adjusting the acquisition process of the multidimensional big data by using a one-dimensional self-adaptive data acquisition algorithm. Because the PCA in the dimension reduction technology utilizes the covariance of multidimensional data to reduce the dimension, and the acquisition frequency of the adjustment data in the one-dimensional data acquisition is also adjusted based on the change size of the data, the method is feasible, and experiments show the feasibility and the effectiveness of the method. The invention has wide application range, comprises all service scenes applying multidimensional large data acquisition, and can improve the performance of data acquisition on the basis of ensuring the data acquisition precision so as to improve the efficiency of application service.

Description

Multi-dimensional data self-adaptive acquisition method and system

Technical Field

The invention belongs to the technical field of big data acquisition, and particularly relates to a multi-dimensional data self-adaptive acquisition method and system.

Background

Currently, the current state of the art commonly used in the industry is such that: in current internet application scenarios, data is becoming more and more important. Data is the implementation basis for supporting many services, and data collection is the performance bottleneck of most data-related service systems. For example, in the field of network security, the network system protection is realized by collecting communication data and further analyzing the characteristics of the data to detect attacks and intrusions. However, in the big data era, data has 5V characteristics, and traditional data acquisition based on statistical sampling methods (periodic sampling, Poisson sampling and random sampling) cannot meet the current requirements. Further, with the development of artificial intelligence, intelligent business permeates the aspects of people's life, so that the current data collection is usually targeted to multidimensional data rather than one-dimensional data. In conclusion, the self-adaptive multi-dimensional big data acquisition method is an urgent problem to be solved in the current big data era.

Except for the traditional statistical sample-based data acquisition method. Adaptive acquisition methods for one-dimensional data have been proposed in prior work, such as predictive algorithms based on regression analysis and time series analysis. The data acquisition frequency can be adaptively adjusted, so that the data acquisition quantity is reduced and the data acquisition performance is improved on the basis of ensuring the data acquisition precision. However, these methods cannot be applied to multidimensional data, and cannot solve the problem of adaptive acquisition of multidimensional data. In the one-dimensional adaptive data acquisition algorithm, the adaptive acquisition adjustment process of data is based on the change of the data: when the data volume of the data is large, the data acquisition frequency is increased, and more data are acquired to ensure the data acquisition precision; and when the data volume of the data is small, the data acquisition frequency is reduced, and the burden of the application system for data acquisition is reduced. However, for multidimensional data, which kind of data in multiple dimensions should be used as reference data in the adaptive adjustment process of data acquisition is an unsolved problem. No solution is given in the current work, i.e. there is no adaptive acquisition scheme for multi-dimensional data acquisition in the current research work.

In summary, the problems of the prior art are as follows: most of the current enterprises adopt traditional statistical sampling methods for collection, such as periodic, random, hierarchical and poisson sampling. It can directly acquire multi-dimensional data but cannot achieve adaptive acquisition. And at present, data acquisition aiming at data analysis is full acquisition. However, in the current big data age, the data volume is larger and larger, and adaptive sampling is needed to solve the problem of reducing the data acquisition volume. We propose an adaptive sampling method for multidimensional data. No adaptive acquisition method for multidimensional big data is proposed. However, the value of the multidimensional big data self-adaptive acquisition method in the future big data era is very high, and the bottleneck problem of data acquisition performance can be avoided, so that the realization of business is better supported.

The difficulty and significance for solving the technical problems are as follows: no relevant solution to the problem of multidimensional large data acquisition has been proposed in the current work. Data acquisition is currently typically accomplished using conventional sampling algorithms (periodic, random and poisson sampling). However, there is a problem that adaptive acquisition adjustment cannot be performed based on the context, thereby reducing the amount of acquired data and reducing the accuracy of data acquisition. However, the adaptive acquisition schemes based on regression prediction or time series analysis, which have been proposed at present, are directed to one-dimensional data and cannot be applied to multi-dimensional data. Because the problem of searching reference data in multi-dimensional data is not solved, namely none of the reference data is used for adjusting the data acquisition process, the self-adaptive acquisition of the multi-dimensional big data cannot be realized.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a multidimensional data self-adaptive acquisition method and system.

The invention is realized in such a way that a multidimensional data self-adaptive acquisition method realizes the self-adaptive adjustment of multidimensional data acquisition by utilizing a one-dimensional data self-adaptive acquisition algorithm and combining a dimension reduction technology, and realizes the acquisition of multidimensional data. The multidimensional data self-adaptive acquisition method comprises the following steps:

step one, using a dimension reduction technology: reducing the dimension of the multidimensional data of the acquired target to one dimension;

and step two, using one-dimensional principal components obtained by dimensionality reduction of the original multi-dimensional data as reference data for judging data change, adjusting the acquisition frequency of the multi-dimensional data, and realizing self-adaptive acquisition of the multi-dimensional data.

Further, multi-dimensional data self-adaptive collection under the current big data scene is achieved.

The multidimensional data self-adaptive acquisition method specifically comprises the following steps:

(1) suppose the goal of data acquisition is Y_i＝(y₁，y₂，y₃，...，y_n) Which is multidimensional data, y_j(j ═ 1, 2, 3.., n) is each dimension of the target data. Wherein i is t +1, t +2, t +3_r(ii) a Where t is a certain acquisition time point. Defining the predicted value of data as

Y_iIs the actual value. N is a radical of_rIs the actual number of acquisitions, N_pIs the number of predicted acquisitions, and N_p＝N_r；

(2) Reducing the dimension of the multidimensional data to a one-dimensional principal component as reference data, wherein a PCA algorithm is mainly utilized:

y_i＝PCA(Y_i)

(3) calculating the mean value of the actual value and the predicted value of the target data based on the one-dimensional principal components as follows:

(4) average ratio of predicted value to actual value using R_MRepresents: when R is_M1, indicating that the data has not changed substantially; when R is_MSignificantly greater than 1 and less than 1 represent a large change in the target data value.

Theoretically, the data change ratio can also be calculated by variance, R_DAnd (4) showing. The calculation process is as follows:

(5) based on the change ratio of the data, the specific adjustment method of the data acquisition process is as follows:

wherein T is_iRepresenting the current sampling interval, T_i-1Representing the previous sampling interval. T is_incRepresents an increased value of the sampling interval; t is_decRepresenting a reduced value of the sampling interval. Thr (Thr)_uAnd Thr_lIs a threshold value for judging data change. When R is_MGreater than Thr_lAnd is less than Thr_uRepresentative data change is small, acquisition interval should be increased by T_inc(ii) a When R is_MGreater than Thr_uOr less than Thr_lRepresentative data varies greatly, and the acquisition interval should be reduced by T_dec。T_mmaxAnd T_minRepresenting maximum and minimum values of the data acquisition interval, i.e. the adjustment of the data acquisition interval not exceeding T at the maximum_mmaxMinimum value of not less than T_minAs a constraint for data acquisition adjustment.

The invention also aims to provide a social network recommendation control system applying the multidimensional data adaptive acquisition method.

Another object of the present invention is to provide an intrusion detection system using the multidimensional data adaptive acquisition method.

The invention also aims to provide an asset portrait acquisition system applying the multidimensional data adaptive acquisition method.

The invention also aims to provide any business application system applying the multidimensional data self-adaptive acquisition method.

The invention provides a method for optimizing data acquisition in a big data scene. In the big data era, the data has the characteristics of large data volume, high flow rate and the like. Therefore, a more optimized method is needed to improve the performance of data collection to better support the implementation of services.

In summary, the advantages and positive effects of the invention are: original multidimensional data are subjected to dimension reduction to one-dimensional principal components through a dimension reduction technology, and a one-dimensional data self-adaptive acquisition algorithm is combined to realize the self-adaptive acquisition method of the multidimensional big data. The invention has wide application, including all scenes applying multidimensional big data acquisition. Such as network security, recommendation systems, social networks, etc. The invention realizes the self-adaptive multi-dimensional data acquisition method by reducing the dimension of the original multi-dimensional data and adjusting the data acquisition process by utilizing the one-dimensional principal component. The invention has the advantages and positive effects that: the method is characterized in that the reference data is automatically searched by using a dimensionality reduction technology (such as PCA), and the multidimensional data self-adaptive acquisition method is realized by combining a one-dimensional data self-adaptive acquisition method. The most important advantage is that the problem of collecting multidimensional data in a big data scene is solved. The method is simple to implement, can be applied to a plurality of scenes, relates to the field of multi-dimensional large data acquisition, and has low computational complexity of the dimensionality reduction technologies such as PCA (principal component analysis). With the development of the big data era, the invention can be applied to any scene needing to realize multi-dimensional data acquisition, and greatly reduces the data acquisition amount on the basis of ensuring the data acquisition precision, thereby improving the performance of big data acquisition and reducing the burden of the data acquisition operation on an application system. The data required by self-adaptive acquisition can be acquired under the scene of multi-dimensional big data acquisition requirements. The data collection has a wide range of applications, including recommendation systems, social networks, intrusion detection, and the like. Conventional data acquisition is generally based on statistical sampling methods. However, in the current big data era, data has 5V characteristics (Volume, Variety, Velocity, Value, Veracity). Meanwhile, due to the development of technologies such as artificial intelligence and the like, the requirement for acquiring required data in multiple dimensions is greatly increased; an adaptive method is needed to be found on the basis of not influencing the accuracy of multi-dimensional data, so that the data acquisition amount is greatly reduced, the burden of data acquisition on an application system is reduced, and the acquisition performance is improved. The invention mainly combines regression analysis and dimension reduction technology in machine learning to design a universal multidimensional data self-adaptive acquisition method, which can meet the acquisition requirement of a big data era and realize the self-adaptive acquisition of multidimensional data in a specific service scene.

Drawings

Fig. 1 is a flowchart of a multidimensional data adaptive acquisition method according to an embodiment of the present invention.

FIG. 2 is a schematic structural diagram of a multidimensional data adaptive acquisition system provided by an embodiment of the present invention;

FIG. 3 is a schematic diagram of the results of a principle contrast plot of one-dimensional data acquisition and multi-dimensional data acquisition provided by the present invention.

FIG. 4 is a schematic diagram illustrating the results of the steps of multi-dimensional data acquisition provided by the present invention.

Fig. 5 is a schematic diagram of an adaptive data acquisition result of a one-dimensional principal component according to an embodiment of the present invention.

Fig. 6 is a schematic diagram of a result of adaptive data acquisition of memory data according to an embodiment of the present invention.

Fig. 7 is a schematic diagram of an adaptive data acquisition result of CPU occupancy data according to an embodiment of the present invention.

Fig. 8 is a schematic diagram of a result of adaptive data acquisition of battery capacity data according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

On the basis of ensuring the multi-dimensional data acquisition precision, the invention greatly reduces the data acquisition amount, thereby preventing the data acquisition from influencing the normal operation of an application system and improving the multi-dimensional data acquisition performance.

The following detailed description of the principles of the invention is provided in connection with the accompanying drawings.

As shown in fig. 1, the multidimensional data adaptive acquisition method provided by the embodiment of the present invention includes the following steps:

s101: reducing the dimension of a plurality of data, namely multidimensional data, to one dimension, namely obtaining one-dimensional principal components of the multidimensional data;

s102: and using the one-dimensional principal component of the original multi-dimensional data as reference data for judging data change, and adjusting the acquisition frequency of the multi-dimensional data.

As shown in fig. 2, the multidimensional data adaptive acquisition method provided by the embodiment of the present invention mainly expands one-dimensional adaptive data acquisition four times, so that the multidimensional data adaptive acquisition method can be applied to a multidimensional data acquisition scene. Or the multidimensional big data acquisition problem is simplified into the one-dimensional data acquisition problem by combining the dimension reduction technology.

The present invention mainly solves the problems that: under a big data scene, the problem that the application system is burdened due to the large acquisition amount of big data is solved; the problem of determining reference data when the one-dimensional data acquisition algorithm acquires the self-adaptive acquisition adjustment of multi-dimensional data. The main objective of the method is to greatly reduce the collected data volume on the basis of ensuring certain collection precision for multidimensional data, thereby improving the performance of data collection and preventing the normal operation of an application system from being influenced by the operation of data collection. For the adaptive acquisition problem of one-dimensional data, some research work has been proposed so far, and most of the research work is to predict the change of data by using technical means such as regression analysis and time series analysis (e.g., Auto-Regressive Moving Average Model), and adjust the data acquisition process according to the change of the one-dimensional data. At the moment of large data volume, the frequency of data acquisition is improved, so that more data are acquired; and at the moment of small data volume, the frequency of data acquisition is reduced, and the data required to be acquired is reduced. The invention can ensure the precision of data acquisition, reduce the data acquisition amount, improve the data acquisition performance and be suitable for large data scenes. However, in the current collection environment, especially when processing data by machine learning and data mining techniques, multiple data in the same time dimension need to be collected, that is, multiple data, that is, multiple features, that is, multi-dimensional data, should be collected at the same time. However, if a plurality of data are acquired simultaneously, it is a difficult problem to adjust the acquisition frequency of the multidimensional data according to which data should be used as reference data, that is, according to which data change.

Dimension reduction is performed on a plurality of data, that is, multidimensional data, by using a dimension reduction technique (for example, PCA (Principal Component Analysis)), and the multidimensional data is reduced to one dimension, that is, a one-dimensional Principal Component of the multidimensional data is obtained. Then, one-dimensional principal components of the original multi-dimensional data are used as reference data for judging data change, and the acquisition frequency of the multi-dimensional data is further adjusted; PCA uses covariance of multidimensional data to reduce dimension, and the acquisition frequency of the data is adjusted based on the variation of the data. As shown in fig. 3, the invention reduces the dimension of the multidimensional data to the one-dimensional principal component by comparing the problem of the adaptive acquisition of the one-dimensional data and combining the dimension reduction technology, and adjusts the acquisition process of the multidimensional data by using the one-dimensional principal component as the reference data and the change of the one-dimensional principal component.

When one-dimensional data is acquired, the adaptive data acquisition algorithm adjusts the acquisition frequency of the data according to the change of the one-dimensional data. But for the acquisition of multi-dimensional data it is a difficult problem on which data the acquisition system should be adjusted. For the problem of acquisition of multidimensional data, two solutions are considered: 1. in the collected multi-dimensional data, a main data is found as reference data for adaptive adjustment, namely, when the collection frequency is adjusted in one-dimensional adaptive collection, the frequency of multi-dimensional data collection is adjusted according to the change of the main data. However, this may be inaccurate, and firstly, how to determine a main data in multiple data is not possible to design a general selection method for main data because the main data to be selected is different in different business scenarios and a large amount of statistical calculation needs to be performed on each kind of data in the multidimensional data in each business scenario to obtain the main data. Furthermore, the main data may not necessarily reflect the variation trend of other data, in which case the accuracy of the acquired data is low, and the requirement of multidimensional large data acquisition cannot be met. This second solution was thus designed and is also the core of the present invention. 2. Dimension reduction is performed on a plurality of data, that is, multidimensional data, by using a dimension reduction technique (for example, PCA (Principal Component Analysis)), and the multidimensional data is reduced to one dimension, that is, a one-dimensional Principal Component of the multidimensional data is obtained. Then, the one-dimensional principal component of the original multi-dimensional data is used as the reference data for judging the data change, and the acquisition frequency of the multi-dimensional data is further adjusted. This is reasonable because PCA uses the covariance of the multidimensional data to reduce the dimension, and the acquisition frequency of the data is adjusted based on the magnitude of the change in the data. The specific operation steps are shown in fig. 4: (1) reducing the dimension of original multi-dimensional data to a one-dimensional principal component by PCA; (2) inputting the one-dimensional principal component as reference data and the original multi-dimensional data as original data into a one-dimensional adaptive data acquisition algorithm; (3) and acquiring multidimensional data by using a one-dimensional adaptive data acquisition algorithm. The conventional sampling method can reduce the data acquisition amount, but cannot realize the adjustment of the adaptive data acquisition, so that the one-dimensional data acquisition algorithm used in the multi-dimensional data acquisition scheme can adopt any one-dimensional adaptive data acquisition algorithm such as regression analysis or time series analysis.

The application effect of the present invention will be described in detail with reference to the simulation.

The experiment is simulated at the mobile terminal, and mainly considers the conditions of simultaneously acquiring four data, including system memory occupation, system CPU occupation, total battery capacity and CPU temperature. Firstly, using PCA to reduce the dimension of four-dimensional data to a one-dimensional principal component; and adjusting the multi-dimensional data acquisition frequency by using a one-dimensional data self-adaptive acquisition algorithm (regression analysis) to obtain an experimental simulation result, and analyzing.

As shown in fig. 4 to 8, the experimental result includes an adaptive acquisition result of the extracted one-dimensional principal component, and an adaptive acquisition result of the multidimensional data including four kinds of data, such as memory usage, CPU usage, battery capacity, and CPU temperature, which are adjusted based on changes in the principal component. As can be seen from the experimental results, the trend of each data can be reflected by the trend of the one-dimensional principal component. The result of the separate acquisition of the four data is ideal, and the acquisition process of the four data is adjusted based on the variation trend of the principal component, so that the PCA is feasible for the self-adaptive acquisition of the multidimensional data/variable/attribute/characteristic.

Before describing the present invention in detail, first, a one-dimensional adaptive data acquisition algorithm ACFAS _ par (adaptive collecting frequency response adjusting length basis predicted acquired ratio) is described. This is the previous work. The basic principle is to adjust the frequency of data acquisition based on data changes, thereby enabling adaptive sampling. The data change size may be represented by calculating a difference between an actual value of the data and a predicted value of the data. When the actual value of the data is very close to the predicted value, the current data change is very small, the acquisition frequency needs to be reduced, and the content of data acquisition is reduced; when the difference value between the actual value and the predicted value of the data is large, the current data change greatly, and the acquisition frequency needs to be increased, so that the data acquisition content is increased, and the data acquisition precision is improved.

The multidimensional data self-adaptive acquisition specifically comprises the following steps:

(1) the target of data acquisition is Y_i＝(y₁，y₂，y₃，y₄) It is four-dimensional data. Suppose y₁Is memory footprint, y₂Is CPU busy, y₃Is the amount of battery, y₄Is the CPU temperature. Wherein i is t +1, t +2, t +3_r(ii) a Where t is a certain acquisition time point. Defining the predicted value of data as

(2) Reducing the dimension of the multidimensional data to a one-dimensional principal component as reference data, wherein the four-dimensional data is reduced to the one-dimensional principal component by mainly utilizing a PCA algorithm:

y_i＝PCA(Y_i)

wherein T is_iRepresenting the current sampling interval, T_i-1Representing the previous sampling interval. T is_incRepresents an increased value of the sampling interval; t is_decRepresenting a reduced value of the sampling interval. Thr (Thr)_uAnd Thr_lIs a threshold value for judging data change. When R is_MGreater than Thr_lAnd is less than Thr_uRepresentative data change is small, acquisition interval should be increased by T_inc(ii) a When R is_MGreater than Thr_uOr less than Thr_lRepresentative data varies greatly, and the acquisition interval should be reduced by T_dec。T_mmaxAnd T_minRepresenting maximum and minimum values of the data acquisition interval, i.e. the adjustment of the data acquisition interval not exceeding T at the maximum_maxMinimum value of not less than T_minAs a constraint for data acquisition adjustment.

The implementation pseudo code of the multidimensional data self-adaptive acquisition method provided by the invention is as follows:

the above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A social network recommendation control system for implementing a multidimensional data self-adaptive acquisition method is characterized in that the social network recommendation control system for implementing the multidimensional data self-adaptive acquisition method is used for implementing the multidimensional data self-adaptive acquisition method, and the multidimensional data self-adaptive acquisition method utilizes a one-dimensional data self-adaptive acquisition algorithm and combines a dimension reduction technology to realize the self-adaptive adjustment of multidimensional data acquisition and realize the acquisition of multidimensional data; the multidimensional data self-adaptive acquisition method mainly comprises the following processes:

step two, using one-dimensional principal components obtained by dimensionality reduction of original multidimensional data as reference data for judging data change, adjusting the acquisition frequency of the multidimensional data, and realizing self-adaptive acquisition of the multidimensional data;

realizing the self-adaptive acquisition of multi-dimensional data under the current big data scene;

(1) the target of data acquisition is Y_i＝(y₁，y₂，y₃，...，y_n) Which is multidimensional data, y_j(j ═ 1, 2, 3.., N) is each dimension of the target data, where i is t +1, t +2, t + 3.., t + N_r(ii) a Wherein t is a certain collection time point, and the predicted value of the data is defined as

Y_iIs the actual value, N_rIs the actual number of acquisitions, N_pIs the number of predicted acquisitions, and N_p＝N_r；

y_i＝PCA(Y_i)

(4) average ratio of predicted value to actual value using R_MRepresents: when R is_M1, indicating that the data has not changed substantially; when R is_MSignificantly greater than 1 and less than 1 indicates a large change in the target data value;

theoretically, the data change ratio can also be calculated by variance, R_DExpressed, the calculation process is as follows:

s.t.

wherein T is_iRepresenting the current sampling interval, T_i-1Representing the previous sampling interval，T_incRepresents an increased value of the sampling interval; t is_decRepresenting a reduced value of the sampling interval, Thr_uAnd Thr_lIs a threshold value for judging data change when R is_MGreater than Thr_lAnd is less than Thr_uRepresentative data change is small, acquisition interval should be increased by T_inc(ii) a When R is_MGreater than Thr_uOr less than Thr_lRepresentative data varies greatly, and the acquisition interval should be reduced by T_dec；T_maxAnd T_minRepresenting maximum and minimum values of the data acquisition interval, i.e. the adjustment of the data acquisition interval not exceeding T at the maximum_maxMinimum value of not less than T_minAs a constraint condition for data acquisition adjustment;

each parameter in the above formula specific calculation process should be determined based on the statistical distribution characteristics of the collected target data in the business.

2. An intrusion detection system using the multidimensional data adaptive collection method in the social network recommendation control system implementing the multidimensional data adaptive collection method according to claim 1.

3. A service quality evaluation system of the multidimensional data adaptive acquisition method in the social network recommendation control system for implementing the multidimensional data adaptive acquisition method according to claim 1.

4. A trust management system of the multidimensional data adaptive acquisition method in the social network recommendation control system for implementing the multidimensional data adaptive acquisition method as claimed in claim 1.

5. Any business application system of the multidimensional data adaptive acquisition method in the social network recommendation control system for implementing the multidimensional data adaptive acquisition method as claimed in claim 1 is applied.