WO2023179593A1 - 数据处理方法及装置 - Google Patents

数据处理方法及装置 Download PDF

Info

Publication number
WO2023179593A1
WO2023179593A1 PCT/CN2023/082740 CN2023082740W WO2023179593A1 WO 2023179593 A1 WO2023179593 A1 WO 2023179593A1 CN 2023082740 W CN2023082740 W CN 2023082740W WO 2023179593 A1 WO2023179593 A1 WO 2023179593A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
processed
classification
layer
category
Prior art date
Application number
PCT/CN2023/082740
Other languages
English (en)
French (fr)
Inventor
胡斌
王坚
刘文亮
李榕
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023179593A1 publication Critical patent/WO2023179593A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0499Feedforward networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting

Definitions

  • the present application relates to the field of artificial intelligence technology, and in particular, to a data processing method and device.
  • the autoencoder can replace the traditional communication transceiver design, model the transmitter and receiver using neural networks, learn the distribution of data through a large number of training samples, and predict the results.
  • the training of neural networks can be achieved through the back propagation (BP) algorithm.
  • the learning process of the BP algorithm consists of a forward propagation process and a back propagation process. In the forward propagation process, the input information passes through the input layer through the hidden layer, is processed layer by layer, and is transmitted to the output layer to obtain the stimulus response.
  • the backpropagation process calculates the difference between the excitation response and the corresponding expected target output as the objective function, and then calculates the partial derivative of the objective function to the weight of each neuron layer by layer, forming the gradient of the objective function to the weight vector, so that the weight can be modified. value.
  • the learning of the neural network is completed during the weight modification process. When the error reaches the expected value, the learning of the neural network ends.
  • the neural network implementation is regarded as a "black box", which cannot be widely recognized theoretically. At the same time, the gradient disappearance or gradient explosion caused during the execution of the BP algorithm has not yet been effectively solved. solve.
  • the embodiments of the present application disclose a data processing method and device, which can reduce communication overhead, make the feedforward neural network architecture more flexible, and can explain the black box problem of neural networks.
  • the first aspect of the embodiment of the present application discloses a data processing method, which includes: determining a feedforward neural network model, where the input information of the first layer in the feedforward neural network model includes classification distribution information of training data and first data.
  • the output information of the l-th layer includes second data features
  • the first data features are the output of the (l-1)-th layer, and both the first data features and the second data features are used for Represents the classification or clustering information of the training data; where l is a positive integer greater than 1; obtains the data to be processed of unknown classification or clustering information; inputs the data to be processed into the feedforward neural network model to determine
  • the data characteristics of the data to be processed; the data characteristics of the data to be processed are used to represent the classification or clustering information of the data to be processed; the data characteristics of the data to be processed are used to determine the characteristics of the data to be processed Classification or clustering results.
  • the method of the embodiment of the present application can reduce the communication overhead caused by training interaction and improve the training efficiency. It only requires the receiving end to train and then train a Task-related readout layer network, and the structure of the feedforward neural network is more flexible. The accuracy can be improved by increasing the number of network layers. That is, when the value of l is larger, the classification or clustering results of the data to be processed will be more accurate. Higher sex, avoid not being able to The problem of different network adaptations at both the transceiver and the receiver requires retraining. Moreover, the feedforward neural network model is interpretable and can explain the black box problem of the neural network, and the data characteristics of the output data to be processed can be used as data preprocessing and can be used for subsequent readout layer operations.
  • the dimensions of the data characteristics of the data to be processed are related to the data type of the data to be processed.
  • the input information of the first layer includes the classification distribution information of the training data and the training data.
  • the training data includes classification labels; the classification distribution information of the training data is determined based on the classification labels in the training data.
  • determining the feedforward neural network model includes: obtaining the first data feature Z l-1 ;
  • the classification distribution information ⁇ i determines the network parameters of the l-th layer; the second data feature is determined based on the first data feature Z l-1 and the network parameters of the l-th layer.
  • the second data feature is determined based on the first data feature Z l-1 and the network parameters of the l-th layer, including: based on the network parameters of the l-th layer and The first data feature Z l-1 determines the gradient expression of the objective function; the gradient expression of the objective function is determined based on the first data feature Z l-1 , the classification distribution information ⁇ i of the training data and the gradient expression of the objective function The second data feature Z l .
  • determining the network parameters of the l-th layer according to the first data feature Z l-1 and the classification distribution information ⁇ i of the training data includes: according to the A data feature Z l-1 and the classification distribution information ⁇ i of the training data determine the regularized autocorrelation matrix of each category corresponding to the classification label in the training data; according to each category corresponding to the classification label in the training data The regularized autocorrelation matrix of the category determines the network parameters of the l-th layer.
  • m i is the number of classification labels corresponding to the i-th category in the m training data
  • K is the number of all categories of classification labels in m training data
  • Z l-1 is the first data feature
  • ⁇ i is the classification distribution information of the training data
  • S i is the classification in the training data
  • is the regularization parameter
  • m i is the number of classification labels corresponding to the i-th category in the m training data
  • K is the number of all categories of classification labels in m training data
  • Z l-1 is the first data feature
  • ⁇ i is the classification distribution information of the training data
  • I is the unit matrix
  • S i is the training The autocorrelation matrix of the i-th category corresponding to the classification label in the data
  • is the regularization parameter
  • S is the autocorrelation matrix of all categories of the classification label in the training data
  • determining the network parameters of the l-th layer according to the first data feature Z l-1 and the classification distribution information ⁇ i of the training data includes: according to the training The classification distribution information ⁇ i of the data determines the gradient parameters; the network parameters of the l-th layer are determined according to the first data feature Z l-1 and the gradient parameters.
  • inputting the data to be processed into the feedforward neural network model to determine the data characteristics of the data to be processed includes: based on the data to be processed and the first layer The network parameters determine the classification distribution information corresponding to the expected classification label of the data to be processed. Classification distribution information corresponding to the data to be processed and the expected classification label of the data to be processed Determine the gradient expression of the objective function; determine the data characteristics of the data to be processed according to the data to be processed and the gradient expression of the objective function.
  • determining the classification distribution information corresponding to the expected classification label of the data to be processed based on the data to be processed and the network parameters of the first layer includes: based on the data to be processed The data and the network parameters of the first layer determine the projection of the predicted classification label of the data to be processed on the first category; the first category is one of the multiple categories corresponding to the predicted classification label of the data to be processed. Any category; determine the classification distribution information corresponding to the expected classification label of the data to be processed according to the projection of the data to be processed on the first category.
  • Z is the data to be processed, is the network parameter of the i-th category of the l-th layer, is the projection of the expected classification label of the data to be processed on the i-th category of the l-th layer; is the classification distribution information corresponding to the expected classification label of the data to be processed, and ⁇ is a hyperparameter that controls the estimation confidence.
  • the objective function gradient expression includes:
  • m i is the number of m data to be processed that is expected to have a classification label of the i-th category
  • K is the number of all categories with expected classification labels in m data to be processed
  • Z is the data to be processed
  • Z is the data to be processed
  • S i is the autocorrelation matrix of the i-th category corresponding to the predicted classification label of the data to be processed
  • the regularized autocorrelation matrix of the i-th category corresponding to the expected classification label of the data to be processed.
  • Z is the data to be processed, is the network parameter of the i-th category of the l-th layer, is the projection of the expected classification label of the data to be processed on the i-th category of the l-th layer; is the classification distribution corresponding to the expected classification label of the data to be processed Information, eta is a hyperparameter that controls the confidence of the estimate.
  • the objective function gradient expression includes:
  • m i is the number of m data to be processed that is expected to have a classification label of the i-th category
  • ⁇ i is a weight parameter used to balance the expected number of samples of each type in the data to be processed
  • Z is the data to be processed
  • Z is the data to be processed
  • Z is the data to be processed
  • S i is the autocorrelation matrix of the i-th category corresponding to the predicted classification label of the data to be processed
  • S is the autocorrelation matrix of all categories corresponding to the expected classification label of the data to be processed
  • the classification distribution information corresponding to the expected classification label of the data to be processed includes one or more of the following: distance information, correlation information, differential information or soft classification information.
  • determining the classification distribution information corresponding to the expected classification label of the data to be processed based on the data to be processed and the network parameters of the first layer includes:
  • Z is the data to be processed, is the classification distribution information corresponding to the expected classification label of the data to be processed, is the network parameter of the i-th category of the l-th layer
  • Z l is the data feature of the l-th layer of the data to be processed
  • Z l-1 is the data feature of the (l-1)-th layer of the data to be processed
  • ⁇ > means inner product.
  • determining the objective function gradient expression based on the classification distribution information corresponding to the data to be processed and the predicted classification label of the data to be processed includes: based on the predicted classification of the data to be processed Classification distribution information corresponding to classification labels Determine gradient parameters (G and Hi ); determine the objective function gradient expression according to the data to be processed and the gradient parameters.
  • the classification distribution information corresponding to the expected classification label of the data to be processed is Determine gradient parameters, including:
  • G [g 1 ,g 2 ,...,g i ];
  • Tr() represents the trace operation
  • I is the identity matrix
  • m i is the number of the i-th category in the m data to be processed
  • K is the number of all categories with expected classification labels in the m data to be processed
  • G and Hi represent the gradient parameters.
  • the gradient expression of the objective function includes:
  • Z is the data to be processed
  • is the Gaussian distribution variance
  • is the regularization parameter
  • I is the identity matrix
  • G and Hi represent the gradient parameters
  • represents the regularization parameter.
  • determining the data characteristics of the data to be processed based on the data to be processed and the gradient expression of the objective function includes:
  • Z l is the data characteristic of the data to be processed, is the gradient expression of the objective function, Z l-1 is the data to be processed, and Z l-1 is constrained in the d-1-dimensional unit sphere space.
  • the method further includes: outputting the data characteristics of the data to be processed.
  • the data to be processed of the unknown classification or clustering information is the data characteristics of the third data
  • the data characteristics of the third data are determined through another feedforward neural network
  • the input information of the l-th layer in another feedforward neural network includes the classification distribution information of the training data and the first data feature
  • the output information of the l-th layer includes the second data feature
  • the first data feature is the The output of the (l-1) layer
  • the first data feature and the second data feature are both used to represent the classification or clustering information of the training data, where l is a positive integer greater than 1.
  • the second aspect of the embodiment of the present application discloses a data processing device, which includes: a first determination unit, an acquisition unit and a second determination unit.
  • the first determination unit is used to determine a feedforward neural network model.
  • the feedforward neural network The input information of the l-th layer in the model includes the classification distribution information of the training data and the first data feature, the output information of the l-th layer includes the second data feature, and the first data feature is the (l-1)-th layer
  • the output of the first data feature and the second data feature are both used to represent the classification or clustering information of the training data; where l is a positive integer greater than 1; the acquisition unit is used to obtain the unknown classification Or the data to be processed of the cluster information; the second determination unit is used to input the data to be processed into the feedforward neural network model to determine the data characteristics of the data to be processed; the data characteristics of the data to be processed are Used to represent the classification or clustering information of the data to be processed; the data characteristics of the data to be processed are used to determine
  • the dimensions of the data characteristics of the data to be processed are related to the data type of the data to be processed.
  • the input information of the first layer includes the classification distribution information of the training data and the training data.
  • the training data includes classification labels; the classification distribution information of the training data is determined based on the classification labels in the training data.
  • the first determination unit is specifically configured to obtain the first data feature Z l-1 ; according to the first data feature Z l-1 and the classification distribution information of the training data ⁇ i determines the network parameters of the l-th layer; the second data feature is determined based on the first data feature Z l-1 and the network parameters of the l-th layer.
  • the first determination unit is specifically configured to determine the gradient expression of the objective function according to the network parameters of the l-th layer and the first data feature Z l-1 ; according to the first data feature Z l-1 , the classification distribution information ⁇ i of the training data and the gradient expression of the objective function determine the second data feature Z l .
  • the first determination unit is specifically configured to determine the classification label correspondence in the training data based on the first data feature Z l-1 and the classification distribution information ⁇ i of the training data.
  • the network parameters of the l-th layer are determined according to the regularized autocorrelation matrix of each category corresponding to the classification label in the training data.
  • m i is the number of classification labels corresponding to the i-th category in the m training data
  • K is the number of all categories of classification labels in m training data
  • Z l-1 is the first data feature
  • ⁇ i is the classification distribution information of the training data
  • S i is the classification in the training data
  • is the regularization parameter
  • the regularized autocorrelation matrix of the i-th category corresponding to the sign is the network parameter of the i-th category in the l-th layer.
  • m i is the number of classification labels corresponding to the i-th category in the m training data
  • K is the number of all categories of classification labels in m training data
  • Z l-1 is the first data feature
  • ⁇ i is the classification distribution information of the training data
  • I is the unit matrix
  • S i is the training The autocorrelation matrix of the i-th category corresponding to the classification label in the data
  • is the regularization parameter
  • S is the autocorrelation matrix of all categories of the classification label in the training data
  • the first determination unit is specifically configured to determine the gradient parameter according to the classification distribution information ⁇ i of the training data; determine the gradient parameter according to the first data feature Z l-1 and the gradient parameter The network parameters of the l-th layer.
  • the second determination unit is specifically configured to determine the classification distribution information corresponding to the expected classification label of the data to be processed according to the data to be processed and the network parameters of the first layer.
  • Classification distribution information corresponding to the data to be processed and the expected classification label of the data to be processed Determine the gradient expression of the objective function; determine the data characteristics of the data to be processed according to the data to be processed and the gradient expression of the objective function.
  • the second determination unit is specifically configured to determine the projection of the expected classification label of the data to be processed on the first category according to the data to be processed and the network parameters of the l-th layer.
  • the first category is any one of multiple categories corresponding to the predicted classification label of the data to be processed; the predicted classification label of the data to be processed is determined according to the projection of the data to be processed on the first category Corresponding classification distribution information.
  • Z is the data to be processed, is the network parameter of the i-th category of the l-th layer, is the projection of the expected classification label of the data to be processed on the i-th category of the l-th layer; is the classification distribution information corresponding to the expected classification label of the data to be processed, and ⁇ is a hyperparameter that controls the estimation confidence.
  • the objective function gradient expression includes:
  • m i is the number of m data to be processed that is expected to have a classification label of the i-th category
  • K is the number of all categories with expected classification labels in m data to be processed
  • Z is the data to be processed
  • Z is the data to be processed
  • S i is the autocorrelation matrix of the i-th category corresponding to the predicted classification label of the data to be processed
  • the regularized autocorrelation matrix of the i-th category corresponding to the expected classification label of the data to be processed.
  • Z is the data to be processed, is the network parameter of the i-th category of the l-th layer, is the projection of the expected classification label of the data to be processed on the i-th category of the l-th layer; is the classification distribution information corresponding to the expected classification label of the data to be processed, and ⁇ is a hyperparameter that controls the estimation confidence.
  • the objective function gradient expression includes:
  • m i is the number of m data to be processed that is expected to have a classification label of the i-th category
  • ⁇ i is a weight parameter used to balance the expected number of samples of each type in the data to be processed
  • Z is the data to be processed
  • Z is the data to be processed
  • Z is the data to be processed
  • S i is the autocorrelation matrix of the i-th category corresponding to the predicted classification label of the data to be processed
  • S is the autocorrelation matrix of all categories corresponding to the expected classification label of the data to be processed
  • the classification distribution information corresponding to the expected classification label of the data to be processed includes one or more of the following: distance information, correlation information, differential information or soft classification information.
  • Z is the data to be processed, is the classification distribution information corresponding to the expected classification label of the data to be processed, is the network parameter of the i-th category of the l-th layer
  • Z l is the data feature of the l-th layer of the data to be processed
  • Z l-1 is the data feature of the (l-1)-th layer of the data to be processed
  • ⁇ > means inner product.
  • the second determination unit is specifically configured to perform classification distribution information corresponding to the expected classification label of the data to be processed.
  • G [g 1 ,g 2 ,...,g i ];
  • Tr() represents the trace operation
  • I is the identity matrix
  • m i is the number of the i-th category in the m data to be processed
  • K is the number of all categories with expected classification labels in the m data to be processed
  • G and Hi represent the gradient parameters.
  • the gradient expression of the objective function includes:
  • Z is the data to be processed
  • is the Gaussian distribution variance
  • is the regularization parameter
  • I is the identity matrix
  • G and Hi represent the gradient parameters
  • represents the regularization parameter.
  • Z l is the data characteristic of the data to be processed, is the gradient expression of the objective function, Z l-1 is the data to be processed, and Z l-1 is constrained in the d-1-dimensional unit sphere space.
  • the data processing device further includes an output unit configured to output the data characteristics of the data to be processed.
  • the data to be processed of the unknown classification or clustering information is the data characteristics of the third data
  • the data characteristics of the third data are determined through another feedforward neural network
  • the input information of the l-th layer in another feedforward neural network includes the classification distribution information of the training data and the first data feature
  • the output information of the l-th layer includes the second data feature
  • the first data feature is the The output of the (l-1) layer
  • the first data feature and the second data feature are both used to represent the classification or clustering information of the training data, where l is a positive integer greater than 1.
  • the third aspect of the embodiment of the present application discloses a data processing device.
  • the device includes at least one processor and a communication interface.
  • the at least one processor calls a computer program or instructions stored in a memory to implement any of the above aspects. Methods.
  • the fourth aspect of the embodiments of the present application discloses a chip system.
  • the chip system includes at least one processor and a communication interface.
  • the at least one processor is used to execute computer programs or instructions to implement the method described in any of the above aspects. .
  • the fifth aspect of the embodiment of the present application discloses a computer-readable storage medium.
  • Computer instructions are stored in the computer-readable storage medium. When the computer instructions are run on a processor, to implement any of the above aspects Methods.
  • a sixth aspect of the embodiment of the present application discloses a computer program product.
  • the computer program product includes computer program code.
  • the method described in any of the above aspects is implemented.
  • the seventh aspect of the embodiment of the present application discloses a data processing system, which includes: the device described in the second aspect.
  • Figure 1 is an end-to-end communication network architecture diagram provided by an embodiment of the present application.
  • Figure 2 is a schematic structural diagram of an artificial intelligence main body framework provided by an embodiment of the present application.
  • Figure 3 is a schematic diagram of a system architecture provided by an embodiment of the present application.
  • Figure 4 is a schematic structural diagram of a feedforward neural network provided by an embodiment of the present application.
  • Figure 4A is a schematic diagram of the first layer calculation process in the feedforward neural network during the training process provided by the embodiment of the present application;
  • Figure 4B is a schematic diagram of the second layer calculation process in the feedforward neural network during the training process provided by the embodiment of the present application;
  • Figure 4C is a schematic diagram of the third layer calculation process in the feedforward neural network during the training process provided by the embodiment of the present application;
  • Figure 4D is a schematic diagram of the first layer calculation process in the feedforward neural network during the deduction process provided by the embodiment of the present application;
  • Figure 4E is a schematic diagram of the second layer calculation process in the feedforward neural network during the deduction process provided by the embodiment of the present application;
  • Figure 4F is a schematic diagram of the third layer calculation process in the feedforward neural network during the deduction process provided by the embodiment of the present application.
  • Figure 5 is a schematic diagram of a chip hardware structure provided by an embodiment of the present application.
  • Figure 6 is a schematic flowchart of a data processing method provided by an embodiment of the present application.
  • Figure 7 is a schematic diagram of the calculation process of the l-th layer in the feedforward neural network model during the training process provided by the embodiment of the present application;
  • Figure 8 is a schematic diagram of the calculation process of the l-th layer in the feedforward neural network model during the training process provided by the embodiment of the present application;
  • Figure 9 is a schematic diagram of the calculation process of the l-th layer in the feedforward neural network model during the training process provided by the embodiment of the present application.
  • Figure 10 is a schematic flowchart of a data processing method provided by an embodiment of the present application.
  • Figure 11 is a schematic diagram of the calculation process of the first layer in the feedforward neural network model during the deduction process provided by the embodiment of the present application;
  • Figure 12 is a schematic diagram of the calculation process of the first layer in the feedforward neural network model during the deduction process provided by the embodiment of the present application;
  • Figure 13 is a schematic diagram of the calculation process of the first layer in the feedforward neural network model during the deduction process provided by the embodiment of the present application;
  • Figure 14 is a schematic diagram of a multi-view scene provided by an embodiment of the present application.
  • Figure 15 is a schematic diagram of a multi-node scenario provided by an embodiment of the present application.
  • Figure 16A is a schematic diagram of the results obtained from training using formula (1) as the objective function provided by the embodiment of the present application;
  • Figure 16B is a schematic diagram of the results obtained by training with MSE as the objective function provided by the embodiment of the present application.
  • Figure 17 is a schematic diagram of a data processing device provided by an embodiment of the present application.
  • Figure 18 is a schematic diagram of a data processing device provided by an embodiment of the present application.
  • any embodiment or design method described as “exemplary” or “for example” in the embodiments of the present application shall not be construed as being better or more advantageous than other embodiments or designs. Rather, the use of the words “exemplarily” or “for example” is intended to present the relevant concepts in a concrete manner.
  • “A and/or B” means A and B, and A or B.
  • “A, and/or B, and/or C” means any one of A, B, and C, or means any two of A, B, and C, or means A, B, and C.
  • the autoencoder can replace the traditional communication transceiver design, model the transmitter and receiver using neural networks, learn the distribution of data through a large number of training samples, and predict the results, as shown in Figure 1.
  • Figure 1 shows An end-to-end communication network architecture, this end-to-end learning method can achieve joint optimization.
  • the training of neural networks can be achieved through the back propagation (BP) algorithm.
  • the learning process of the BP algorithm consists of a forward propagation process and a back propagation process. composition. In the forward propagation process, the input information passes through the input layer, through the hidden layer, is processed layer by layer and transmitted to the output layer.
  • the BP algorithm requires that the activation functions of artificial neurons (or "nodes") are differentiable. However, in the BP algorithm, there is no corresponding theoretical guidance for the selection of the number of network layers and neurons, and when the network structure is changed, retraining is required.
  • the neural network implementation is regarded as a "black box", which cannot be widely recognized theoretically.
  • the gradient disappearance or gradient explosion caused during the execution of the BP algorithm has not yet been effectively solved. solve.
  • the channel is usually used as a hidden layer in a network, and the channel must be differentiable. This condition may not be met for channels in actual scenarios.
  • a neural network based on random features can be an extreme learning machine (ELM).
  • ELM is a typical feedforward neural network learning algorithm.
  • the network usually has single or multi-layer hidden layer nodes, in which the parameters of the hidden layer nodes do not need to be adjusted, and the weights from the hidden layer to the output layer only need to be determined by solving a linear system of equations, so the calculation speed can be improved.
  • the algorithm has good generalization performance, and its learning speed is 1,000 times faster than training using the BP algorithm. However, in order to obtain a sufficient number of features to characterize the original data, a wider hidden layer is usually required.
  • the neural network based on metric representation can be a neural network training method based on the Hilbert-Schmidt independence criterion (HSIC).
  • HSIC Hilbert-Schmidt independence criterion
  • This method is trained using an approximate information bottleneck method and requires the maximum Maximize the mutual information between the hidden layer and the label, and at the same time minimize the mutual dependence between the hidden layer representation and the input.
  • the calculation of mutual information is more difficult in random variables, so the HSIC criterion based on the non-parametric kernel method is used, which is more complex than the algorithm based on BP.
  • embodiments of the present application propose a data processing method and provide a feedforward neural network model, thereby reducing the communication overhead of the transceiver caused by the BP algorithm training interaction and improving the training efficiency.
  • the number of network layers is adjusted to improve training accuracy and avoid the problem of retraining due to different network adaptations of different transceivers.
  • Figure 2 shows a structural schematic diagram of the artificial intelligence main framework.
  • the following is from the “intelligent information chain” (horizontal axis) and “IT value chain” ( The above artificial intelligence subject framework is elaborated on the two dimensions of vertical axis).
  • the "intelligent information chain” reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, the data has gone through the condensation process of "data-information-knowledge-wisdom".
  • the "IT value chain” reflects the value that artificial intelligence brings to the information technology industry, from the underlying infrastructure of human intelligence and information (providing and processing technology implementation) to the systematic industrial ecological process.
  • Infrastructure provides computing power support for artificial intelligence systems, enables communication with the external world, and supports it through basic platforms.
  • computing power is provided by smart chips (hardware acceleration chips such as CPU, GPU, NPU, ASIC, FPGA, etc.);
  • the basic platform includes distributed computing framework and network and other related platform guarantees and support, which can include cloud storage and Computing, Internet, etc.
  • sensors communicate with the outside world to obtain data, which are provided to smart chips in the distributed computing system provided by the basic platform for calculation.
  • Data from the upper layer of the infrastructure is used to represent data sources in the field of artificial intelligence.
  • the data involves graphics, images, voice, and text, as well as IoT data of traditional equipment, including business data of existing systems as well as force, displacement, and fluid. Sensing data such as location, temperature, humidity, etc.
  • Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making and other methods.
  • machine learning and deep learning can perform symbolic and formal intelligent information modeling, extraction, preprocessing, training, etc. on data.
  • Reasoning refers to the process of simulating human intelligent reasoning in computers or intelligent systems, using formal information to perform machine thinking and problem solving based on reasoning control strategies. Typical functions are search and matching.
  • Decision-making refers to the process of decision-making after intelligent information is reasoned, and usually provides functions such as classification, sorting, and prediction.
  • some general capabilities can be formed based on the results of further data processing, such as algorithms or a general system, such as translation, text analysis, computer vision processing, speech recognition, and image processing. identification, etc.
  • Intelligent industries and industry applications refer to the products and applications of artificial intelligence systems in various fields. They are the encapsulation of overall artificial intelligence solutions, productizing intelligent information decision-making and realizing practical applications. Its application fields mainly include: intelligent terminals, intelligent transportation, Smart healthcare, autonomous driving, safe cities, etc.
  • the embodiments of this application are mainly used in driving assistance, autonomous driving, mobile phone terminals and other fields.
  • Application scenario 1 Advanced driver assistance system (ADAS)/autonomous driving solution (ADS)
  • ADAS Advanced driver assistance system
  • ADS autonomous driving solution
  • ADAS and ADS multiple types of 2D target detection need to be performed in real time, including: dynamic obstacles (pedestrians, cyclists, tricycles, cars, trucks, buses) (Bus)), static obstacles (traffic cones (TrafficCone), traffic sticks (TrafficStick), fire hydrants (FireHydrant), motorcycles (Motocycle), bicycles (Bicycle)), traffic signs (TrafficSign), guide signs ( GuideSign), billboard, red traffic light (TrafficLight_Red)/yellow traffic light (TrafficLight_Yellow)/green traffic light (TrafficLight_Green)/black traffic light (TrafficLight_Black), road sign (RoadSign)).
  • Application scenario 2 Image classification scenario
  • the object recognition device After acquiring the image to be classified, the object recognition device processes the objects in the image to be classified through the classification model trained based on the data processing method of the embodiment of the present application to obtain the category of the image to be classified, and then can determine the category of the image to be classified according to the object in the image to be classified.
  • the object category of the image to be classified is classified. For photographers, they take many photos every day, including animals, people, and plants. The method of this application can be used to quickly classify photos according to the content in the photos, which can be divided into photos containing animals, photos containing people, and photos containing plants.
  • the object recognition device After the object recognition device obtains the image of the product, it processes the image of the product using the classification model trained based on the data processing method of the embodiment of the present application to obtain the category of the product in the image of the product, and then classifies the product according to the category of the product.
  • the method of this application can be used to quickly complete the classification of goods, reducing time overhead and labor costs.
  • the camera will capture the face image, use the method in the embodiment of this application to extract features, and calculate the similarity with the image features of the ID document stored in the system. If the similarity is high, the verification is successful.
  • the method of this application can be used to quickly perform face verification.
  • feedforward neural networks are also a common recognition model. In scenarios that require simultaneous interpretation, real-time speech recognition and translation must be achieved. An efficient feed-forward neural network can bring a better experience to the translator.
  • the feedforward neural network model trained in the embodiment of this application can realize the above functions.
  • Figure 3 shows a system architecture 100 provided by an embodiment of the present application.
  • the data collection device 160 is used to collect or generate training data.
  • the training data includes: multiple images with labels or multiple voice clips, etc.; and the training data is stored in the database. 130.
  • the training device 120 trains a feedforward neural network model based on the training data maintained in the database 130.
  • the input information of the first layer in the feedforward neural network model includes the classification distribution information of the training data and the first data feature.
  • the output information of layer l includes a second data feature.
  • the first data feature is the output of layer (l-1). Both the first data feature and the second data feature are used to represent the classification or clustering of the training data.
  • Class information where l is a positive integer greater than 1.
  • the trained feedforward neural network model can be used to implement the data processing method provided by the embodiments of the present application.
  • the training data maintained in the database 130 may not all be collected by the data collection device 160, and may also be received from other devices.
  • the training device 120 does not necessarily perform feedforward neural network model training based solely on the training data maintained by the database 130. It may also obtain training data from the cloud or other places for model training. The above description should not be used as a guideline for this application. Limitations of Examples.
  • the target model/rules 101 trained according to the training device 120 can be applied to different systems or devices, such as to the execution device 110 shown in Figure 3.
  • the execution device 110 can be a terminal, such as a mobile phone terminal, a tablet computer, Laptops, augmented reality (AR)/virtual reality (VR), vehicle-mounted terminals, etc., or servers or clouds, etc.
  • the execution device 110 is configured with an input/output (I/O) interface 112 for data interaction with external devices.
  • the user can input data to the I/O interface 112 through the client device 140.
  • the input data may include: images to be recognized, videos, or voice clips to be recognized.
  • the execution device 110 When the execution device 110 preprocesses the input data, or when the calculation module 111 of the execution device 110 performs calculation and other related processing (such as implementing the function of the feedforward neural network in this application), the execution device 110 can call the data storage
  • the data, codes, etc. in the system 170 are used for corresponding processing, and the data, instructions, etc. obtained by the corresponding processing can also be stored in the data storage system 170 .
  • the I/O interface 112 returns the processing results, such as recognition results or classification results of images, videos, or voices, to the client device 140 so that the client device 140 can provide them to the user device 150 .
  • the user device 150 may be a target that needs to use Lightweight terminals of model/rule 101, such as mobile phone terminals, laptops, AR/VR terminals or vehicle-mounted terminals, etc., used to respond to the corresponding needs of end users, such as image recognition and outputting recognition results for images input by end users to the end user, or perform text classification on the text input by the end user and output the classification result to the end user, etc.
  • the training device 120 can generate corresponding target models/rules 101 based on different training data for different goals or different tasks, and the corresponding target models/rules 101 can be used to achieve the above goals or complete the The above tasks, thereby providing the user with the desired results.
  • the user can manually enter the input data, and the manual input can be operated through the interface provided by the I/O interface 112 .
  • the client device 140 can automatically send input data to the I/O interface 112. If requiring the client device 140 to automatically send input data requires the user's authorization, the user can set corresponding permissions in the client device 140.
  • the user can view the results output by the execution device 110 on the client device 140, and the specific presentation form may be display, sound, action, etc.
  • the client device 140 can also be used as a data collection end to collect the input data of the input I/O interface 112 and the output results of the output I/O interface 112 as new sample data, and store them in the database 130 .
  • the I/O interface 112 directly uses the input data input to the I/O interface 112 and the output result of the output I/O interface 112 as a new sample as shown in the figure.
  • the data is stored in database 130.
  • the client device 140 can transmit the result to the user device 150.
  • the user device 150 can be a terminal, such as a mobile phone terminal, a tablet computer, a notebook computer, AR/VR, a vehicle-mounted terminal, etc.
  • user device 150 can run target model/rules 101 to implement specific functionality.
  • Figure 3 is only a schematic diagram of a system architecture provided by an embodiment of the present application.
  • the positional relationship between the devices, devices, modules, etc. shown in the figure does not constitute any limitation.
  • the data The storage system 170 is an external memory relative to the execution device 110. In other cases, the data storage system 170 can also be placed in the execution device 110.
  • the target model/rule 101 is obtained by training according to the training device 120.
  • the target model/rule 101 can be the classification model in application scenario 2 and application scenario 3, the image recognition model in application scenario 4, and the application scenario 5.
  • the target model/rule 101 provided by the embodiment of this application is, for example, an image recognition model; another example is a speech recognition model, etc.
  • both the image recognition model and the speech recognition model can be feedforward neural network models. .
  • the following is a schematic structural diagram of a feedforward neural network provided by an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of a feedforward neural network 400 provided by an embodiment of the present application.
  • the feedforward neural network 400 may be called an interpretable feedforward neural network, and the feedforward neural network 400 may include an input layer 410 , an intermediate layer 420 , and an output layer 430 .
  • the input layer 410 can obtain the data to be processed, and transfer the obtained data to be processed to the intermediate layer 420 for processing, so as to obtain the data characteristics of the data to be processed, and the data characteristics of the data to be processed are used to determine the data to be processed.
  • the processing results for example, the data characteristics of the data to be processed are used to determine the classification or clustering results of the data to be processed.
  • the number of layers that the middle layer 420 can include is not limited here. When the layers included in the middle layer The larger the number, the more accurate the classification or clustering result of the data to be processed will be.
  • the output layer 430 can be used to output the data characteristics of the data to be processed obtained by the intermediate layer 420
  • the intermediate layer 420 may include layers 421 to 423 as examples, where the layers 421 to 423 may be referred to as layer 1, layer 2, and layer 3.
  • the working principles of Layer 1, Layer 2, and Layer 3 are introduced as follows. They are described specifically from the following two aspects, the training process and the deduction process, as follows:
  • the training process is as follows:
  • the specific calculation process in the first layer is shown in Figure 4A.
  • the input information of the first layer includes the classification distribution information of the training data and the training data.
  • the training data includes classification labels, and the classification distribution information of the training data is based on the training data.
  • the classification label of is determined, and then the network parameters of the first layer are determined based on the classification distribution information ⁇ i of the training data and the training data.
  • the network parameters of layer 1 and the training data to determine the gradient expression of the objective function, and determine the output information of the first layer based on the training data and the classification distribution information ⁇ i of the training data and the gradient expression of the objective function.
  • the output information of the first layer includes the data of the training data. Characteristic Z 1 .
  • the specific calculation process in the second layer is shown in Figure 4B.
  • the input information of the second layer includes the classification distribution information of the training data and the output information of the first layer, that is, the data feature Z 1 of the training data.
  • the classification distribution information ⁇ i and the data characteristics Z 1 of the training data determine the network parameters of layer 2
  • the network parameters of layer 2 and the data feature Z 1 of the training data to determine the gradient expression of the objective function, and determine the output information of the second layer based on the data feature Z 1 of the training data and the classification distribution information ⁇ i of the training data and the gradient expression of the objective function.
  • the output information of layer 2 includes the data feature Z 2 of layer 2.
  • the specific calculation process in layer 3 is shown in Figure 4C.
  • the input information of layer 3 includes the classification distribution information of the training data and the output information of layer 2, that is, the data feature Z 2 of layer 2.
  • the classification distribution information ⁇ i and the data feature Z 2 of the second layer determine the network parameters of the third layer
  • the network parameters of layer 3 Determine the gradient expression of the objective function with the data feature Z 2 of the second layer, and determine the output information of the third layer based on the data feature Z 2 of the second layer, the classification distribution information ⁇ i and the training data, and the gradient expression of the objective function
  • the output information of the third layer includes the data feature Z 3 of the third layer.
  • the network parameters of each layer are, for example, and Store it as a d ⁇ d fully connected layer parameter to obtain the trained feedforward neural network model.
  • the specific calculation process in layer 1 is shown in Figure 4D.
  • the input information of layer 1 includes data to be processed. According to the data to be processed and the network parameters of layer 1, Determine the classification distribution information corresponding to the expected classification labels of the data to be processed Then, according to the classification distribution information corresponding to the data to be processed and the expected classification label of the data to be processed, Determine the gradient expression of the objective function, and determine the output information of the first layer based on the data to be processed and the gradient expression of the objective function.
  • the output information of the first layer includes the data feature Z 1 of the first layer.
  • the input information of layer 2 includes the output information of layer 1, that is, the data feature Z 1 of layer 1.
  • Layer 2 network parameters Determine the classification distribution information corresponding to the expected classification labels of the data to be processed Then based on the data characteristics Z 1 of the first layer and the classification distribution information corresponding to the expected classification label of the data to be processed
  • the objective function gradient expression is determined, and the output information of the second layer is determined based on the data feature Z 1 of the first layer and the objective function gradient expression.
  • the output information of the second layer includes the data feature Z 2 of the second layer.
  • the input information of layer 3 includes the output information of layer 2, that is, the data feature Z 2 of layer 2.
  • Layer 3 network parameters Determine the classification distribution information corresponding to the expected classification labels of the data to be processed
  • the data characteristics Z 2 of the second layer and the classification distribution information corresponding to the expected classification label of the data to be processed Determine the objective function gradient expression, and determine the output information of the third layer according to the data feature Z 2 of the second layer and the objective function gradient expression.
  • the output information of the third layer includes the data feature Z 3 of the third layer.
  • the data characteristics of layer 3 can be called the data characteristics of the data to be processed.
  • Figure 5 is a chip hardware structure provided by an embodiment of the present application.
  • the chip includes an artificial intelligence processor 50.
  • the chip can be disposed in the execution device 110 as shown in FIG. 3 to complete the calculation work of the calculation module 111.
  • the chip can also be installed in the training device 120 as shown in Figure 3 to complete the training work of the training device 120 and output the target.
  • Model/Rule 101 The algorithms of each layer in the feedforward neural network shown in Figure 4 can be implemented in the chip shown in Figure 5.
  • the artificial intelligence processor 50 can be a neural network processor (network processing unit, NPU), a tensor processing unit (TPU), or a graphics processor (graphics processing unit, GPU), or any other suitable for large-scale XOR Processor for computational processing.
  • NPU network processing unit
  • TPU tensor processing unit
  • GPU graphics processor
  • NPU can be mounted to the main CPU (Host CPU) as a co-processor, and the main CPU assigns tasks to it.
  • the core part of the NPU is the arithmetic circuit 503.
  • the arithmetic circuit 503 is controlled by the controller 504 to extract data in the memory (weight memory or input memory) and perform operations.
  • the computing circuit 503 internally includes multiple processing engines (PEs).
  • PEs processing engines
  • arithmetic circuit 503 is a two-dimensional systolic array.
  • the arithmetic circuit 503 may also be a one-dimensional systolic array or other electronic circuit capable of performing numerical operations such as multiplication and addition.
  • arithmetic circuit 503 is a general-purpose matrix processor.
  • the operation circuit 503 obtains the corresponding data of matrix B from the weight memory 502 and caches it on each PE in the operation circuit 503 .
  • the operation circuit 503 obtains the input data of matrix A from the input memory 501, performs a matrix operation based on the input data of matrix A and the weight data of matrix B, and stores the partial result or final result of the matrix in an accumulator (accumulator) 508. .
  • the unified memory 506 is used to store input data and output data.
  • the weight data is directly transferred to the weight memory 502 through the storage unit access controller (direct memory access controller, DMAC) 505.
  • DMAC direct memory access controller
  • Input data is also transferred to unified memory 506 via DMAC.
  • the bus interface unit (bus interface unit, BIU) 510 is used for the interaction between the DMCA and the instruction fetch buffer 509; the bus interface unit 510 is also used for the instruction fetch buffer 509 to obtain instructions from the external memory; the bus interface unit 510 also The storage unit access controller 505 obtains the original data of the input matrix A or the weight matrix B from the external memory.
  • DMAC is mainly used to transfer the input data in the external memory DDR to the unified memory 506 , or to transfer the weight data to the weight memory 502 , or to transfer the input data to the input memory 501 .
  • the vector calculation unit 507 may include multiple arithmetic processing units, and if necessary, perform further processing on the output of the arithmetic circuit 503, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, etc.
  • the vector calculation unit 507 is mainly used for intermediate layer calculation in the feedforward neural network.
  • vector calculation unit 507 stores the processed output vector to unified buffer 506 .
  • the vector calculation unit 507 may apply a nonlinear function to the output of the operation circuit 503, such as a vector of accumulated values, to generate an activation value.
  • vector calculation unit 507 generates normalized values, merged values, or both.
  • the processed output vector can be used as an activation input to the arithmetic circuit 503, for example for use in a subsequent layer in a feedforward neural network.
  • An instruction fetch buffer 509 connected to the controller 504 is used to store instructions used by the controller 504.
  • the controller 504 is used to call instructions cached in the fetch memory 509 to control the working process of the computing accelerator.
  • the unified memory 506, the input memory 501, the weight memory 502 and the instruction memory 509 are all on-chip memories, and the external memory is a memory external to the NPU.
  • the external memory can be double data rate synchronous dynamic random access. Memory (double data rate synchronous dynamic random access memory, DDR SDRAM), high bandwidth memory (high bandwidth memory, HBM) or other readable and writable memory.
  • the execution device 110 in Figure 3 introduced above can execute the data processing method or data processing method of the embodiment of the present application.
  • Each step of the data processing method, the feedforward neural network model in Figure 4 and the chip shown in Figure 5 can also perform the data processing method or each step of the data processing method in the embodiment of the present application.
  • the embodiment of this application provides a system architecture.
  • the system architecture includes one or more local devices, execution devices, and data storage systems. Among them, the local device is connected to the execution device through the communication network.
  • Execution devices can be implemented by one or more servers.
  • the execution device can be used in conjunction with other computing devices, such as data storage, routers, load balancers and other devices.
  • Execution devices can be deployed on one physical site or distributed across multiple physical sites.
  • the execution device can use the data in the data storage system, or call the program code in the data storage system to implement the quantification method of the neural network in the embodiment of the present application.
  • Each local device may represent any computing device, such as a personal computer, computer workstation, smartphone, tablet, smart camera, smart car or other type of cellular phone, media consumption device, wearable device, set-top box, game console, etc.
  • Each user's local device can interact with the execution device through a communication network of any communication mechanism/communication standard.
  • the communication network can be a wide area network, a local area network, a point-to-point connection, etc., or any combination thereof.
  • the local device obtains relevant parameters of the target neural network from the execution device, deploys the target neural network on the local device, and uses the target neural network to perform image classification or image processing, etc.
  • the target neural network is trained according to the data processing method of the embodiment of the present application.
  • the target neural network can be directly deployed on the execution device, and the execution device obtains the data to be processed from the local device and the local device, and classifies or performs other types of processing on the data to be processed according to the target neural network.
  • the above execution device can also be called a cloud device, in which case the execution device is generally deployed in the cloud.
  • the sampled data Z [X 1 ,X 2 ,...,X m ] ⁇ R d ⁇ m
  • the autocorrelation matrix S of the sampled data Z can be used as a representation
  • S is the autocorrelation matrix of sampled data
  • m is the number of sampled data
  • Z is the sampled data.
  • the autocorrelation matrix S is an unbiased estimate and is positive definite.
  • the autocorrelation matrix for a certain type can be defined as:
  • S i is the autocorrelation matrix of the data of the i-th category corresponding to the classification label in the sampled data
  • m i represents the number of the i-th category corresponding to the classification label in the sampled data
  • K is the number of all categories of classification labels in m sampled data
  • ⁇ i is the classification distribution information of the i-th category of data corresponding to the classification label in the sampled data
  • Z is the sampled data.
  • the KL (Kullback-Leibler) divergence of the two matrices can be defined as:
  • S j ) is the KL divergence of the autocorrelation matrix of the data whose classification label corresponds to the i-th category in the sampled data and the autocorrelation matrix of the data whose classification label corresponds to the j-th category in the sampled data
  • S i is the autocorrelation matrix of the data of the i-th category corresponding to the classification label in the sampled data
  • S j is the autocorrelation matrix of the data of the j-th category corresponding to the classification label in the sampled data
  • d is the dimension of the sampled data
  • Tr() represents the trace operation
  • logdet() means taking the logarithm of the determinant of the matrix.
  • the JS divergence of the two matrices can be defined as:
  • S j ) is the JS divergence of the autocorrelation matrix of the data whose classification label corresponds to the i-th category in the sampled data and the autocorrelation matrix of the data whose classification label corresponds to the j-th category in the sampled data
  • S i is the autocorrelation matrix of the data of the i-th category corresponding to the classification label in the sampled data
  • S j is the autocorrelation matrix of the data of the j-th category corresponding to the classification label in the sampled data
  • d is the dimension of the sampled data
  • Tr() represents the trace operation.
  • the JS divergence between the autocorrelation matrices of different categories of sampling data can be expanded by determining the objective function to distinguish the sampling data of different categories, thereby achieving the classification/clustering effect.
  • the expression of the objective function The formula is as follows:
  • ⁇ i,j is a weight parameter used to balance the number of sampled data of each category
  • m i represents the number of classification labels corresponding to the i-th category in the sampled data
  • m j represents the number of classification labels corresponding to the j-th category in the sampled data
  • S j ) is the JS divergence of the autocorrelation matrix of the data whose classification label corresponds to the i-th category in the sampled data and the autocorrelation matrix of the data whose classification label corresponds to the j-th category in the sampled data.
  • This objective function can be used for network update.
  • gradient ascent can be used to update the data feature Z, as follows:
  • Z l represents the data characteristics of the l-th layer in the feed-forward neural network
  • Z l-1 represents the data characteristics of the (l-1)-th layer in the feed-forward neural network
  • represents the step size, or learning rate.
  • the gradient expression of the objective function can be determined, as follows:
  • ⁇ i,j is a weight parameter used to balance the number of samples in each category of sampled data, is the regularized autocorrelation matrix of the data of the i-th category corresponding to the classification label in the sampled data
  • S i is the autocorrelation matrix of the data of the i-th category corresponding to the classification label in the sampled data
  • is the regular parameter
  • I is the identity matrix
  • m i represents the number of the i-th category corresponding to the classification label in the sampled data
  • K is the number of all categories of classification labels in m sampled data
  • ⁇ i is the classification distribution information of the i-th category of data corresponding to the classification label in the sampled data
  • S j is the autocorrelation matrix of the j-th category of data corresponding to the classification label in the sampled data
  • m j represents the number of the j-th category of data
  • P(Z) is a mixture distribution generated by a set of conditional probabilities ⁇ P(Z
  • C k is classification information.
  • the random vector Z obeys the distribution P(Z).
  • the classification information C k is given, the random vector Z obeys the distribution P(Z
  • the objective function The expression is as follows:
  • ⁇ k is a weight parameter used to balance the number of sampled data of each category
  • m k represents the number of classification labels in the sampled data corresponding to the kth category
  • S k is a distribution that obeys the conditional probability distribution P (Z
  • the autocorrelation matrix of feature Z k , S is the autocorrelation matrix of feature Z that obeys the probability distribution P(Z)
  • S) is the KL divergence of S k and S.
  • the gradient expression of the objective function can be determined, as follows:
  • ⁇ k is a weight parameter used to balance the number of sampled data of each category
  • m k represents the number of classification labels in the sampled data corresponding to the kth category
  • K is the number of all categories of classification labels in m sampled data
  • is a regular parameter
  • S k is the autocorrelation matrix of the data of the kth category corresponding to the classification label in the sampled data
  • I is the identity matrix
  • ⁇ k is the classification distribution information of the data of the kth category corresponding to the classification label in the sampled data
  • S is the autocorrelation matrix of all categories of classification labels in the sampled data
  • Z is the sampled data.
  • Contrastive Learning is a method of feature extraction. Its core idea is that for similar original data, the distance between the images mapped to the feature space should be as close as possible. The distance between the data and the images they map to the feature space should be as far as possible. Therefore, the objective function can be designed using the idea of contrastive learning, specifically following the following two principles: 1) Contrastivity: the distance between the central nodes of data classification/clustering should be as large as possible; 2) Diversity: Within the same classification/cluster, data should be as diverse as possible.
  • the contrastive principle equivalently as: under the condition that the data energy is certain, maximize the volume of the n-dimensional simplex formed by each node.
  • the principle of diversity can be characterized by entropy, and the principle of diversity is described as maximizing the entropy of features under known classification/clustering information. It can be proved that under certain conditions of feature energy, the feature has maximum entropy if and only if the distribution of the feature is white Gaussian noise. Therefore, we hope that the distribution of features is as close to a Gaussian distribution as possible.
  • KL divergence can be used to describe the similarity between the feature distribution and the Gaussian distribution, and the objective function is defined as:
  • the volume of the K-simplex spanned by the central node is is a column vector whose elements are all 1
  • ⁇ k is the classification distribution information of the data of the kth category corresponding to the classification label in the sampled data
  • Tr() represents the trace operation
  • is the variance of the Gaussian distribution
  • m represents the number of sampled data
  • d is the dimension of the sampled data.
  • the objective function satisfies convexity and unitary invariance, so the gradient expression of the objective function is as follows:
  • Tr(ZZ T ) m(1 + ⁇ 2 d)
  • is the Gaussian distribution variance
  • m represents the number of sampled data
  • m k is the number of classification labels corresponding to the kth category in the m sampled data
  • K is the number of all categories of classification labels in m sampled data
  • I is the identity matrix
  • represents the regularization parameter
  • Z is the sampled data.
  • Figure 6 is a schematic flowchart of a data processing method provided by an embodiment of the present application. This method can be executed by a data processing device, which specifically can be the training device 120 in the system architecture 100 shown in Figure 3. The method includes but is not limited to the following steps:
  • Step S601 Obtain training data.
  • the training data includes classification labels.
  • the 1-10th pictures are the 1st category, that is, the number "0”
  • the 11th-20th pictures are the 2nd category, that is, the number "1" Category
  • the 21st-30th picture is the 3rd category, that is, the number "3” category
  • the 91-100th picture is the 10th category, that is, the number "9" category.
  • the classification distribution information of the training data can be determined according to the classification labels of the training data.
  • a general classification or clustering task can have m d-dimensional data, expressed as a feature matrix Z ⁇ R d ⁇ m , and there are K classifications/clusters C 1 ,...,C K .
  • soft classification/clustering it can be defined as follows:
  • ⁇ k represents the distribution information of each classification in the data. It should be the same in the training set and the test set. Therefore, the classification distribution information of the original data can be obtained by estimating the parameter ⁇ k , and then the classification of the original data can be used. Information extracts features from the data.
  • Step S602 Determine the feedforward neural network model.
  • the input information of the lth layer in the feedforward neural network model includes the classification distribution information of the training data and the first data feature
  • the output information of the lth layer includes the second data feature
  • the first data feature is the (lth -1)
  • the output of layer, the first data feature and the second data feature are both used to represent the classification or clustering information of the training data; where l is a positive integer greater than 1.
  • the input dimension of the input data set includes the input dimension of the input data set
  • determining the feedforward neural network model includes: obtaining the first data feature Z l-1 ; and classifying distribution information ⁇ i according to the first data feature Z l-1 and the training data. Determine the network parameters of the l-th layer; the second data feature is determined based on the first data feature Z l-1 and the network parameters of the l-th layer.
  • determining the network parameters of the l-th layer according to the first data feature Z l-1 and the classification distribution information ⁇ i of the training data specifically includes the following method: according to the first data feature Z l-1 and the training data
  • the classification distribution information ⁇ i of the data determines the regularized autocorrelation matrix of each category corresponding to the classification label in the training data; the regularized autocorrelation matrix of each category corresponding to the classification label in the training data determines the first Network parameters of layer l.
  • m i is the number of classification labels corresponding to the i-th category in the m training data
  • K is m training
  • the number of all categories of classification labels in the training data is the weight parameter used to balance the number of samples of each type in the training data
  • Z l-1 is the first data feature
  • ⁇ i is the classification distribution information of the training data
  • S i is the classification label in the training data
  • is the regularization parameter
  • m i is the number of classification labels corresponding to the i-th category in the m training data
  • K is the number of all categories of classification labels in m training data
  • Z l-1 is the first data feature
  • ⁇ i is the classification distribution information of the training data
  • I is the unit matrix
  • S i is the training The autocorrelation matrix of the i-th category corresponding to the classification label in the data
  • is the regularization parameter
  • S is the autocorrelation matrix of all categories of the classification label in the training data
  • determining the network parameters of the l-th layer according to the first data feature Z l-1 and the classification distribution information ⁇ i of the training data may also include the following method: determining the gradient according to the classification distribution information ⁇ i of the training data. Parameters; then determine the network parameters of the l-th layer according to the first data feature Z l-1 and the gradient parameter.
  • the vertex of the simplex can be used as the clustering center to mark the reference between each category. If The smaller the value, it means that the distribution of the i-th category in the training data is closer to the distribution of the j-th category in the training data, so It can be used as a discriminant parameter.
  • the intermediate variable C i of each layer can be stored in the network as a d ⁇ d fully connected layer. parameter.
  • the second data characteristics can be determined according to the first data characteristic Z l-1 and the network parameters of the l-th layer, specifically including the following methods: according to the The network parameters and the first data feature Z l-1 determine the objective function gradient expression; then, according to the first data feature Z l-1 , the classification distribution information ⁇ i of the training data and the objective function gradient The expression determines the second data characteristic Z l .
  • the gradient expression of the objective function can be as described in the above formula (1), formula (2) or formula (3), which will not be described again here.
  • Z l is the second data feature
  • represents the step size or learning rate
  • Z l-1 is the first data feature.
  • the calculation process of the l-th layer in the feedforward neural network model is shown in Figure 7. From formula (1), it can be seen that the calculation of the l-th layer
  • the feature Z l of the layer requires the feature Z l-1 of the (l-1)th layer, as well as the classification distribution information ⁇ i of the training data and the network parameters of the l-th layer, that is, the intermediate variables in, Represents the network parameters of the i-th category in layer l.
  • the classification distribution information ⁇ i of the training data can be determined, thereby determining the regularized autocorrelation matrix of the i-th category corresponding to the classification label of the training data. And then get the intermediate variable if The smaller the value, it means that the distribution of the i-th category in the training data is closer to the distribution of the j-th category in the training data, so It can be used as a discriminant parameter.
  • the intermediate variable U i of each layer can be stored as a d ⁇ d fully connected layer parameter using the network.
  • Z l-1 is the feature of the (l-1)th layer
  • represents the step size or learning rate
  • Z l is the feature of the l-th layer.
  • the calculation process of the l-th layer in the feedforward neural network model is shown in Figure 8. From formula (2), it can be seen that the calculation of the l-th layer
  • the feature Z l of the layer requires the feature Z l-1 of the (l-1)th layer, as well as the classification distribution information ⁇ i of the training data and the network parameters of the l-th layer, that is, the intermediate variables and in, Represents the network parameters of the i-th category in the l-th layer, Represents the network parameters of the i-th category in layer l.
  • the classification distribution information ⁇ i of the training data can be determined, thereby determining the regularized autocorrelation matrix of the i-th category corresponding to the classification label of the training data. And then get the intermediate variable and if The smaller the value, it means that the distribution of the i-th category in the training data is closer to the distribution of the j-th category in the training data, so It can be used as a discriminant parameter.
  • the intermediate variable A i of each layer can be stored as a d ⁇ d fully connected layer parameter using the network.
  • Z l-1 is the feature of the (l-1)th layer
  • represents the step size or learning rate
  • Z l is the feature of the l-th layer.
  • the calculation process of the l-th layer in the feedforward neural network model is shown in Figure 9. From formula (3), it can be seen that the calculation of the l-th layer
  • the feature Z l of the layer requires the feature Z l-1 of the (l-1)th layer, as well as the classification distribution information ⁇ i of the training data and the network parameters of the l-th layer, that is, the intermediate variables in, Represents the network parameters of the i-th category in layer l.
  • the classification distribution information ⁇ i of the training data can be determined,
  • the gradient parameters G and H i are determined, and the intermediate variables are obtained
  • the network parameters of the i-th category in layer l. refers to the clustering center.
  • the vertex of the simplex can be used as the clustering center to mark the reference between each category. If The smaller the value, it means that the distribution of the i-th category in the training data is closer to the distribution of the j-th category in the training data, so It can be used as a discriminant parameter.
  • the intermediate variable C i of each layer can be stored as a d ⁇ d fully connected layer parameter using the network.
  • Z l-1 is the feature of the (l-1)th layer
  • represents the step size or learning rate
  • Z l is the feature of the l-th layer.
  • Z l-1 is constrained in the d-1-dimensional unit sphere space.
  • a feedforward neural network model is provided, thereby reducing the communication overhead of the transceiver caused by the BP algorithm training interaction and improving the training efficiency.
  • the number of network layers is adjusted to improve training accuracy and avoid the problem of retraining due to different network adaptations of different transceivers.
  • Figure 10 is a schematic flowchart of a data processing method provided by an embodiment of the present application. This method can be executed by a data processing device, which specifically can be the execution device 110, the client device 140 or the user device 150 in the system architecture 100 shown in Figure 3. The method includes but is not limited to the following steps:
  • Step S1001 Determine the feedforward neural network model.
  • Step S1002 Obtain data to be processed of unknown classification or clustering information.
  • the data to be processed of the unknown classification or clustering information does not include classification labels.
  • Step S1003 Input the data to be processed into a feedforward neural network model to determine the data characteristics of the data to be processed.
  • the data characteristics of the data to be processed are classification or clustering information used to represent the data to be processed.
  • the data characteristics of the data to be processed are used to determine the classification or clustering results of the data to be processed.
  • the dimension of the data feature of the data to be processed is related to the data type of the data to be processed. For example, for the selection of the dimension of the data feature, it can be known from the VC dimension (Vapnik-Chervonenkis) theory that the higher the VC dimension, the higher the complexity of the model. , the easier it is to distinguish. However, it is easy to overfit if it is too high, so it is necessary to determine the appropriate dimensions.
  • a common estimation method is to calculate the eigenvalues of the autocorrelation matrix of the original data, remove some dimensions close to 0, and leave the remaining dimensions as the dimensions for feature extraction.
  • it can be refined for different data types. For example, if the data type of the data to be processed is a picture, the dimension of the data feature of the data to be processed can be 1000; if the data type of the data to be processed is text, the dimension of the data feature to be processed can be The dimensionality of data features can be 768.
  • the process of inputting the data to be processed into the feedforward neural network model to determine the data characteristics of the data to be processed can be understood as a deduction process, specifically as follows:
  • the input of the data to be processed into a feedforward neural network model to determine the data characteristics of the data to be processed specifically includes the following method: determining the data to be processed according to the data to be processed and the network parameters of the first layer Classification distribution information corresponding to the expected classification label of the data According to the classification distribution information corresponding to the data to be processed and the expected classification label of the data to be processed Determine the gradient expression of the objective function; determine the data characteristics of the data to be processed based on the data to be processed and the gradient expression of the objective function.
  • the classification distribution information corresponding to the expected classification label of the data to be processed is determined based on the data to be processed and the network parameters of the first layer.
  • the method may include the following: determining the projection of the expected classification label of the data to be processed on the first category based on the data to be processed and the network parameters of the first layer; the first category corresponds to the predicted classification label of the data to be processed. any one category among multiple categories; determine the classification distribution information corresponding to the expected classification label of the data to be processed according to the projection of the data to be processed on the first category.
  • Z is the data to be processed, is the network parameter of the i-th category of the l-th layer, is the projection of the expected classification label of the data to be processed on the i-th category of the l-th layer; is the classification distribution information corresponding to the expected classification label of the data to be processed, and ⁇ is a hyperparameter that controls the estimation confidence. It can be understood as the projection of the data to be processed on the i-th category of the l-th layer. When The smaller the value, the closer the correlation with the i-th category of the l-th layer. Therefore, the softmax function can be used to determine the classification distribution information corresponding to the expected classification label of the data to be processed. The specific formula is shown above.
  • the classification distribution information corresponding to the expected classification label of the data to be processed is determined according to the above formula. After that, according to the classification distribution information corresponding to the data to be processed and the expected classification label of the data to be processed, Determine the gradient expression of the objective function; the gradient expression of the objective function can be specifically shown in formula (1), and then determine the data characteristics of the data to be processed according to the data to be processed and the gradient expression of the objective function.
  • Z is the data to be processed, is the network parameter of the i-th category of the l-th layer, is the projection of the expected classification label of the data to be processed on the i-th category of the l-th layer; is the classification distribution information corresponding to the expected classification label of the data to be processed, and ⁇ is a hyperparameter that controls the estimation confidence. It can be understood as the projection of the data to be processed on the i-th category of the l-th layer. When The smaller the value, the closer the correlation with the i-th category of the l-th layer. Therefore, the softmax function can be used to determine the classification distribution information corresponding to the expected classification label of the data to be processed. The specific formula is shown above.
  • the classification distribution information corresponding to the expected classification label of the data to be processed is determined according to the above formula. After that, according to the classification distribution information corresponding to the data to be processed and the expected classification label of the data to be processed, Determine the gradient expression of the objective function; the gradient expression of the objective function can be specifically shown in formula (2), and then determine the data characteristics of the data to be processed according to the data to be processed and the gradient expression of the objective function.
  • the classification distribution information corresponding to the expected classification label of the data to be processed includes one or more of the following: distance information, correlation information, differential information or soft classification information, according to the data to be processed and the network parameters of the first layer Determine the classification distribution information corresponding to the expected classification labels of the data to be processed
  • the specific formula is as follows:
  • Z is the data to be processed, is the classification distribution information corresponding to the expected classification label of the data to be processed, is the network parameter of the i-th category of the l-th layer
  • Z l is the data feature of the l-th layer of the data to be processed
  • Z l-1 is the data feature of the (l-1)-th layer of the data to be processed
  • ⁇ > means inner product.
  • the classification distribution information corresponding to the expected classification label of the data to be processed can be Determine the gradient expression of the objective function, specifically including: classification distribution information corresponding to the expected classification label of the data to be processed Determine the gradient parameters (G and Hi ); determine the gradient expression of the objective function according to the data to be processed and the gradient parameters.
  • the specific formula is as follows:
  • G [g 1 ,g 2 ,...,g i ];
  • Tr() represents the trace operation
  • I is the identity matrix
  • m i is the number of the i-th category in the m data to be processed
  • K is the number of all categories with expected classification labels in the m data to be processed
  • G and Hi represent the gradient parameters.
  • the objective gradient function expression is shown in formula (3), and then the data characteristics of the to-be-processed data are determined based on the to-be-processed data and the objective function gradient expression.
  • the method further includes: outputting data characteristics of the data to be processed.
  • the data to be processed of the unknown classification or clustering information is the data characteristics of the third data
  • the data characteristics of the third data are determined through another feedforward neural network
  • the input information of the l-th layer in another feedforward neural network includes the classification distribution information of the training data and the first data feature
  • the output information of the l-th layer includes the second data feature
  • the first data feature is the The output of the (l-1) layer
  • the first data feature and the second data feature are both used to represent the classification or clustering information of the training data, where l is a positive integer greater than 1. That is to say, the data characteristics of the third data are determined through another feedforward neural network.
  • the data characteristics of the third data are also the data to be processed with unknown classification or clustering information, and then the data to be processed is input to the determined
  • the data characteristics of the data to be processed can be obtained from a good feedforward neural network model.
  • the calculation process of the l-th layer in the derivation process of the feedforward neural network model is shown in Figure 11.
  • the data Z to be processed can be based on the network parameters saved in the trained feedforward neural network, such as the network parameters of the lth layer.
  • the data to be processed Z determines the classification distribution information corresponding to the expected classification label of the data to be processed.
  • the data Z to be processed and the network parameters of the lth layer Determine the gradient expression of the objective function, as shown in formula (1).
  • the relevant formulas can be referred to the above.
  • the calculation process of the l-th layer in the derivation process of the feedforward neural network model is shown in Figure 12.
  • the data to be processed Z can be based on the network parameters saved in the trained feedforward neural network, such as the network parameters of the lth layer and And the data Z to be processed determines the classification distribution information corresponding to the expected classification label of the data to be processed. Then according to the classification distribution information corresponding to the expected classification label of the data to be processed The data to be processed Z, and the network parameters of the lth layer and Determine the gradient expression of the objective function, as shown in formula (2). Finally, determine the data characteristics of the data to be processed based on the gradient expression of the objective function and the data to be processed.
  • the relevant formulas can be referred to the above.
  • the calculation process of the l-th layer in the derivation process of the feedforward neural network model is shown in Figure 13.
  • the data to be processed Z can be based on the network parameters saved in the trained feedforward neural network, such as the network parameters of the lth layer
  • the data Z to be processed determines the classification distribution information corresponding to the expected classification label of the data to be processed.
  • the classification distribution information corresponding to the expected classification label of the data to be processed Determine the gradient parameters G and Hi , and then use the data to be processed Z and the network parameters of the lth layer and the gradient parameters G and H i to determine the gradient expression of the objective function, as shown in formula (3).
  • the data characteristics of the data to be processed are determined based on the gradient expression of the objective function and the data to be processed.
  • the relevant formulas can be referred to the above.
  • the data processing methods shown in Figure 6 and Figure 10 can be applied to multi-view (Multi-View) scenarios and multi-node (Multi-Node), specifically as follows:
  • the multi-view scene is shown in Figure 14. Since the feedforward neural network shown in Figure 6 and Figure 10 above is mainly used to transmit data features, task-related operations are completed at the receiving end, so there can be multiple sending ends. different data characteristics, In order to facilitate the receiving end to process the different data characteristics, and obtain the data processing results according to different classification tasks. As shown in Figure 14, in this multi-view scenario, there can be multiple senders. Two senders are used as an example for description. The two senders are the first sender and the second sender respectively. Among them, the first sender Both the sending end and the second sending end perform the same task, for example, a classification task, but the classification distribution information of the training data received by the first sending end and the second sending end may be different.
  • the first sending end uses the training data and the training data.
  • the feedforward neural network model obtained from the classification distribution information extracts the data features Z 1 related to the classification task.
  • the second sending end extracts the data features related to the classification task through the feedforward neural network model obtained from the training data and the classification distribution information of the training data.
  • Z 2 , and the data feature Z 1 extracted by the first sending end and the data feature Z 2 extracted by the first sending end are sent to the receiving end through channel transmission.
  • the first network may be a feedforward neural network, KNN, or a convolutional neural network (CNN) proposed in the embodiment of the present application.
  • Z 1 represents the data characteristics before transmission through the channel
  • n represents the Gaussian noise with standard deviation
  • Var( ⁇ ) represents the variance
  • the multi-node scenario is shown in Figure 15.
  • the transmission data can be considered to be deduced through the feedforward neural network deployed on different network nodes. This reduces the impact on the channel and enables the characteristic results of multiple receivers to have better accuracy and be used by multiple receivers.
  • the first node extracts the data feature Z 1 related to the classification task through the feedforward neural network model obtained from the training data and the classification distribution information of the training data, and uses the data feature Z 1 extracted by the first node It is sent to the second node through channel transmission.
  • the feedforward neural network model obtained based on the data feature Z 1 and the classification distribution information of the training data extracts the classification task-related Data feature Z 2 , and so on, the last node is obtained.
  • the feedforward neural network model obtained based on the data feature Z n-1 extracted from the previous node and the classification distribution information of the training data extracts the data feature Z n related to the classification task.
  • n represents the number of nodes.
  • the data feature Z n can be input into the first network to train the readout layer to obtain the final output result.
  • the first network may be the feedforward neural network proposed in the embodiment of this application, KNN, or CNN, etc. Among them, in this scenario, the data feature dimensions of input and output of different communication nodes need to remain the same.
  • the training device trains the feedforward neural network model according to the data processing method shown in Figure 6, it also needs to evaluate the trained model through verification data to ensure that the trained feedforward neural network model has better generalization.
  • the training device trains the designed objective function using gradient backpropagation and feedforward respectively. details as follows:
  • the training device takes the above formula (1) as the objective function, takes the MNIST handwritten font set as an example, adopts a feature dimension of 128, uses Resnet18 network training to obtain the results before the readout layer, and uses T-distributed stohastic embedding neighbor embedding (t-SNE) algorithm reduces the dimensionality of the results before the readout layer into 2D visualization data, as shown in Figure 16A.
  • the training device also uses the mean square error (MSE) as the objective function, and the training is Read out the results of the layer, and use the t-SNE algorithm to reduce the dimension into 2D visualization data, as shown in Figure 16B.
  • MSE mean square error
  • a multi-layer network structure is designed. And test the k-nearest neighbor classification algorithm (k-nearest neighbor, KNN) results of the final output features through the AWGN channel, as shown in Table 1.
  • KNN k-nearest neighbor classification algorithm
  • the feature dimensions adopted are the same as the input dimensions. is 768
  • the signal-to-noise ratio SNR 25db
  • eta 500
  • eta is a hyperparameter used to control the estimation confidence when predicting classification labels
  • the accuracy of the extracted data features becomes higher. For example, if the number of layers in the middle layer of the feedforward neural network is 2, the training set accuracy is 0.5247; currently The number of layers in the middle layer of the fed neural network is 6, and the accuracy of the training set is 0.7135. The number of layers in the middle layer of 6 is 0.1888 higher than the accuracy of the training set with 2 layers.
  • the method of the embodiment of the present application can reduce the communication overhead caused by training interaction and improve the training efficiency.
  • Only the receiving end needs to train the readout layer network.
  • the structure of the feedforward neural network is more flexible, and the accuracy can be improved by increasing the number of network layers. That is, when the value of l is larger, the accuracy of the classification or clustering results of the data to be processed is higher, and the accuracy of the classification or clustering results of the data to be processed is higher.
  • the problem of different network adaptations at the transceiver and receiver requires retraining.
  • the feedforward neural network model is interpretable and can explain the black box problem of the neural network, and the data characteristics of the output data to be processed can be used as data preprocessing and can be used for subsequent readout layer operations.
  • Figure 17 is a schematic structural diagram of a data processing device 1700 provided by an embodiment of the present application.
  • the data processing device 1700 may include a first determination unit 1701, an acquisition unit 1702, and a second determination unit 1703, wherein each A detailed description of the units follows.
  • the first determination unit 1701 is used to determine the feedforward neural network model.
  • the input information of the first layer in the feedforward neural network model includes the classification distribution information of the training data and the first data characteristics.
  • the output of the first layer includes second data features, the first data features are the output of the (l-1)th layer, and both the first data features and the second data features are used to represent the classification or aggregation of the training data.
  • Class information where l is a positive integer greater than 1;
  • Acquisition unit 1702 used to obtain data to be processed of unknown classification or clustering information
  • the second determination unit 1703 is used to input the data to be processed into the feedforward neural network model to determine the data characteristics of the data to be processed; the data characteristics of the data to be processed are used to represent the data to be processed. classification or clustering information; the data characteristics of the data to be processed are used to determine the classification or clustering results of the data to be processed.
  • the dimensions of the data characteristics of the data to be processed are related to the data type of the data to be processed.
  • the input information of the first layer includes the classification distribution information of the training data and the training data.
  • the training data includes classification labels; the classification distribution information of the training data is determined based on the classification labels in the training data.
  • the first determination unit 1701 is specifically configured to obtain the first data feature Z l-1 ;
  • the network parameters of the l-th layer are determined according to the first data feature Z l-1 and the classification distribution information ⁇ i of the training data;
  • the second data feature is determined according to the first data feature Z l-1 and the network parameters of the l-th layer.
  • the first determination unit 1701 is specifically configured to determine the objective function gradient expression according to the network parameters of the l-th layer and the first data feature Z l-1 ; according to the first data
  • the second data feature Z l is determined by the feature Z l-1 , the classification distribution information ⁇ i of the training data and the gradient expression of the objective function.
  • the first determination unit 1701 is specifically configured to determine the classification label in the training data according to the first data feature Z l-1 and the classification distribution information ⁇ i of the training data.
  • the corresponding regularized autocorrelation matrix of each category determine the network parameters of the first layer according to the regularized autocorrelation matrix of each category corresponding to the classification label in the training data.
  • m i is the number of classification labels corresponding to the i-th category in the m training data
  • K is the number of all categories of classification labels in m training data
  • Z l-1 is the first data feature
  • ⁇ i is the classification distribution information of the training data
  • S i is the classification in the training data
  • is the regularization parameter
  • m i is the number of classification labels corresponding to the i-th category in the m training data
  • K is the number of all categories of classification labels in m training data
  • Z l-1 is the first data feature
  • ⁇ i is the classification distribution information of the training data
  • I is the unit matrix
  • S i is the training The autocorrelation matrix of the i-th category corresponding to the classification label in the data
  • is the regularization parameter
  • S is the autocorrelation matrix of all categories of the classification label in the training data
  • the first determination unit 1701 is specifically configured to determine the gradient parameter according to the classification distribution information ⁇ i of the training data; according to the first data feature Z l-1 and the gradient parameter Determine the network parameters of the l-th layer.
  • the second determination unit 1703 is specifically configured to determine the classification distribution information corresponding to the expected classification label of the data to be processed according to the data to be processed and the network parameters of the first layer.
  • Classification distribution information corresponding to the data to be processed and the expected classification label of the data to be processed Determine the gradient expression of the objective function; determine the data characteristics of the data to be processed according to the data to be processed and the gradient expression of the objective function.
  • the second determination unit 1703 is specifically configured to determine the expected classification label of the data to be processed in the first category according to the data to be processed and the network parameters of the first layer. Projection; the first category is any one of multiple categories corresponding to the predicted classification label of the data to be processed; determining the expected classification of the data to be processed according to the projection of the data to be processed on the first category Classification distribution information corresponding to the label.
  • Z is the data to be processed, is the network parameter of the i-th category of the l-th layer, is the projection of the expected classification label of the data to be processed on the i-th category of the l-th layer; is the classification distribution information corresponding to the expected classification label of the data to be processed, and ⁇ is a hyperparameter that controls the estimation confidence.
  • the objective function gradient expression includes:
  • m i is the number of m data to be processed that is expected to have a classification label of the i-th category
  • K is the number of all categories with expected classification labels in m data to be processed
  • Z is the data to be processed
  • Z is the data to be processed
  • S i is the autocorrelation matrix of the i-th category corresponding to the predicted classification label of the data to be processed
  • the regularized autocorrelation matrix of the i-th category corresponding to the expected classification label of the data to be processed.
  • Z is the data to be processed, is the network parameter of the i-th category of the l-th layer, is the projection of the expected classification label of the data to be processed on the i-th category of the l-th layer; is the classification distribution information corresponding to the expected classification label of the data to be processed, and ⁇ is a hyperparameter that controls the estimation confidence.
  • the objective function gradient expression includes:
  • m i is the number of m data to be processed that is expected to have a classification label of the i-th category
  • ⁇ i is a weight parameter used to balance the expected number of samples of each type in the data to be processed
  • Z is the data to be processed
  • Z is the data to be processed
  • Z is the data to be processed
  • S i is the autocorrelation matrix of the i-th category corresponding to the predicted classification label of the data to be processed
  • S is the autocorrelation matrix of all categories corresponding to the expected classification label of the data to be processed
  • the classification distribution information corresponding to the expected classification label of the data to be processed includes one or more of the following: distance information, correlation information, differential information or soft classification information.
  • Z is the data to be processed, is the classification distribution information corresponding to the expected classification label of the data to be processed, is the network parameter of the i-th category of the l-th layer
  • Z l is the data feature of the l-th layer of the data to be processed
  • Z l-1 is the data feature of the (l-1)-th layer of the data to be processed
  • ⁇ > means inner product.
  • the second determination unit 1703 is specifically configured to use classification distribution information corresponding to the expected classification label of the data to be processed. Determine gradient parameters (G and Hi ); determine the objective function gradient expression according to the data to be processed and the gradient parameters.
  • G [g 1 ,g 2 ,...,g i ];
  • Tr() represents the trace operation
  • I is the identity matrix
  • m i is the number of the i-th category in the m data to be processed
  • K is the number of all categories with expected classification labels in the m data to be processed
  • G and Hi represent the gradient parameters.
  • the gradient expression of the objective function includes:
  • Z is the data to be processed
  • is the Gaussian distribution variance
  • is the regularization parameter
  • I is the identity matrix
  • G and Hi represent the gradient parameters
  • represents the regularization parameter.
  • Z l is the data characteristic of the data to be processed, is the gradient expression of the objective function, Z l-1 is the data to be processed, and Z l-1 is constrained in the d-1-dimensional unit sphere space.
  • the data processing device further includes an output unit configured to output the data characteristics of the data to be processed.
  • the data to be processed of the unknown classification or clustering information is the data characteristics of the third data
  • the data characteristics of the third data are determined through another feedforward neural network
  • the input information of the l-th layer in another feedforward neural network includes the classification distribution information of the training data and the first data feature
  • the output information of the l-th layer includes the second data feature
  • the first data feature is the The output of the (l-1) layer
  • the first data feature and the second data feature are both used to represent the classification or clustering information of the training data, where l is a positive integer greater than 1.
  • each unit may also refer to the corresponding description of the method embodiment shown in FIG. 6 or FIG. 10 .
  • Figure 18 is a data processing device 1800 provided by an embodiment of the present application.
  • the data processing device 1800 includes at least one processor 1801 and a communication interface 1803.
  • it also includes a memory 1802.
  • the processor 1801, memory 1802 and communication interface 1803 are connected to each other through bus 1804.
  • Memory 1802 includes, but is not limited to, random access memory (RAM), read-only memory (ROM), erasable programmable read only memory (EPROM), or Portable compact disc read-only memory (CD-ROM), this memory 1802 is used for related computer programs and data.
  • Communication interface 1803 is used to receive and send data.
  • the processor 1801 may be one or more central processing units (CPUs).
  • CPUs central processing units
  • the CPU may be a single-core CPU or a multi-core CPU.
  • the processor 1801 in the data processing device 1800 is used to read the computer program code stored in the memory 1802 and perform the following operations:
  • the input information of the first layer in the feedforward neural network model includes the classification distribution information of the training data and the first data feature, and the output information of the first layer includes the second data feature, so
  • the first data feature is the output of the (l-1)th layer, and the first data feature and the second data feature are both used to represent the classification or clustering information of the training data; where l is greater than A positive integer of 1;
  • the data to be processed is input into the feedforward neural network model to determine the data characteristics of the data to be processed; the data characteristics of the data to be processed are classification or clustering information used to represent the data to be processed; The data characteristics of the data to be processed are used to determine the classification or clustering results of the data to be processed.
  • the dimensions of the data characteristics of the data to be processed are related to the data type of the data to be processed.
  • the input information of the first layer includes the classification distribution information of the training data and the training data.
  • the training data includes classification labels; the classification distribution information of the training data is determined based on the classification labels in the training data.
  • the processor 1801 is configured to obtain the first data feature Z l-1 ; according to the first data feature Z l-1 and the classification distribution information of the training data ⁇ i determines the network parameters of the l-th layer; the second data feature is determined based on the first data feature Z l-1 and the network parameters of the l-th layer.
  • the processor 1801 is configured to determine the objective function gradient expression according to the network parameters of the l-th layer and the first data feature Z l-1 ; according to the first data feature Z l-1 , the classification distribution information ⁇ i of the training data and the gradient expression of the objective function determine the second data feature Z l .
  • the processor 1801 is configured to determine the classification label correspondence in the training data according to the first data feature Z l-1 and the classification distribution information ⁇ i of the training data.
  • the network parameters of the l-th layer are determined according to the regularized autocorrelation matrix of each category corresponding to the classification label in the training data.
  • m i is the number of classification labels corresponding to the i-th category in the m training data
  • K is the number of all categories of classification labels in m training data
  • Z l-1 is the first data feature
  • ⁇ i is the classification distribution information of the training data
  • S i is the classification in the training data
  • is the regularization parameter
  • m i is the number of classification labels corresponding to the i-th category in the m training data
  • K is m training The number of all categories of classification labels in the data
  • Z l-1 is the first data feature
  • ⁇ i is the classification distribution information of the training data
  • I is the unit matrix
  • S i is the classification label in the training data
  • is the regularization parameter
  • S is the autocorrelation matrix of all categories of the classification label in the training data
  • the processor 1801 is configured to determine the gradient parameter according to the classification distribution information ⁇ i of the training data; determine the gradient parameter according to the first data feature Z l-1 and the gradient parameter The network parameters of the l-th layer.
  • the processor 1801 is configured to determine the classification distribution information corresponding to the expected classification label of the data to be processed according to the data to be processed and the network parameters of the first layer.
  • Classification distribution information corresponding to the data to be processed and the expected classification label of the data to be processed Determine the gradient expression of the objective function; determine the data characteristics of the data to be processed according to the data to be processed and the gradient expression of the objective function.
  • the processor 1801 is configured to determine the projection of the expected classification label of the data to be processed on the first category according to the data to be processed and the network parameters of the first layer. ;
  • the first category is any one of multiple categories corresponding to the predicted classification label of the data to be processed; the predicted classification label of the data to be processed is determined according to the projection of the data to be processed on the first category Corresponding classification distribution information.
  • Z is the data to be processed, is the network parameter of the i-th category of the l-th layer, is the projection of the expected classification label of the data to be processed on the i-th category of the l-th layer; is the classification distribution information corresponding to the expected classification label of the data to be processed, and ⁇ is a hyperparameter that controls the estimation confidence.
  • the objective function gradient expression includes:
  • m i is the number of m data to be processed that is expected to have a classification label of the i-th category
  • K is the number of all categories with expected classification labels in m data to be processed
  • Z is the data to be processed
  • Z is the data to be processed
  • S i is the autocorrelation matrix of the i-th category corresponding to the predicted classification label of the data to be processed
  • the regularized autocorrelation matrix of the i-th category corresponding to the expected classification label of the data to be processed.
  • Z is the data to be processed, is the network parameter of the i-th category of the l-th layer, is the projection of the expected classification label of the data to be processed on the i-th category of the l-th layer; is the classification distribution information corresponding to the expected classification label of the data to be processed, and ⁇ is a hyperparameter that controls the estimation confidence.
  • the objective function gradient expression includes:
  • m i is the number of m data to be processed that is expected to have a classification label of the i-th category
  • ⁇ i is a weight parameter used to balance the expected number of samples of each type in the data to be processed
  • Z is the data to be processed
  • Z is the data to be processed
  • Z is the data to be processed
  • S i is the autocorrelation matrix of the i-th category corresponding to the predicted classification label of the data to be processed
  • S is the autocorrelation matrix of all categories corresponding to the expected classification label of the data to be processed
  • the classification distribution information corresponding to the expected classification label of the data to be processed includes one or more of the following: distance information, correlation information, differential information or soft classification information.
  • Z is the data to be processed, is the classification distribution information corresponding to the expected classification label of the data to be processed, is the network parameter of the i-th category of the l-th layer
  • Z l is the data feature of the l-th layer of the data to be processed
  • Z l-1 is the data feature of the (l-1)-th layer of the data to be processed
  • ⁇ > means inner product.
  • the processor 1801 is configured to use classification distribution information corresponding to the expected classification label of the data to be processed. Determine gradient parameters (G and Hi ); determine the objective function gradient expression according to the data to be processed and the gradient parameters.
  • G [g 1 ,g 2 ,...,g i ];
  • Tr() represents the trace operation
  • I is the identity matrix
  • m i is the number of the i-th category in the m data to be processed
  • K is the number of all categories with expected classification labels in the m data to be processed
  • G and Hi represent the gradient parameters.
  • the gradient expression of the objective function includes:
  • Z is the data to be processed
  • is the Gaussian distribution variance
  • is the regularization parameter
  • I is the identity matrix
  • G and Hi represent the gradient parameters
  • represents the regularization parameter.
  • Z l is the data characteristic of the data to be processed, is the gradient expression of the objective function, Z l-1 is the data to be processed, and Z l-1 is constrained in the d-1-dimensional unit sphere space.
  • the processor 1801 is configured to output the data characteristics of the data to be processed.
  • the data to be processed of the unknown classification or clustering information is the data characteristics of the third data
  • the data characteristics of the third data are determined through another feedforward neural network
  • the input information of the l-th layer in another feedforward neural network includes the classification distribution information of the training data and the first data feature
  • the output information of the l-th layer includes the second data feature
  • the first data feature is the The output of the (l-1) layer
  • the first data feature and the second data feature are both used to represent the classification or clustering information of the training data, where l is a positive integer greater than 1.
  • the processor in the embodiment of the present application can be a central processing unit (Central Processing Unit, CPU), or other general-purpose processor, digital signal processor (Digital Signal Processor, DSP), or application specific integrated circuit. (Application Specific Integrated Circuit, ASIC), Field Programmable Gate Array (FPGA) or other programmable logic devices, transistor logic devices, hardware components or any combination thereof.
  • a general-purpose processor can be a microprocessor or any conventional processor.
  • the method steps in the embodiments of the present application can be implemented by hardware or by a processor executing software instructions.
  • Software instructions can be composed of corresponding software modules, and the software modules can be stored in random access memory, flash memory, read-only memory, programmable read-only memory, erasable programmable read-only memory, electrically erasable programmable read-only memory In memory, register, hard disk, mobile hard disk, CD-ROM or any other form of storage medium well known in the art.
  • An exemplary storage medium is coupled to the processor such that the processor can read information from the storage medium and write information to the storage medium.
  • the storage medium can also be an integral part of the processor.
  • the processor and storage media may be located in an ASIC. Additionally, the ASIC can be located in the base station or terminal. Of course, the processor and the storage medium may also exist as discrete components in the base station or terminal.
  • the computer program product includes one or more computer programs or instructions.
  • the computer may be a general purpose computer, a special purpose computer, a computer network, a network device, a user equipment, or other programmable device.
  • the computer program or instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another.
  • the computer program or instructions may be transmitted from a website, computer, A server or data center transmits via wired or wireless means to another website site, computer, server, or data center.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or data center that integrates one or more available media.
  • the available media may be magnetic media, such as floppy disks, hard disks, and tapes; optical media, such as digital video optical disks; or semiconductor media, such as solid-state hard drives.
  • the computer-readable storage medium may be volatile or nonvolatile storage media, or may include both volatile and nonvolatile types of storage media.
  • transmission may include the following three situations: sending of data, receiving of data, or sending of data and receiving of data.
  • data may include service data and/or signaling data.
  • the number of nouns means “singular noun or plural noun", that is, “one or more”, unless otherwise specified. "At least one” means one or more. "Including at least one of the following: A, B, C.” means it can include A, or B, or C, or A and B, or A and C, or B and C, or A, B and C. Among them, A, B, and C can be single or multiple.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请实施例提供一种数据处理方法及装置,该方法包括:确定前馈神经网络模型,前馈神经网络模型中的第l层的输入信息包括训练数据的分类分布信息和第一数据特征,第l层的输出信息包括第二数据特征,第一数据特征为第(l-1)层的输出,第一数据特征和第二数据特征均是用于表示训练数据的分类或聚类信息;其中,l为大于1的正整数;获取未知分类或聚类信息的待处理数据;将待处理数据输入前馈神经网络模型中确定待处理数据的数据特征;待处理数据的数据特征是用于表示待处理数据的分类或聚类信息;待处理数据的数据特征用于确定待处理数据的分类或聚类结果,能够减少通信开销,且该前馈神经网络架构更加灵活,并能够解释神经网络的黑盒问题。

Description

数据处理方法及装置
本申请要求于2022年3月23日提交中国专利局、申请号为202210290759.2、申请名称为“数据处理方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能技术领域,尤其涉及一种数据处理方法及装置。
背景技术
近些年,深度学习的发展使引起了学术界和工业界对基于深度学习的无线通信技术的研究,研究结果证实了深度学习技术可以提高无线通信系统的性能,并有潜力应用在物理层进行干扰调整、信道估计和信号检测、信号处理等方面。
通过自编码器可以取代传统通信收发机设计,将发送端和接收端用神经网络的方式建模,通过大量训练样本学习数据的分布,并预测结果。例如,神经网络的训练可以通过反向传播(back propagation,BP)算法实现,BP算法的学习过程由正向传播过程和反向传播过程组成。在正向传播过程中,输入信息通过输入层经隐藏层,逐层处理并传向输出层得到激励响应。反向传播过程通过计算激励响应与对应期望目标输出的差作为目标函数,进而逐层求出目标函数对各神经元权值的偏导数,构成目标函数对权值向量的梯度,从而可以修改权值。神经网络的学习在权值修改过程中完成,当误差达到所期望值时,神经网络的学习结束。然而,在BP算法中,网络层数、神经元个数的选择没有相应的理论指导,并且当网络结构发生改动时,需要进行重新训练。对网络输出结果不具备可靠的数学可解释性,将神经网络实现看作一个“黑盒”,理论性不能广泛认可,同时,在BP算法执行过程中导致的梯度消失或者梯度爆炸始终尚未得到有效解决。
发明内容
本申请实施例公开了一种数据处理方法及装置,能够减少通信开销,且该前馈神经网络架构更加灵活,并能够解释神经网络的黑盒问题。
本申请实施例第一方面公开了一种数据处理方法,包括:确定前馈神经网络模型,所述前馈神经网络模型中的第l层的输入信息包括训练数据的分类分布信息和第一数据特征,所述第l层的输出信息包括第二数据特征,所述第一数据特征为第(l-1)层的输出,所述第一数据特征和所述第二数据特征均是用于表示所述训练数据的分类或聚类信息;其中,l为大于1的正整数;获取未知分类或聚类信息的待处理数据;将所述待处理数据输入所述前馈神经网络模型中确定所述待处理数据的数据特征;所述待处理数据的数据特征是用于表示所述待处理数据的分类或聚类信息;所述待处理数据的数据特征用于确定所述待处理数据的分类或聚类结果。
在上述方法中,相比较于BP算法需要梯度回传更新发送端网络,采用本申请实施例的方法可以能够减少训练交互而导致的通信开销,提高训练效率,只需要接收端训练再训练一个与任务相关的读出层网络,而且该前馈神经网络的结构更加灵活,可通过增加网络层数来获得精度提升,即当l的取值越大,待处理数据的分类或聚类结果的准确性更高,避免因不 同收发端网络适配不同而需要重新训练的问题。而且该前馈神经网络模型是可解释的,能够解释神经网络的黑盒问题,并且输出的待处理数据的数据特征可以作为数据的预处理,能够用于后续的读出层操作。
在一种可能的实现方式中,所述待处理数据的数据特征的维度与所述待处理数据的数据类型相关。
在又一种可能的实现方式中,当l=2,所述第一数据特征为第一层的输出时,所述第一层的输入信息包括所述训练数据的分类分布信息和所述训练数据,所述训练数据包括分类标签;所述训练数据的分类分布信息是根据所述训练数据中的分类标签确定的。
在又一种可能的实现方式中,所述确定前馈神经网络模型,包括:获取所述第一数据特征Zl-1;根据所述第一数据特征Zl-1和所述训练数据的分类分布信息∏i确定所述第l层的网络参数;所述第二数据特征是根据所述第一数据特征Zl-1和所述第l层的网络参数确定的。
在又一种可能的实现方式中,所述第二数据特征是根据所述第一数据特征Zl-1和所述第l层的网络参数确定的,包括:根据第l层的网络参数和所述第一数据特征Zl-1确定目标函数梯度表达式;根据所述第一数据特征Zl-1、所述训练数据的分类分布信息∏i和所述目标函数梯度表达式确定所述第二数据特征Zl
在又一种可能的实现方式中,所述根据所述第一数据特征Zl-1和所述训练数据的分类分布信息∏i确定所述第l层的网络参数,包括:根据所述第一数据特征Zl-1和所述训练数据的分类分布信息∏i确定所述训练数据中的分类标签对应的各个类别的正则化自相关矩阵;根据所述训练数据中的分类标签对应的各个类别的正则化自相关矩阵确定所述第l层的网络参数。
在又一种可能的实现方式中,


其中,mi是m个训练数据中的分类标签对应第i类的个数,K是m个训练数据中的分类标签的所有类别的个数,是用于平衡训练数据中各类的样本数量的权重参数,Zl-1是所述第一数据特征,∏i是所述训练数据的分类分布信息,Si是所述训练数据中的分类标签对应的第i类的自相关矩阵,∈是正则化参数,是所述训练数据中的分类标签对应的第i类的正则化自相关矩阵,是所述第l层第i类的网络参数。
在又一种可能的实现方式中,

其中,mi是m个训练数据中的分类标签对应第i类的个数,K是m个训练数据中的分类标签的所有类别的个数,Zl-1是第一数据特征,∏i是所述训练数据的分类分布信息,I是单位矩阵,Si是所述训练数据中的分类标签对应的第i类的自相关矩阵,∈是正则化参数,是所述训练数据中的分类标签对应的第i类的正则化自相关矩阵,S是所述训练数据中的分类标签的所有类别的自相关矩阵,是所述训练数据中的分类标签的所有类别的正则化自相关矩阵,是所述第l层第i类的网络参数。
在又一种可能的实现方式中,所述根据所述第一数据特征Zl-1和所述训练数据的分类分布信息∏i确定所述第l层的网络参数,包括:根据所述训练数据的分类分布信息∏i确定梯度参数;根据所述第一数据特征Zl-1和所述梯度参数确定所述第l层的网络参数。
在又一种可能的实现方式中,


其中,Zl-1满足能量约束:Tr(Zl-1(Zl-1)T)=m(1+σ2d),σ为高斯分布方差,m是训练数据的采样个数,d是训练数据的维度,Zl-1是第一数据特征,e∈Rm×1为元素全为1的列向量,∏i是所述训练数据的分类分布信息,Tr()表示迹运算,I是单位矩阵,mi是m个训练数据中的分类标签对应第i类的个数,K是m个训练数据中的分类标签的所有类别的个数,是所述第l层第i类的网络参数,G和Hi为梯度参数。
在又一种可能的实现方式中,所述将所述待处理数据输入所述前馈神经网络模型确定所述待处理数据的数据特征,包括:根据所述待处理数据以及所述第l层的网络参数确定所述待处理数据的预计分类标签对应的分类分布信息根据所述待处理数据和所述待处理数据的预计分类标签对应的分类分布信息确定目标函数梯度表达式;根据所述待处理数据和所述目标函数梯度表达式确定所述待处理数据的数据特征。
在又一种可能的实现方式中,所述根据所述待处理数据以及所述第l层的网络参数确定所述待处理数据的预计分类标签对应的分类分布信息,包括:根据所述待处理数据和所述第l层的网络参数确定所述待处理数据的预计分类标签在第一类别上的投影;所述第一类别为所述待处理数据预计的分类标签对应的多个类别中的任意一个类别;根据所述待处理数据在第一类别上的投影确定所述待处理数据的预计分类标签对应的分类分布信息。
在又一种可能的实现方式中,

其中,Z为所述待处理数据,是所述第l层第i类的网络参数,是所述待处理数据的预计分类标签在第l层第i类上的投影;是所述待处理数据的预计分类标签对应的分类分布信息,η是控制估计置信度的超参。
在又一种可能的实现方式中,所述目标函数梯度表达式包括:
其中,mi是m个待处理数据中预计分类标签为第i类的个数,K是m个待处理数据中预计分类标签的所有类别的个数,是用于平衡待处理数据中预计的各类的样本数量的权重参数,Z是所述待处理数据,是所述待处理数据的预计分类标签对应的分类分布信息,Si是所述待处理数据的预计分类标签对应的第i类的自相关矩阵,所述待处理数据的预计分类标签对应的第i类的正则化自相关矩阵。
在又一种可能的实现方式中,

其中,Z为所述待处理数据,是所述第l层第i类的网络参数,是所述待处理数据的预计分类标签在第l层第i类上的投影;是所述待处理数据的预计分类标签对应的分类分布 信息,η是控制估计置信度的超参。
在又一种可能的实现方式中,所述目标函数梯度表达式包括:
其中,mi是m个待处理数据中预计分类标签为第i类的个数,αi用于平衡待处理数据中预计的各类的样本数量的权重参数,Z是所述待处理数据,是所述待处理数据的预计分类标签对应的分类分布信息,Si是所述待处理数据的预计分类标签对应的第i类的自相关矩阵,所述待处理数据的预计分类标签对应的第i类的正则化自相关矩阵,S是所述待处理数据的预计分类标签对应的所有类别的自相关矩阵,是所述待处理数据的预计分类标签对应的所有类别的正则化自相关矩阵。
在又一种可能的实现方式中,所述待处理数据的预计分类标签对应的分类分布信息,包括以下一项或多项:距离信息,相关性信息,差分信息或软分类信息。
在又一种可能的实现方式中,所述根据所述待处理数据以及所述第l层的网络参数确定所述待处理数据的预计分类标签对应的分类分布信息,包括:

其中,Z是所述待处理数据,是所述待处理数据的预计分类标签对应的分类分布信息,是所述第l层第i类的网络参数,Zl为所述待处理数据的第l层的数据特征,Zl-1为所述待处理数据的第(l-1)层的数据特征,<>表示内积。
在又一种可能的实现方式中,所述根据所述待处理数据和所述待处理数据的预计分类标签对应的分类分布信息确定目标函数梯度表达式,包括:根据所述待处理数据的预计分类标签对应的分类分布信息确定梯度参数(G和Hi);根据所述待处理数据和所述梯度参数确定所述目标函数梯度表达式。
在又一种可能的实现方式中,所述根据所述待处理数据的预计分类标签对应的分类分布信息确定梯度参数,包括:
G=[g1,g2,…,gi];
其中,是所述待处理数据的预计分类标签对应的分类分布信息,Tr()表示迹运算,I是单位矩阵,mi是m个待处理数据中第i类的个数,K是m个待处理数据中预计分类标签的所有类别的个数,G和Hi表示梯度参数。
在又一种可能的实现方式中,目标函数梯度表达式包括:
其中,Z是所述待处理数据,σ为高斯分布方差,∈是正则化参数,I是单位矩阵,G和Hi表示梯度参数,β表示正则化参数。
在又一种可能的实现方式中,所述根据所述待处理数据和所述目标函数梯度表达式确定所述待处理数据的数据特征,包括:
其中,Zl是待处理数据的数据特征,是目标函数梯度表达式,Zl-1是所述待处理数据,Zl-1约束在d-1维单位球空间内。
在又一种可能的实现方式中,所述方法还包括:输出所述待处理数据的数据特征。
在又一种可能的实现方式中,所述未知分类或聚类信息的待处理数据是第三数据的数据特征,所述第三数据的数据特征是经过另一个前馈神经网络确定的,所述另一个前馈神经网络中的第l层的输入信息包括训练数据的分类分布信息和第一数据特征,所述第l层的输出信息包括第二数据特征,所述第一数据特征为第(l-1)层的输出,所述第一数据特征和所述第二数据特征均是用于表示所述训练数据的分类或聚类信息,其中,l为大于1的正整数。
本申请实施例第二方面公开了一种数据处理装置,包括:第一确定单元、获取单元和第二确定单元,第一确定单元,用于确定前馈神经网络模型,所述前馈神经网络模型中的第l层的输入信息包括训练数据的分类分布信息和第一数据特征,所述第l层的输出信息包括第二数据特征,所述第一数据特征为第(l-1)层的输出,所述第一数据特征和所述第二数据特征均是用于表示所述训练数据的分类或聚类信息;其中,l为大于1的正整数;获取单元,用于获取未知分类或聚类信息的待处理数据;第二确定单元,用于将所述待处理数据输入所述前馈神经网络模型中确定所述待处理数据的数据特征;所述待处理数据的数据特征是用于表示所述待处理数据的分类或聚类信息;所述待处理数据的数据特征用于确定所述待处理数据的分类或聚类结果。
在一种可能的实现方式中,所述待处理数据的数据特征的维度与所述待处理数据的数据类型相关。
在又一种可能的实现方式中,当l=2,所述第一数据特征为第一层的输出时,所述第一层的输入信息包括所述训练数据的分类分布信息和所述训练数据,所述训练数据包括分类标签;所述训练数据的分类分布信息是根据所述训练数据中的分类标签确定的。
在又一种可能的实现方式中,第一确定单元,具体用于获取所述第一数据特征Zl-1;根据所述第一数据特征Zl-1和所述训练数据的分类分布信息∏i确定所述第l层的网络参数;所述第二数据特征是根据所述第一数据特征Zl-1和所述第l层的网络参数确定的。
在又一种可能的实现方式中,第一确定单元,具体用于根据第l层的网络参数和所述第一数据特征Zl-1确定目标函数梯度表达式;根据所述第一数据特征Zl-1、所述训练数据的分类分布信息∏i和所述目标函数梯度表达式确定所述第二数据特征Zl
在又一种可能的实现方式中,第一确定单元,具体用于根据所述第一数据特征Zl-1和所述训练数据的分类分布信息∏i确定所述训练数据中的分类标签对应的各个类别的正则化自相关矩阵;根据所述训练数据中的分类标签对应的各个类别的正则化自相关矩阵确定所述第l层的网络参数。
在又一种可能的实现方式中,


其中,mi是m个训练数据中的分类标签对应第i类的个数,K是m个训练数据中的分类标签的所有类别的个数,是用于平衡训练数据中各类的样本数量的权重参数,Zl-1是所述第一数据特征,∏i是所述训练数据的分类分布信息,Si是所述训练数据中的分类标签对应的第i类的自相关矩阵,∈是正则化参数,是所述训练数据中的分类标 签对应的第i类的正则化自相关矩阵,是所述第l层第i类的网络参数。
在又一种可能的实现方式中,

其中,mi是m个训练数据中的分类标签对应第i类的个数,K是m个训练数据中的分类标签的所有类别的个数,Zl-1是第一数据特征,∏i是所述训练数据的分类分布信息,I是单位矩阵,Si是所述训练数据中的分类标签对应的第i类的自相关矩阵,∈是正则化参数,是所述训练数据中的分类标签对应的第i类的正则化自相关矩阵,S是所述训练数据中的分类标签的所有类别的自相关矩阵,是所述训练数据中的分类标签的所有类别的正则化自相关矩阵,是所述第l层第i类的网络参数。
在又一种可能的实现方式中,第一确定单元,具体用于根据所述训练数据的分类分布信息∏i确定梯度参数;根据所述第一数据特征Zl-1和所述梯度参数确定所述第l层的网络参数。
在又一种可能的实现方式中,


其中,Zl-1满足能量约束:Tr(Zl-1(Zl-1)T)=m(1+σ2d),σ为高斯分布方差,m是训练数据的采样个数,d是训练数据的维度,Zl-1是第一数据特征,e∈Rm×1为元素全为1的列向量,∏i是所述训练数据的分类分布信息,Tr()表示迹运算,I是单位矩阵,mi是m个训练数据中的分类标签对应第i类的个数,K是m个训练数据中的分类标签的所有类别的个数,是所述第l层第i类的网络参数,G和Hi为梯度参数。
在又一种可能的实现方式中,第二确定单元,具体用于根据所述待处理数据以及所述第l层的网络参数确定所述待处理数据的预计分类标签对应的分类分布信息根据所述待处理数据和所述待处理数据的预计分类标签对应的分类分布信息确定目标函数梯度表达式;根据所述待处理数据和所述目标函数梯度表达式确定所述待处理数据的数据特征。
在又一种可能的实现方式中,第二确定单元,具体用于根据所述待处理数据和所述第l层的网络参数确定所述待处理数据的预计分类标签在第一类别上的投影;所述第一类别为所述待处理数据预计的分类标签对应的多个类别中的任意一个类别;根据所述待处理数据在第一类别上的投影确定所述待处理数据的预计分类标签对应的分类分布信息。
在又一种可能的实现方式中,

其中,Z为所述待处理数据,是所述第l层第i类的网络参数,是所述待处理数据的预计分类标签在第l层第i类上的投影;是所述待处理数据的预计分类标签对应的分类分布信息,η是控制估计置信度的超参。
在又一种可能的实现方式中,所述目标函数梯度表达式包括:
其中,mi是m个待处理数据中预计分类标签为第i类的个数,K是m个待处理数据中预计分类标签的所有类别的个数,是用于平衡待处理数据中预计的各类的样本数量的权重参数,Z是所述待处理数据,是所述待处理数据的预计分类标签对应的分类分布信息,Si是所述待处理数据的预计分类标签对应的第i类的自相关矩阵,所述待处理数据的预计分类标签对应的第i类的正则化自相关矩阵。
在又一种可能的实现方式中,

其中,Z为所述待处理数据,是所述第l层第i类的网络参数,是所述待处理数据的预计分类标签在第l层第i类上的投影;是所述待处理数据的预计分类标签对应的分类分布信息,η是控制估计置信度的超参。
在又一种可能的实现方式中,所述目标函数梯度表达式包括:
其中,mi是m个待处理数据中预计分类标签为第i类的个数,αi用于平衡待处理数据中预计的各类的样本数量的权重参数,Z是所述待处理数据,是所述待处理数据的预计分类标签对应的分类分布信息,Si是所述待处理数据的预计分类标签对应的第i类的自相关矩阵,所述待处理数据的预计分类标签对应的第i类的正则化自相关矩阵,S是所述待处理数据的预计分类标签对应的所有类别的自相关矩阵,是所述待处理数据的预计分类标签对应的所有类别的正则化自相关矩阵。
在又一种可能的实现方式中,所述待处理数据的预计分类标签对应的分类分布信息,包括以下一项或多项:距离信息,相关性信息,差分信息或软分类信息。
在又一种可能的实现方式中,

其中,Z是所述待处理数据,是所述待处理数据的预计分类标签对应的分类分布信息,是所述第l层第i类的网络参数,Zl为所述待处理数据的第l层的数据特征,Zl-1为所述待处理数据的第(l-1)层的数据特征,<>表示内积。
在又一种可能的实现方式中,第二确定单元,具体用于根据所述待处理数据的预计分类标签对应的分类分布信息确定梯度参数(G和Hi);根据所述待处理数据和所述梯度参数确定所述目标函数梯度表达式。
在又一种可能的实现方式中,
G=[g1,g2,…,gi];
其中,是所述待处理数据的预计分类标签对应的分类分布信息,Tr()表示迹运算,I是单位矩阵,mi是m个待处理数据中第i类的个数,K是m个待处理数据中预计分类标签的所有类别的个数,G和Hi表示梯度参数。
在又一种可能的实现方式中,目标函数梯度表达式包括:
其中,Z是所述待处理数据,σ为高斯分布方差,∈是正则化参数,I是单位矩阵,G和Hi表示梯度参数,β表示正则化参数。
在又一种可能的实现方式中,
其中,Zl是待处理数据的数据特征,是目标函数梯度表达式,Zl-1是所述待处理数据,Zl-1约束在d-1维单位球空间内。
在又一种可能的实现方式中,所述数据处理装置还包括输出单元,所述输出单元,用于输出所述待处理数据的数据特征。
在又一种可能的实现方式中,所述未知分类或聚类信息的待处理数据是第三数据的数据特征,所述第三数据的数据特征是经过另一个前馈神经网络确定的,所述另一个前馈神经网络中的第l层的输入信息包括训练数据的分类分布信息和第一数据特征,所述第l层的输出信息包括第二数据特征,所述第一数据特征为第(l-1)层的输出,所述第一数据特征和所述第二数据特征均是用于表示所述训练数据的分类或聚类信息,其中,l为大于1的正整数。
关于第二方面或可能的实现方式所带来的技术效果,可参考对于第一方面或相应的实施方式的技术效果的介绍。
本申请实施例第三方面公开了一种数据处理装置,所述装置包括至少一个处理器和通信接口,所述至少一个处理器调用存储器中存储的计算机程序或指令来实现上述任一方面所述的方法。
本申请实施例第四方面公开了一种芯片系统,所述芯片系统包括至少一个处理器和通信接口,所述至少一个处理器用于执行计算机程序或指令,以实现上述任一方面所述的方法。
本申请实施例第五方面公开了一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机指令,当所述计算机指令在处理器上运行时,以实现上述任一方面所述的方法。
本申请实施例第六方面公开了一种计算机程序产品,所述计算机程序产品中包括计算机程序代码,当所述计算机程序代码在计算机上运行时,以实现上述任一方面所述的方法。
本申请实施例第七方面公开了一种数据处理系统,所述系统包括:如第二方面所述的装置。
附图说明
以下对本申请实施例用到的附图进行介绍。
图1是本申请实施例提供的一种端到端通信网络架构图;
图2是本申请实施例提供的一种人工智能主体框架的一种结构示意图;
图3是本申请实施例提供的一种系统架构的示意图;
图4是本申请实施例提供的一种前馈神经网络的结构示意图;
图4A是本申请实施例提供的一种训练过程中前馈神经网络中的第1层计算过程示意图;
图4B是本申请实施例提供的一种训练过程中前馈神经网络中的第2层计算过程示意图;
图4C是本申请实施例提供的一种训练过程中前馈神经网络中的第3层计算过程示意图;
图4D是本申请实施例提供的一种推演过程中前馈神经网络中的第1层计算过程示意图;
图4E是本申请实施例提供的一种推演过程中前馈神经网络中的第2层计算过程示意图;
图4F是本申请实施例提供的一种推演过程中前馈神经网络中的第3层计算过程示意图;
图5是本申请实施例提供的一种芯片硬件结构示意图;
图6是本申请实施例提供的一种数据处理方法的流程示意图;
图7是本申请实施例提供的一种训练过程中前馈神经网络模型中的第l层的计算过程示意图;
图8是本申请实施例提供的一种训练过程中前馈神经网络模型中的第l层的计算过程示意图;
图9是本申请实施例提供的一种训练过程中前馈神经网络模型中的第l层的计算过程示意图;
图10是本申请实施例提供的一种数据处理方法的流程示意图;
图11是本申请实施例提供的一种推演过程中前馈神经网络模型中的第l层的计算过程示意图;
图12是本申请实施例提供的一种推演过程中前馈神经网络模型中的第l层的计算过程示意图;
图13是本申请实施例提供的一种推演过程中前馈神经网络模型中的第l层的计算过程示意图;
图14是本申请实施例提供的一种多视图场景示意图;
图15是本申请实施例提供的一种多节点场景示意图;
图16A是本申请实施例提供的一种以公式(1)为目标函数训练得到结果的示意图;
图16B是本申请实施例提供的一种以MSE为目标函数训练得到结果的示意图;
图17是本申请实施例提供的一种数据处理装置示意图;
图18是本申请实施例提供的一种数据处理装置示意图。
具体实施方式
下面结合本申请实施例中的附图对本申请实施例进行描述。
本申请的说明书以及附图中的术语“第一”和“第二”等是用于区分不同的对象,或者用于区别对同一对象的不同处理,而不是用于描述对象的特定顺序。此外,本申请的描述中所提到的术语“包括”和“具有”以及它们的任何变形,意图在于覆盖不排他的包含。例如包含了一些列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括其他没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其他步骤或单元。需要说明的是,本申请实施例中,“示例性地”或者“例如”等词用于表示作例子、例证或说明。本申请实施例中被描述为“示例性地”或者“例如”的任何实施例或设计方法不应被解释为比其他实施例或设计方案更优地或更具优势。确切而言,使用“示例性地”或者“例如”等词旨在以具体方式呈现相关概念。在本申请实施例中,“A和/或B”表示A和B,A或B两个含义。“A,和/或B,和/或C”表示A、B、C中的任一个,或者,表示A、B、C中的任两个,或者,表示A和B和C。下面将结合附图,对本申请中的技术方案进行描述。
近些年,深度学习的发展使引起了学术界和工业界对基于深度学习的无线通信技术的研究,研究结果证实了深度学习技术可以提高无线通信系统的性能,并有潜力应用在物理层进行干扰调整、信道估计和信号检测、信号处理等方面。
通过自编码器可以取代传统通信收发机设计,将发送端和接收端用神经网络的方式建模,通过大量训练样本学习数据的分布,并预测结果,如图1所示,图1示出了一种端到端通信网络架构,这种端到端的学习方式能够做到联合优化。例如,神经网络的训练可以通过反向传播(back propagation,BP)算法实现,BP算法的学习过程由正向传播过程和反向传播过程 组成。在正向传播过程中,输入信息通过输入层经隐含层,逐层处理并传向输出层。如果在输出层得不到期望的输出值,则取输出与期望的误差的平方和作为目标函数,转入反向传播,逐层求出目标函数对各神经元权值的偏导数,构成目标函数对权值向量的梯量,作为修改权值的依据,神经网络的学习在权值修改过程中完成,当误差达到所期望值时,神经网络的学习结束。BP算法要求人工神经元(或“节点”)的激励函数可微。然而,在BP算法中,网络层数、神经元个数的选择没有相应的理论指导,并且当网络结构发生改动时,需要进行重新训练。对网络输出结果不具备可靠的数学可解释性,将神经网络实现看作一个“黑盒”,理论性不能广泛认可,同时,在BP算法执行过程中导致的梯度消失或者梯度爆炸始终尚未得到有效解决。而且,当BP算法应用在通信场景中,通常将信道作为一个网络中的隐藏层,则要求信道必须可微。对于实际场景中的信道不一定满足该条件。
为了解决BP算法存在的问题,本申请实施例还提出了基于随机特征的神经网络和基于度量表示的神经网络,其中,基于随机特征的神经网络可以是极限学习机(extreme learning machine,ELM),ELM是一种典型的前馈神经网络的学习算法。该网络中通常具有单层或多层隐藏层节点,其中隐藏层节点的参数不需要被调整,而隐藏层到输出层的权值只需解一个线性方程组来确定,因此可以提升计算速度。该算法泛化性能良好,且其学习速度比运用BP算法训练的速度要快上1000倍,但为了获得足够的特征数量来表征原始数据,通常需要较宽的隐藏层来实现。基于度量表示的神经网络可以是基于希尔伯特-施密特独立性准则(the Hilbert-Schmidt independence criterion,HSIC)的神经网络训练方法,该方法使用近似信息瓶颈的方法训练而来,要求最大化隐藏层和标签之间的互信息,并同时最小化隐藏层表示和输入的互相依赖关系。其中互信息计算在随机变量中较为困难,故而采用了基于非参数核方法的HSIC准则,相较于基于BP的算法复杂度高。
因此,本申请实施例为了解决上述问题,提出了一种数据处理方法,提供了一种前馈神经网络模型,从而减少由于BP算法训练交互而导致的收发端通信开销,提高训练效率。并且应对不同收发端网络结构的场景中,通过调整网络层数来提升训练精度,避免因不同收发端网络适配不同而需要重新训练的问题。
首先对人工智能系统总体工作流程进行描述,请参见图2,图2示出的为人工智能主体框架的一种结构示意图,下面从“智能信息链”(水平轴)和“IT价值链”(垂直轴)两个维度对上述人工智能主体框架进行阐述。其中,“智能信息链”反映从数据的获取到处理的一系列过程。举例来说,可以是智能信息感知、智能信息表示与形成、智能推理、智能决策、智能执行与输出的一般过程。在这个过程中,数据经历了“数据—信息—知识—智慧”的凝练过程。“IT价值链”从人智能的底层基础设施、信息(提供和处理技术实现)到系统的产业生态过程,反映人工智能为信息技术产业带来的价值。
(1)基础设施
基础设施为人工智能系统提供计算能力支持,实现与外部世界的沟通,并通过基础平台实现支撑。通过传感器与外部沟通;计算能力由智能芯片(CPU、GPU、NPU、ASIC、FPGA等硬件加速芯片)提供;基础平台包括分布式计算框架及网络等相关的平台保障和支持,可以包括云存储和计算、互联互通网等。举例来说,传感器和外部沟通获取数据,这些数据提供给基础平台提供的分布式计算系统中的智能芯片进行计算。
(2)数据
基础设施的上一层的数据用于表示人工智能领域的数据来源。数据涉及到图形、图像、语音、文本,还涉及到传统设备的物联网数据,包括已有系统的业务数据以及力、位移、液 位、温度、湿度等感知数据。
(3)数据处理
数据处理通常包括数据训练,机器学习,深度学习,搜索,推理,决策等方式。
其中,机器学习和深度学习可以对数据进行符号化和形式化的智能信息建模、抽取、预处理、训练等。
推理是指在计算机或智能系统中,模拟人类的智能推理方式,依据推理控制策略,利用形式化的信息进行机器思维和求解问题的过程,典型的功能是搜索与匹配。
决策是指智能信息经过推理后进行决策的过程,通常提供分类、排序、预测等功能。
(4)通用能力
对数据经过上面提到的数据处理后,进一步基于数据处理的结果可以形成一些通用的能力,比如可以是算法或者一个通用系统,例如,翻译,文本的分析,计算机视觉的处理,语音识别,图像的识别等等。
(5)智能产业及行业应用
智能产业及行业应用指人工智能系统在各领域的产品和应用,是对人工智能整体解决方案的封装,将智能信息决策产品化、实现落地应用,其应用领域主要包括:智能终端、智能交通、智能医疗、自动驾驶、平安城市等。
本申请实施例主要应用在驾驶辅助、自动驾驶、手机终端等领域。
下面介绍几种应用场景:
应用场景1:高级驾驶辅助系统(advanced driver assistance system,ADAS)/自动驾驶解决方案(autonomous driving solution,ADS)
在ADAS和ADS中,需要实时进行多类型的2D目标检测,包括:动态障碍物(行人(Pedestrian)、骑行者(Cyclist)、三轮车(Tricycle)、轿车(Car)、卡车(Truck)、公交车(Bus)),静态障碍物(交通锥标(TrafficCone)、交通棍标(TrafficStick)、消防栓(FireHydrant)、摩托车(Motocycle)、自行车(Bicycle)),交通标志(TrafficSign)、导向标志(GuideSign)、广告牌(Billboard)、红色交通灯(TrafficLight_Red)/黄色交通灯(TrafficLight_Yellow)/绿色交通灯(TrafficLight_Green)/黑色交通灯(TrafficLight_Black)、路标(RoadSign))。另外,为了准确获取动态障碍物的在3维空间所占的区域,还需要对动态障碍物进行3D估计,输出3D框。为了与激光雷达的数据进行融合,需要获取动态障碍物的Mask,从而把打到动态障碍物上的激光点云筛选出来;为了进行精确的泊车位,需要同时检测出泊车位的4个关键点;为了进行构图定位,需要检测出静态目标的关键点。这是一个语义分割问题。自动驾驶车辆的摄像头捕捉到道路画面,需要对画面进行分割,分出路面、路基、车辆、行人等不同物体,从而保持车辆行驶在正确的区域。对于安全型要求极高的自动驾驶需要实时对画面进行理解,能够实时运行的进行语义分割的前馈神经网络至关重要。
应用场景2:图像分类场景
物体识别装置在获取待分类图像后,通过基于本申请实施例的数据处理方法所训练得到的分类模型对待分类图像中的物体进行处理,得到待分类图像的类别,然后可根据待分类图像中物体的物体类别对待分类图像进行分类。对于摄影师来说,每天会拍很多照片,有动物的,有人物,有植物的。采用本申请的方法可以快速地将照片按照照片中的内容进行分类,可分成包含动物的照片、包含人物的照片和包含植物的照片。
对于图像数量比较庞大的情况,人工分类的方式效率比较低下,并且人在长时间处理同一件事情时很容易产生疲劳感,此时分类的结果会有很大的误差。
应用场景3:商品分类
物体识别装置获取商品的图像后,通过基于本申请实施例的数据处理方法所训练得到的分类模型商品的图像进行处理,得到商品的图像中商品的类别,然后根据商品的类别对商品进行分类。对于大型商场或超市中种类繁多的商品,采用本申请的方法可以快速完成商品的分类,降低了时间开销和人工成本。
应用场景4:入口闸机人脸验证
这是一个图像相似度比对问题。在高铁、机场等入口的闸机上,乘客进行人脸认证时,摄像头会拍摄人脸图像,使用本申请实施例的方法抽取特征,和存储在系统中的身份证件的图像特征进行相似度计算,如果相似度高就验证成功。采用本申请的方法可以快速的进行人脸验证。
应用场景5:翻译机同声传译
这是一个语音识别和机器翻译问题。在语音识别和机器翻译问题上,前馈神经网络也是常有的一种识别模型。在需要同声传译的场景,必须达到实时语音识别并进行翻译,高效的前馈神经网络可以给翻译机带来更好的体验。
本申请实施例训练出的前馈神经网络模型可以实现上述功能。
下面介绍本申请实施例提供的系统架构。
参见图3,图3示出了本申请实施例提供的一种系统架构100。如所述系统架构100所示,数据采集设备160用于采集或生成训练数据,本申请实施例中训练数据包括:带标签的多张图像或者多个语音片段等;并将训练数据存入数据库130,训练设备120基于数据库130中维护的训练数据训练得到前馈神经网络模型,该前馈神经网络模型中的第l层的输入信息包括训练数据的分类分布信息和第一数据特征,该第l层的输出信息包括第二数据特征,该第一数据特征为第(l-1)层的输出,该第一数据特征和该第二数据特征均是用于表示该训练数据的分类或聚类信息;其中,l为大于1的正整数。
具体来说,该训练好的前馈神经网络模型能够用于实现本申请实施例提供的数据处理方法。
需要说明的是,在实际应用中,数据库130中维护的训练数据不一定都来自于数据采集设备160的采集,也有可能是从其他设备接收得到的。另外需要说明的是,训练设备120也不一定完全基于数据库130维护的训练数据进行前馈神经网络模型的训练,也有可能从云端或其他地方获取训练数据进行模型训练,上述描述不应该作为对本申请实施例的限定。
根据训练设备120训练得到的目标模型/规则101可以应用于不同的系统或设备中,如应用于图3所示的执行设备110,所述执行设备110可以是终端,如手机终端,平板电脑,笔记本电脑,增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR),车载终端等,还可以是服务器或者云端等。在图3中,执行设备110配置输入/输出(input/output,I/O)接口112,用于与外部设备进行数据交互,用户可以通过客户设备140向I/O接口112输入数据,所述输入数据在本申请实施例中可以包括:待识别图像、视频或待识别的语音片段。
在执行设备110对输入数据进行预处理,或者在执行设备110的计算模块111执行计算等相关的处理(比如进行本申请中前馈神经网络的功能实现)过程中,执行设备110可以调用数据存储系统170中的数据、代码等以用于相应的处理,也可以将相应处理得到的数据、指令等存入数据存储系统170中。
最后,I/O接口112将处理结果,如图像、视频或语音的识别结果或分类结果返回给客户设备140,从而客户设备140可以提供给用户设备150。该用户设备150可以是需要使用目标 模型/规则101的轻量级终端,如手机终端、笔记本电脑、AR/VR终端或车载终端等,以用于响应于终端用户的相应需求,如对终端用户输入的图像进行图像识别输出识别结果给该终端用户,或对终端用户输入的文本进行文本分类输出分类结果给该终端用户等。
值得说明的是,训练设备120可以针对不同的目标或称不同的任务,基于不同的训练数据生成相应的目标模型/规则101,该相应的目标模型/规则101即可以用于实现上述目标或完成上述任务,从而为用户提供所需的结果。
在图3中所示情况下,用户可以手动给定输入数据,该手动给定可以通过I/O接口112提供的界面进行操作。另一种情况下,客户设备140可以自动地向I/O接口112发送输入数据,如果要求客户设备140自动发送输入数据需要获得用户的授权,则用户可以在客户设备140中设置相应权限。用户可以在客户设备140查看执行设备110输出的结果,具体的呈现形式可以是显示、声音、动作等具体方式。客户设备140也可以作为数据采集端,采集如图所示输入I/O接口112的输入数据及输出I/O接口112的输出结果作为新的样本数据,并存入数据库130。当然,也可以不经过客户设备140进行采集,而是由I/O接口112直接将如图所示输入I/O接口112的输入数据及输出I/O接口112的输出结果,作为新的样本数据存入数据库130。
客户设备140在接收到输出结果后,可以将结果传输给用户设备150,用户设备150可以是终端,如手机终端,平板电脑,笔记本电脑,AR/VR,车载终端等。在其中一个示例中,用户设备150可以运行目标模型/规则101,以实现特定的功能。
值得注意的是,图3仅是本申请实施例提供的一种系统架构的示意图,图中所示设备、器件、模块等之间的位置关系不构成任何限制,例如,在图3中,数据存储系统170相对执行设备110是外部存储器,在其它情况下,也可以将数据存储系统170置于执行设备110中。
如图3所示,根据训练设备120训练得到目标模型/规则101,该目标模型/规则101可以是应用场景2和应用场景3中的分类模型,应用场景4中的图像识别模型,应用场景5中的语音识别模型。具体的,本申请实施例提供的目标模型/规则101,例如,图像识别模型;又例如,语音识别模型等等,在实际应用中,图像识别模型、语音识别模型都可以是前馈神经网络模型。
下面介绍本申请实施例提供的一种前馈神经网络的结构示意图。
图4为本申请实施例提供的一种前馈神经网络400的结构示意图。该前馈神经网络400可以称为可解释前馈神经网络,该前馈神经网络400可以包括输入层410,中间层420,以及输出层430。其中,输入层410可以获取待处理数据,并将获取到的待处理数据交由中间层420进行处理,可以得到待处理数据的数据特征,该待处理数据的数据特征用于确定该待处理数据的处理结果,例如,该待处理数据的数据特征用于确定该待处理数据的分类或聚类结果,其中,中间层420具体可以包括多少层,此处不做限定,当中间层包括的层数越多,该待处理数据的分类或聚类结果的结果越准确。该输出层430可以用于输出中间层420得到的待处理数据的数据特征。
如图4所示中间层420可以包括如示例421-423层,其中,421-423层可以称为第1层、第2层、第3层。如下介绍第1层、第2层、第3层的工作原理,具体从以下两个方面进行描述,训练过程和推演过程,具体如下:
训练过程具体如下:
第1层中的具体计算过程如图4A所示,第1层的输入信息包括训练数据的分类分布信息和训练数据,其中,训练数据包括分类标签,训练数据的分类分布信息是根据训练数据中 的分类标签确定的,然后,根据训练数据的分类分布信息∏i和训练数据确定第1层的网络参数然后,根据第1层的网络参数和训练数据确定目标函数梯度表达式,并根据训练数据和训练数据的分类分布信息∏i和以及目标函数梯度表达式确定第1层的输出信息,该第1层的输出信息包括训练数据的数据特征Z1
第2层中的具体计算过程如图4B所示,第2层的输入信息包括训练数据的分类分布信息和第1层的输出信息,即训练数据的数据特征Z1,然后,根据训练数据的分类分布信息∏i和训练数据的数据特征Z1确定第2层的网络参数然后,根据第2层的网络参数和训练数据的数据特征Z1确定目标函数梯度表达式,并根据训练数据的数据特征Z1和训练数据的分类分布信息∏i和以及目标函数梯度表达式确定第2层的输出信息,该第2层的输出信息包括第2层的数据特征Z2
第3层中的具体计算过程如图4C所示,第3层的输入信息包括训练数据的分类分布信息和第2层的输出信息,即第2层的数据特征Z2,然后,根据训练数据的分类分布信息∏i和第2层的数据特征Z2确定第3层的网络参数然后,根据第3层的网络参数和第2层的数据特征Z2确定目标函数梯度表达式,并根据第2层的数据特征Z2和训练数据的分类分布信息∏i和以及目标函数梯度表达式确定第3层的输出信息,该第3层的输出信息包括第3层的数据特征Z3
然后,将每一层的网络参数,例如,存储为一个d×d的全连接层参数,从而得到训练好的前馈神经网络模型。
推演过程具体如下:
获取未知分类或聚类信息的待处理数据;将该待处理数据输入到该训练好的前馈神经网络模型得到待处理数据的数据特征的具体过程如下:
第1层中的具体计算过程如图4D所示,第1层的输入信息包括待处理数据,根据该待处理数据和第1层的网络参数确定待处理数据的预计分类标签对应的分类分布信息然后根据待处理数据和待处理数据的预计分类标签对应的分类分布信息确定目标函数梯度表达式,根据待处理数据和所述目标函数梯度表达式确定第1层的输出信息,该第1层的输出信息包括第1层的数据特征Z1
第2层中的具体计算过程如图4E所示,第2层的输入信息包括第1层的输出信息,即第1层的数据特征Z1,根据该第1层的数据特征Z1和第2层的网络参数确定待处理数据的预计分类标签对应的分类分布信息然后根据第1层的数据特征Z1和待处理数据的预计分类标签对应的分类分布信息确定目标函数梯度表达式,根据第1层的数据特征Z1和所述目标函数梯度表达式确定第2层的输出信息,该第2层的输出信息包括第2层的数据特征Z2
第3层中的具体计算过程如图4F所示,第3层的输入信息包括第2层的输出信息,即第2层的数据特征Z2,根据该第2层的数据特征Z2和第3层的网络参数确定待处理数据的预计分类标签对应的分类分布信息然后根据第2层的数据特征Z2和待处理数据的预计分类标签对应的分类分布信息确定目标函数梯度表达式,根据第2层的数据特征Z2和所述目标函数梯度表达式确定第3层的输出信息,该第3层的输出信息包括第3层的数据特征Z3,该第3层的数据特征可以称为待处理数据的数据特征。
下面介绍本申请实施例提供的一种芯片硬件结构。
图5为本申请实施例提供的一种芯片硬件结构,该芯片包括人工智能处理器50。该芯片可以被设置在如图3所示的执行设备110中,用以完成计算模块111的计算工作。该芯片也可以被设置在如图3所示的训练设备120中,用以完成训练设备120的训练工作并输出目标 模型/规则101。如图4所示的前馈神经网络中各层的算法均可在如图5所示的芯片中得以实现。
人工智能处理器50可以是神经网络处理器(network processing unit,NPU),张量处理器(tensor processing unit,TPU)或者图形处理器(graphics processing unit,GPU)等一切适合用于大规模异或运算处理的处理器。以NPU为例:NPU可以作为协处理器挂载到主CPU(Host CPU)上,由主CPU为其分配任务。NPU的核心部分为运算电路503,通过控制器504控制运算电路503提取存储器(权重存储器或输入存储器)中的数据并进行运算。
在一些实现中,运算电路503内部包括多个处理单元(process engine,PE)。在一些实现中,运算电路503是二维脉动阵列。运算电路503还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数字运算的其他电子线路。在一些实现中,运算电路503是通用的矩阵处理器。
举例来说,假设有输入矩阵A,权重矩阵B,输出矩阵C。运算电路503从权重存储器502中取矩阵B相应的数据,并缓存在运算电路503中的每一个PE上。运算电路503从输入存储器501中取矩阵A的输入数据,根据矩阵A的输入数据与矩阵B的权重数据进行矩阵运算,得到的矩阵的部分结果或最终结果,保存在累加器(accumulator)508中。
统一存储器506用于存放输入数据以及输出数据。权重数据直接通过存储单元访问控制器(direct memory access controller,DMAC)505被搬运到权重存储器502中。输入数据也通过DMAC被搬运到统一存储器506中。
总线接口单元(bus interface unit,BIU)510,用于DMCA和取指存储器(instruction fetch buffer)509的交互;总线接口单元510还用于取指存储器509从外部存储器获取指令;总线接口单元510还用于存储单元访问控制器505从外部存储器获取输入矩阵A或者权重矩阵B的原数据。
DMAC主要用于将外部存储器DDR中的输入数据搬运到统一存储器506中,或将权重数据搬运到权重存储器502中,或将输入数据搬运到输入存储器501中。
向量计算单元507可以包括多个运算处理单元,在需要的情况下,对运算电路503的输出做进一步处理,如向量乘,向量加,指数运算,对数运算,大小比较等等。向量计算单元507主要用于前馈神经网络中的中间层计算。
在一些实现中,向量计算单元507将经处理的输出的向量存储到统一缓存器506。例如,向量计算单元507可以将非线性函数应用到运算电路503的输出,例如累加值的向量,用以生成激活值。在一些实现中,向量计算单元507生成归一化的值、合并值,或二者均有。在一些实现中,处理过的输出的向量能够用作到运算电路503的激活输入,例如用于在前馈神经网络中的后续层中的使用。
与控制器504连接的取指存储器(instruction fetch buffer)509,用于存储控制器504使用的指令。
控制器504,用于调用取指存储器509中缓存的指令,实现控制该运算加速器的工作过程。
一般地,统一存储器506,输入存储器501,权重存储器502以及取指存储器509均为片上(On-Chip)存储器,外部存储器为该NPU外部的存储器,该外部存储器可以为双倍数据率同步动态随机存储器(double data rate synchronous dynamic random access memory,DDR SDRAM)、高带宽存储器(high bandwidth memory,HBM)或其他可读可写的存储器。
上文中介绍的图3中的执行设备110能够执行本申请实施例的数据处理方法或者数据处 理方法的各个步骤,图4的前馈神经网络模型和图5所示的芯片也可以执行本申请实施例的数据处理方法或者数据处理方法的各个步骤。
本申请实施例提供了一种系统架构。该系统架构包括一个或多个本地设备、执行设备和数据存储系统。其中,本地设备通过通信网络与执行设备连接。
执行设备可以由一个或多个服务器实现。可选的,执行设备可以与其它计算设备配合使用,例如:数据存储器、路由器、负载均衡器等设备。执行设备可以布置在一个物理站点上,或者分布在多个物理站点上。执行设备可以使用数据存储系统中的数据,或者调用数据存储系统中的程序代码来实现本申请实施例的神经网络的量化方法。
用户可以操作各自的用户设备(例如一个或多个本地设备)与执行设备进行交互。每个本地设备可以表示任何计算设备,例如个人计算机、计算机工作站、智能手机、平板电脑、智能摄像头、智能汽车或其他类型蜂窝电话、媒体消费设备、可穿戴设备、机顶盒、游戏机等。
每个用户的本地设备可以通过任何通信机制/通信标准的通信网络与执行设备进行交互,通信网络可以是广域网、局域网、点对点连接等方式,或它们的任意组合。
在一种实现方式中,本地设备从执行设备获取到目标神经网络的相关参数,将目标神经网络部署在本地设备、本地设备上,利用该目标神经网络进行图像分类或者图像处理等等。其中,目标神经网络为根据本申请实施例的数据处理方法训练得到的。
在另一种实现中,执行设备上可以直接部署目标神经网络,执行设备通过从本地设备和本地设备获取待处理数据,并根据目标神经网络对待处理数据进行分类或者其他类型的处理。
上述执行设备也可以称为云端设备,此时执行设备一般部署在云端。
下面对本申请中的部分用语进行解释说明,以便于理解。
(1)基于数据自相关矩阵的JS(Jensen-Shannon)散度度量
假设有m个采样数据,每个采样数据的维度为d,那么采样数据Z=[X1,X2,…,Xm]∈Rd×m,采样数据Z的自相关矩阵S可以作为表征采样数据分布的重要参数,采样数据Z的自相关矩阵S的计算公式具体如下:
其中,S为采样数据自相关矩阵,m为采样数据的个数,Z为采样数据。
对于Z来说,自相关矩阵S是无偏估计,并且正定。同样,对于某一类的自相关矩阵可定义为:
其中,Si为采样数据中分类标签对应第i类的数据的自相关矩阵,mi表示采样数据中分类标签对应第i类的个数,故K是m个采样数据中的分类标签的所有类别的个数,∏i为采样数据中分类标签对应第i类的数据的分类分布信息,Z为采样数据。
如果将两个随机变量的自相关矩阵服从某个高维正态分布,那么可以定义两个矩阵的KL(Kullback-Leibler)散度为:
其中,DKL(Si|Sj)为采样数据中分类标签对应第i类的数据的自相关矩阵和采样数据中分类标签对应第j类的数据的自相关矩阵的KL散度,Si为采样数据中分类标签对应第i类的数据的自相关矩阵,Sj为采样数据中分类标签对应第j类的数据的自相关矩阵,d为采样数据的维度,Tr()表示迹运算,logdet()表示对矩阵的行列式取对数。
由于KL散度是不对称的,为满足距离度量的对称性,还可以用JS(Jensen-Shannon)散度,那么可以定义两个矩阵的JS散度为:
其中,DJS(Si|Sj)为采样数据中分类标签对应第i类的数据的自相关矩阵和采样数据中分类标签对应第j类的数据的自相关矩阵的JS散度,Si为采样数据中分类标签对应第i类的数据的自相关矩阵,Sj为采样数据中分类标签对应第j类的数据的自相关矩阵,d为采样数据的维度,Tr()表示迹运算。
因此,可以通过确定目标函数进行运算扩大不同分类采样数据的自相关矩阵之间的JS散度,以此来区分不同分类的采样数据,从而达到分类/聚类的效果,具体该目标函数的表达式如下:
其中,αi,j用于平衡各个类别的采样数据的数量的权重参数,mi表示采样数据中分类标签对应第i类的个数,mj表示采样数据中分类标签对应第j类的个数,DJS(Si|Sj)为采样数据中分类标签对应第i类的数据的自相关矩阵和采样数据中分类标签对应第j类的数据的自相关矩阵的JS散度。
该目标函数可以用于网络更新,为了实现前馈神经网络,可以采用梯度上升的方式来更新数据特征Z,具体如下:
其中,Zl表示前馈神经网络中第l层的数据特征,Zl-1表示前馈神经网络中第(l-1)层的数据特征,表示目标函数的梯度表达式,λ表示步长,或者学习率。
根据目标函数可以确定目标函数梯度表达式,具体如下:
其中,αi,j用于平衡采样数据各个类别的样本数量的权重参数,是采样数据中分类标签对应第i类的数据的正则化自相关矩阵,Si为采样数据中分类标签对应第i类的数据的自相关矩阵,∈为正则参数,I为单位矩阵,mi表示采样数据中分类标签对应第i类的个数,故K是m个采样数据中的分类标签的所有类别的个数,∏i为采样数据中分类标签对应第i类的数据的分类分布信息,是采样数据中分类标签对应第j类的数据的正则化自相关矩阵,Sj为采样数据中分类标签对应第j类的数据的自相关矩阵,mj表示采样数据中分类标签对应第j类的个数,∏j为采样数据分类标签对应中第j类的数据的分类分布信息,Z为采样数据。
(2)基于数据自相关矩阵的KL散度度量
当原始数据服从某个概率分布时,其相应的特征Z也服从某个概率分布,表示为:
其中,P(Z)是由一组条件概率{P(Z|Z∈Ck)}生成的混合分布,Ck为分类信息。当未给出分类信息时,随机向量Z服从分布P(Z),当给定分类信息Ck时,随机向量Z服从分布P(Z|Z∈Ck)。因此希望分类信息的增加能够带来特征分布较大的变化,我们将分布P(Z|Z∈Ck)与分布P(Z)之间的差异性作为特征的一种度量,具体该目标函数的表达式如下:
其中,αk用于平衡各个类别的采样数据的数量的权重参数,mk表示采样数据中的分类标签对应第k类的个数,Sk为服从条件概率分布P(Z|Z∈Ck)的特征Zk的自相关矩阵,S为服从概率分布P(Z)的特征Z的自相关矩阵,DKL(Sk||S)为Sk和S的KL散度。
根据目标函数可以确定目标函数梯度表达式,具体如下:
其中,αk用于平衡各个类别的采样数据的数量的权重参数,mk表示采样数据中的分类标签对应第k类的个数,故K是m个采样数据中的分类标签的所有类别的个数,是采样数据中分类标签对应第k类的数据的正则化自相关矩阵,∈为正则参数,Sk是采样数据中分类标签对应第k类的数据的自相关矩阵,I为单位矩阵,∏k为采样数据中分类标签对应第k类的数据的分类分布信息,是采样数据中分类标签的所有类别的正则化自相关矩阵,S是采样数据中分类标签的所有类别的自相关矩阵,Z为采样数据。
(3)基于对比学习的线性判别表示准则(linear discriminative representation,LDR):
特征提取可以看作是寻找一种从原始数据空间到特征空间的映射的过程。对比学习(Contrastive Learning)作为特征提取的一种方法,其核心思想是,对于相似的原始数据,它们映射到特征空间中的像之间的距离应该尽可能地接近,而对于差别较大的原始数据,它们映射到特征空间的像之间的距离应该尽可能地远。因此,可以使用基于对比学习的思想来设计目标函数,具体遵循以下两个原则:1)对比性:数据的分类/聚类的中心节点之间的距离应该尽可能地大;2)多样性:在同一个分类/聚类内,数据应尽可能地保持多样性。
具体如下:对于有n个分类/聚类的数据,根据对比性原则,如果采用直接计算两两节点之间的距离,则计算量为O(n2),且为多目标优化问题,难以处理。因此我们将对比性原则等价地描述为:在数据能量一定的条件下,最大化各个节点所张成的n-维单纯形的体积。多样性原则可以利用熵来刻画,并且将多样性原则描述为最大化分类/聚类信息已知条件下特征的熵。可以证明,在特征能量一定的条件下,当且仅当特征的分布为白高斯噪声时,特征具有最大熵。因此,希望特征的分布与高斯分布尽可能地相近。与上述描述类似,可以采用KL散度来描述特征分布与高斯分布之间的相似度,将目标函数定义为:
其中,有由中心节点张成的K-单纯形体积为 为元素全为1的列向量,∏k为采样数据中分类标签对应第k类的数据的分类分布信息,Tr()表示迹运算,并且要求Z满足能量约束Tr(ZZT)=m(1+σ2d),σ为高斯分布方差,m表示采样数据的个数,d是采样数据的维度。
该目标函数满足凸性和酉不变性,故该目标函数的梯度表达式具体如下:
其中,为元素全为1的列向量,∏k为采样数据中分类标签对应第k类的数据的分类分布信息,Tr()表示迹运算,并且要求Z满足能量约束Tr(ZZT)=m(1+σ2d),σ为高斯分布方差,m表示采样数据的个数,mk是m个采样数据中分类标签对应第k类的个数,K是m个采样数据中的分类标签的所有类别的个数,I是单位矩阵,β表示正则化参数,Z为采样数据。
下面详细介绍本申请实施例涉及的方法。图6为本申请实施例提供的一种数据处理方法的流程示意图。该方法可由数据处理装置来执行,数据处理装置具体可以是图3所示的系统架构100中的训练设备120,该方法包括但不限于如下步骤:
步骤S601、获取训练数据。
具体地,该训练数据包括分类标签。在一种示例中,假设训练数据有m=100张图片,第1-10张图片为第1类,即数字“0”类,第11-20张图片为第2类,即数字“1”类,第21-30张图片为第3类,即数字“3”类,依次类推,第91-100张图片为第10类,即数字“9”类。
在一种可能的实现方式中,可以根据训练数据的分类标签确定训练数据的分类分布信息。
具体的,一般性的分类或聚类任务,可以有m个d维数据,表示为特征矩阵Z∈Rd×m,有K个分类/聚类C1,…,CK。当考虑软分类/聚类时,具体可以有如下定义:
可以看出∏k是只有对角线有值的对角阵,且∑kk=Im×m。∏k表示数据中每个分类存在的分布信息,在训练集和测试集应该是相同的,因此,可以对通过估计参数∏k的方式得到原始数据的分类分布信息,进而利用该原始数据的分类信息对数据进行特征提取。
在一种示例中,MNIST数据集为例,假设从数据集中采样了m=100张图片数据,每张图片为d=28*28维值在[0,1]之间的像素点构成,该100张图片为训练数据,那么Z表示这样的一组训练数据组成的特征矩阵。其中有K=10个分类,假定第1-10张图片为数字“0”的类,第11-20张图片为数字“1”的类,因此,可以确定对于数字“0”类,为1,其余为0,那么训练数据中数字“0”的类存在的分布信息∏0=diag(1,1,1,1,1,1,1,1,1,1,0,…,0)。其余的分类情况类似。
步骤S602、确定前馈神经网络模型。
具体地,前馈神经网络模型中的第l层的输入信息包括训练数据的分类分布信息和第一数据特征,第l层的输出信息包括第二数据特征,该第一数据特征为第(l-1)层的输出,第一数据特征和第二数据特征均是用于表示所述训练数据的分类或聚类信息;其中,l为大于1的正整数。
在一种可能的实现方式中,当l=2,该第一数据特征为第一层的输出时,第一层的输入信息包括训练数据的分类分布信息和训练数据。也即,当l=2,第一层的输入信息包括训练数据的分类分布信息和训练数据,第一层的输出为第一数据特征,第2层的输入信息包括训练数据的分类分布信息和第一数据特征,第2层的输出信息包括第二数据特征。其中,可以将输入的数据集X的输入维度通过特征工程降维到d维,得到训练数据作为输入,在本申请实施例中,输入的数据集X的输入维度和训练数据的维度相同。
在一种可能的实现方式中,所述确定前馈神经网络模型,包括:获取所述第一数据特征Zl-1;根据第一数据特征Zl-1和训练数据的分类分布信息∏i确定所述第l层的网络参数;第二数据特征是根据所述第一数据特征Zl-1和所述第l层的网络参数确定的。
具体地,根据第一数据特征Zl-1和训练数据的分类分布信息∏i确定所述第l层的网络参数具体包括如下方式:根据所述第一数据特征Zl-1和所述训练数据的分类分布信息∏i确定所述训练数据中的分类标签对应的各个类别的正则化自相关矩阵;根据所述训练数据中的分类标签对应的各个类别的正则化自相关矩阵确定所述第l层的网络参数。
具体公式如下:


其中,mi是m个训练数据中的分类标签对应第i类的个数,K是m个训 练数据中的分类标签的所有类别的个数,是用于平衡训练数据中的各类的样本数量的权重参数,Zl-1是第一数据特征,∏i是所述训练数据的分类分布信息,Si是所述训练数据中的分类标签对应的第i类的自相关矩阵,∈是正则化参数,是所述训练数据中的分类标签对应的第i类的正则化自相关矩阵,是所述第l层第i类的网络参数。
其中,根据上述公式确定如果的值越小,说明训练数据中的第i类的分布与其他类越接近,故能作为一种判别参数,可以将每层的Ui用网络储存成一个d×d的全连接层参数。最后通过计算目标函数的梯度表达式,例如公式(1),并通过L2正则化约束实现投影到单位球内的操作,令特征Zl-1约束在单位球空间内,得到下一层的特征Zl
或,具体公式如下:



其中,mi是m个训练数据中的分类标签对应第i类的个数,K是m个训练数据中的分类标签的所有类别的个数,Zl-1是第一数据特征,∏i是所述训练数据的分类分布信息,I是单位矩阵,Si是所述训练数据中的分类标签对应的第i类的自相关矩阵,∈是正则化参数,是所述训练数据中的分类标签对应的第i类的正则化自相关矩阵,S是所述训练数据中的分类标签的所有类别的自相关矩阵,是所述训练数据中的分类标签的所有类别的正则化自相关矩阵,是所述第l层第i类的网络参数。
其中,根据上述公式确定如果的值越小,说明训练数据中的第i类的分布与其他类越接近,故能作为一种判别参数,可以将每一层的Ai用网络储存成一个d×d的全连接层参数。最后通过计算目标函数的梯度表达式,例如公式(2),并通过L2正则化约束实现投影到单位球内的操作,令特征Zl-1约束在单位球空间内,得到下一层的特征Zl
具体地,根据第一数据特征Zl-1和训练数据的分类分布信息∏i确定所述第l层的网络参数具体还可以包括如下方式:根据所述训练数据的分类分布信息∏i确定梯度参数;然后根据所述第一数据特征Zl-1和所述梯度参数确定所述第l层的网络参数。
具体公式如下:


其中,Zl-1满足能量约束:Tr(Zl-1(Zl-1)T)=m(1+σ2d),σ为高斯分布方差,m是训练数据的采样个数,d是训练数据的维度,Zl-1是第一数据特征,e∈Rm×1为元素全为1的列向量,∏i是所述训练数据的分类分布信息,Tr()表示迹运算,I是单位矩阵,mi是m个训练数据中的分类标签对应第i类的个数,K是m个训练数据中的分类标签的所有类别的个数,表示第l层第i类的网络参数,G和Hi为梯度参数。
其中,是指聚类中心,这里可以单纯形的顶点作为聚类中心,用来标记各个类别间的参照,如果的值越小,说明训练数据中第i类的分布与训练数据中第j类的分布越接近,故可以作为一种判别参数,这里可以将每一层中间变量Ci用网络储存成一个d×d的全连接层 参数。最后通过计算目标函数的梯度表达式,例如公式(3),并通过L2正则化约束实现投影到单位球内的操作,令特征Zl-1约束在单位球空间内,得到下一层的特征Zl。具体如下:
具体地,在确定第l层的网络参数之后,可以根据所述第一数据特征Zl-1和所述第l层的网络参数确定第二数据特征,具体包括如下方式:根据第l层的网络参数和所述第一数据特征Zl-1确定目标函数梯度表达式;然后,根据所述第一数据特征Zl-1、所述训练数据的分类分布信息∏i和所述目标函数梯度表达式确定所述第二数据特征Zl。其中,目标函数梯度表达式可以如上述公式(1)、公式(2)或公式(3)所述,此处不再赘述。
其中,Zl为第二数据特征,λ表示步长或学习率,为目标函数梯度表达式,Zl-1为第一数据特征。
为了更好的描述前馈神经网络模型中的训练过程,分别以上述公式(1)、公式(2)和公式(3)作为目标函数梯度表达式进行举例说明,具体如下:
在一种示例中,以目标函数梯度表达式为上述公式(1)为例,前馈神经网络模型中的第l层的计算过程如图7所示,从公式(1)可知,计算第l层的特征Zl需要第(l-1)层的特征Zl-1,以及训练数据的分类分布信息∏i和第l层的网络参数,即中间变量其中,表示第l层第i类的网络参数。通过采样训练数据中的已知的分类标签,可以确定训练数据的分类分布信息∏i,从而确定训练数据的分类标签对应的第i类的正则化自相关矩阵进而得到中间变量如果的值越小,说明训练数据中第i类的分布与训练数据中第j类的分布越接近,故可以作为一种判别参数,这里可以将每一层中间变量Ui用网络储存成一个d×d的全连接层参数。最后通过计算目标函数梯度表达式,例如,公式(1),并通过L2正则化约束实现投影到单位球内的操作,令第(l-1)层的特征Zl-1约束在单位球空间内,得到第l层的的特征Zl,具体公式如下。
其中,Zl-1为第(l-1)层的特征,λ表示步长或学习率,为公式(1),Zl为第l层的特征。
在一种示例中,以目标函数梯度表达式为上述公式(2)为例,前馈神经网络模型中的第l层的计算过程如图8所示,从公式(2)可知,计算第l层的特征Zl需要第(l-1)层的特征Zl-1,以及训练数据的分类分布信息∏i和第l层的网络参数,即中间变量其中,表示第l层第i类的网络参数,表示第l层第i类的网络参数。通过采样训练数据中的已知的分类标签,可以确定训练数据的分类分布信息∏i,从而确定训练数据的分类标签对应的第i类的正则化自相关矩阵进而得到中间变量如果的值越小,说明训练数据中第i类的分布与训练数据中第j类的分布越接近,故可以作为一种判别参数,这里可以将每一层中间变量Ai用网络储存成一个d×d的全连接层参数。最后通过计算目标函数梯度表达式,例如,公式(2),并通过L2正则化约束实现投影到单位球内的操作,令第(l-1)层的特征Zl-1约束在单位球空间内,得到第l层的特征:
其中,Zl-1为第(l-1)层的特征,λ表示步长或学习率,为公式(2),Zl为第l层的特征。
在一种示例中,以目标函数梯度表达式为上述公式(3)为例,前馈神经网络模型中的第l层的计算过程如图9所示,从公式(3)可知,计算第l层的特征Zl需要第(l-1)层的特征Zl-1,以及训练数据的分类分布信息∏i和第l层的网络参数,即中间变量其中,表示第l层第i类的网络参数。通过采样训练数据中的已知的分类标签,可以确定训练数据的分类分布信息∏i, 从而确定梯度参数G和Hi,进而得到中间变量表示第l层第i类的网络参数。是指聚类中心,这里可以单纯形的顶点作为聚类中心,用来标记各个类别间的参照,如果的值越小,说明训练数据中第i类的分布与训练数据中第j类的分布越接近,故可以作为一种判别参数,这里可以将每一层中间变量Ci用网络储存成一个d×d的全连接层参数。最后通过计算目标函数梯度表达式,例如,公式(1),并通过L2正则化约束实现投影到单位球内的操作,令第(l-1)层的特征Zl-1约束在单位球空间内,得到第l层的的特征:
其中,Zl-1为第(l-1)层的特征,λ表示步长或学习率,为公式(3),Zl为第l层的特征。Zl-1约束在d-1维单位球空间内。
在上述方法中,提供了一种前馈神经网络模型,从而减少由于BP算法训练交互而导致的收发端通信开销,提高训练效率。并且应对不同收发端网络结构的场景中,通过调整网络层数来提升训练精度,避免因不同收发端网络适配不同而需要重新训练的问题。
下面详细介绍本申请实施例涉及的方法。图10为本申请实施例提供的一种数据处理方法的流程示意图。该方法可由数据处理装置来执行,数据处理装置具体可以是图3所示的系统架构100中的执行设备110、客户设备140或者用户设备150,该方法包括但不限于如下步骤:
步骤S1001:确定前馈神经网络模型。
具体地,确定前馈神经网络模型的过程可以如图6所示,此处不再赘述。
步骤S1002:获取未知分类或聚类信息的待处理数据。
可选的,该未知分类或聚类信息的待处理数据中不包括分类标签。
步骤S1003:将该待处理数据输入前馈神经网络模型中确定所述待处理数据的数据特征。
具体地,该待处理数据的数据特征是用于表示待处理数据的分类或聚类信息。该待处理数据的数据特征用于确定该待处理数据的分类或聚类结果。该待处理数据的数据特征的维度与该待处理数据的数据类型相关,例如,对于数据特征的维度的选择,由VC维(Vapnik-Chervonenkis)理论可知,VC维越高,模型复杂度越高,越容易区分。但过高容易过拟合,因此需要确定适合的维度。对于维度下限确定,一种泛用的估计方式是通过计算原始数据自相关矩阵特征值的情况,去掉接近于0的部分维度,剩下作为提取特征的维度。另外可针对不同的数据类型可以细化,例如若待处理数据的数据类型为图片,则待处理数据的数据特征的维度可以为1000;若待处理数据的数据类型为文字,则待处理数据的数据特征的维度可以为768。
其中,将该待处理数据输入前馈神经网络模型中确定所述待处理数据的数据特征的过程可以理解为推演过程,具体如下:
在一种可能的实现方式中,该待处理数据输入前馈神经网络模型中确定所述待处理数据的数据特征具体包括如下方式:根据待处理数据以及该第l层的网络参数确定该待处理数据的预计分类标签对应的分类分布信息根据该待处理数据和该待处理数据的预计分类标签对应的分类分布信息确定目标函数梯度表达式;根据该待处理数据和该目标函数梯度表达式确定该待处理数据的数据特征。
其中,该根据该待处理数据以及该第l层的网络参数确定该待处理数据的预计分类标签对应的分类分布信息具体可以包括如下方式:根据该待处理数据和该第l层的网络参数确定该待处理数据的预计分类标签在第一类别上的投影;该第一类别为该待处理数据预计的分类标签对应的多个类别中的任意一个类别;根据该待处理数据在第一类别上的投影确定该待处理数据的预计分类标签对应的分类分布信息。
确定待处理数据的预计分类标签对应的分类分布信息的具体公式如下:

其中,Z为所述待处理数据,是所述第l层第i类的网络参数,是所述待处理数据的预计分类标签在第l层第i类上的投影;是所述待处理数据的预计分类标签对应的分类分布信息,η是控制估计置信度的超参。可以理解为待处理数据在第l层第i类上的投影,当的值越小,则意味着与第l层第i类的相关性越近,因此可以采用softmax函数确定待处理数据的预计分类标签对应的分类分布信息具体公式如上所示。
其中,在根据上述公式确定该待处理数据的预计分类标签对应的分类分布信息之后,根据该待处理数据和该待处理数据的预计分类标签对应的分类分布信息确定目标函数梯度表达式;该目标函数梯度表达式具体可以如公式(1)所示,然后根据该待处理数据和该目标函数梯度表达式确定该待处理数据的数据特征。
或,确定待处理数据的预计分类标签对应的分类分布信息的具体公式如下:

其中,Z为所述待处理数据,是所述第l层第i类的网络参数,是所述待处理数据的预计分类标签在第l层第i类上的投影;是所述待处理数据的预计分类标签对应的分类分布信息,η是控制估计置信度的超参。可以理解为待处理数据在第l层第i类上的投影,当的值越小,则意味着与第l层第i类的相关性越近,因此可以采用softmax函数确定待处理数据的预计分类标签对应的分类分布信息具体公式如上所示。
其中,在根据上述公式确定该待处理数据的预计分类标签对应的分类分布信息之后,根据该待处理数据和该待处理数据的预计分类标签对应的分类分布信息确定目标函数梯度表达式;该目标函数梯度表达式具体可以如公式(2)所示,然后根据该待处理数据和该目标函数梯度表达式确定该待处理数据的数据特征。
或者,当待处理数据的预计分类标签对应的分类分布信息,包括以下一项或多项:距离信息,相关性信息,差分信息或软分类信息,根据待处理数据以及该第l层的网络参数确定待处理数据的预计分类标签对应的分类分布信息的具体公式如下:

其中,Z是所述待处理数据,是所述待处理数据的预计分类标签对应的分类分布信息,是所述第l层第i类的网络参数,Zl为所述待处理数据的第l层的数据特征,Zl-1为所述待处理数据的第(l-1)层的数据特征,<>表示内积。
在确定待处理数据的预计分类标签对应的分类分布信息之后,可以根据该待处理数据和该待处理数据的预计分类标签对应的分类分布信息确定目标函数梯度表达式,具体包括:根据所述待处理数据的预计分类标签对应的分类分布信息确定梯度参数(G和Hi);根据所述待处理数据和所述梯度参数确定所述目标函数梯度表达式,具体公式如下:
G=[g1,g2,…,gi];
其中,是所述待处理数据的预计分类标签对应的分类分布信息,Tr()表示迹运算,I是单位矩阵,mi是m个待处理数据中第i类的个数,K是m个待处理数据中预计分类标签的所有类别的个数,G和Hi表示梯度参数。该目标梯度函数表达式如公式(3)所示,然后根据该待处理数据和该目标函数梯度表达式确定该待处理数据的数据特征。
在一种可能的实现方式中,该方法还包括:输出待处理数据的数据特征。
在又一种可能的实现方式中,所述未知分类或聚类信息的待处理数据是第三数据的数据特征,所述第三数据的数据特征是经过另一个前馈神经网络确定的,所述另一个前馈神经网络中的第l层的输入信息包括训练数据的分类分布信息和第一数据特征,所述第l层的输出信息包括第二数据特征,所述第一数据特征为第(l-1)层的输出,所述第一数据特征和所述第二数据特征均是用于表示所述训练数据的分类或聚类信息,其中,l为大于1的正整数。也即就是说,通过另一个前馈神经网络确定第三数据的数据特征,该第三数据的数据特征也即为未知分类或聚类信息的待处理数据,然后将该待处理数据输入到确定好的前馈神经网络模型中得到该待处理数据的数据特征。
为了更好的描述前馈神经网络模型中的推演过程,分别以上述公式(1)、公式(2)和公式(3)作为目标函数梯度表达式进行举例说明,具体如下:
在一种示例中,以目标函数梯度表达式为上述公式(1)为例,前馈神经网络模型中推演过程中第l层的计算过程如图11所示,对于未知分类或聚类信息的待处理数据Z,可以根据训练好的前馈神经网络中保存的网络参数,例如第l层的网络参数待处理数据Z确定待处理数据的预计分类标签对应的分类分布信息然后根据待处理数据的预计分类标签对应的分类分布信息待处理数据Z以及第l层的网络参数确定目标函数梯度表达式,具体如公式(1)所示,最后根据目标函数梯度表达式和待处理数据确定待处理数据的数据特征。相关公式可以参考上述所述。
在一种示例中,以目标函数梯度表达式为上述公式(2)为例,前馈神经网络模型中推演过程中的第l层的计算过程如图12所示,对于未知分类或聚类信息的待处理数据Z,可以根据训练好的前馈神经网络中保存的网络参数,例如第l层的网络参数以及待处理数据Z确定待处理数据的预计分类标签对应的分类分布信息然后根据待处理数据的预计分类标签对应的分类分布信息待处理数据Z、以及第l层的网络参数确定目标函数梯度表达式,具体如公式(2)所示,最后根据目标函数梯度表达式和待处理数据确定待处理数据的数据特征。相关公式可以参考上述所述。
在一种示例中,以目标函数梯度表达式为上述公式(3)为例,前馈神经网络模型中推演过程中的第l层的计算过程如图13所示,对于未知分类或聚类信息的待处理数据Z,可以根据训练好的前馈神经网络中保存的网络参数,例如第l层的网络参数以及待处理数据Z确定待处理数据的预计分类标签对应的分类分布信息然后根据待处理数据的预计分类标签对应的分类分布信息确定梯度参数G和Hi,然后根据待处理数据Z、以及第l层的网络参数和梯度参数G和Hi确定目标函数梯度表达式,具体如公式(3)所示,最后根据目标函数梯度表达式和待处理数据确定待处理数据的数据特征。相关公式可以参考上述所述。
在本申请实施例中,图6和图10所示的数据处理方法可以适用于多视图(Multi-View)场景和多节点(Multi-Node)中,具体如下:
(1)多视图场景如图14所示,由于上述图6和图10所示的前馈神经网络主要用于传输数据特征,任务相关的运算在接收端完成,因此可以存在多个发送端发送不同的数据特征, 以便于接收端对该不同的数据特征进行处理,并根据不同的分类任务,得到数据的处理结果。如图14所示,该多视图场景下,可以有多个发送端,以2个发送端为例进行描述,该两个发送端分别为第一发送端和第二发送端,其中,第一发送端和第二发送端都执行同一个任务,例如,分类任务,但是第一发送端和第二发送端接收的训练数据的分类分布信息可以不同,第一发送端通过训练数据和训练数据的分类分布信息得到的前馈神经网络模型提取到分类任务相关的数据特征Z1,第二发送端通过训练数据和训练数据的分类分布信息得到的前馈神经网络模型提取到分类任务相关的数据特征Z2,并将该第一发送端提取到的数据特征Z1和第一发送端提取到的数据特征Z2经过信道传输发送给接收端。接收端将接收到的数据特征Z1和数据特征Z2聚合得到特征Z=[Z1,Z2],特征维度D=∑idi,例如,数据特征Z1的特征维度为128,数据特征Z2的特征维度为128,则接收端聚合得到特征Z的特征维度为128+128=256,再由该接收端将得到特征Z输入到第一网络中训练读出层,得到最后的输出结果。其中,该第一网络可以为本申请实施例提出的前馈神经网络,KNN,或卷积神经网络(convolutional neuron network,CNN)等。
其中,当该第一发送端提取到的数据特征Z1和第一发送端提取到的数据特征Z2经过信道传输发送给接收端时,满足以下条件:
其中,Z1表示经过信道传输前的数据特征,表示经过信道传输后的特征矩阵,n表示标准差为σ的高斯噪声Var(·)表示方差。
(2)多节点场景如图15所示,基于前馈神经网络结构上的灵活性和对特征的聚类效果,可以考虑将传输数据通过部署在不同网络节点上的前馈神经网络上推演,使得对信道的影响降低,并使得多个接收端的特征结果均能有较好的精度,供多个接收端使用。如图15所示,第一节点通过训练数据和训练数据的分类分布信息得到的前馈神经网络模型提取到分类任务相关的数据特征Z1,并将该第一节点提取到的数据特征Z1经过信道传输发送给第二节点,相应的,第二节点接收到该数据特征Z1后,根据该数据特征Z1和训练数据的分类分布信息得到的前馈神经网络模型提取到分类任务相关的数据特征Z2,依次类推,得到最后一个节点根据上一个节点提取到的数据特征Zn-1和训练数据的分类分布信息得到的前馈神经网络模型提取到分类任务相关的数据特征Zn,其中,n表示节点的个数,可选的,可以将该数据特征Zn输入到第一网络中训练读出层,得到最后的输出结果。其中,该第一网络可以为本申请实施例提出的前馈神经网络,KNN,或CNN等。其中,在该场景中,不同通信节点输入输出的数据特征维度需要保持相同。
在本申请实施例中,训练设备根据图6所示的数据处理方法对前馈神经网络模型进行训练后,还需要通过验证数据对训练得到的模型进行评估,保证训练得到的前馈神经网络模型的具有较好的泛化性。
在一种实现方式中,训练设备分别通过将设计的目标函数用梯度回传方式训练和前馈方式训练。具体如下:
(1)反向传播:
训练设备将上述公式(1)作为目标函数,以MNIST手写字体集为例,采取的特征维度为128,用Resnet18网络训练得到读出层前的结果,并用T分布随机近邻嵌入(t-distributed stohastic neighbor embedding,t-SNE)算法将该读出层前的结果降维成2D可视化数据,具体如图16A所示,训练设备还将均方误差(mean square error,MSE)作为目标函数,训练得到 读出层的结果,并用t-SNE算法降维成2D可视化数据,具体如图16B所示,根据图16A和图16B所示,可见将公式(1)作为目标函数对原始数据分类有更强的约束性,进而更有以利于减少通信中噪声对其产生的影响。
(2)前馈传播:
按照上述描述的前馈神经网络方案,设计多层的网络结构。并通过AWGN信道测试最后的输出特征的k最邻近分类算法(k-nearest neighbor,KNN)结果,具体如表格1所示,其中,以MNIST手写字体集为例,采取的特征维度和输入维度相同为768,前馈神经网络模型的学习率λ=0.001,信噪比SNR=25db,η=500,η是预计分类标签时用于控制估计置信度的超参,训练采样数m=1000。根据表格1可见,随着前馈神经网络的层数增加,提取到的数据特征的准确度越高,比如,当前馈神经网络的中间层的层数为2,训练集准确率为0.5247;当前馈神经网络的中间层的层数为6,训练集准确率为0.7135,中间层的层数为6比层数为2的训练集准确率高0.1888。
表格1
在上述方法中,相比较于BP算法需要梯度回传更新发送端网络,采用本申请实施例的方法可以能够减少训练交互而导致的通信开销,提高训练效率,只需要接收端训练读出层网络,而且该前馈神经网络的结构更加灵活,可通过增加网络层数来获得精度提升,即当l的取值越大,待处理数据的分类或聚类结果的准确性更高,避免因不同收发端网络适配不同而需要重新训练的问题。而且该前馈神经网络模型是可解释的,能够解释神经网络的黑盒问题,并且输出的待处理数据的数据特征可以作为数据的预处理,能够用于后续的读出层操作。
上述详细阐述了本申请实施例的方法,下面提供了本申请实施例的装置。
请参见图17,图17是本申请实施例提供的一种数据处理装置1700的结构示意图,该数据处理装置1700可以包括第一确定单元1701、获取单元1702以及第二确定单元1703,其中,各个单元的详细描述如下。
第一确定单元1701,用于确定前馈神经网络模型,所述前馈神经网络模型中的第l层的输入信息包括训练数据的分类分布信息和第一数据特征,所述第l层的输出信息包括第二数据特征,所述第一数据特征为第(l-1)层的输出,所述第一数据特征和所述第二数据特征均是用于表示所述训练数据的分类或聚类信息;其中,l为大于1的正整数;
获取单元1702,用于获取未知分类或聚类信息的待处理数据;
第二确定单元1703,用于将所述待处理数据输入所述前馈神经网络模型中确定所述待处理数据的数据特征;所述待处理数据的数据特征是用于表示所述待处理数据的分类或聚类信息;所述待处理数据的数据特征用于确定所述待处理数据的分类或聚类结果。
在一种可能的实现方式中,所述待处理数据的数据特征的维度与所述待处理数据的数据类型相关。
在又一种可能的实现方式中,当l=2,所述第一数据特征为第一层的输出时,所述第一层的输入信息包括所述训练数据的分类分布信息和所述训练数据,所述训练数据包括分类标签;所述训练数据的分类分布信息是根据所述训练数据中的分类标签确定的。
在又一种可能的实现方式中,第一确定单元1701,具体用于获取所述第一数据特征Zl-1; 根据所述第一数据特征Zl-1和所述训练数据的分类分布信息∏i确定所述第l层的网络参数;所述第二数据特征是根据所述第一数据特征Zl-1和所述第l层的网络参数确定的。
在又一种可能的实现方式中,第一确定单元1701,具体用于根据第l层的网络参数和所述第一数据特征Zl-1确定目标函数梯度表达式;根据所述第一数据特征Zl-1、所述训练数据的分类分布信息∏i和所述目标函数梯度表达式确定所述第二数据特征Zl
在又一种可能的实现方式中,第一确定单元1701,具体用于根据所述第一数据特征Zl-1和所述训练数据的分类分布信息∏i确定所述训练数据中的分类标签对应的各个类别的正则化自相关矩阵;根据所述训练数据中的分类标签对应的各个类别的正则化自相关矩阵确定所述第l层的网络参数。
在又一种可能的实现方式中,


其中,mi是m个训练数据中的分类标签对应第i类的个数,K是m个训练数据中的分类标签的所有类别的个数,是用于平衡训练数据中各类的样本数量的权重参数,Zl-1是所述第一数据特征,∏i是所述训练数据的分类分布信息,Si是所述训练数据中的分类标签对应的第i类的自相关矩阵,∈是正则化参数,是所述训练数据中的分类标签对应的第i类的正则化自相关矩阵,是所述第l层第i类的网络参数。
在又一种可能的实现方式中,

其中,mi是m个训练数据中的分类标签对应第i类的个数,K是m个训练数据中的分类标签的所有类别的个数,Zl-1是第一数据特征,∏i是所述训练数据的分类分布信息,I是单位矩阵,Si是所述训练数据中的分类标签对应的第i类的自相关矩阵,∈是正则化参数,是所述训练数据中的分类标签对应的第i类的正则化自相关矩阵,S是所述训练数据中的分类标签的所有类别的自相关矩阵,是所述训练数据中的分类标签的所有类别的正则化自相关矩阵,是所述第l层第i类的网络参数。
在又一种可能的实现方式中,第一确定单元1701,具体用于根据所述训练数据的分类分布信息∏i确定梯度参数;根据所述第一数据特征Zl-1和所述梯度参数确定所述第l层的网络参数。
在又一种可能的实现方式中,


其中,Zl-1满足能量约束:Tr(Zl-1(Zl-1)T)=m(1+σ2d),σ为高斯分布方差,m是训练数据的采样个数,d是训练数据的维度,Zl-1是第一数据特征,e∈Rm×1为元素全为1的列向量,∏i是所述训练数据的分类分布信息,Tr()表示迹运算,I是单位矩阵,mi是m个训练数据中的分类标签对应第i类的个数,K是m个训练数据中的分类标签的所有类别的个数,是所述第l层第i类的网络参数,G和Hi为梯度参数。
在又一种可能的实现方式中,第二确定单元1703,具体用于根据所述待处理数据以及所述第l层的网络参数确定所述待处理数据的预计分类标签对应的分类分布信息根据所述 待处理数据和所述待处理数据的预计分类标签对应的分类分布信息确定目标函数梯度表达式;根据所述待处理数据和所述目标函数梯度表达式确定所述待处理数据的数据特征。
在又一种可能的实现方式中,第二确定单元1703,具体用于根据所述待处理数据和所述第l层的网络参数确定所述待处理数据的预计分类标签在第一类别上的投影;所述第一类别为所述待处理数据预计的分类标签对应的多个类别中的任意一个类别;根据所述待处理数据在第一类别上的投影确定所述待处理数据的预计分类标签对应的分类分布信息。
在又一种可能的实现方式中,

其中,Z为所述待处理数据,是所述第l层第i类的网络参数,是所述待处理数据的预计分类标签在第l层第i类上的投影;是所述待处理数据的预计分类标签对应的分类分布信息,η是控制估计置信度的超参。
在又一种可能的实现方式中,所述目标函数梯度表达式包括:
其中,mi是m个待处理数据中预计分类标签为第i类的个数,K是m个待处理数据中预计分类标签的所有类别的个数,是用于平衡待处理数据中预计的各类的样本数量的权重参数,Z是所述待处理数据,是所述待处理数据的预计分类标签对应的分类分布信息,Si是所述待处理数据的预计分类标签对应的第i类的自相关矩阵,所述待处理数据的预计分类标签对应的第i类的正则化自相关矩阵。
在又一种可能的实现方式中,

其中,Z为所述待处理数据,是所述第l层第i类的网络参数,是所述待处理数据的预计分类标签在第l层第i类上的投影;是所述待处理数据的预计分类标签对应的分类分布信息,η是控制估计置信度的超参。
在又一种可能的实现方式中,所述目标函数梯度表达式包括:
其中,mi是m个待处理数据中预计分类标签为第i类的个数,αi用于平衡待处理数据中预计的各类的样本数量的权重参数,Z是所述待处理数据,是所述待处理数据的预计分类标签对应的分类分布信息,Si是所述待处理数据的预计分类标签对应的第i类的自相关矩阵,所述待处理数据的预计分类标签对应的第i类的正则化自相关矩阵,S是所述待处理数据的预计分类标签对应的所有类别的自相关矩阵,是所述待处理数据的预计分类标签对应的所有类别的正则化自相关矩阵。
在又一种可能的实现方式中,所述待处理数据的预计分类标签对应的分类分布信息,包括以下一项或多项:距离信息,相关性信息,差分信息或软分类信息。
在又一种可能的实现方式中,

其中,Z是所述待处理数据,是所述待处理数据的预计分类标签对应的分类分布信息,是所述第l层第i类的网络参数,Zl为所述待处理数据的第l层的数据特征,Zl-1为所述待处理数据的第(l-1)层的数据特征,<>表示内积。
在又一种可能的实现方式中,第二确定单元1703,具体用于根据所述待处理数据的预计分类标签对应的分类分布信息确定梯度参数(G和Hi);根据所述待处理数据和所述梯度参数确定所述目标函数梯度表达式。
在又一种可能的实现方式中,
G=[g1,g2,…,gi];
其中,是所述待处理数据的预计分类标签对应的分类分布信息,Tr()表示迹运算,I是单位矩阵,mi是m个待处理数据中第i类的个数,K是m个待处理数据中预计分类标签的所有类别的个数,G和Hi表示梯度参数。
在又一种可能的实现方式中,目标函数梯度表达式包括:
其中,Z是所述待处理数据,σ为高斯分布方差,∈是正则化参数,I是单位矩阵,G和Hi表示梯度参数,β表示正则化参数。
在又一种可能的实现方式中,
其中,Zl是待处理数据的数据特征,是目标函数梯度表达式,Zl-1是所述待处理数据,Zl-1约束在d-1维单位球空间内。
在又一种可能的实现方式中,所述数据处理装置还包括输出单元,所述输出单元,用于输出所述待处理数据的数据特征。
在又一种可能的实现方式中,所述未知分类或聚类信息的待处理数据是第三数据的数据特征,所述第三数据的数据特征是经过另一个前馈神经网络确定的,所述另一个前馈神经网络中的第l层的输入信息包括训练数据的分类分布信息和第一数据特征,所述第l层的输出信息包括第二数据特征,所述第一数据特征为第(l-1)层的输出,所述第一数据特征和所述第二数据特征均是用于表示所述训练数据的分类或聚类信息,其中,l为大于1的正整数。
需要说明的是,各个单元的实现及有益效果还可以对应参照图6或图10所示的方法实施例的相应描述。
请参见图18,图18是本申请实施例提供的一种数据处理装置1800,该数据处理装置1800包括至少一个处理器1801和通信接口1803,可选的,还包括存储器1802,所述处理器1801、存储器1802和通信接口1803通过总线1804相互连接。
存储器1802包括但不限于是随机存储记忆体(random access memory,RAM)、只读存储器(read-only memory,ROM)、可擦除可编程只读存储器(erasable programmable read only memory,EPROM)、或便携式只读存储器(compact disc read-only memory,CD-ROM),该存储器1802用于相关计算机程序及数据。通信接口1803用于接收和发送数据。
处理器1801可以是一个或多个中央处理器(central processing unit,CPU),在处理器1801是一个CPU的情况下,该CPU可以是单核CPU,也可以是多核CPU。
该数据处理装置1800中的处理器1801用于读取所述存储器1802中存储的计算机程序代码,执行以下操作:
确定前馈神经网络模型,所述前馈神经网络模型中的第l层的输入信息包括训练数据的分类分布信息和第一数据特征,所述第l层的输出信息包括第二数据特征,所述第一数据特征为第(l-1)层的输出,所述第一数据特征和所述第二数据特征均是用于表示所述训练数据的分类或聚类信息;其中,l为大于1的正整数;
获取未知分类或聚类信息的待处理数据;
将所述待处理数据输入所述前馈神经网络模型中确定所述待处理数据的数据特征;所述待处理数据的数据特征是用于表示所述待处理数据的分类或聚类信息;所述待处理数据的数据特征用于确定所述待处理数据的分类或聚类结果。
在一种可能的实现方式中,所述待处理数据的数据特征的维度与所述待处理数据的数据类型相关。
在又一种可能的实现方式中,当l=2,所述第一数据特征为第一层的输出时,所述第一层的输入信息包括所述训练数据的分类分布信息和所述训练数据,所述训练数据包括分类标签;所述训练数据的分类分布信息是根据所述训练数据中的分类标签确定的。
在又一种可能的实现方式中,所述处理器1801,用于获取所述第一数据特征Zl-1;根据所述第一数据特征Zl-1和所述训练数据的分类分布信息∏i确定所述第l层的网络参数;所述第二数据特征是根据所述第一数据特征Zl-1和所述第l层的网络参数确定的。
在又一种可能的实现方式中,所述处理器1801,用于根据第l层的网络参数和所述第一数据特征Zl-1确定目标函数梯度表达式;根据所述第一数据特征Zl-1、所述训练数据的分类分布信息∏i和所述目标函数梯度表达式确定所述第二数据特征Zl
在又一种可能的实现方式中,所述处理器1801,用于根据所述第一数据特征Zl-1和所述训练数据的分类分布信息∏i确定所述训练数据中的分类标签对应的各个类别的正则化自相关矩阵;根据所述训练数据中的分类标签对应的各个类别的正则化自相关矩阵确定所述第l层的网络参数。
在又一种可能的实现方式中,


其中,mi是m个训练数据中的分类标签对应第i类的个数,K是m个训练数据中的分类标签的所有类别的个数,是用于平衡训练数据中各类的样本数量的权重参数,Zl-1是所述第一数据特征,∏i是所述训练数据的分类分布信息,Si是所述训练数据中的分类标签对应的第i类的自相关矩阵,∈是正则化参数,是所述训练数据中的分类标签对应的第i类的正则化自相关矩阵,是所述第l层第i类的网络参数。
在又一种可能的实现方式中,

其中,mi是m个训练数据中的分类标签对应第i类的个数,K是m个训练 数据中的分类标签的所有类别的个数,Zl-1是第一数据特征,∏i是所述训练数据的分类分布信息,I是单位矩阵,Si是所述训练数据中的分类标签对应的第i类的自相关矩阵,∈是正则化参数,是所述训练数据中的分类标签对应的第i类的正则化自相关矩阵,S是所述训练数据中的分类标签的所有类别的自相关矩阵,是所述训练数据中的分类标签的所有类别的正则化自相关矩阵,是所述第l层第i类的网络参数。
在又一种可能的实现方式中,所述处理器1801,用于根据所述训练数据的分类分布信息∏i确定梯度参数;根据所述第一数据特征Zl-1和所述梯度参数确定所述第l层的网络参数。
在又一种可能的实现方式中,


其中,Zl-1满足能量约束:Tr(Zl-1(Zl-1)T)=m(1+σ2d),σ为高斯分布方差,m是训练数据的采样个数,d是训练数据的维度,Zl-1是第一数据特征,e∈Rm×1为元素全为1的列向量,∏i是所述训练数据的分类分布信息,Tr()表示迹运算,I是单位矩阵,mi是m个训练数据中的分类标签对应第i类的个数,K是m个训练数据中的分类标签的所有类别的个数,是所述第l层第i类的网络参数,G和Hi为梯度参数。
在又一种可能的实现方式中,所述处理器1801,用于根据所述待处理数据以及所述第l层的网络参数确定所述待处理数据的预计分类标签对应的分类分布信息根据所述待处理数据和所述待处理数据的预计分类标签对应的分类分布信息确定目标函数梯度表达式;根据所述待处理数据和所述目标函数梯度表达式确定所述待处理数据的数据特征。
在又一种可能的实现方式中,所述处理器1801,用于根据所述待处理数据和所述第l层的网络参数确定所述待处理数据的预计分类标签在第一类别上的投影;所述第一类别为所述待处理数据预计的分类标签对应的多个类别中的任意一个类别;根据所述待处理数据在第一类别上的投影确定所述待处理数据的预计分类标签对应的分类分布信息。
在又一种可能的实现方式中,

其中,Z为所述待处理数据,是所述第l层第i类的网络参数,是所述待处理数据的预计分类标签在第l层第i类上的投影;是所述待处理数据的预计分类标签对应的分类分布信息,η是控制估计置信度的超参。
在又一种可能的实现方式中,所述目标函数梯度表达式包括:
其中,mi是m个待处理数据中预计分类标签为第i类的个数,K是m个待处理数据中预计分类标签的所有类别的个数,是用于平衡待处理数据中预计的各类的样本数量的权重参数,Z是所述待处理数据,是所述待处理数据的预计分类标签对应的分类分布信息,Si是所述待处理数据的预计分类标签对应的第i类的自相关矩阵,所述待处理数据的预计分类标签对应的第i类的正则化自相关矩阵。
在又一种可能的实现方式中,

其中,Z为所述待处理数据,是所述第l层第i类的网络参数,是所述待处理数据的预计分类标签在第l层第i类上的投影;是所述待处理数据的预计分类标签对应的分类分布信息,η是控制估计置信度的超参。
在又一种可能的实现方式中,所述目标函数梯度表达式包括:
其中,mi是m个待处理数据中预计分类标签为第i类的个数,αi用于平衡待处理数据中预计的各类的样本数量的权重参数,Z是所述待处理数据,是所述待处理数据的预计分类标签对应的分类分布信息,Si是所述待处理数据的预计分类标签对应的第i类的自相关矩阵,所述待处理数据的预计分类标签对应的第i类的正则化自相关矩阵,S是所述待处理数据的预计分类标签对应的所有类别的自相关矩阵,是所述待处理数据的预计分类标签对应的所有类别的正则化自相关矩阵。
在又一种可能的实现方式中,所述待处理数据的预计分类标签对应的分类分布信息,包括以下一项或多项:距离信息,相关性信息,差分信息或软分类信息。
在又一种可能的实现方式中,

其中,Z是所述待处理数据,是所述待处理数据的预计分类标签对应的分类分布信息,是所述第l层第i类的网络参数,Zl为所述待处理数据的第l层的数据特征,Zl-1为所述待处理数据的第(l-1)层的数据特征,<>表示内积。
在又一种可能的实现方式中,所述处理器1801,用于根据所述待处理数据的预计分类标签对应的分类分布信息确定梯度参数(G和Hi);根据所述待处理数据和所述梯度参数确定所述目标函数梯度表达式。
在又一种可能的实现方式中,
G=[g1,g2,…,gi];
其中,是所述待处理数据的预计分类标签对应的分类分布信息,Tr()表示迹运算,I是单位矩阵,mi是m个待处理数据中第i类的个数,K是m个待处理数据中预计分类标签的所有类别的个数,G和Hi表示梯度参数。
在又一种可能的实现方式中,目标函数梯度表达式包括:
其中,Z是所述待处理数据,σ为高斯分布方差,∈是正则化参数,I是单位矩阵,G和Hi表示梯度参数,β表示正则化参数。
在又一种可能的实现方式中,
其中,Zl是待处理数据的数据特征,是目标函数梯度表达式,Zl-1是所述待处理数据,Zl-1约束在d-1维单位球空间内。
在又一种可能的实现方式中,所述处理器1801,用于输出所述待处理数据的数据特征。
在又一种可能的实现方式中,所述未知分类或聚类信息的待处理数据是第三数据的数据特征,所述第三数据的数据特征是经过另一个前馈神经网络确定的,所述另一个前馈神经网络中的第l层的输入信息包括训练数据的分类分布信息和第一数据特征,所述第l层的输出信息包括第二数据特征,所述第一数据特征为第(l-1)层的输出,所述第一数据特征和所述第二数据特征均是用于表示所述训练数据的分类或聚类信息,其中,l为大于1的正整数。
需要说明的是,各个操作的实现及有益效果还可以对应参照图6或图10所示的方法实施例的相应描述。
可以理解的是,本申请的实施例中的处理器可以是中央处理单元(Central Processing Unit,CPU),还可以是其它通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field Programmable Gate Array,FPGA)或者其它可编程逻辑器件、晶体管逻辑器件,硬件部件或者其任意组合。通用处理器可以是微处理器,也可以是任何常规的处理器。
本申请的实施例中的方法步骤可以通过硬件的方式来实现,也可以由处理器执行软件指令的方式来实现。软件指令可以由相应的软件模块组成,软件模块可以被存放于随机存取存储器、闪存、只读存储器、可编程只读存储器、可擦除可编程只读存储器、电可擦除可编程只读存储器、寄存器、硬盘、移动硬盘、CD-ROM或者本领域熟知的任何其它形式的存储介质中。一种示例性的存储介质耦合至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息。当然,存储介质也可以是处理器的组成部分。处理器和存储介质可以位于ASIC中。另外,该ASIC可以位于基站或终端中。当然,处理器和存储介质也可以作为分立组件存在于基站或终端中。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机程序或指令。在计算机上加载和执行所述计算机程序或指令时,全部或部分地执行本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、网络设备、用户设备或者其它可编程装置。所述计算机程序或指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机程序或指令可以从一个网站站点、计算机、服务器或数据中心通过有线或无线方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是集成一个或多个可用介质的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,例如,软盘、硬盘、磁带;也可以是光介质,例如,数字视频光盘;还可以是半导体介质,例如,固态硬盘。该计算机可读存储介质可以是易失性或非易失性存储介质,或可包括易失性和非易失性两种类型的存储介质。
在本申请的各个实施例中,如果没有特殊说明以及逻辑冲突,不同的实施例之间的术语和/或描述具有一致性、且可以相互引用,不同的实施例中的技术特征根据其内在的逻辑关系可以组合形成新的实施例。
在本申请的描述中,“第一”,“第二”,“S601”,或“S602”等词汇,仅用于区分描述以及上下文行文方便的目的,不同的次序编号本身不具有特定技术含义,不能理解为指示或暗示相对重要性,也不能理解为指示或暗示操作的执行顺序,各过程的执行顺序应以其功能和内 在逻辑确定。
本申请中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如“A和/或B”可以表示:单独存在A,同时存在A和B,单独存在B这三种情况,其中A,B可以是单数或者复数。另外,本文中字符“/”,表示前后关联对象是一种“或”的关系。
本申请中,“传输”可以包括以下三种情况:数据的发送,数据的接收,或者数据的发送和数据的接收。本申请中,“数据”可以包括业务数据,和/或,信令数据。
本申请中术语“包括”或“具有”及其任何变形,意图在于覆盖不排他的包括,例如,包括了一系列步骤的过程/方法,或一系列单元的系统/产品/设备,不必限于清楚地列出的那些步骤或单元,而是可以包括没有清楚地列出的或对于这些过程/方法/产品/设备固有的其它步骤或单元。
在本申请的描述中,对于名词的数目,除非特别说明,表示“单数名词或复数名词”,即“一个或多个”。“至少一个”,表示一个或者多个。“包括以下至少一个:A,B,C。”表示可以包括A,或者包括B,或者包括C,或者包括A和B,或者包括A和C,或者包括B和C,或者包括A,B和C。其中A,B,C可以是单个,也可以是多个。

Claims (52)

  1. 一种数据处理方法,其特征在于,包括:
    确定前馈神经网络模型,所述前馈神经网络模型中的第l层的输入信息包括训练数据的分类分布信息和第一数据特征,所述第l层的输出信息包括第二数据特征,所述第一数据特征为第(l-1)层的输出,所述第一数据特征和所述第二数据特征均是用于表示所述训练数据的分类或聚类信息;其中,l为大于1的正整数;
    获取未知分类或聚类信息的待处理数据;
    将所述待处理数据输入所述前馈神经网络模型中确定所述待处理数据的数据特征;所述待处理数据的数据特征是用于表示所述待处理数据的分类或聚类信息;所述待处理数据的数据特征用于确定所述待处理数据的分类或聚类结果。
  2. 根据权利要求1所述的方法,其特征在于,
    所述待处理数据的数据特征的维度与所述待处理数据的数据类型相关。
  3. 根据权利要求1或2所述的方法,其特征在于,
    当l=2,所述第一数据特征为第一层的输出时,所述第一层的输入信息包括所述训练数据的分类分布信息和所述训练数据,所述训练数据包括分类标签;所述训练数据的分类分布信息是根据所述训练数据中的分类标签确定的。
  4. 根据权利要求1-3任一项所述的方法,其特征在于,所述确定前馈神经网络模型,包括:
    获取所述第一数据特征Zl-1
    根据所述第一数据特征Zl-1和所述训练数据的分类分布信息∏i确定所述第l层的网络参数;所述第二数据特征是根据所述第一数据特征Zl-1和所述第l层的网络参数确定的。
  5. 根据权利要求4所述的方法,其特征在于,所述第二数据特征是根据所述第一数据特征Zl-1和所述第l层的网络参数确定的,包括:
    根据第l层的网络参数和所述第一数据特征Zl-1确定目标函数梯度表达式;
    根据所述第一数据特征Zl-1、所述训练数据的分类分布信息∏i和所述目标函数梯度表达式确定所述第二数据特征Zl
  6. 根据权利要求4或5所述的方法,其特征在于,所述根据所述第一数据特征Zl-1和所述训练数据的分类分布信息∏i确定所述第l层的网络参数,包括:
    根据所述第一数据特征Zl-1和所述训练数据的分类分布信息∏i确定所述训练数据中的分类标签对应的各个类别的正则化自相关矩阵;
    根据所述训练数据中的分类标签对应的各个类别的正则化自相关矩阵确定所述第l层的网络参数。
  7. 根据权利要求6所述的方法,其特征在于,


    其中,mi是m个训练数据中的分类标签对应第i类的个数,K是m个训练数据中的分类标签的所有类别的个数,是用于平衡训练数据中各类的样本数量的权重参数,Zl-1是所述第一数据特征,∏i是所述训练数据的分类分布信息,Si是所述训练数据中的分类标签对应的第i类的自相关矩阵,∈是正则化参数,是所述训练数据中的分类标签对应的第i类的正则化自相关矩阵,是所述第l层第i类的网络参数。
  8. 根据权利要求6所述的方法,其特征在于,


    其中,mi是m个训练数据中的分类标签对应第i类的个数,K是m个训练数据中的分类标签的所有类别的个数,Zl-1是第一数据特征,∏i是所述训练数据的分类分布信息,I是单位矩阵,Si是所述训练数据中的分类标签对应的第i类的自相关矩阵,∈是正则化参数,是所述训练数据中的分类标签对应的第i类的正则化自相关矩阵,S是所述训练数据中的分类标签的所有类别的自相关矩阵,是所述训练数据中的分类标签的所有类别的正则化自相关矩阵,是所述第l层第i类的网络参数。
  9. 根据权利要求4或5所述的方法,其特征在于,所述根据所述第一数据特征Zl-1和所述训练数据的分类分布信息∏i确定所述第l层的网络参数,包括:
    根据所述训练数据的分类分布信息∏i确定梯度参数;
    根据所述第一数据特征Zl-1和所述梯度参数确定所述第l层的网络参数。
  10. 根据权利要求9所述的方法,其特征在于,


    其中,Zl-1满足能量约束:Tr(Zl-1(Zl-1)T)=m(1+σ2d),σ为高斯分布方差,m是训练数据的采样个数,d是训练数据的维度,Zl-1是第一数据特征,e∈Rm×1为元素全为1的列向量,∏i是所述训练数据的分类分布信息,Tr()表示迹运算,I是单位矩阵,mi是m个训练数据中的分类标签对应第i类的个数,K是m个训练数据中的分类标签的所有类别的个数,是所述第l层第i类的网络参数,G和Hi为梯度参数。
  11. 根据权利要求4所述的方法,其特征在于,所述将所述待处理数据输入所述前馈神经网络模型确定所述待处理数据的数据特征,包括:
    根据所述待处理数据以及所述第l层的网络参数确定所述待处理数据的预计分类标签对应的分类分布信息
    根据所述待处理数据和所述待处理数据的预计分类标签对应的分类分布信息确定目标 函数梯度表达式;
    根据所述待处理数据和所述目标函数梯度表达式确定所述待处理数据的数据特征。
  12. 根据权利要求11所述的方法,其特征在于,所述根据所述待处理数据以及所述第l层的网络参数确定所述待处理数据的预计分类标签对应的分类分布信息,包括:
    根据所述待处理数据和所述第l层的网络参数确定所述待处理数据的预计分类标签在第一类别上的投影;所述第一类别为所述待处理数据预计的分类标签对应的多个类别中的任意一个类别;
    根据所述待处理数据在第一类别上的投影确定所述待处理数据的预计分类标签对应的分类分布信息。
  13. 根据权利要求12所述的方法,其特征在于,

    其中,Z为所述待处理数据,是所述第l层第i类的网络参数,是所述待处理数据的预计分类标签在第l层第i类上的投影;是所述待处理数据的预计分类标签对应的分类分布信息,η是控制估计置信度的超参。
  14. 根据权利要求11-13任一项所述的方法,其特征在于,所述目标函数梯度表达式包括:
    其中,mi是m个待处理数据中预计分类标签为第i类的个数,K是m个待处理数据中预计分类标签的所有类别的个数,是用于平衡待处理数据中预计的各类的样本数量的权重参数,Z是所述待处理数据,是所述待处理数据的预计分类标签对应的分类分布信息,Si是所述待处理数据的预计分类标签对应的第i类的自相关矩阵,所述待处理数据的预计分类标签对应的第i类的正则化自相关矩阵。
  15. 根据权利要求12所述的方法,其特征在于,

    其中,Z为所述待处理数据,是所述第l层第i类的网络参数,是所述待处理数据的预计分类标签在第l层第i类上的投影;是所述待处理数据的预计分类标签对应的分类分布信息,η是控制估计置信度的超参。
  16. 根据权利要求11或12所述的方法,其特征在于,所述目标函数梯度表达式包括:
    其中,mi是m个待处理数据中预计分类标签为第i类的个数,αi用于平衡待处理数据中预计的各类的样本数量的权重参数,K是m个待处理数据中预计分类标签的所有类别的个数,Z是所述待处理数据,是所述待处理数据的预计分类标签对应的分类分布信息,Si是所述待处理数据的预计分类标签对应的第i类的自相关矩阵,所述待处理数据的预计分类标签对应的第i类的正则化自相关矩阵,S是所述待处理数据的预计分类标签对应的所有类别的自相关矩阵,是所述待处理数据的预计分类标签对应的所有类别的正则化自相关矩阵。
  17. 根据权利要求11所述的方法,其特征在于,所述待处理数据的预计分类标签对应的分类分布信息,包括以下一项或多项:距离信息,相关性信息,差分信息或软分类信息。
  18. 根据权利要求17所述的方法,其特征在于,所述根据所述待处理数据以及所述第l层的网络参数确定所述待处理数据的预计分类标签对应的分类分布信息,包括:



    其中,Z是所述待处理数据,是所述待处理数据的预计分类标签对应的分类分布信息,是所述第l层第i类的网络参数,Zl为所述待处理数据的第l层的数据特征,Zl-1为所述待处理数据的第(l-1)层的数据特征,<>表示内积。
  19. 根据权利要求11所述的方法,其特征在于,所述根据所述待处理数据和所述待处理数据的预计分类标签对应的分类分布信息确定目标函数梯度表达式,包括:
    根据所述待处理数据的预计分类标签对应的分类分布信息确定梯度参数(G和Hi);
    根据所述待处理数据和所述梯度参数确定所述目标函数梯度表达式。
  20. 根据权利要求19所述的方法,其特征在于,所述根据所述待处理数据的预计分类标签对应的分类分布信息确定梯度参数,包括:

    G=[g1,g2,…,gi];
    其中,是所述待处理数据的预计分类标签对应的分类分布信息,Tr()表示迹运算,I是单位矩阵,mi是m个待处理数据中第i类的个数,K是m个待处理数据中预计分类标签的所有类别的个数,G和Hi表示梯度参数。
  21. 根据权利要求20所述的方法,其特征在于,所述目标函数梯度表达式包括:
    其中,Z是所述待处理数据,σ为高斯分布方差,∈是正则化参数,I是单位矩阵,G和Hi表示梯度参数,β表示正则化参数。
  22. 根据权利要求1-21任一项所述的方法,其特征在于,所述根据所述待处理数据和所 述目标函数梯度表达式确定所述待处理数据的数据特征,包括:
    其中,Zl是待处理数据的数据特征,是目标函数梯度表达式,Zl-1是所述待处理数据,Zl-1约束在d-1维单位球空间内。
  23. 根据权利要求1-22任一项所述的方法,其特征在于,所述方法还包括:
    输出所述待处理数据的数据特征。
  24. 根据权利要求1-23任一项所述的方法,其特征在于,所述未知分类或聚类信息的待处理数据是第三数据的数据特征,所述第三数据的数据特征是经过另一个前馈神经网络确定的,所述另一个前馈神经网络中的第l层的输入信息包括训练数据的分类分布信息和第一数据特征,所述第l层的输出信息包括第二数据特征,所述第一数据特征为第(l-1)层的输出,所述第一数据特征和所述第二数据特征均是用于表示所述训练数据的分类或聚类信息,其中,l为大于1的正整数。
  25. 一种数据处理装置,其特征在于,所述装置包括:
    第一确定单元,用于确定前馈神经网络模型,所述前馈神经网络模型中的第l层的输入信息包括训练数据的分类分布信息和第一数据特征,所述第l层的输出信息包括第二数据特征,所述第一数据特征为第(l-1)层的输出,所述第一数据特征和所述第二数据特征均是用于表示所述训练数据的分类或聚类信息;其中,l为大于1的正整数;
    获取单元,用于获取未知分类或聚类信息的待处理数据;
    第二确定单元,用于将所述待处理数据输入所述前馈神经网络模型中确定所述待处理数据的数据特征;所述待处理数据的数据特征是用于表示所述待处理数据的分类或聚类信息;所述待处理数据的数据特征用于确定所述待处理数据的分类或聚类结果。
  26. 根据权利要求25所述的装置,其特征在于,所述待处理数据的数据特征的维度与所述待处理数据的数据类型相关。
  27. 根据权利要求25或26所述的装置,其特征在于,
    当l=2,所述第一数据特征为第一层的输出时,所述第一层的输入信息包括所述训练数据的分类分布信息和所述训练数据,所述训练数据包括分类标签;所述训练数据的分类分布信息是根据所述训练数据中的分类标签确定的。
  28. 根据权利要求25-27任一项所述的装置,其特征在于,所述第一确定单元,具体用于:
    获取所述第一数据特征Zl-1
    根据所述第一数据特征Zl-1和所述训练数据的分类分布信息∏i确定所述第l层的网络参数;所述第二数据特征是根据所述第一数据特征Zl-1和所述第l层的网络参数确定的。
  29. 根据权利要求28所述的装置,其特征在于,所述第一确定单元,具体用于:
    根据第l层的网络参数和所述第一数据特征Zl-1确定目标函数梯度表达式;
    根据所述第一数据特征Zl-1、所述训练数据的分类分布信息∏i和所述目标函数梯度表达式确定所述第二数据特征Zl
  30. 根据权利要求28或29所述的装置,其特征在于,所述第一确定单元,具体用于:
    根据所述第一数据特征Zl-1和所述训练数据的分类分布信息∏i确定所述训练数据中的分类标签对应的各个类别的正则化自相关矩阵;
    根据所述训练数据中的分类标签对应的各个类别的正则化自相关矩阵确定所述第l层的网络参数。
  31. 根据权利要求30所述的装置,其特征在于,


    其中,mi是m个训练数据中的分类标签对应第i类的个数,K是m个训练数据中的分类标签的所有类别的个数,是用于平衡训练数据中各类的样本数量的权重参数,Zl-1是所述第一数据特征,∏i是所述训练数据的分类分布信息,Si是所述训练数据中的分类标签对应的第i类的自相关矩阵,∈是正则化参数,是所述训练数据中的分类标签对应的第i类的正则化自相关矩阵,是所述第l层第i类的网络参数。
  32. 根据权利要求30所述的装置,其特征在于,


    其中,mi是m个训练数据中的分类标签对应第i类的个数,K是m个训练数据中的分类标签的所有类别的个数,Zl-1是第一数据特征,∏i是所述训练数据的分类分布信息,I是单位矩阵,Si是所述训练数据中的分类标签对应的第i类的自相关矩阵,∈是正则化参数,是所述训练数据中的分类标签对应的第i类的正则化自相关矩阵,S是所述训练数据中的分类标签的所有类别的自相关矩阵,是所述训练数据中的分类标签的所有类别的正则化自相关矩阵,是所述第l层第i类的网络参数。
  33. 根据权利要求28或29所述的装置,其特征在于,所述第一确定单元,具体用于:
    根据所述训练数据的分类分布信息∏i确定梯度参数;
    根据所述第一数据特征Zl-1和所述梯度参数确定所述第l层的网络参数。
  34. 根据权利要求33所述的装置,其特征在于,


    其中,Zl-1满足能量约束:Tr(Zl-1(Zl-1)T)=m(1+σ2d),σ为高斯分布方差,m是训练数据的采样个数,d是训练数据的维度,Zl-1是第一数据特征,e∈Rm×1为元素全为1的列 向量,∏i是所述训练数据的分类分布信息,Tr()表示迹运算,I是单位矩阵,mi是m个训练数据中的分类标签对应第i类的个数,K是m个训练数据中的分类标签的所有类别的个数,是所述第l层第i类的网络参数,G和Hi为梯度参数。
  35. 根据权利要求28所述的装置,其特征在于,所述第二确定单元,具体用于:
    根据所述待处理数据以及所述第l层的网络参数确定所述待处理数据的预计分类标签对应的分类分布信息
    根据所述待处理数据和所述待处理数据的预计分类标签对应的分类分布信息确定目标函数梯度表达式;
    根据所述待处理数据和所述目标函数梯度表达式确定所述待处理数据的数据特征。
  36. 根据权利要求35所述的装置,其特征在于,所述第二确定单元,具体用于:
    根据所述待处理数据和所述第l层的网络参数确定所述待处理数据的预计分类标签在第一类别上的投影;所述第一类别为所述待处理数据预计的分类标签对应的多个类别中的任意一个类别;
    根据所述待处理数据在第一类别上的投影确定所述待处理数据的预计分类标签对应的分类分布信息。
  37. 根据权利要求36所述的装置,其特征在于,

    其中,Z为所述待处理数据,是所述第l层第i类的网络参数,是所述待处理数据的预计分类标签在第l层第i类上的投影;是所述待处理数据的预计分类标签对应的分类分布信息,η是控制估计置信度的超参。
  38. 根据权利要求35-37任一项所述的装置,其特征在于,所述目标函数梯度表达式包括:
    其中,mi是m个待处理数据中预计分类标签为第i类的个数,K是m个待处理数据中预计分类标签的所有类别的个数,是用于平衡待处理数据中预计的各类的样本数量的权重参数,Z是所述待处理数据,是所述待处理数据的预计分类标签对应的分类分布信息,Si是所述待处理数据的预计分类标签对应的第i类的自相关矩阵,所述待处理数据的预计分类标签对应的第i类的正则化自相关矩阵。
  39. 根据权利要求35所述的装置,其特征在于,

    其中,Z为所述待处理数据,是所述第l层第i类的网络参数,是所述待处理数据的预计分类标签在第l层第i类上的投影;是所述待处理数据的预计分类标签对应的分类分布信息,η是控制估计置信度的超参。
  40. 根据权利要求35或36所述的装置,其特征在于,所述目标函数梯度表达式包括:
    其中,mi是m个待处理数据中预计分类标签为第i类的个数,αi用于平衡待处理数据中预计的各类的样本数量的权重参数,K是m个待处理数据中预计分类标签的所有类别的个数,Z是所述待处理数据,是所述待处理数据的预计分类标签对应的分类分布信息,Si是所述待处理数据的预计分类标签对应的第i类的自相关矩阵,所述待处理数据的预计分类标签对应的第i类的正则化自相关矩阵,S是所述待处理数据的预计分类标签对应的所有类别的自相关矩阵,是所述待处理数据的预计分类标签对应的所有类别的正则化自相关矩阵。
  41. 根据权利要求35所述的装置,其特征在于,所述待处理数据的预计分类标签对应的分类分布信息,包括以下一项或多项:距离信息,相关性信息,差分信息或软分类信息。
  42. 根据权利要求41所述的装置,其特征在于,



    其中,Z是所述待处理数据,是所述待处理数据的预计分类标签对应的分类分布信息,是所述第l层第i类的网络参数,Zl为所述待处理数据的第l层的数据特征,Zl-1为所述待处理数据的第(l-1)层的数据特征,<>表示内积。
  43. 根据权利要求35所述的装置,其特征在于,所述第二确定单元,具体用于:
    根据所述待处理数据的预计分类标签对应的分类分布信息确定梯度参数(G和Hi);
    根据所述待处理数据和所述梯度参数确定所述目标函数梯度表达式。
  44. 根据权利要求43所述的装置,其特征在于,

    G=[g1,g2,…,gi];
    其中,是所述待处理数据的预计分类标签对应的分类分布信息,Tr()表示迹运算,I是单位矩阵,mi是m个待处理数据中第i类的个数,K是m个待处理数据中预计分类标签的所有类别的个数,G和Hi表示梯度参数。
  45. 根据权利要求44所述的装置,其特征在于,所述目标函数梯度表达式包括:
    其中,Z是所述待处理数据,σ为高斯分布方差,∈是正则化参数,I是单位矩阵,G和Hi表示 梯度参数,β表示正则化参数。
  46. 根据权利要求25-45任一项所述的装置,其特征在于,
    其中,Zl是待处理数据的数据特征,是目标函数梯度表达式,Zl-1是所述待处理数据,Zl-1约束在d-1维单位球空间内。
  47. 根据权利要求25-46任一项所述的装置,其特征在于,所述数据处理装置还包括输出单元;
    所述输出单元,用于输出所述待处理数据的数据特征。
  48. 根据权利要求25-47任一项所述的装置,其特征在于,所述未知分类或聚类信息的待处理数据是第三数据的数据特征,所述第三数据的数据特征是经过另一个前馈神经网络确定的,所述另一个前馈神经网络中的第l层的输入信息包括训练数据的分类分布信息和第一数据特征,所述第l层的输出信息包括第二数据特征,所述第一数据特征为第(l-1)层的输出,所述第一数据特征和所述第二数据特征均是用于表示所述训练数据的分类或聚类信息,其中,l为大于1的正整数。
  49. 一种数据处理装置,其特征在于,所述装置包括至少一个处理器和通信接口,所述至少一个处理器调用存储器中存储的计算机程序或指令来执行如权利要求1-24任一项所述的方法。
  50. 一种数据处理系统,其特征在于,所述系统包括:如权利要求25-48任一项所述的装置。
  51. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机指令,当其在处理器上运行时,实现如权利要求1-24任一项所述的方法。
  52. 一种计算机程序产品,所述计算机程序产品中包括计算机程序代码,当所述计算机程序代码在计算机上运行时,实现如权利要求1-24任一项所述的方法。
PCT/CN2023/082740 2022-03-23 2023-03-21 数据处理方法及装置 WO2023179593A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210290759.2 2022-03-23
CN202210290759.2A CN116863260A (zh) 2022-03-23 2022-03-23 数据处理方法及装置

Publications (1)

Publication Number Publication Date
WO2023179593A1 true WO2023179593A1 (zh) 2023-09-28

Family

ID=88099975

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/082740 WO2023179593A1 (zh) 2022-03-23 2023-03-21 数据处理方法及装置

Country Status (2)

Country Link
CN (1) CN116863260A (zh)
WO (1) WO2023179593A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117057257A (zh) * 2023-10-11 2023-11-14 云南电投绿能科技有限公司 一种测风塔数据的插值计算方法、装置、设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080065574A1 (en) * 2006-09-08 2008-03-13 Morgan Stanley Adaptive database management and monitoring
US20200034366A1 (en) * 2018-07-27 2020-01-30 drchrono inc. Identifying Missing Questions by Clustering and Outlier Detection
CN110751230A (zh) * 2019-10-30 2020-02-04 深圳市太赫兹科技创新研究院有限公司 物质分类方法、装置、终端设备及存储介质
CN112069313A (zh) * 2020-08-12 2020-12-11 北京工业大学 一种基于bert与双向lstm、注意力机制融合的灾难信息博文分类方法
CN113627471A (zh) * 2021-07-03 2021-11-09 西安电子科技大学 一种数据分类方法、系统、设备及信息数据处理终端

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080065574A1 (en) * 2006-09-08 2008-03-13 Morgan Stanley Adaptive database management and monitoring
US20200034366A1 (en) * 2018-07-27 2020-01-30 drchrono inc. Identifying Missing Questions by Clustering and Outlier Detection
CN110751230A (zh) * 2019-10-30 2020-02-04 深圳市太赫兹科技创新研究院有限公司 物质分类方法、装置、终端设备及存储介质
CN112069313A (zh) * 2020-08-12 2020-12-11 北京工业大学 一种基于bert与双向lstm、注意力机制融合的灾难信息博文分类方法
CN113627471A (zh) * 2021-07-03 2021-11-09 西安电子科技大学 一种数据分类方法、系统、设备及信息数据处理终端

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117057257A (zh) * 2023-10-11 2023-11-14 云南电投绿能科技有限公司 一种测风塔数据的插值计算方法、装置、设备及存储介质
CN117057257B (zh) * 2023-10-11 2024-01-26 云南电投绿能科技有限公司 一种测风塔数据的插值计算方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN116863260A (zh) 2023-10-10

Similar Documents

Publication Publication Date Title
WO2021190451A1 (zh) 训练图像处理模型的方法和装置
WO2021238281A1 (zh) 一种神经网络的训练方法、图像分类系统及相关设备
WO2020221200A1 (zh) 神经网络的构建方法、图像处理方法及装置
WO2021043112A1 (zh) 图像分类方法以及装置
WO2021155792A1 (zh) 一种处理装置、方法及存储介质
CN112990211B (zh) 一种神经网络的训练方法、图像处理方法以及装置
CN112150821B (zh) 轻量化车辆检测模型构建方法、系统及装置
WO2021147325A1 (zh) 一种物体检测方法、装置以及存储介质
CN112069868A (zh) 一种基于卷积神经网络的无人机实时车辆检测方法
US11816149B2 (en) Electronic device and control method thereof
WO2021164750A1 (zh) 一种卷积层量化方法及其装置
JP2016062610A (ja) 特徴モデル生成方法及び特徴モデル生成装置
Ayachi et al. Pedestrian detection based on light-weighted separable convolution for advanced driver assistance systems
Shen et al. A convolutional neural‐network‐based pedestrian counting model for various crowded scenes
CN110222718B (zh) 图像处理的方法及装置
WO2022007867A1 (zh) 神经网络的构建方法和装置
WO2021129668A1 (zh) 训练神经网络的方法和装置
CN113807399A (zh) 一种神经网络训练方法、检测方法以及装置
WO2022217434A1 (zh) 感知网络、感知网络的训练方法、物体识别方法及装置
WO2021190433A1 (zh) 更新物体识别模型的方法和装置
US20230154157A1 (en) Saliency-based input resampling for efficient object detection
WO2023179593A1 (zh) 数据处理方法及装置
WO2023125628A1 (zh) 神经网络模型优化方法、装置及计算设备
Gupta et al. A novel finetuned YOLOv6 transfer learning model for real-time object detection
CN113011568A (zh) 一种模型的训练方法、数据处理方法及设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23773840

Country of ref document: EP

Kind code of ref document: A1