WO2021022521A1

WO2021022521A1 - Method for processing data, and method and device for training neural network model

Info

Publication number: WO2021022521A1
Application number: PCT/CN2019/099653
Authority: WO
Inventors: 李成
Original assignee: 华为技术有限公司
Priority date: 2019-08-07
Filing date: 2019-08-07
Publication date: 2021-02-11
Also published as: CN112639828A

Abstract

A method for processing data, comprising: acquiring a plurality of pieces of data to be processed (501); processing the plurality of pieces of data by using a first neural network model to obtain a plurality of first vectors that are in one-to-one correspondence with the plurality of pieces of data (502), wherein the first neural network model is obtained by training on the basis of general data; acquiring first association relationship information (503), the first association relationship information being used to indicate at least one first vector group, and each first vector group comprising two first vectors that satisfy a priori hypothesis; and inputting the plurality of first vectors and the first association relationship information into a second neural network model to obtain a processing result for first data to be processed (504), said first data being any data among the plurality of pieces of data. The purpose of the described method for processing data is to weaken the dependence of a neural network model on training data.

Description

Data processing method, method and equipment for training neural network model

Technical field

This application relates to the field of neural networks, in particular to methods for data processing in neural network systems, methods and equipment for training neural network models.

Background technique

Artificial intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge, and use knowledge to obtain the best results. As an important branch of artificial intelligence, deep learning (DL) has received extensive attention and in-depth research from academia and industry. It has not only produced many theoretical innovations, but also has many practical applications in the industry, such as image processing and voice. Recognition, motion analysis, etc.

The trained neural network model sometimes depends on the data to be trained and cannot solve problems in other fields than the field of the data to be trained. For example, when the data to be trained is input into a deep neural network model, the data processing results obtained often match the characteristics of the input data; and when the deep neural network model is actually used, the output result matches the characteristics of the input data Degree is poor. Therefore, in order to weaken the dependence of the neural network model on the training data, it is necessary to provide a new method of constructing the neural network model.

Summary of the invention

This application provides a data processing method, a method and equipment for training a neural network model, with the purpose of reducing the dependence of the neural network model on training data.

In a first aspect, a data processing method is provided, which includes: obtaining a plurality of data to be processed; using a first neural network model to process the plurality of data to be processed, and obtain a one-to-one relationship with the plurality of data to be processed. Corresponding multiple first vectors, wherein the first neural network model is obtained based on general data training; acquiring first association information, the first association information is used to indicate at least one first vector group, each The first vector group includes two first vectors satisfying a priori hypothesis; the multiple first vectors and the first association relationship information are input into the second neural network model to obtain the processing result for the first data to be processed, The first data to be processed is any data in the plurality of data to be processed.

Optionally, the first neural network model is a convolutional neural network model or a graph neural network model. For example, the first neural network model may be one of a deep convolutional neural network model, a graph convolutional neural network model, and a graph attention neural network model.

In a possible implementation manner, the second neural network model is a graph network model, and accordingly, the multiple first vectors are used as nodes of the graph network model, and the first association relationship is used as the graph network model. The side of the network model.

The first neural network model and the second neural network model may be two sub-models of a certain neural network model.

The first neural network model and the second neural network model can be stored on two different devices, that is to say, the steps in the data processing method provided in this application can be executed by multiple devices. For example, the first neural network model is stored on the first device, and the first device can perform the steps of "obtain multiple data to be processed" and "use the first neural network model to process the multiple data to be processed to obtain In the step of “a plurality of first vectors corresponding to the plurality of data to be processed in a one-to-one relationship”, a second neural network model is stored on the second device, and the second device can execute “acquire the first association information, the first association The relationship information is used to indicate at least one first vector group, and each first vector group includes two first vectors satisfying a priori hypothesis" and "combining the multiple first vectors and the first association relationship information Input the second neural network model to obtain the processing result for the first to-be-processed data. The first to-be-processed data is any one of the multiple to-be-processed data" steps. Wherein, multiple first vectors may be transmitted through the communication interface between the first device and the second device.

In the embodiments of the present application, the first neural network model is trained using general data, and a general model that is not affected by the scene or less affected by the scene can be obtained, so the first neural network model model can be applied in a variety of scenarios. However, since the application of the first neural network model is not limited by the scene, it is difficult to achieve high-accuracy recognition of any scene using only the first neural network model. Therefore, multiple feature vectors output by the first neural network model can be input into the second neural network model, so that the first neural network model can be applied in relatively special scenes, so that the second neural network model can learn general scenes and special scenes The difference and association between. Existing neural network models usually only recognize a particular scene. Once applied in other fields, most of the parameters of the neural network model can no longer be used. Since the second neural network model can learn the difference and association between general scenarios and special scenarios, and because the data input to the first neural network model can be general data, the method provided in this application can weaken the scenario where the data to be processed is located on the neural network Restrictions on model architecture and parameters. In addition, in order to enhance the recognition accuracy of the second neural network model, while identifying the first data to be processed, data associated with the first data to be processed will also be considered. As the amount of processed data increases, it is beneficial to increase the second The recognition accuracy of the neural network model. In addition, since the correlation between data and data is considered, the learning of the data relationship by the second neural network model can be enhanced.

With reference to the first aspect, in some implementations of the first aspect, the first association relationship information is used to indicate N of the first vector groups, where N is an integer greater than 1, and in the combination of the plurality of Before the first vector and the first association relationship information are input into the second neural network model, and the processing result for the first to-be-processed data is obtained, the method further includes: acquiring second association relationship information, the second association relationship information Used to indicate n second vector groups, the n second vector groups belong to the N first vector groups, n is less than N, and n is a positive integer; the plurality of first vectors and all the The first association relationship information is input into a second neural network model to obtain a processing result for the first to-be-processed data, including: combining the plurality of first vectors, the first association relationship information, and the second association relationship information Input the second neural network model to obtain the processing result for the first data to be processed.

In the embodiment of the present application, when the first association relationship information only indicates that there is an association relationship between two first vectors, the first association relationship information cannot reflect the strength of the association between the two first vectors. The second association relationship information may indicate one or more first vector groups with a strong association relationship or a weak association relationship among the plurality of first vector groups, so that the second neural network model can consider in addition to the data associated with the first to-be-processed data The to-be-processed data can also strengthen the influence of the to-be-processed data that is closely related to the first to-be-processed data on the first to-be-processed data, or weaken the to-be-processed data that is distantly associated with the first to-be-processed data. Because of the influence of processing data, more data can be obtained to identify the first data to be processed.

With reference to the first aspect, in some implementations of the first aspect, the acquiring multiple data to be processed includes: acquiring target data, where the target data is one of the multiple data to be processed; acquiring associated data , There is an association relationship between the associated data and the target data that satisfies the a priori hypothesis, and the plurality of data to be processed includes the associated data.

In the embodiment of the present application, the associated data can be flexibly introduced according to the data to be processed, which improves the flexibility of obtaining the data to be processed and avoids the introduction of unnecessary redundant data.

With reference to the first aspect, in some implementations of the first aspect, the first association relationship information includes a second association relationship matrix, and the vector in the first dimension in the second association relationship matrix includes the The first vector corresponds to a plurality of elements in a one-to-one relationship, the vector in the second dimension in the second association relationship matrix includes a plurality of elements in one-to-one correspondence with the plurality of first vectors, wherein the second association relationship Any element in the matrix is used to indicate whether the vector corresponding to the any element in the first dimension and the vector corresponding to the any element in the second dimension has a value that satisfies the a priori hypothesis. connection relation.

In the embodiment of the present application, a matrix is used to represent the association relationship between multiple first vectors, avoiding the introduction of multiple different types of data structures in the second neural network model, which facilitates calculation.

With reference to the first aspect, in some implementations of the first aspect, the using a first neural network model to process the plurality of data to be processed includes: using the first neural network model to process the plurality of The data to be processed and the fifth association relationship information are processed, where the fifth association relationship information is used to indicate at least one data group to be processed, and each data group to be processed includes two data to be processed that satisfy a priori assumption.

In this embodiment of the application, in order to enhance the recognition accuracy of the first neural network model, the data associated with the first to-be-processed data is also considered while identifying the first to-be-processed data. As the amount of processed data increases, It is helpful to increase the recognition accuracy of the first neural network model. Moreover, since the correlation between the data and the data is considered, the learning of the data relationship by the first neural network model can be enhanced.

With reference to the first aspect, in some implementations of the first aspect, the fifth association relationship information includes a first association relationship matrix, and a vector located in the first dimension in the first association relationship matrix includes the A plurality of elements corresponding to the data to be processed in a one-to-one relationship, the vector in the second dimension in the first association relationship matrix includes a plurality of elements corresponding to the plurality of data to be processed in a one-to-one relationship, wherein the first association relationship Any element in the matrix is used to indicate whether the vector corresponding to the any element in the first dimension and the vector corresponding to the any element in the second dimension has a value that satisfies the a priori hypothesis. connection relation.

In the embodiment of the present application, a matrix is used to represent the association relationship between multiple to-be-processed data, avoiding the introduction of multiple different types of data structures in the first neural network model, which is conducive to simple calculation.

With reference to the first aspect, in some implementations of the first aspect, the weight parameter of the second neural network model is obtained by: acquiring a plurality of data to be trained; using the first neural network model to A plurality of to-be-trained data are processed to obtain a plurality of fourth vectors corresponding to the plurality of to-be-trained data; and third association relationship information is obtained, and the third association relationship information is used to indicate at least one third vector group , Each third vector group includes two fourth vectors satisfying the a priori hypothesis; inputting the multiple fourth vectors and the third association relationship information into the second neural network model to obtain The first processing result of the data to be trained, the first data to be trained is any data of the plurality of data to be trained, and the first processing result is used to modify the weight parameter of the second neural network model.

In the embodiments of the present application, the first neural network model is trained using general data, and a general model that is not affected by the scene or less affected by the scene can be obtained, so the first neural network model model can be applied in a variety of scenarios. The multiple feature vectors output by the first neural network model are input into the second neural network model, so that the second neural network model can realize the recognition of a relatively special scene based on the recognition result of the first neural network model. Therefore, the second neural network model can learn the difference and association between general scenes and special scenes. In order to enhance the recognition accuracy of the second neural network model, while recognizing the first data to be trained, data associated with the first data to be trained is also considered. As the amount of processed data increases, it is beneficial to increase the recognition accuracy of the second neural network model. In addition, since the correlation between data and data is considered, the learning of the data relationship by the second neural network model can be enhanced.

With reference to the first aspect, in some implementations of the first aspect, the obtaining a first processing result for the first data to be trained includes: obtaining the first processing result and a second processing result for the second data to be trained As a result of the processing, the label of the first data to be trained is a first label, and the label of the second data to be trained is a second label; the method further includes: combining the first label and the second label The similarity between the two is matched with the similarity between the first processing result and the second processing result to obtain a matching result, and the matching result is used to modify the weight parameter of the second neural network model.

In the embodiment of the present application, through the similarity between the label and the label, it can be judged whether the similarity between the two processing results is appropriate, and the learning of the association relationship between the data and the data can be strengthened by the second neural network model.

With reference to the first aspect, in some implementations of the first aspect, the third association relationship information is used to indicate M third vector groups, where M is an integer greater than 1, and in the fourth Before the vector and the third association relationship information are input into the second neural network model, and the first processing result for the first to-be-trained data is obtained, the method further includes: obtaining fourth association relationship information, the fourth association The relationship information is used to indicate m fourth vector groups, the m fourth vector groups belong to the M third vector groups, m is less than M, and m is a positive integer; the multiple fourth vector groups And inputting the third association relationship information into the second neural network model to obtain the first processing result for the first to-be-trained data includes: combining the plurality of fourth vectors, the third association relationship information, and the first Four-association relationship information is input into the second neural network model to obtain the first processing result.

In the embodiment of the present application, when the third association relationship information only indicates that there is an association relationship between the two fourth vectors, the third association relationship information cannot reflect the strength of the association between the two fourth vectors. The second association relationship information may indicate one or more third vector groups with a strong association relationship or a weak association relationship among the plurality of third vector groups, so that the second neural network model can consider in addition to the data associated with the first training data. The data to be trained can also strengthen the influence of the data to be trained closely related to the first data to be trained on the first data to be trained, or weaken the data to be trained that is distantly associated with the first data to be trained on the first data to be trained. Because of the influence of training data, more data can be obtained to identify the first data to be trained.

With reference to the first aspect, in some implementations of the first aspect, the first processing result is also used to modify the weight parameter of the first neural network model.

In the embodiment of the present application, since the association relationship between data and data can be learned during the training process, if the first processing result is also used to modify the first neural network model, the learning data of the first neural network model can be strengthened The ability to associate data between data.

With reference to the first aspect, in some implementations of the first aspect, the plurality of data to be trained includes one or more target type data, and each target type data has a label used to modify the weight parameter.

In the embodiment of the present application, a semi-supervised learning method may be used to train the second neural network model. In other words, a part of the plurality of data to be trained has a label, and the other part may not have a label. The two parts of data can be merged according to the third association information. Even if the data to be trained includes data without labels, data without labels can still be considered when modifying the second neural network model. Therefore, the number of tags of the data to be trained can be reduced, and the amount of data processing for training the second neural network model can be simplified.

With reference to the first aspect, in some implementations of the first aspect, the third association relationship information includes a fourth association relationship matrix, and a vector located in the first dimension in the fourth association relationship matrix includes the The fourth vector corresponds to a plurality of elements in a one-to-one relationship, and the vector in the second dimension in the fourth association relationship matrix includes a plurality of elements in one-to-one correspondence with the plurality of fourth vectors, wherein the fourth association relationship Any element in the matrix is used to indicate whether the vector corresponding to the any element in the first dimension and the vector corresponding to the any element in the second dimension has a value that satisfies the a priori hypothesis. connection relation.

In the embodiment of the present application, a matrix is used to represent the association relationship between multiple fourth vectors, avoiding the introduction of multiple different types of data structures in the second neural network model, which facilitates the calculation.

With reference to the first aspect, in some implementations of the first aspect, the using a first neural network model to process the plurality of data to be trained includes: using the first neural network model to process the plurality of The data to be trained and the sixth association relationship information are processed. The sixth association relationship information is used to indicate at least one to-be-trained data group, and each to-be-trained data group includes two to-be-trained data that satisfy a priori hypothesis.

In the embodiment of the present application, in order to enhance the recognition accuracy of the first neural network model, the data associated with the first data to be trained will also be considered while identifying the first data to be trained. As the amount of processed data increases, it is beneficial to increase the recognition accuracy of the first neural network model. Moreover, since the correlation between the data and the data is considered, the learning of the data relationship by the first neural network model can be enhanced.

With reference to the first aspect, in some implementations of the first aspect, the sixth association relationship information includes a third association relationship matrix, and the vector in the first dimension in the third association relationship matrix includes the A plurality of elements in one-to-one correspondence with the data to be trained, the vector in the second dimension in the third association relationship matrix includes a plurality of elements in one-to-one correspondence with the plurality of data to be trained, wherein the third association relationship Any element in the matrix is used to indicate whether the vector corresponding to the any element in the first dimension and the vector corresponding to the any element in the second dimension has a value that satisfies the a priori hypothesis. connection relation.

In the embodiment of the present application, a matrix is used to represent the association relationship between multiple data to be trained, avoiding the introduction of multiple different types of data structures in the first neural network model, which facilitates calculation.

In a second aspect, a method for training a neural network model is provided, which includes: obtaining a plurality of data to be trained; using the first neural network model to process the plurality of data to be trained, and obtain the same Multiple fourth vectors in one-to-one correspondence; acquiring third association relationship information, where the third association relationship information is used to indicate at least one third vector group, and each third vector group includes two that satisfy the a priori hypothesis A fourth vector; input the multiple fourth vectors and the third association relationship information into a second neural network model to obtain a first processing result for the first data to be trained, and the first data to be trained is the For any one of the multiple data to be trained, the first processing result is used to modify the weight parameter of the second neural network model.

In the embodiment of the present application, the first neural network model can be obtained through training on the training data of scenario 1. Input the to-be-trained data of scene 2 into the first neural network model to output multiple feature vectors; then input the multiple feature vectors into the second neural network model, so that the second neural network model can be identified in the first neural network model Based on the results, the recognition of scene 2 is realized. Therefore, the second neural network model can learn the difference and association between scene 1 and scene 2. In order to enhance the recognition accuracy of the second neural network model, while recognizing the first data to be trained, data associated with the first data to be trained is also considered. As the amount of processed data increases, it is beneficial to increase the recognition accuracy of the second neural network model. In addition, since the correlation between data and data is considered, the learning of the data relationship by the second neural network model can be enhanced.

With reference to the second aspect, in some implementations of the second aspect, the obtaining the first processing result for the first data to be trained includes: obtaining the first processing result and the second processing result for the second data to be trained As a result of the processing, the label of the first data to be trained is the first label, the label of the second data to be trained is the second label, and the first data to be trained and the second data to be trained are the multiple Any two of the data to be trained; the method further includes: comparing the similarity between the first label and the second label to the difference between the first processing result and the second processing result To obtain a matching result, and the matching result is used to modify the weight parameter of the second neural network model.

With reference to the second aspect, in some implementations of the second aspect, the third association relationship information is used to indicate M third vector groups, and when the plurality of fourth vectors are associated with the third vector group, Before the relationship information is input to the second neural network model and the first processing result for the first data to be trained is obtained, the method further includes: obtaining fourth association relationship information, where the fourth association relationship information is used to indicate m A fourth vector group, the m fourth vector groups belong to the M third vector groups, m is less than M, and m is a positive integer; and the plurality of fourth vectors and the third association relationship Inputting information into the second neural network model to obtain a first processing result for the first data to be trained includes: inputting the plurality of fourth vectors, the third association relationship information, and the fourth association relationship information into the The second neural network model obtains the first processing result.

With reference to the second aspect, in some implementations of the second aspect, the first processing result is also used to modify the weight parameter of the first neural network model.

With reference to the second aspect, in some implementations of the second aspect, the plurality of data to be trained includes one or more target type data, and each target type data has a label used to modify the weight parameter.

With reference to the second aspect, in some implementations of the second aspect, the third association relationship information includes a fourth association relationship matrix, and a vector located in the first dimension in the fourth association relationship matrix includes the The fourth vector corresponds to a plurality of elements in a one-to-one relationship, and the vector in the second dimension in the fourth association relationship matrix includes a plurality of elements in one-to-one correspondence with the plurality of fourth vectors, wherein the fourth association relationship Any element in the matrix is used to indicate whether the vector corresponding to the any element in the first dimension and the vector corresponding to the any element in the second dimension has a value that satisfies the a priori hypothesis. connection relation.

With reference to the second aspect, in some implementations of the second aspect, the using the first neural network model to process the plurality of data to be trained includes: using the first neural network model to process the plurality of The data to be trained and the sixth association relationship information are processed. The sixth association relationship information is used to indicate at least one to-be-trained data group, and each to-be-trained data group includes two to-be-trained data that satisfy a priori hypothesis.

With reference to the second aspect, in some implementations of the second aspect, the sixth association relationship information includes a third association relationship matrix, and the vector in the first dimension in the third association relationship matrix includes the A plurality of elements in one-to-one correspondence with the data to be trained, the vector in the second dimension in the third association relationship matrix includes a plurality of elements in one-to-one correspondence with the plurality of data to be trained, wherein the third association relationship Any element in the matrix is used to indicate whether the vector corresponding to the any element in the first dimension and the vector corresponding to the any element in the second dimension has a value that satisfies the a priori hypothesis. connection relation.

With reference to the second aspect, in some implementations of the second aspect, the first neural network model is obtained based on general data training.

In the embodiments of the present application, the first neural network model is trained using general data, and a general model that is not affected by the scene or less affected by the scene can be obtained, so the first neural network model model can be applied in a variety of scenarios. The multiple feature vectors output by the first neural network model are input into the second neural network model, so that the second neural network model can realize the recognition of a relatively special scene based on the recognition result of the first neural network model. Therefore, the second neural network model can learn the difference and association between the general scene and the special scene.

In a third aspect, a method for training a neural network model is provided, including: acquiring a plurality of data to be trained; inputting the plurality of data to be trained and the seventh association relationship information into a second neural network model to obtain A first processing result of the data to be trained and a second processing result of the second data to be trained, the label of the first data to be trained is the first label, and the label of the second data to be trained is the second label, The first to-be-trained data and the second to-be-trained data are any two pieces of data among the plurality of to-be-trained data; the method further includes: separating the first label and the second label The similarity of is matched with the similarity between the first processing result and the second processing result to obtain a matching result, and the matching result is used to modify the weight parameter of the second neural network model.

With reference to the third aspect, in some implementations of the third aspect, the method further includes: acquiring seventh association relationship information, where the seventh association relationship information is used to indicate at least one first training data set, and each first training data set is A training data set includes two data to be trained that satisfy the prior hypothesis.

In the embodiment of the present application, in order to enhance the recognition accuracy of the second neural network model, the data associated with the first data to be trained will also be considered while identifying the first data to be trained. As the amount of processed data increases, it is beneficial to increase the recognition accuracy of the second neural network model. In addition, since the correlation between data and data is considered, the learning of the data relationship by the second neural network model can be enhanced.

With reference to the third aspect, in some implementations of the third aspect, the seventh association relationship information is used to indicate the H first training data sets, and in the combination of the plurality of to-be-trained data and the seventh Before the association relationship information is input into the second neural network model and the first processing result for the first data to be trained is obtained, the method further includes: obtaining eighth association relationship information, where the eighth association relationship information is used to indicate h Second data groups to be trained, the h second data groups to be trained belong to the H first training data groups, h is less than H, and h is a positive integer; the multiple data groups to be trained are The seventh association relationship information is input into the second neural network model to obtain the first processing result for the first to-be-trained data, including: combining the plurality of to-be-trained data, the seventh association relationship information, and the eighth The association relationship information is input into the second neural network model to obtain the first processing result.

With reference to the third aspect, in some implementations of the third aspect, the plurality of to-be-trained data includes one or more target type data, and each target type data has a label for modifying the weight parameter.

With reference to the third aspect, in some implementations of the third aspect, the seventh association relationship information includes a fifth association relationship matrix, and a vector located in the first dimension in the fifth association relationship matrix includes the A plurality of elements corresponding to the data to be trained in a one-to-one relationship, the vector in the second dimension in the fifth association relationship matrix includes a plurality of elements in one-to-one correspondence with the plurality of data to be trained, wherein the fifth association relationship Any element in the matrix is used to indicate whether the data to be trained corresponding to any element in the first dimension and the data to be trained corresponding to any element in the second dimension satisfy the A priori hypothesis of the relationship.

In a fourth aspect, a data processing device is provided, and the device includes a module for executing the first aspect or the method in any possible implementation manner of the first aspect.

Optionally, the device may be a cloud server or a terminal device.

In a fifth aspect, a device for training a neural network model is provided, and the device includes a module for executing the second aspect or the method in any possible implementation of the second aspect.

Optionally, the device may be a cloud server or a terminal device.

In a sixth aspect, a device for training a neural network model is provided, and the device includes a module for executing the third aspect or the method in any possible implementation manner of the third aspect.

Optionally, the device may be a cloud server or a terminal device.

In a seventh aspect, a data processing device is provided. The device includes: a memory for storing a program; a processor for executing the program stored in the memory, and when the program stored in the memory is executed, the The processor is configured to execute the method in any one of the implementation manners in the first aspect.

Optionally, the device may be a cloud server or a terminal device.

In an eighth aspect, a device for training a neural network model is provided. The device includes: a memory for storing a program; a processor for executing the program stored in the memory, and when the program stored in the memory is executed, The processor is configured to execute the method in any one of the implementation manners in the second aspect.

Optionally, the device may be a cloud server or a terminal device.

In a ninth aspect, a device for training a neural network model is provided. The device includes: a memory for storing a program; a processor for executing the program stored in the memory, and when the program stored in the memory is executed, The processor is configured to execute the method in any one of the implementation manners of the third aspect.

Optionally, the device may be a cloud server or a terminal device.

In a tenth aspect, a computer-readable medium is provided, and the computer-readable medium stores program code for device execution. The program code includes a method for executing any one of the first to third aspects. .

An eleventh aspect provides a computer program product containing instructions, when the computer program product runs on a computer, the computer executes the method in any one of the foregoing first to third aspects.

In a twelfth aspect, a chip is provided. The chip includes a processor and a data interface. The processor reads instructions stored in a memory through the data interface, and executes any one of the first to third aspects. One way to achieve this.

Optionally, as an implementation manner, the chip may further include a memory in which instructions are stored, and the processor is configured to execute the instructions stored in the memory. When the instructions are executed, the The processor is configured to execute the method in any one of the implementation manners of the first aspect to the third aspect.

Description of the drawings

FIG. 1 is a schematic diagram of a convolutional neural network architecture provided by an embodiment of the present application.

Fig. 2 is a schematic diagram of a graph model provided by an embodiment of the application.

FIG. 3 is a schematic diagram of a system architecture provided by an embodiment of the application.

FIG. 4 is a schematic diagram of the hardware structure of a chip provided by an embodiment of the application.

FIG. 5 is a schematic diagram of a system architecture provided by an embodiment of the application.

FIG. 6 is a schematic flowchart of a data processing method provided by an embodiment of the application.

FIG. 7 is a schematic flowchart of a method for training a neural network model provided by an embodiment of the application.

FIG. 8 is a schematic block diagram of a data processing device provided by an embodiment of the application.

Fig. 9 is a schematic block diagram of a device for training a neural network model provided by an embodiment of the present application.

detailed description

The technical solution in this application will be described below in conjunction with the drawings.

(1) Neural network

A neural network can be composed of neural units. A neural unit can refer to an arithmetic unit that takes x _s and intercept 1 as inputs. The output of the arithmetic unit can be:

Among them, s=1, 2,...n, n is a natural number greater than 1, W _s is the weight of x _s , and b is the bias of the neural unit. f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal. The output signal of the activation function can be used as the input of the next convolutional layer. The activation function can be a sigmoid function. A neural network is a network formed by connecting many of the above-mentioned single neural units together, that is, the output of one neural unit can be the input of another neural unit. The input of each neural unit can be connected with the local receptive field of the previous layer to extract the characteristics of the local receptive field. The local receptive field can be a region composed of several neural units.

(2) Deep neural network

Deep neural network (DNN), also known as multi-layer neural network, can be understood as a neural network with many hidden layers. There is no special metric for "many" here. According to the location of different layers of DNN, the neural network inside DNN can be divided into three categories: input layer, hidden layer, and output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the number of layers in the middle are all hidden layers. The layers are fully connected, that is to say, any neuron in the i-th layer must be connected to any neuron in the i+1th layer. Although DNN looks complicated, it is not complicated in terms of the work of each layer. In simple terms, it is the following linear relationship expression:

among them,

Is the input vector,

Is the output vector,

Is the offset vector, W is the weight matrix (also called coefficient), and α() is the activation function. Each layer is just the input vector

After such a simple operation, the output vector is obtained

Due to the large number of DNN layers, the coefficient W and the offset vector

The number is also a lot. The definition of these parameters in the DNN is as follows: Take the coefficient W as an example: Suppose that in a three-layer DNN, the linear coefficients from the fourth neuron in the second layer to the second neuron in the third layer are defined as

The superscript 3 represents the number of layers where the coefficient W is located, and the subscript corresponds to the output third layer index 2 and the input second layer index 4. The summary is: the coefficient from the kth neuron of the L-1th layer to the jth neuron of the Lth layer is defined as

It should be noted that the input layer has no W parameter. In deep neural networks, more hidden layers make the network more capable of portraying complex situations in the real world. Theoretically speaking, a model with more parameters is more complex and has a greater "capacity", which means it can complete more complex learning tasks. Training a deep neural network is also a process of learning a weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (a weight matrix formed by vectors W of many layers).

(3) Convolutional neural network

Convolutional Neural Network (CNN, Convolutional Neuron Network) is a deep neural network with a convolutional structure. The convolutional neural network contains a feature extractor composed of a convolutional layer and a sub-sampling layer. The feature extractor can be seen as a filter, and the convolution process can be seen as using a trainable filter to convolve with an input image or convolution feature map. The convolutional layer refers to the neuron layer that performs convolution processing on the input signal in the convolutional neural network. In the convolutional layer of a convolutional neural network, a neuron can be connected to only part of the neighboring neurons. A convolutional layer usually contains several feature planes, and each feature plane can be composed of some rectangularly arranged neural units. Neural units in the same feature plane share weights, and the shared weights here are the convolution kernels. Sharing weight can be understood as the way to extract image information is independent of location. The underlying principle is that the statistical information of a certain part of the image is the same as that of other parts. This means that the image information learned in one part can also be used in another part. Therefore, the image information obtained by the same learning can be used for all positions on the image. In the same convolution layer, multiple convolution kernels can be used to extract different image information. Generally, the more the number of convolution kernels, the richer the image information reflected by the convolution operation.

The convolution kernel can be initialized in the form of a matrix of random size. During the training of the convolutional neural network, the convolution kernel can obtain reasonable weights through learning. In addition, the direct benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.

As shown in FIG. 1, a convolutional neural network (CNN) 400 may include an input layer 410, a convolutional layer/pooling layer 420 (the pooling layer is optional), and a neural network layer 430.

Convolutional layer/pooling layer 420:

Convolutional layer:

As shown in Figure 1, the convolutional layer/pooling layer 420 may include layers 421-426. For example, in one implementation, layer 421 is a convolutional layer, layer 422 is a pooling layer, and layer 423 is a convolutional layer. Build layers, 424 layers are pooling layers, 425 are convolutional layers, and 426 are pooling layers; in another implementation, 421 and 422 are convolutional layers, 423 are pooling layers, and 424 and 425 are convolutional layers. Layer, 426 is the pooling layer. That is, the output of the convolutional layer can be used as the input of the subsequent pooling layer, or as the input of another convolutional layer to continue the convolution operation.

The following will take the convolutional layer 421 as an example to introduce the internal working principle of a convolutional layer.

The convolution layer 421 can include many convolution operators. The convolution operator is also called a kernel. Its role in image processing is equivalent to a filter that extracts specific information from the input image matrix. The convolution operator is essentially It can be a weight matrix. This weight matrix is usually pre-defined. In the process of convolution on the image, the weight matrix is usually one pixel after one pixel (or two pixels after two pixels) along the horizontal direction on the input image. ...It depends on the value of stride) to complete the work of extracting specific features from the image. The size of the weight matrix should be related to the size of the image. It should be noted that the depth dimension of the weight matrix and the depth dimension of the input image are the same. During the convolution operation, the weight matrix will extend to Enter the entire depth of the image. Therefore, convolution with a single weight matrix will produce a single depth dimension convolution output, but in most cases, a single weight matrix is not used, but multiple weight matrices of the same size (row × column) are applied. That is, multiple homogeneous matrices. The output of each weight matrix is stacked to form the depth dimension of the convolutional image, where the dimension can be understood as determined by the "multiple" mentioned above. Different weight matrices can be used to extract different features in the image. For example, one weight matrix is used to extract edge information of the image, another weight matrix is used to extract specific colors of the image, and another weight matrix is used to eliminate unwanted noise in the image. Perform fuzzification, etc. The multiple weight matrices have the same size (row×column), and the feature maps extracted by the multiple weight matrices of the same size have the same size, and then the multiple extracted feature maps of the same size are combined to form a convolution operation. Output.

The weight values in these weight matrices need to be obtained through a lot of training in practical applications. Each weight matrix formed by the weight values obtained through training can be used to extract information from the input image, so that the convolutional neural network 400 can make correct predictions. .

When the convolutional neural network 400 has multiple convolutional layers, the initial convolutional layer (such as 421) often extracts more general features, which can also be called low-level features; With the deepening of the network 400, the features extracted by the subsequent convolutional layers (for example, 426) become more and more complex, such as features such as high-level semantics, and features with higher semantics are more suitable for the problem to be solved.

Pooling layer:

Since it is often necessary to reduce the number of training parameters, it is often necessary to periodically introduce a pooling layer after the convolutional layer. In the layers 421-426 as illustrated by 420 in Figure 1, it can be a convolutional layer followed by a layer The pooling layer can also be a multi-layer convolutional layer followed by one or more pooling layers. In the image processing process, the only purpose of the pooling layer is to reduce the size of the image space. The pooling layer may include an average pooling operator and/or a maximum pooling operator for sampling the input image to obtain a smaller size image. The average pooling operator can calculate the pixel values in the image within a specific range to generate an average value as the result of average pooling. The maximum pooling operator can take the pixel with the largest value within a specific range as the result of the maximum pooling. In addition, just as the size of the weight matrix used in the convolutional layer should be related to the image size, the operators in the pooling layer should also be related to the image size. The size of the image output after processing by the pooling layer can be smaller than the size of the image of the input pooling layer, and each pixel in the image output by the pooling layer represents the average value or the maximum value of the corresponding sub-region of the image input to the pooling layer.

Neural network layer 430:

After processing by the convolutional layer/pooling layer 420, the convolutional neural network 400 is not enough to output the required output information. Because as mentioned above, the convolutional layer/pooling layer 420 only extracts features and reduces the parameters brought by the input image. However, in order to generate the final output information (required class information or other related information), the convolutional neural network 400 needs to use the neural network layer 430 to generate one or a group of required classes of output. Therefore, the neural network layer 430 can include multiple hidden layers (431, 432 to 43n as shown in FIG. 1) and an output layer 440. The parameters contained in the hidden layers can be based on specific task types. The relevant training data of the, for example, the task type can include image recognition, image classification, image super-resolution reconstruction and so on.

After the multiple hidden layers in the neural network layer 430, that is, the final layer of the entire convolutional neural network 400 is the output layer 440. The output layer 440 has a loss function similar to the classification cross entropy, which is specifically used to calculate the prediction error. Once the forward propagation of the entire convolutional neural network 400 (as shown in Figure 1, the propagation from the 410 to 440 direction is forward propagation) is completed, the back propagation (as shown in Figure 1, the propagation from the 440 to 410 direction is the back propagation) will be Start to update the weight values and deviations of the aforementioned layers to reduce the loss of the convolutional neural network 400 and the error between the output result of the convolutional neural network 400 through the output layer and the ideal result.

It should be noted that the convolutional neural network 400 shown in FIG. 1 is only used as an example of a convolutional neural network. In specific applications, the convolutional neural network may also exist in the form of other network models.

(4) Recurrent Neural Networks (RNN) are used to process sequence data. In the traditional neural network model, from the input layer to the hidden layer and then to the output layer, the layers are fully connected, and each node in each layer is disconnected. Although this ordinary neural network has solved many problems, it is still powerless for many problems. For example, if you want to predict what the next word of a sentence will be, you generally need to use the previous word, because the preceding and following words in a sentence are not independent. The reason why RNN is called recurrent neural network is that the current output of a sequence is also related to the previous output. The specific form is that the network will memorize the previous information and apply it to the calculation of the current output, that is, the nodes between the hidden layer are no longer unconnected but connected, and the input of the hidden layer includes not only The output of the input layer also includes the output of the hidden layer at the previous moment. In theory, RNN can process sequence data of any length. The training of RNN is the same as the training of traditional CNN or DNN. The error backpropagation algorithm is also used, but there is a difference: that is, if the RNN is network expanded, then the parameters, such as W, are shared; this is not the case with the traditional neural network mentioned above. And in the gradient descent algorithm, the output of each step depends not only on the current step of the network, but also on the state of the previous steps of the network. This learning algorithm is called backpropagation through time (BPTT).

(5) Loss function

In the process of training a deep neural network, because it is hoped that the output of the deep neural network is as close as possible to the value that you really want to predict, you can compare the predicted value of the current network with the target value you really want, and then based on the difference between the two To update the weight vector of each layer of neural network (of course, there is usually an initialization process before the first update, that is, pre-configured parameters for each layer in the deep neural network), for example, if the predicted value of the network If it is high, adjust the weight vector to make it predict lower, and keep adjusting until the deep neural network can predict the really wanted target value or a value very close to the really wanted target value. Therefore, it is necessary to predefine "how to compare the difference between the predicted value and the target value". This is the loss function or objective function, which is used to measure the difference between the predicted value and the target value. Important equation. Among them, take the loss function as an example. The higher the output value (loss) of the loss function, the greater the difference. Then the training of the deep neural network becomes a process of reducing this loss as much as possible. The loss function is usually a multivariate function, and the gradient can reflect the rate of change of the output value of the loss function when the variable changes. The greater the absolute value of the gradient, the greater the rate of change of the output value of the loss function, and the loss can be calculated when updating different parameters. The gradient of the function continuously updates the parameters along the direction of the fastest gradient drop, reducing the output value of the loss function as soon as possible.

(6) Backpropagation algorithm

Convolutional neural networks can use backpropagation (BP) algorithms to modify the size of the parameters in the initial super-resolution model during the training process, so that the reconstruction error loss of the super-resolution model becomes smaller and smaller. Specifically, forwarding the input signal to the output will cause error loss, and the parameters in the initial super-resolution model are updated by backpropagating the error loss information, so that the error loss is converged. The backpropagation algorithm is a backpropagation motion dominated by error loss, and aims to obtain the optimal super-resolution model parameters, such as a weight matrix.

(7) Generative confrontation network

Generative adversarial networks (GAN) is a deep learning model. The model includes at least two modules: one module is a generative model, and the other is a discriminative model. Through these two modules, they learn from each other to produce better output. Both the generative model and the discriminant model can be a neural network, specifically a deep neural network, or a convolutional neural network. The basic principle of GAN is as follows: Take the GAN that generates pictures as an example, suppose there are two networks, G (generator) and D (discriminator), where G is a network that generates pictures, and it receives a random noise z through this noise Generate a picture and mark it as G(z); D is a discriminant network used to discriminate whether a picture is "real". Its input parameter is x, x represents a picture, and the output D(x) represents the probability that x is a real picture. If it is 1, it means 100% is a real picture, and if it is 0, it means it cannot be real. image. In the process of training this generative confrontation network, the goal of generating network G is to generate as real pictures as possible to deceive the discriminating network D, and the goal of discriminating network D is to try to distinguish the pictures generated by G from the real pictures Come. In this way, G and D constitute a dynamic "game" process, that is, the "confrontation" in the "generative confrontation network". As a result of the final game, in an ideal state, G can generate a picture G(z) that is enough to "make it fake", but it is difficult for D to determine whether the picture generated by G is real, that is, D(G(z))=0.5. In this way, an excellent generative model G is obtained, which can be used to generate pictures.

(8) Graph neural network

In computer science, a graph is a data structure, which is composed of two parts: nodes and edges between nodes. Therefore, the graph can be expressed by the formula G=(V,E), G is the graph, and V is Node set, E is edge set, as shown in Figure 2. Nodes are sometimes called vertices. The edge between node n1 and node n2 can be expressed as (n1, n2). Graph Neural Network (GNN) is a neural network that runs directly on the graph data structure. Among them, the label of node n in the node set can be represented by a vector, and the label of edge (n1, n2) in the edge set can also be represented by a vector. Therefore, the features of the nodes n1 and/or n2 can be obtained through the labels of the nodes n1 and n2 and the labels of the edges (n1, n2). The graph neural network can include an input layer, an output layer, and one or more hidden layers.

The purpose of the graph neural network is to train a state embedding function h _v =f(x _v ,x _co[v] ,h _ne[v] ,x _ne[v] ). Among them, h _v is the state of node v, x _v is the feature representation of node v, x _co[v] is the feature representation of the edge associated with node v, h _ne[v] is the other associated with node v The state of the node, x _ne[v] is the characteristic representation of other nodes associated with node v. Take the node 1 shown in Figure 2 as an example, the nodes 2, 3, 4, and 6 on the inner side of the dotted line all have edges between them and the node 1, and the nodes 2, 3, 4, and 6 are all connected to the node 1. The associated node. If there is an edge connected between node v and node i, then node i is a node associated with node v, and node i can be called a neighbor node of node v.

The output function of the graph neural network model is o _v =g(h _v ,x _v ), the neural network is optimized by the loss function loss,

Where t _v is the label of node v.

(9) Graph Convolutional Neural Network

Graph Convolutional Neural Network (GCN) is a method for deep learning of graph data, which can be understood as the application of graph neural network in convolutional neural network. Graph convolutional neural networks are usually divided into two categories: spectral approaches (spectral approaches) and non-spectral approaches (non-spectral approaches). The spectral method is based on the spectral representation of the graph. Through the eigendecomposition of the graph Laplacian operator, the convolution operation is defined in the Fourier domain. The convolution operation requires intensive matrix calculation and non-local spatial filtering calculation. The non-spectral method is to directly convolve on the graph instead of on the spectrum of the graph. However, the graph convolutional neural network depends on the structure information of the graph, which leads to the model trained on a specific graph structure often cannot be directly used on other graph structures. The graph convolution operator can be:

among them,

Represents a node i in the characteristics of layer l expression, c _ij denotes a normalization factor, related to the FIG structure, N _i represents a node associated with node i, the node associated with the node i may include a node i itself, R _j represents a node Type of i. By collecting the characteristic information of each node and making nonlinear changes, the expressive ability of the model is enhanced.

(10) Graph attention neural network

The graph attention network (GAT) includes the core layer of graph attention, which distributes attention to the set of neighboring nodes associated with node i through the implicit self-attention layer. According to the characteristics of the neighboring nodes, it is the node i Allocate different weights and perform weighted summation on the features of neighboring nodes. The difference from the graph convolutional neural network is that the graph attention network does not depend on the specific graph structure. The graph attention network uses a multi-layer multi-head attention mechanism to implement attention distribution to each node under the association structure of the graph, so it can calculate the information that each node obtains from other related nodes. The essence of the multi-head attention mechanism is weighted summation, and the weight comes from the learned attention matrix and the node's own information. Therefore, the network is different from the graph convolutional neural network, and the parameters learned by the network do not depend on the specific graph structure.

Referring to FIG. 3, an embodiment of the present application provides a system architecture 100. As shown in the system architecture 100, the data collection device 160 is used to collect data to be trained. In the embodiment of the application, the data to be trained includes: image data, video data, audio data, text data, etc.; and the data to be trained is stored Database 130, the training device 120 obtains the target model/rule 101 based on the training data maintained in the database 130. The following will use Embodiment 1 to describe in more detail how the training device 120 obtains the target model/rule 101 based on the data to be trained. The target model/rule 101 can be used to implement the method for training a neural network model provided in the embodiment of the present application, that is, The target model/rule 101 may include a first neural network model and a second neural network model. Input the data to be trained into the first neural network model to obtain multiple fourth vectors and input the multiple fourth vectors into the second neural network model , By adjusting the weight parameters of the target model/rule 101 through the loss function, the trained target model/rule 101 can be obtained. It should be noted that, in actual applications, the data to be trained in the database 130 may not all come from the collection of the data collection device 160, and may also be received from other devices. In addition, it should be noted that the training device 120 does not necessarily perform the training of the target model/rule 101 based on the training data maintained by the database 130. It may also obtain the training data from the cloud or other places for model training. The above description should not be used as Limitations of the embodiments of this application.

The target model/rule 101 trained by the training device 120 can be applied to different systems or devices, such as the execution device 110 shown in FIG. 3, which can be a terminal, such as a mobile phone terminal, a tablet computer, Laptops, AR/VR, vehicle-mounted terminals, etc., can also be servers or clouds. In FIG. 3, the execution device 110 is equipped with an input/output interface 112 for data interaction with external devices. The user can input data to the input/output interface 112 through the client device 140. The input data is described in the embodiment of the present application. Can include multiple data to be processed.

The preprocessing module 113 is configured to perform processing according to the input data (such as the image data, video data, audio data, text data, etc.) received by the input/output interface 112. The input data may be the data to be processed in the embodiment of the present application. Preprocessing. In this embodiment of the application, the preprocessing module 113 may be used to extract features of input data, for example.

When the execution device 110 preprocesses input data, or when the calculation module 111 of the execution device 110 performs calculations and other related processing, the execution device 110 may call data, codes, etc. in the data storage system 150 for corresponding processing , The data, instructions, etc. obtained by corresponding processing may also be stored in the data storage system 150.

Finally, the input/output interface 112 returns the processing result to the client device 140 to provide it to the user.

It is worth noting that the training device 120 can generate corresponding target models/rules 101 based on different data to be trained for different goals or tasks, and the corresponding target models/rules 101 can be used to achieve the above goals or Complete the above tasks to provide users with the desired results.

In the case shown in FIG. 3, the user can manually set input data, and the manual setting can be operated through the interface provided by the input/output interface 112. In another case, the client device 140 can automatically send input data to the input/output interface 112. If the client device 140 is required to automatically send the input data and the user's authorization is required, the user can set the corresponding authority in the client device 140. The user can view the result output by the execution device 110 on the client device 140, and the specific presentation form may be a specific manner such as display, sound, and action. The client device 140 can also be used as a data collection terminal to collect the input data of the input/output interface 112 and the output result of the output input/output interface 112 as new sample data as shown in the figure, and store it in the database 130. Of course, it is also possible not to collect through the client device 140, but the input/output interface 112 directly uses the input data of the input/output interface 112 and the output result of the output input/output interface 112 as a new sample as shown in the figure. The data is stored in the database 130.

It is worth noting that Fig. 3 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship between the devices, devices, modules, etc. shown in the figure does not constitute any limitation. For example, in Fig. 3 The data storage system 150 is an external memory relative to the execution device 110. In other cases, the data storage system 150 may also be placed in the execution device 110.

As shown in FIG. 3, the target model/rule 101 is obtained by training according to the training device 120, and the target model/rule 101 may include the first neural network model and the second neural network model in the embodiment of the application. The first neural network model may be a convolutional neural network model or a graph neural network model, and the second neural network model may be a graph neural network model.

The following describes a chip hardware structure provided by an embodiment of the present application.

FIG. 4 is a chip hardware structure provided by an embodiment of the application, and the chip includes a neural network processor 20.

A neural network processor (Neural-network Processing Unit, NPU) 20 can be mounted as a coprocessor to a host central processing unit (Host Central Processing Unit, Host CPU), and the Host CPU allocates tasks. The core part of the NPU is the arithmetic circuit 203. The controller 204 controls the arithmetic circuit 203 to extract data from the memory (weight memory or input memory) and perform calculations.

In some implementations, the arithmetic circuit 203 includes multiple processing units (Process Engine, PE). In some implementations, the arithmetic circuit 203 is a two-dimensional systolic array. The arithmetic circuit 203 may also be a one-dimensional systolic array or other electronic circuits capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 203 is a general-purpose matrix processor.

For example, suppose there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit fetches the data corresponding to matrix B from the weight memory 202 and caches it on each PE in the arithmetic circuit. The arithmetic circuit fetches matrix A data and matrix B from the input memory 201 to perform matrix operations, and the partial or final result of the obtained matrix is stored in an accumulator 208.

The vector calculation unit 207 can perform further processing on the output of the operation circuit 203, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, and so on. For example, the vector calculation unit 207 can be used for network calculations in the non-convolutional/non-FC layer of the neural network, such as pooling, batch normalization, local response normalization, etc. .

In some implementations, the vector calculation unit 207 can store the processed output vector to the unified buffer 206. For example, the vector calculation unit 207 may apply a nonlinear function to the output of the arithmetic circuit 203, such as a vector of accumulated values, to generate the activation value. In some implementations, the vector calculation unit 207 generates a normalized value, a combined value, or both. In some implementations, the processed output vector can be used as an activation input to the arithmetic circuit 203, for example for use in a subsequent layer in a neural network.

Part or all of the steps of the method provided in this application may be executed by the arithmetic circuit 203 or the vector calculation unit 207.

The unified memory 206 is used to store input data and output data.

The weight data directly transfers the input data in the external memory to the input memory 201 and/or the unified memory 206 through the storage unit access controller 205 (Direct Memory Access Controller, DMAC), and stores the weight data in the external memory into the weight memory 202, And save the data in the unified memory 206 into the external memory.

A bus interface unit (BIU) 210 is used to implement interaction between the main CPU, the DMAC, and the fetch memory 209 through the bus.

An instruction fetch buffer 209 connected to the controller 204 is used to store instructions used by the controller 204.

The controller 204 is used to call the instructions cached in the instruction fetch memory 209 to control the working process of the computing accelerator.

Generally, the unified memory 206, the input memory 201, the weight memory 202, and the fetch memory 209 are all on-chip (On-Chip) memories. The external memory is a memory private to the NPU, and the external memory can be synchronized at double data rate. Dynamic random access memory (Double Data Rate Synchronous Dynamic Random Access Memory, referred to as DDR SDRAM), high bandwidth memory (High Bandwidth Memory, HBM) or other readable and writable memory.

As shown in FIG. 5, an embodiment of the present application provides a system architecture 300. The system architecture includes a local device 301, a local device 302, an execution device 310, and a data storage system 350. The local device 301 and the local device 302 are connected to the execution device 310 through a communication network.

The execution device 310 may be implemented by one or more servers. Optionally, the execution device 310 can be used in conjunction with other computing devices, such as data storage, routers, load balancers and other devices. The execution device 310 may be arranged on one physical site or distributed on multiple physical sites. The execution device 310 may use the data in the data storage system 350 or call the program code in the data storage system 350 to implement the method for searching the neural network structure of the embodiment of the present application.

Specifically, the execution device 310 can be built as an image recognition neural network, which can be used for image recognition or image processing.

The user can operate respective user devices (for example, the local device 301 and the local device 302) to interact with the execution device 310. Each local device can represent any computing device, such as personal computers, computer workstations, smart phones, tablets, smart cameras, smart cars or other types of cellular phones, media consumption devices, wearable devices, set-top boxes, game consoles, etc.

The local device of each user can interact with the execution device 310 through a communication network of any communication mechanism/communication standard. The communication network can be a wide area network, a local area network, a point-to-point connection, etc., or any combination thereof.

The above-mentioned execution device 310 may also be referred to as a cloud device. At this time, the execution device 310 is generally deployed in the cloud.

As mentioned above, the neural network model will depend on the data to be trained. For the data to be trained, the output result of the neural network model is close to the characteristics of the data to be trained, and the accuracy rate is high; when the trained neural network model is applied in actual use, the output of the trained neural network model is recognized The result is far from the characteristics of the input data itself, and the accuracy is low. In order to reduce the degree of dependence of the neural network model on training data, this application provides a data processing method, so that the trained neural network model can achieve high-accuracy recognition when it is applied to a specific scene.

FIG. 6 is a schematic flowchart of a data processing method provided by an embodiment of the present application. The method 500 may be executed by the execution device 110 as shown in FIG. 3. The method 500 may be executed by the neural network processor 20 as shown in FIG. 4. The method 500 may be executed by the execution device 310 as shown in FIG. 5.

501. Obtain multiple data to be processed.

The data to be processed can be understood as the data that is about to be input to the neural network model and processed by the neural network model. The data to be processed can be text data, image data, video data, audio data, etc., such as a text file, a paragraph of text in a text file, a picture file, an image block in a picture file, a frame in a video file, A video file, a video in a video file, an audio file, and an audio in an audio file. Multiple data to be processed can be multiple text files, multiple texts in a text file, multiple picture files, multiple image blocks in a picture file, multiple frames in a video file, multiple video files, Multiple pieces of video in one video file, multiple audio files, multiple pieces of audio in one audio file, etc. This application does not limit the type of data to be processed.

There are many ways to obtain the data to be processed. In an example, the plurality of to-be-processed data are stored in the database, so the device executing the method 500 can directly retrieve the plurality of to-be-processed data from the database. In an example, if a camera is provided on the device that executes the method 500, the plurality of data to be processed can be obtained by using a camera shooting method. In an example, the cloud device stores the multiple data to be processed, so the device that executes the method 500 can receive the multiple data to be processed sent by the cloud device through the communication network.

502. Use a first neural network model to process the plurality of to-be-processed data to obtain a plurality of first vectors corresponding to the plurality of to-be-processed data, wherein the first neural network model is based on a general Data training is obtained.

In other words, input multiple data to be processed into the first neural network model, and use the first neural network model to perform, for example, feature screening (filter out useful features) and feature fusion (combine multiple features) Wait for processing operations, and output a plurality of first vectors one-to-one corresponding to the plurality of data to be processed. Take the convolutional neural network shown in Figure 1 as an example. To process multiple data to be processed, the multiple data to be processed can be input from the input layer, and pass through hidden layers such as convolutional layer and/or pooling layer. Data processing is performed, and a plurality of first vectors corresponding to the plurality of data to be processed are output from the output layer of the first neural network model. Among them, the first vector can be a number or a vector containing multiple numbers.

The type of the first neural network model may be a convolutional neural network model, a graph neural network model, a graph convolutional neural network model, a graph attention neural network model, and so on. This application does not limit the type of the first neural network model.

In particular, the first neural network model may be a traditional convolutional neural network model. The output layer of the traditional convolutional neural network is a fully connected layer, which is sometimes called a classifier. In other words, the traditional convolutional neural network model can directly output the recognition result of the data to be processed. For example, if the data to be processed is an image, the traditional convolutional neural network model can directly output the recognition results of whether there is a person in the image, whether the person is male or female. The recognition result can often only represent the probability that the data to be processed belongs to a certain feature.

In particular, the first neural network model may also be a special convolutional neural network model that does not include a fully connected layer, which can output the calculation result of the convolutional layer or the pooling layer. In other words, the first neural network model can output processing results that are intermediate calculation results in the traditional convolutional neural network model. For simplicity of description, the processing results output by this special convolutional neural network model are called intermediate calculation results. Generally, the intermediate calculation result can be used to characterize part or all of the information of the data to be processed.

In particular, the first neural network model may be a graph neural network model.

Optionally, the using the first neural network model to process the plurality of data to be processed includes: using the first neural network model to process the plurality of data to be processed and the fifth association relationship information, The fifth association relationship information is used to indicate at least one data group to be processed, and each data group to be processed includes two data to be processed that satisfy a priori assumption.

The data group to be processed contains two data to be processed that have an association relationship. That is, there is an association relationship between the two to-be-processed data in the to-be-processed data group that satisfies a priori assumption. For example, if the data group to be processed is (data to be processed 1, data to be processed 2), then there is an association relationship between the data to be processed 1 and the data to be processed 2 that satisfies the a priori assumption. In other words, the plurality of data to be processed and the fifth association relationship information reflecting the association relationship between the plurality of data to be processed are input into the first neural network model, and the first neural network model can determine the data according to the fifth association relationship information Whether there is an influence between the data and the data, and the weight parameter in the first neural network model reflects the degree of influence between the data and the data, so as to obtain a plurality of first vectors that can reflect the relevance of the data. The multiple to-be-processed data correspond one to one.

Hypothesis (hypothesis) refers to the explanation of a certain phenomenon in accordance with the pre-determination, that is, based on the known scientific facts and scientific principles, the speculation and explanation of the natural phenomenon under study and its regularity, and After detailed classification, induction and analysis of the data, a temporary but acceptable explanation is obtained.

Prior probability appears in Bayesian statistical inference and refers to the prior probability distribution of random variables (usually referred to as a priori), that is, the probability of expressing a person’s belief in the variable before considering some evidence distributed.

A priori hypothesis refers to a prior probability distribution proposed for all hypotheses in the hypothesis space. Taking text data as an example, the plurality of data to be processed may be multiple paragraphs of text, wherein a paragraph of text may include multiple sentences. Normally, different paragraphs of text express different topics. Therefore, multiple sentences in a paragraph are more related, and multiple sentences belonging to different paragraphs are weak or non-relevant. Then there can be a priori hypothesis, such as a correlation between multiple sentences belonging to the same paragraph.

Taking picture data as an example, the multiple to-be-processed data may be multiple frames of pictures. Normally, as time goes by, the longer the interval between two frames, the smaller the correlation between the two frames; the shorter the interval between the two frames, the greater the correlation between the two frames Big. Then there can be a priori assumption, such as a correlation between two frames whose interval is less than a preset threshold. The preset threshold may be 8s, for example.

Taking video data as an example, the multiple pieces of to-be-processed data may be multiple pieces of videos, where, as time moves, the longer the interval between the two pieces of video, the smaller the correlation between the two pieces of videos; The shorter the video interval, the greater the correlation between the two videos. Then there may be a priori assumption, such as a correlation between two videos whose minimum interval length is less than a preset threshold. The preset threshold may be 8s, for example.

Taking audio data as an example, the multiple pieces of to-be-processed data may be multiple pieces of audio, where as time moves, the longer the interval between the two pieces of audio, the smaller the correlation between the two pieces of audio; The shorter the audio interval, the greater the correlation between the two audio segments. Then there may be a priori assumption, such as a correlation between two audio segments whose minimum interval duration is less than a preset threshold. The preset threshold may be 8s, for example.

The fifth association relationship information may be a matrix. Compared with other information types, matrix operations are more convenient.

Optionally, the fifth association relationship information includes a first association relationship matrix, and a vector in the first dimension in the first association relationship matrix includes a plurality of elements corresponding to the plurality of data to be processed one-to-one, so The vector in the second dimension in the first correlation matrix includes a plurality of elements corresponding to the plurality of data to be processed, wherein any element in the first correlation matrix is used to indicate any Whether the vector corresponding to the element in the first dimension and the vector corresponding to any element in the second dimension has an association relationship that satisfies the a priori hypothesis.

Assuming that the first correlation matrix is P,

Among them, P is a k×k matrix, the i-th column corresponds to the to-be-processed data i, the j-th row corresponds to the to-be-processed data j, the elements p _{i,j in} the i-th column and the j-th row represent the to-be-processed data i and the to-be-processed data Whether there is an association relationship between j that satisfies the prior hypothesis. When there is an association relationship between the to-be-processed data i and the to-be-processed data j _{, the} value of p _{i,j in} the i-th column and the j-th row can be 1, and there is no association between the to-be-processed data i and the to-be-processed data j , The value of element p _{i,j in} the i-th column and j-th row can be 0. Or, when there is an association relationship between the to-be-processed data i and the to-be-processed data j _{, the} value of the element p _{i,j in} the i-th column and the j-th row can be 0, and there is no relationship between the to-be-processed data i and the to-be-processed data j For the association relationship, the value of p _{i,j in} the i-th column and j-th row can be 1.

In an example, the matrix P ^T obtained after the matrix P is converted to rank is the same as the matrix P. In other words, p _i,j = p _j,i . At this time, the association relationship between the to-be-processed data i and the to-be-processed data j may be non-directional.

In an example, the matrix P ^T obtained after the matrix P is converted to rank is different from the matrix P. In other words, p _i,j ≠p _j,i . At this time, the relationship between the to-be-processed data i and the to-be-processed data j is directional. For example, p _i,j indicates that there is an association relationship between the to-be-processed data i and the to-be-processed data j, and p _j,i indicates that there is an association between the to-be-processed data i and the to-be-processed data j. The to-be-processed data j points to the association relationship of the to-be-processed data i. Or, p _i,j indicates that there is an association relationship between the to-be-processed data i and the to-be-processed data j from the to-be-processed data j to the to-be-processed data i, and p _j,i indicates that there is a relationship between the to-be-processed data i and the to-be-processed data j The data i to be processed points to the relationship of the data j to be processed.

There are at least two cases for the degree of association between multiple data to be processed.

In an example, the plurality of to-be-processed data consists of the to-be-processed data 1 and a number of to-be-processed data related to the to-be-processed data 1. As shown in Figure 2, node 1, node 2, node 3, node 4, node 6, where there is an edge connected between node 1 and node 2, and there is an edge connected between node 1 and node 3. The edge connected between node 1 and node 4 has an edge connected between node 1 and node 6.

In an example, the plurality of data to be processed includes data to be processed 1, a number of data to be processed that are associated with the data to be processed 1, and a number of data to be processed that are not associated with the data to be processed 1. As shown in Figure 2, node 1, node 4, node 5, node 6, where there is an edge connected between node 1 and node 4 and an edge connected between node 1 and node 6, and there is an edge connected between node 5 There is no edge between node 1 and node 5 for the edge between node 4 and the edge between node 5 and node 6.

For the above two situations, there can be different ways to obtain multiple data to be processed.

In an example, a plurality of to-be-processed data is acquired, and based on a priori assumption, it is determined whether there is an association relationship between any two of the plurality of to-be-processed data.

In an example, a piece of to-be-processed data is acquired, and based on a priori hypothesis, other pieces of to-be-processed data that have an association relationship with the piece of to-be-processed data are determined.

Optionally, the acquiring multiple to-be-processed data includes: acquiring target data, where the target data is one of the multiple to-be-processed data; acquiring associated data that has a priori hypothesis with the target data, so The plurality of data to be processed includes the associated data.

In other words, the device executing the method 500 first obtains the target data, and then introduces the associated data related to the target data according to a priori assumption.

Taking text data as an example, the target data may be sentence 1. When the a priori assumption is that there is a correlation between multiple sentences belonging to the same paragraph, then other sentences in the paragraph where the sentence 1 is located except for the sentence 1 are introduced as the correlation data.

Taking picture data as an example, the target data can be picture 1 in a video. When it is assumed a priori that there is an association between two frames with an interval of less than 8s, then the frame with an interval of less than 8s from the frame 1 is used as the associated data.

Taking video data as an example, the target data can be video 1. When the a priori assumption is that there is an association between two videos with a minimum interval of less than 8s, then the video with a minimum interval of less than 8s is used as the associated data.

Taking audio data as an example, the target data can be audio 1. When the a priori assumption is that there is a correlation between two pieces of audio with a minimum interval of less than 8s, then the audio with the minimum interval of audio 1 less than 8s is regarded as the associated data.

In the above example, a time interval of 8s is taken as an example to obtain the associated data. Those skilled in the art can understand that the above time interval can be adjusted according to different scenarios.

In addition, in order to reduce the dependence of the neural network model on training data, the first neural network model can be trained using general data. The so-called general data can be data that is not affected by the scene, or data that has low dependence on the scene. For example, the first neural network model is used to identify character features in images, and its training data set can include various possible scenes, such as street scenes, conference scenes, car scenes, rural scenes, Asian scenes, African scenes, European and American scenes Wait. The multiple to-be-processed data may be data applied in a specific scene. In other words, the first neural network model capable of processing general data can be used to process special data.

The process of training the first neural network model can be to input general data into the first neural network model, and the first neural network model can perform data processing operations such as feature screening and feature fusion on the general data to obtain feature vectors. Perform matrix operations on the feature vector and the weight matrix containing the weight parameters to obtain the data training result corresponding to the general data. Then, the distance between the data training result and the label of the general data is calculated, so as to modify the weight parameter of the first neural network model. The distance between the data training result and the label of the general data can be understood as the degree of similarity between the data training result and the label of the general data. The specific calculation method of the information distance can be cross entropy, KL divergence, JS divergence, etc.

Exemplarily, in order to obtain a large amount of image training data, in the process of collecting data, the data is usually collected in a video manner, and the training data can be tagged, so as to obtain the labeled data required for the training process. The specific tagging process and the interpretation of tags are common technical content in the field of deep learning, and will not be repeated in this embodiment of the application.

When the data training result is the recognition result of general data, then according to the recognition result, the distance between the data training result and the label of the general data can be obtained. For example, the recognition result of general data 1 is: the confidence that general data 1 belongs to feature 1 is 0.7, and the confidence that general data 1 belongs to feature 2 is 0.3. The label of general data 1 is: label 1, which corresponds to feature 1. Then, the recognition result of general data 1 can be represented by (0.7, 0.3), and the label of general data 1 can be represented by (1, 0). The distance between the data training result and the label of the general data may be the distance between the vector (0.7, 0.3) and the vector (1, 0).

When the data training result is an intermediate calculation result, the label of the general data may be a vector with the same dimension as the intermediate calculation result. Through vector calculation, the distance between the data training result and the label of the general data can be obtained.

503. Acquire first association relationship information, where the first association relationship information is used to indicate at least one first vector group, and each first vector group includes two first vectors that satisfy a priori assumption.

In other words, the first association relationship information reflects whether there is an association relationship between the multiple first vectors. The first vector group contains two first vectors that have an association relationship. That is, there is an association relationship between the two first vectors in the first vector group that satisfies a priori hypothesis. For example, if the first vector group indicates (first vector 1, first vector 2), then there is an association relationship between the first vector 1 and the first vector 2 that satisfies the a priori assumption. The first association relationship information reflects whether there is an influence between the multiple first vectors, so that the data processing result that can reflect the data association can be obtained according to the first association relationship information. It should be understood that the first vector may have an association relationship with itself.

In an example, since the multiple first vectors are in one-to-one correspondence with the multiple to-be-processed data, the first association relationship information may be determined according to the association relationship between the multiple to-be-processed data. That is, the first association relationship information is the same or substantially the same as the fifth association relationship information above.

In another example, the first association relationship information is different from the fifth association relationship information described above. For example, based on the similarity between any two first vectors in the multiple first vectors, it can be determined whether there is an association relationship between any two first vectors. The greater the similarity, the greater the association; the smaller the similarity, the smaller the association. Then the a priori hypothesis corresponding to the first association relationship information may be that when the similarity exceeds the preset value, it can be considered that there is an association relationship between any two first vectors; when the similarity does not exceed the preset value In the case of the value, it can be considered that there is no association relationship between any two first vectors.

The first association relationship information can be reflected through the graph model. As shown in Figure 2, node 1, node 2, and node 3 may correspond to first vector 1, first vector 2, and first vector 3, respectively. There is an edge connected between node 1 and node 2, so there is an association relationship between the first vector 1 and the first vector 2; there is an edge connected between node 2 and node 3, so the first vector 2 and the first vector There is an association relationship between vectors 3; there is no edge connected between node 1 and node 3, so there is no association relationship between the first vector 1 and the first vector 3.

Optionally, the first association relationship information includes a second association relationship matrix, and a vector in the first dimension in the second association relationship matrix includes a plurality of elements corresponding to the plurality of first vectors one-to-one, so The vector in the second dimension in the second correlation matrix includes multiple elements corresponding to the multiple first vectors one-to-one, wherein any element in the second correlation matrix is used to indicate any Whether the vector corresponding to the element in the first dimension and the vector corresponding to any element in the second dimension has an association relationship that satisfies the a priori hypothesis.

Assuming that the second correlation matrix is Q,

Among them, Q is a matrix of l×l, the i-th column corresponds to the first vector i, the j-th row corresponds to the first vector j, and the elements q _{i,j in} the i-th column and j-th row represent the first vector i and the first vector Whether there is an association relationship between j that satisfies the prior hypothesis. When there is an association relationship between the first vector i and the first vector j _{, the} value of the element q _{i,j in} the i-th column and the j-th row can be 1, and there is no association relationship between the first vector i and the first vector j , The value of element q _{i,j in} the i-th column and j-th row can be 0. Or, when there is an association relationship between the first vector i and the first vector j _{, the} value of the element q _{i,j in} the i-th column and the j-th row can be 0, and there is no relationship between the first vector i and the first vector j For the association relationship, the value of element q _{i, j in} the i-th column and j-th row can be 1.

In an example, the matrix Q ^T obtained after the matrix Q is converted to rank is the same as the matrix Q. In other words, q _i,j =q _j,i . At this time, the association relationship between the first vector i and the first vector j may be non-directional.

In an example, the matrix Q ^T obtained after the matrix Q is converted to rank is different from the matrix Q. In other words, q _i,j ≠q _j,i . At this time, the association relationship between the first vector i and the first vector j is directional. For example, q _i,j indicates that there is an association relationship between the first vector i and the first vector j from the first vector i to the first vector j, and q _j,i indicates that there is a relationship between the first vector i and the first vector j The first vector j points to the association relationship of the first vector i. Or, q _i,j indicates that there is an association relationship between the first vector i and the first vector j from the first vector j to the first vector i, and q _j,i indicates that there is a relationship between the first vector i and the first vector j The first vector i points to the association relationship of the first vector j.

In order to avoid calculation difficulties caused by the excessively large number of matrices, the second correlation matrix can be compressed to obtain a matrix with a smaller dimension.

In an example, suppose that the second correlation matrix Q is an l×l matrix, and the values of all elements on the second correlation matrix Q whose diagonal elements are separated from the second correlation matrix Q by more than l'elements All 0 or all 1, l'<l, then can be divided into several small matrices, the maximum number of rows of the small matrix is l', and the maximum number of columns of the small matrix is l'. This process can also be referred to as the sparseness of the second correlation matrix Q.

In an example, assuming that the second correlation matrix Q cannot be sparsed, the second correlation matrix Q can be compressed according to the spectral clustering method.

It should be understood that the a priori hypothesis can indicate a forward association relationship or a reverse association relationship. For example, in general, the shorter the picture frame interval, the more relevant the content in the picture. Therefore, when the a priori hypothesis indicates that there is an association relationship between picture frames within 8s, it can be understood that the a priori hypothesis indicates a kind of Forward association relationship; when the a priori hypothesis indicates that there is an association relationship between picture frames other than 8s, it can be understood that the a priori hypothesis indicates a reverse association relationship.

504. Input the multiple first vectors and the first association relationship information into a second neural network model to obtain a processing result for the first to-be-processed data, where the first to-be-processed data is the multiple to-be-processed Any data in the data.

That is to say, the output result of the first neural network model and the correlation relationship within the output result are input into the second neural network model. Inputting multiple first vectors into the second neural network model can be understood as inputting the characteristic representations of multiple data to be processed into the second neural network model. Inputting the first association relationship information into the second neural network model can be understood as inputting the information about whether there is an influence between any two of the first vectors in the second neural network model. The multiple first vectors can be understood as nodes in the graph model, and the first association relationship information can be used to indicate whether there are edges between nodes. Therefore, the second neural network model may be a graph neural network model.

The second neural network model processes multiple first vectors and the first association relationship information, which can be based on the weight parameters in the second neural network model to determine whether any two first vectors have an impact and the specific impact What is the degree to obtain the processing result of the first data to be processed. The processing result of the first to-be-processed data may be a characteristic representation of the first to-be-processed data, or may be a recognition result of the first to-be-processed data. The processing result of the first data to be processed may be a vector.

Suppose that multiple first vectors are l first vectors, which are represented by x ₁ ,..., x _{l respectively} . Among them, l≤i≤l, l≤t≤s,

Then, by combining the multiple first vectors, a matrix X, X={x ₁ ,..., x _i ,..., x _l } can be obtained. It is assumed that the first association relationship information is the second association relationship matrix Q mentioned above. ,

Suppose first that h weight matrices W ₁ , W ₂ , ..., W _h to be trained. The dimensions of W ₁ , W ₂ , ..., W _h are all s*s _h . Means _{_{W 1, W 2, ...,}} W h contains s * s _h a weight parameter. s _h = s/h, where h is used to represent the number of heads of the graph attention neural network (the number of heads can also be called the number of slices). s _h commonly known as single head dimensions.

At this time, U ₁ =X·W ₁ , U ₂ =X·W ₂ ,..., U _h =X·W _h are calculated respectively. Obviously, the dimensions of U ₁ , U ₂ , ..., U _h are all l*s _{h at this time} .

Then calculate V _i,j =U _i ·U _j ^T , i≠j, 1≤i≤h, and 1≤j≤h. At this time _{, the} dimensions of _{Vi,j are} already l*l. Then apply the Softmax function to each row of _Vi,j to find the normalized probability, and get R _i,j . R _i,j is still an l*l matrix, this matrix can be understood as the mutual attention intensity matrix between each point.

After that, R _i,j and Q are multiplied by matrix elements to obtain E _i,j after the Q relation mask. E _i,j can be understood as filtering out related points according to the edge relationship, keeping the attention between them, and not keeping the attention of irrelevant points. This matrix contains a large number of interrelated information of nodes, so the information content is relatively rich. Then use E _i,j ·U _i to get the final expression U _inew after each point is updated by other point information. The dimension of U _inew is l*s _h .

Finally, U _1new , …, U _inew , …, U _{hnew are} spliced together to obtain _X'matrix , X'={U _1new ,…,U _inew ,…,U _hnew }, and the dimension of _X'is l*s. It can be seen that X'contains the correlation information between nodes and the weight parameters.

The above process is the data processing process of a layer network. If the depth of the graph attention neural network model is h', that is, the h'layer network is included, then the X'output by the current layer can be input to the next layer of the network, that is, the X'output by the current layer is regarded as the X of the next layer of network , Carry out the same or similar data processing process as above.

It can be seen that, compared with X, X'has the same matrix size, but each element in X'contains information about one or more elements in X. By integrating data with association relationships, the second neural network model can obtain more information when recognizing a certain feature, and improve the recognition accuracy. Perform matrix operations on the matrix X'and the weight parameter matrix to obtain the processing result of the first data to be processed.

In an example, the plurality of to-be-processed data includes first to-be-processed data, and the first to-be-processed data may be the target data mentioned above, and the plurality of to-be-processed data further includes one associated with the first to-be-processed data. Or multiple associated data, the second neural network model may combine the impact of the associated data on the first data to be processed according to the first association relationship information, so as to obtain a processing result corresponding to the first data to be processed. In other words, the second neural network model not only performs feature extraction on the first to-be-processed data, but also performs feature extraction on other to-be-processed data related to the first to-be-processed data, thus expanding the data input in the prediction process Quantities help improve the accuracy of recognition.

In an example, the plurality of to-be-processed data includes first to-be-processed data, the first to-be-processed data may correspond to a target vector, and the plurality of first vectors may further include one or more associated vectors associated with the target vector, The plurality of data to be processed includes the data to be processed in a one-to-one correspondence with the one or more associated vectors. The second neural network model may combine the influence of the correlation vector on the target vector according to the first correlation information, so as to obtain the processing result corresponding to the first data to be processed. In other words, the second neural network model not only performs feature extraction on the target vector, but also performs feature extraction on the associated vector that has an associated relationship with the target vector, thus expanding the amount of data processing in the prediction process and helping to improve recognition accuracy. rate.

In addition, the second neural network model may output multiple processing results corresponding to the multiple data to be processed in a one-to-one correspondence. That is, the second neural network model synthesizes the multiple first vectors and the association relationship between each first vector, and outputs multiple processing results corresponding to the multiple data to be processed one-to-one.

Assuming a scene, there is a first correlation between the first vector A and the first vector B, and there is a second correlation between the first vector A and the first vector C, then the closeness of the correlation between the two correlations can be The same can be different. For example, two sentences in the same paragraph that are far apart are closely related to each other, and two sentences that are closer to each other in the same paragraph are closely related to each other. For another example, two frames with a longer interval have a lower degree of correlation, and two frames with a shorter interval have a higher degree of correlation. In order to express the degree of closeness of the two associations, there can be multiple expressions.

In an example, the first association relationship information is a matrix, and the numerical value of the elements in the matrix is used to indicate the closeness of the association relationship. The larger the value, the tighter the association relationship. However, determining the specific size of the value often introduces redundant artificial settings, or will increase the difficulty of training the neural network model.

In an example, when there are two types of first vector groups in the first association relationship information that have a close association relationship and a distant association relationship, the second association relationship information can be established, and the second association relationship information is used to indicate the association relationship. Tight first vector group. That is to say, the degree of influence between the two first vectors that are closely related can be enhanced through the second related relationship information.

Optionally, the first association relationship information is used to indicate N of the first vector groups, where N is an integer greater than 1, and when the plurality of first vectors and the first association relationship information are input The second neural network model, before obtaining the processing result for the first to-be-processed data, the method further includes: obtaining second association relationship information, where the second association relationship information is used to indicate n second vector groups, The n second vector groups belong to the N first vector groups, n is less than N, and n is a positive integer; said inputting the plurality of first vectors and the first association relationship information into the second neural network model , Obtaining a processing result for the first to-be-processed data includes: inputting the plurality of first vectors, the first association relationship information, and the second association relationship information into the second neural network model to obtain The processing result of the first data to be processed.

The information indicated in the second association relationship information is included in the first association relationship information. That is to say, there must be an association relationship between the two first vectors in each second vector group that satisfies the a priori hypothesis.

Assuming that the first association relationship information is the same or substantially the same as the fifth association relationship information above, the first association relationship information can reflect the association relationship between multiple data to be processed, and the second association relationship information can reflect the multiple data to be processed. Whether there is a close relationship between the processed data.

Taking text data as an example, when the a priori assumption is that there are associations between multiple sentences belonging to the same paragraph, then the first association relationship information can indicate that there is an association between different sentences in the same paragraph, and the second association relationship information can indicate the same paragraph. There are close associations between adjacent sentences within.

Taking picture data as an example, when the a priori assumption is that there is an association between two frames with an interval less than 8s, then the first association information can indicate that there is an association between two frames with an interval less than 8s, and the second association information can indicate There is a close correlation between two frames with an interval of less than 2s.

Taking video data as an example, when the a priori assumption is that there is an association between two videos with a minimum interval of less than 8s, then the first association information can indicate that there is an association between two videos with a minimum interval of less than 8s, and the second association information It can indicate that there is a close correlation between two videos whose minimum interval is less than 2s.

Taking audio data as an example, when the a priori assumption is that there is an association between two pieces of audio with a minimum interval of less than 8s, then the first association information can indicate that there is an association between two pieces of audio with a minimum interval of less than 8s, and the second association information It can indicate that there is a close correlation between two audio segments with a minimum interval of less than 2s.

Assuming that the first association relationship information is different from the fifth association relationship information above, the first association relationship information can reflect the similarity between the multiple first vectors, and the second association relationship information can reflect the multiple first vectors. Two first vectors with higher similarity.

For example, when the a priori assumption is that the similarity between two first vectors exceeds a preset value, the first association relationship information may indicate that there is an association between two first vectors whose similarity exceeds the preset value 1. The second association relationship information may indicate that there is an association between two first vectors whose similarity exceeds the preset value 2, and the preset value 2 is greater than the preset value 1.

It should be understood that, similar to the first association relationship information, the second association relationship information may include a matrix for representing n second vector groups.

It should be understood that the first neural network model and the second neural network model may be two sub-models in one neural network model.

The method of training the second neural network model and obtaining the weight parameters of the second neural network model will be described in detail below with reference to FIG. 7. The method 600 may be performed by the training device 120 as shown in FIG. 3.

601: Obtain multiple data to be trained.

The data to be trained can be understood as the data that will be input to the neural network model and used to train the neural network model. Some or all of the multiple data to be trained have labels. The neural network model can process the training data to obtain the data processing result. By calculating the distance between the label and the data processing result, the weight parameter of the neural network model can be modified. The distance between the data processing result and the label can be understood as the degree of similarity between the data processing result and the label. The specific calculation method of the information distance can be cross entropy, KL divergence, JS divergence, etc.

The data to be trained can be text data, image data, video data, audio data, etc., such as a text file, a paragraph of text in a text file, a picture file, an image block in a picture file, a frame in a video file, A video file, a video in a video file, an audio file, and an audio in an audio file. Multiple data to be trained can be multiple text files, multiple texts in a text file, multiple picture files, multiple image blocks in a picture file, multiple frames in a video file, multiple video files, Multiple pieces of video in one video file, multiple audio files, multiple pieces of audio in one audio file, etc. This application does not limit the type of training data.

There are many ways to obtain the data to be trained. In an example, the multiple data to be trained are stored in the database, so the device executing the method 600 can directly retrieve the multiple data to be trained from the database. In an example, if a camera is provided on the device that executes the method 600, then the multiple data to be trained can be obtained by using a camera shooting method. In an example, the cloud device stores the plurality of data to be trained, so the device executing the method 600 can receive the plurality of data to be trained sent by the cloud device through the communication network.

602. Use the first neural network model to process the plurality of to-be-trained data to obtain a plurality of fourth vectors that correspond to the plurality of to-be-trained data one-to-one.

Wherein, the plurality of data to be trained may be general data.

Input the data 1 to be trained into the first neural network model to obtain the fourth vector 1. Input the training data 2 into the first neural network model to obtain the fourth vector 2.

The third association relationship information is used to indicate the association relationship between the data. It is assumed that the third vector group indicated by the third association relationship information includes (fourth vector 1, fourth vector 2), and there is an association relationship between the fourth vector 1 and the fourth vector 2.

The fourth vector 1 and the third association relationship information are input into the second neural network model to obtain the first processing result 1. Therefore, at least the influence and contribution of the training data 2 to the training data 1 can be obtained.

That is to say, input multiple data to be trained into the first neural network model, and use the first neural network model to perform, for example, feature screening (filter out useful features) and feature fusion (combine multiple features) on multiple data to be trained. Waiting for processing operations, and output a plurality of fourth vectors corresponding to the plurality of data to be trained one-to-one. Taking the convolutional neural network shown in Figure 1 as an example, to process multiple data to be trained, the multiple data to be trained can be input from the input layer, and pass through hidden layers such as convolutional layer and/or pooling layer. Data processing is performed, and a plurality of fourth vectors corresponding to the plurality of data to be trained are outputted from the output layer of the first neural network model. Among them, the fourth vector can be a number or a vector containing multiple numbers.

In an example, the first neural network model is a neural network model to be trained. The first neural network model can perform data processing operations such as feature screening and feature fusion on the multiple data to be trained to obtain feature vectors. Perform a matrix operation on the feature vector and the weight matrix containing the weight parameter to obtain a plurality of fourth vectors corresponding to the plurality of data to be trained. The multiple fourth vectors are used to modify the weight parameters of the first neural network model. For example, the distance between the fourth vector and the tags of the multiple data to be trained can be calculated, and the loss function can be combined to modify the weight parameter of the first neural network model. Weight parameter.

In one example, the first neural network model is a trained neural network model.

In order to reduce the dependence of the neural network model on training data, the first neural network model can be trained using general data. The so-called general data can be data that is not affected by the scene, or data that has low dependence on the scene. For example, the first neural network model is used to identify character features in images, and its training data set can include various possible scenes, such as street scenes, conference scenes, car scenes, rural scenes, Asian scenes, African scenes, European and American scenes Wait. Then the plurality of data to be trained may be data applied in a specific scene. That is to say, the first neural network model that can handle general data is migrated to a special scene, and the second neural network model that can handle the special scene is obtained through the method of neural network model training.

The process of training the first neural network model can be to input general data into the first neural network model, and the first neural network model can perform data processing operations such as feature screening and feature fusion on the general data to obtain feature vectors. Perform matrix operations on the feature vector and the weight matrix containing the weight parameters to obtain the data training result corresponding to the general data. Then, the distance between the data training result and the label of the general data is calculated, and the weight parameter of the first neural network model is corrected. The distance between the data training result and the label of the general data can be understood as the degree of similarity between the data training result and the label of the general data. The specific calculation method of the information distance can be cross entropy, KL divergence, JS divergence, etc.

In particular, the first neural network model may be a traditional convolutional neural network model. The output layer of the traditional convolutional neural network is a fully connected layer, which is sometimes called a classifier. In other words, the traditional convolutional neural network model can input the recognition result of the data to be trained into the loss function through the fully connected layer. For example, if the data to be trained is an image, the fully connected layer of the traditional convolutional neural network model can directly output the recognition results of whether there is a person in the image, and the person is male or female. The recognition result can often only represent the probability that the data to be trained belongs to a certain feature.

In particular, the first neural network model may also be a special convolutional neural network model that does not include a fully connected layer, and the calculation result of the convolutional layer or the pooling layer may be input to the loss function. In other words, the first neural network model can input the processing result that belongs to the intermediate calculation result in the traditional convolutional neural network model into the loss function. For simplicity of description, the processing result of the input loss function of this special convolutional neural network model is called the intermediate calculation result. Generally, the intermediate calculation result can be used to characterize part or all of the information of the data to be trained. In other words, the intermediate calculation result usually contains more information content than the recognition result.

Optionally, the using the first neural network model to process the plurality of data to be trained includes: using the first neural network model to process the plurality of data to be trained and the sixth association relationship information, The sixth association relationship information is used to indicate at least one to-be-trained data group, and each to-be-trained data group includes two to-be-trained data that satisfy a priori hypothesis.

The to-be-trained data group contains two pieces of to-be-trained data that have an association relationship. That is, there is an association relationship between the two to-be-trained data in the to-be-trained data group that satisfies a priori hypothesis. For example, if the data group to be trained is (data to be trained 1, data to be trained 2), then there is an association relationship between the data to be trained 1 and the data to be trained 2 that satisfies the prior hypothesis. That is to say, the plurality of data to be trained and the sixth association relationship information reflecting the association relationship between the plurality of data to be trained are input into the first neural network model, and the first neural network model can determine the data according to the sixth association relationship information Whether there is an influence between the data and the data, and the weight parameter in the first neural network model reflects the degree of influence between the data and the data, so as to obtain a plurality of first vectors that can reflect the relevance of the data. The multiple to-be-trained data correspond one-to-one.

Taking text data as an example, the multiple pieces of data to be trained may be multiple paragraphs of text, and one piece of text may include multiple sentences. Normally, different paragraphs of text express different topics. Therefore, multiple sentences in a paragraph are more related, and multiple sentences belonging to different paragraphs are weak or non-relevant. Then there can be a priori hypothesis, such as a correlation between multiple sentences belonging to the same paragraph.

Taking picture data as an example, the multiple data to be trained may be multiple frames of pictures. Normally, as time goes by, the longer the interval between two frames, the smaller the correlation between the two frames; the shorter the interval between the two frames, the greater the correlation between the two frames Big. Then there can be a priori assumption, such as a correlation between two frames whose interval is less than a preset threshold. The preset threshold may be 8s, for example.

Taking video data as an example, the multiple pieces of to-be-trained data may be multiple pieces of videos. As time moves, the longer the interval between the two pieces of video, the smaller the correlation between the two pieces of videos; The shorter the video interval, the greater the correlation between the two videos. Then there may be a priori assumption, such as a correlation between two videos whose minimum interval length is less than a preset threshold. The preset threshold may be 8s, for example.

Taking audio data as an example, the multiple pieces of to-be-trained data may be multiple pieces of audio, where as time moves, the longer the interval between the two pieces of audio, the smaller the correlation between the two pieces of audio; The shorter the audio interval, the greater the correlation between the two audio segments. Then there may be a priori assumption, such as a correlation between two audio segments whose minimum interval duration is less than a preset threshold. The preset threshold may be 8s, for example.

The sixth association relationship information may be a matrix. Compared with other information types, matrix operations are more convenient.

Optionally, the sixth association relationship information includes a third association relationship matrix, and a vector located in the first dimension in the third association relationship matrix includes a plurality of elements corresponding to the plurality of data to be trained one-to-one, so The vector in the second dimension in the third correlation matrix includes multiple elements corresponding to the plurality of data to be trained one-to-one, wherein any element in the third correlation matrix is used to indicate any Whether the vector corresponding to the element in the first dimension and the vector corresponding to any element in the second dimension has an association relationship that satisfies the a priori hypothesis.

Suppose the third correlation matrix is A,

Among them, A is a k×k matrix, the i-th column corresponds to the first vector i, the j-th row corresponds to the first vector j, and the elements a _{i,j in} the i-th column and j-th row represent the first vector i and the first vector Whether there is an association relationship between j that satisfies the prior hypothesis. When there is an association relationship between the first vector i and the first vector j _{, the} value of the elements a _{i, j in} the i-th column and the j-th row can be 1, and there is no association relationship between the first vector i and the first vector j , The value of element a _{i, j in} the i-th column and j-th row can be 0. Or, when there is an association relationship between the first vector i and the first vector j _{, the} value of the elements a _{i, j in} the i-th column and the j-th row can be 0, and there is no relationship between the first vector i and the first vector j For the association relationship, the value of the elements a _{i and j in} the i-th column and j-th row can be 1.

In an example, the matrix A ^T obtained after the matrix A is converted to the rank is the same as the matrix A. In other words, a _i,j =a _j,i . At this time, the association relationship between the first vector i and the first vector j may be non-directional.

In an example, the matrix A ^T obtained after the matrix A is converted to rank is different from the matrix A. In other words, a _i,j ≠a _j,i . At this time, the association relationship between the first vector i and the first vector j is directional. For example, a _i,j indicates that there is an association relationship between the first vector i and the first vector j, and a _j,i indicates that there is a relationship between the first vector i and the first vector j. The first vector j points to the association relationship of the first vector i. Or, a _i,j indicates that there is an association relationship between the first vector i and the first vector j from the first vector j to the first vector i, and a _j,i indicates that there is a relationship between the first vector i and the first vector j The first vector i points to the association relationship of the first vector j.

603. Acquire third association relationship information, where the third association relationship information is used to indicate at least one third vector group, and each third vector group includes two fourth vectors that satisfy the a priori hypothesis.

In other words, the third association relationship information reflects whether there is an association relationship between the multiple fourth vectors. The third vector group contains two fourth vectors that have an association relationship. That is, there is an association relationship between the two fourth vectors in the third vector group that satisfies the a priori assumption. For example, if the third vector group indicates (fourth vector 1, fourth vector 2), then there is an association relationship between the fourth vector 1 and the fourth vector 2 that satisfies the a priori assumption. The third association relationship information reflects whether there is an influence between the multiple fourth vectors, so that the data processing result that can reflect the data association can be obtained according to the third association relationship information. It should be understood that the fourth vector may have an association relationship with itself.

In an example, since the multiple fourth vectors are in one-to-one correspondence with the multiple to-be-trained data, the third association relationship information may be determined according to the association relationship between the multiple to-be-trained data. That is to say, the third correlation information is the same or substantially the same as the sixth correlation information above.

In an example, the third association relationship information is different from the above sixth association relationship information. For example, according to the similarity between any two fourth vectors in the multiple fourth vectors, it can be determined whether there is an association relationship between any two fourth vectors. The greater the similarity, the greater the association; the smaller the similarity, the smaller the association. Then the a priori hypothesis corresponding to the third association relationship information can be that when the similarity exceeds the preset value, it can be considered that there is an association relationship between any two fourth vectors; when the similarity does not exceed the preset value In the case of the value, it can be considered that there is no correlation between any two fourth vectors.

The third association relationship information can be reflected through the graph model. As shown in Fig. 2, node 1, node 2, and node 3 may correspond to fourth vector 1, fourth vector 2, and fourth vector 3, respectively. There is an edge connected between node 1 and node 2, so there is an association between the fourth vector 1 and the fourth vector 2; there is an edge connected between node 2 and node 3, so the fourth vector 2 and the fourth vector There is an association relationship between vector 3; there is no edge connected between node 1 and node 3, so there is no association relationship between fourth vector 1 and fourth vector 3.

Optionally, the third association relationship information includes a fourth association relationship matrix, and a vector located in the first dimension in the fourth association relationship matrix includes a plurality of elements corresponding to the plurality of fourth vectors one-to-one, so The vector in the second dimension in the fourth correlation matrix includes multiple elements corresponding to the multiple fourth vectors one-to-one, wherein any element in the fourth correlation matrix is used to indicate any Whether the vector corresponding to the element in the first dimension and the vector corresponding to any element in the second dimension has an association relationship that satisfies the a priori hypothesis.

Assuming that the fourth correlation matrix is B,

Among them, B is a matrix of l×l, the i-th column corresponds to the fourth vector i, the j-th row corresponds to the fourth vector j, and the elements bi _{,j in} the i-th column and j-th row represent the fourth vector i and the fourth vector Whether there is an association relationship between j that satisfies the prior hypothesis. When there is an association relationship between the fourth vector i and the fourth vector j _{, the} value of the element b _{i,j in} the i-th column and the j-th row can be 1, and there is no association relationship between the fourth vector i and the fourth vector j , The value of element bi _{,j in} the i-th column and j-th row can be 0. Or, when there is an association relationship between the fourth vector i and the fourth vector j _{, the} value of the element b _{i,j in} the i-th column and the j-th row can be 0, and there is no relationship between the fourth vector i and the fourth vector j For the association relationship, the value of element bi _{, j in} the i-th column and j-th row can be 1.

In an example, the matrix B ^T obtained after the matrix B is transformed into the rank is the same as the matrix B. In other words, b _i,j = b _j,i . At this time, the association relationship between the fourth vector i and the fourth vector j may be non-directional.

In an example, the matrix B ^T obtained after the matrix B is converted to rank is different from the matrix B. In other words, b _i,j ≠b _j,i . At this time, the correlation between the fourth vector i and the fourth vector j is directional. For example, b _i,j indicates that there is an association relationship between the fourth vector i and the fourth vector j between the fourth vector i and the fourth vector j, and b _j,i indicates that there is a relationship between the fourth vector i and the fourth vector j. The fourth vector j points to the association relationship of the fourth vector i. Or, b _{i, j} indicates that there is an association relationship between the fourth vector i and the fourth vector j from the fourth vector j to the fourth vector i, and b _{j, i} indicates that there is a relationship between the fourth vector i and the fourth vector j The fourth vector i points to the association relationship of the fourth vector j.

In order to avoid calculation difficulties caused by the excessively large number of matrices, the fourth correlation matrix can be compressed to obtain a matrix with a smaller dimension.

In an example, suppose that the fourth associative relationship matrix B is an l×l matrix, and the values of all elements on the fourth associative relationship matrix B and the fourth associative relationship matrix B whose diagonal elements are separated by more than l'elements All 0 or all 1, l'<l, then can be divided into several small matrices, the maximum number of rows of the small matrix is l', and the maximum number of columns of the small matrix is l'. This process can also be referred to as the sparseness of the fourth correlation matrix B.

In an example, assuming that the fourth correlation matrix B cannot be sparsed, the fourth correlation matrix B can be compressed according to the spectral clustering method.

604. Input the plurality of fourth vectors and the third association relationship information into the second neural network model to obtain a first processing result for the first data to be trained, and the first data to be trained is the For any one of the multiple data to be trained, the first processing result is used to modify the weight parameter of the second neural network model.

That is to say, the output result of the first neural network model and the correlation relationship within the output result are input into the second neural network model. Inputting multiple fourth vectors into the second neural network model can be understood as inputting multiple feature representations of the data to be trained into the second neural network model. Inputting the third association relationship information into the second neural network model can be understood as inputting information about whether there is an influence between any two fourth vectors among the plurality of fourth vectors into the second neural network model. Multiple fourth vectors can be understood as nodes in the graph model, and the third association relationship information can be used to indicate whether there are edges between nodes. Therefore, the second neural network model may be a graph neural network model.

The second neural network model processes multiple fourth vectors and the third association relationship information, which can be based on the weight parameters in the second neural network model to determine whether any two fourth vectors have influence and the specific influence What is the degree to obtain the processing result of the first data to be trained. The processing result of the first to-be-trained data may be a feature representation of the first to-be-trained data, or may be a recognition result of the first to-be-trained data. The processing result of the first data to be trained may be a vector.

Suppose that multiple fourth vectors are l fourth vectors, which are represented by y ₁ ,..., y _{l respectively} . Among them, l≤i≤l, l≤t≤s,

Then, by combining the multiple fourth vectors, a matrix Y, Y={y ₁ ,..., y _i ,..., y _l } can be obtained. Assume that the third correlation information is the fourth correlation matrix Q mentioned above.

First assume two weight matrices to be trained W ₁ , W ₂ , ..., W _h . The dimensions of W ₁ , W ₂ , ..., W _h are all s*s _h . Means _{_{W 1, W 2, ...,}} W h contains s * s _h a weight parameter. s _h = s/h, where h is used to represent the number of heads of the graph attention neural network (the number of heads can also be called the number of slices). s _h commonly known as single head dimensions.

At this time, U ₁ =Y·W ₁ , U ₂ =Y·W ₂ ,..., U _h =Y·W _h are calculated respectively. Obviously, the dimensions of U ₁ , U ₂ , ..., U _h are all l*s _{h at this time} .

Finally, U _1new , …, U _inew , …, U _{hnew are} spliced together to obtain _Y'matrix , Y'={U _1new ,…, U _inew ,…, U _hnew }, and the dimension of _Y'is l*s. It can be seen that Y'contains the correlation information between nodes and the weight parameters.

The above process is the data processing process of a one-layer network. If the depth of the graph attention neural network model is h', which includes the h'-layer network, the Y'output from the current layer can be input to the next layer of the network, that is, the current layer output The Y'is regarded as the Y of the next layer of network, and the data processing process is the same or similar to the above.

It can be seen that compared with Y, Y'has the same matrix size, but each element in Y'contains information about one or more elements in Y. By integrating data with association relationships, the second neural network model can obtain more information when recognizing a certain feature, and improve the recognition accuracy.

In an example, the plurality of data to be trained includes first data to be trained, the plurality of data to be trained further includes one or more associated data associated with the first data to be trained, and the second neural network model may be based on the first Three association relationship information, combined with the influence of the associated data on the first data to be trained, so as to obtain the processing result corresponding to the first data to be trained. In other words, in addition to feature extraction of the first data to be trained, the second neural network model also extracts features of other data to be trained that has an association relationship with the first data to be trained, thus expanding the data input in the prediction process Quantities help improve the accuracy of recognition.

In an example, the plurality of data to be trained includes first data to be trained, the first data to be trained may correspond to a target vector, and the plurality of fourth vectors further include one or more associated vectors associated with the target vector, The plurality of data to be trained includes the data to be trained in a one-to-one correspondence with the one or more associated vectors. The second neural network model can combine the influence of the correlation vector on the target vector according to the third correlation information, so as to obtain the processing result corresponding to the first data to be trained. In other words, the second neural network model not only performs feature extraction on the target vector, but also performs feature extraction on the associated vector that has an associated relationship with the target vector, thus expanding the amount of data processing in the prediction process and helping to improve recognition accuracy. rate.

In addition, the second neural network model may output multiple processing results corresponding to the multiple data to be trained in a one-to-one correspondence. That is, the second neural network model synthesizes multiple fourth vectors and the association relationship between each fourth vector, and outputs multiple processing results corresponding to the multiple data to be trained in a one-to-one correspondence.

Assuming a scene, there is a first association between the fourth vector A and the fourth vector B, and a second association exists between the fourth vector A and the fourth vector C, then the closeness of the association between the two associations can be The same can be different. For example, two sentences in the same paragraph that are far apart are closely related to each other, and two sentences that are closer to each other in the same paragraph are closely related to each other. For another example, two frames with a longer interval have a lower degree of correlation, and two frames with a shorter interval have a higher degree of correlation. In order to express the degree of closeness of the two associations, there can be multiple expressions.

In an example, the third association relationship information is a matrix, and the numerical value of the elements in the matrix is used to indicate the closeness of the association relationship. The larger the value, the tighter the association relationship. However, determining the specific size of the value often introduces redundant artificial settings, or will increase the difficulty of training the neural network model.

In an example, when there are two types of fourth vector groups in the third association relationship information, which have a close association relationship and a distant association relationship, the fourth association relationship information can be established, and the fourth association relationship information is used to indicate the association relationship. The tight fourth vector group. In other words, the degree of influence between the two fourth vectors with a close relationship can be strengthened by the fourth relationship information.

Optionally, the third association relationship information is used to indicate M fourth vector groups, where M is an integer greater than 1, and the plurality of fourth vectors and the third association relationship information are input into the Before the second neural network model obtains the first processing result for the first data to be trained, the method further includes: obtaining fourth association relationship information, where the fourth association relationship information is used to indicate m fifth vector groups, The m fifth vector groups belong to the M fourth vector groups, m is less than M, and m is a positive integer; and the plurality of fourth vectors and the third association relationship information are input into the first A second neural network model to obtain the first processing result for the first data to be trained, including: inputting the plurality of fourth vectors, the third association relationship information, and the fourth association relationship information into the second neural network model , To obtain the first processing result.

The information indicated in the fourth association relationship information is included in the third association relationship information. In other words, there must be an association relationship between the two fourth vectors in each fourth vector group that satisfies the a priori hypothesis.

Assuming that the third correlation information is the same or substantially the same as the sixth correlation information above, the third correlation information can reflect the correlation between multiple data to be trained, and the fourth correlation information can reflect multiple data to be trained. Whether there is a close relationship between the training data.

Taking text data as an example, when the a priori assumption is that there is an association between multiple sentences belonging to the same paragraph, the third association relationship information can indicate that there is an association between different sentences in the same paragraph, and the fourth association relationship information can indicate the same paragraph. There are close associations between adjacent sentences within.

Taking picture data as an example, when the a priori assumption is that there is an association between two frames with an interval of less than 8s, the third association information can indicate that there is an association between two frames with an interval of less than 8s, and the fourth association information can indicate There is a close correlation between two frames with an interval of less than 2s.

Taking video data as an example, when the a priori assumption is that there is an association between two videos with a minimum interval of less than 8s, the third association information can indicate that there is an association between two videos with a minimum interval of less than 8s, and the fourth association information It can indicate that there is a close correlation between two videos whose minimum interval is less than 2s.

Taking audio data as an example, when the a priori assumption is that there is an association between two pieces of audio with a minimum interval of less than 8s, the third association relationship information can indicate that there is an association between two pieces of audio with a minimum interval of less than 8s, and the fourth association relationship information It can indicate that there is a close correlation between two audio segments with a minimum interval of less than 2s.

Assuming that the third association relationship information is different from the sixth association relationship information above, the third association relationship information can reflect the similarity between multiple fourth vectors, and the fourth association relationship information can reflect the multiple fourth vectors. Two fourth vectors with higher similarity.

For example, when the a priori assumption is that the similarity between two fourth vectors exceeds a preset value, the third association relationship information may indicate that there is an association between two fourth vectors whose similarity exceeds the preset value 1. The four-association relationship information may indicate that there is an association between two fourth vectors whose similarity exceeds the preset value 2, and the preset value 2 is greater than the preset value 1.

It should be understood that, similar to the third association relationship information, the fourth association relationship information may include a matrix for representing m fourth vector groups.

After obtaining the first processing result for the first to-be-trained data, the weight parameter of the second neural network model can be corrected through the loss function.

In an example, the weight parameter of the second neural network model can be modified by using a loss function according to the distance between the label of the first data to be trained and the first processing result. For example, when the distance between the label of the first data to be trained and the first processing result is relatively close (that is, the degree of similarity is high), it means that the weight parameter is more appropriate, and the correction range of the weight parameter is smaller; When the distance between the label and the first processing result is far (that is, the similarity is low), it indicates that the weight parameter is not suitable, and the correction range of the weight parameter can be increased.

In an example, the plurality of fourth vectors and the third association relationship information are input into the second neural network model to obtain the first processing result for the first data to be trained and the data for the second data to be trained The second processing result, the first data to be trained and the second data to be trained are any two data of the plurality of data to be trained, and the difference between the first processing result and the second processing result The similarity of is used to modify the weight parameter of the second neural network model. For example, the similarity between the first processing result and the second processing result is similarity 1, the fourth vector corresponding to the first processing result is fourth vector 1, and the fourth vector corresponding to the second processing result is The fourth vector 2, the similarity between the fourth vector 1 and the fourth vector 2 is similarity 2. When the difference between similarity 1 and similarity 2 is small, the weight parameter is more appropriate, and the correction amplitude of the weight parameter is smaller; when the difference between similarity 1 and similarity 2 is small, the weight parameter is not suitable and can be increased The magnitude of the correction to the weight parameter.

Optionally, the obtaining a first processing result for the first data to be trained includes: obtaining the first processing result and a second processing result for the second data to be trained, a label of the first data to be trained Is the first label, the label of the second data to be trained is the second label, and the first data to be trained and the second data to be trained are any two data of the plurality of data to be trained; The method further includes: matching the similarity between the first label and the second label with the similarity between the first processing result and the second processing result to obtain a matching result, the The matching result is used to modify the weight parameter of the second neural network model.

The sixth association relationship information mentioned above may not include the similarity information between the first label and the second label, that is, the association relationship between the first data to be processed and the second data to be processed may be The similarity between the first label and the second label is irrelevant. The sixth association relationship information mentioned above can associate multiple data that may have associations, and increase the amount of data processed by the second neural network model. The similarity between the first label and the second label is used to evaluate whether the first processing result and the second processing result are accurate.

Taking text data as an example, when the first label is prose and the second label is argumentative, it means that the similarity between the first processing result and the second processing result should be low. When the first processing result is environmental governance and the second processing result is energy supply, the similarity between the first processing result and the second processing result is relatively high, indicating that the weight parameters of the second neural network model are inappropriate, and the loss function can be used Modify the weight parameters of the second neural network model.

Taking picture data as an example, when the first label is rabbit and the second label is rabbit, it means that the similarity between the first processing result and the second processing result should be high. When the first processing result is long ears and the second processing result is short ears, then the similarity between the first processing result and the second processing result is low, indicating that the weight parameters of the second neural network model may not be appropriate. Use the loss function to modify the weight parameters of the second neural network model.

Taking video data as an example, when the first tag is a meeting and the second tag is a vehicle, it means that the similarity between the first processing result and the second processing result should be low. When the first processing result is project investigation and the second processing result is road traffic, then the similarity between the first processing result and the second processing result is low, indicating that the weight parameters of the second neural network model may be appropriate, then The weight parameter of the second neural network model of the loss function has a small correction range.

Taking audio data as an example, when the first label is a bug sound, the second label is also a bug sound, which means that the similarity between the first processing result and the second processing result should be high. When the first processing result is mosquitoes and the second processing result is flies, the similarity between the first processing result and the second processing result is high, indicating that the weight parameters of the second neural network model may be appropriate, so the loss The correction amplitude of the weight parameter of the second neural network model of the function is small.

A possible form of loss function loss is given below.

Among them, y _i 'represents the processing result i for the data i to be trained, y _j ' represents the processing result j for the data j to be trained, z _i represents the label i of the data i to be trained, z _j represents the label of the data j to be trained j. Function _{_{C (y i ', y j}} ') represents the degree of similarity with the processing result of the processing result i j, the function C (z _i, z _j) represents the similarity of labels i and j labels. The matrix D may be a matrix for amplifying the similarity between the processing result i and the processing result j.

For example, there are labels a, b, and c. When the label of the data i to be trained includes label a, but does not include the labels b and c, the label of the data i to be trained can be represented by (1, 0, 0). For example, there are labels a, b, and c. When the label of the data i to be trained includes label b, but does not include the labels a and c, the label of the data i to be trained can be represented by (0, 1, 0). When the label of the data i to be trained includes label a and label c but does not include the label b, the label of the data i to be trained can be represented by (1, 0, 1). When the label of the data i to be trained includes label a, label b, and label c, the label of the data i to be trained can be represented by (1, 1, 1).

Optionally, the plurality of to-be-trained data includes one or more target type data, and each target type data has a label for modifying the weight parameter.

That is, the plurality of data to be trained includes the first type of data and the second type of data. The data to be trained belonging to the first type of data has a label, and the data to be trained belonging to the second type of data does not have a label. Therefore, the weight parameter of the second neural network model can be corrected according to the distance between the processing result of the first type of data and the label of the first type of data. The distance between the processing result of the first type of data and the label of the first type of data can be understood as the degree of similarity between the processing result of the first type of data and the label of the first type of data. The specific calculation method of the information distance can be cross entropy, KL divergence, JS divergence, etc. The second type of data does not have a label, but because there may be an association between the first type of data and the second type of data, the second type of data can be introduced in the process of obtaining the processing result of the first type of data. In other words, the second neural network model may be a semi-supervised model, that is, the plurality of data to be trained may include data without labels. In order to ensure the training reliability of the second neural network model, the proportion of the first type of data in the multiple data to be trained is generally not less than 5%-10%.

Optionally, the first processing result is also used to modify the weight parameter of the first neural network model.

In other words, the first processing result can be used to modify the weight parameter of the second neural network model as well as the weight parameter of the first neural network model.

In an example, the first processing result and the label of the first data to be trained may be input to the loss function of the first neural network model to modify the weight parameter of the first neural network model.

Before inputting a plurality of data to be trained into the first neural network model, the first neural network model may be a neural network model that is not restricted by the scene or is less restricted by the scene. The plurality of to-be-trained data may be data of a certain specific scene. Therefore, the weight parameter of the first neural network model may be modified according to the first processing result, so that the first neural network model can adapt to the special scene.

The following specific examples introduce the effects that the first neural network model and the second neural network model can achieve in training and prediction.

Example one

Get all the pictures taken by all the cameras of a certain company in a certain month, a total of about 100,000 pictures. 90,000 of these 100,000 pictures are input into the first neural network model as multiple data to be trained, and each picture can be a data to be trained. The remaining 10,000 pictures can be used as verification data to verify whether the weight parameters of the second neural network model are appropriate. As an example for ease of description, the 90,000 pictures constitute the training data set, and the 10,000 pictures constitute the verification data set.

Select 10,000 pictures in the training data set as the first type of data with labels, then the remaining 80,000 pictures in the training data set are the second type data without labels. Get the label of the first type of data.

Using the first neural network model to process the training data set, 90,000 fourth vectors corresponding to the training data set are obtained. The first neural network model may be a multiple granularity network (multiple granularity network, MGN) model. The multi-granularity network model is a convolutional neural network model. Each fourth vector may include 1024 elements, and each fourth vector is a feature representation of a picture.

Get a priori hypothesis. The a priori hypothesis can be one or more of the following, for example:

(1) There is an association relationship between two pictures with an interval of less than 8s.

(2) There is an association between two pictures from the same camera.

(3) There is an association relationship between two pictures with an image similarity greater than 50%.

It should be understood that the specific content of the a priori hypothesis is related to the application scenarios of the first neural network model and the second neural network model, and is not limited here.

According to the a priori assumption, the third association relationship information used to indicate the association relationship between 90,000 fourth vectors can be determined.

The 90,000 fourth vectors and the third association relationship information are input into the second neural network model to obtain the processing result for the first type of data. Among them, since the first type of data may be associated with the second type of data, the processing result of the first type of data takes into account the content of the second type of data.

Matching the processing result of the first type of data with the label of the first type of data can modify the parameters of the second neural network model.

Then input the data in the verification data set into the first neural network model to obtain multiple fourth vectors for the verification data set; then input multiple fourth vectors for the verification data set into the second neural network model, and according to a priori assumptions , Input the association relationship between the multiple fourth vectors for the verification data set to the second neural network model to obtain the data processing result for the verification data set. Then, the data processing result is matched with the label of the verification data set to obtain the recognition ability of the first neural network model and the second neural network model. Through practical application, the mean average precision (mAP) is used to score the trained neural network model. Compared with the traditional neural network model, the scoring result can be improved by 4-20 points. In other words, the method for training a neural network model provided in this application can enhance the neural network model.

Example two

Get the Chinese text questions collected by a company's robot customer service in a certain month, a total of about 15,000 Chinese text questions. Among the 15,000 Chinese text questions, 8,000 Chinese text questions are input into the first neural network model as multiple data to be trained, and each Chinese text question can be one data to be trained. The remaining 7,000 Chinese text questions can be used as verification data to verify whether the weight parameters of the second neural network model are appropriate. As an example for ease of description, the 8,000 Chinese text questions constitute a training data set, and the 7,000 Chinese text questions constitute a verification data set.

Selecting 2,000 Chinese text questions in the training data set as the first type of data with labels, then the remaining 6,000 Chinese text questions in the training data set are the second type data without labels. Get the label of the first type of data.

The first neural network model is used to process the training data set, and 8 million fourth vectors corresponding to the training data set are obtained. The first neural network model may be a bidirectional encoder representations from transformer (BERT) model based on a transformer. The BERT model can be a convolutional neural network model. Each fourth vector may include 768 elements, and each fourth vector is a feature representation of a Chinese text question.

(1) There is an association between two Chinese text questions with the same keywords in the text.

(2) There is a correlation between two Chinese text questions with text similarity greater than 50%.

According to a priori hypothesis, the third association relationship information used to indicate the association relationship between 8 million fourth vectors can be determined.

The 8,000 fourth vectors and the third association relationship information are input into the second neural network model to obtain the processing result for the first type of data. Among them, since the first type of data may be associated with the second type of data, the processing result of the first type of data takes into account the content of the second type of data.

Then input the data in the verification data set into the first neural network model to obtain multiple fourth vectors for the verification data set; then input multiple fourth vectors for the verification data set into the second neural network model, and according to a priori assumptions , Input the association relationship between the multiple fourth vectors for the verification data set to the second neural network model to obtain the data processing result for the verification data set. Then, the data processing result is matched with the label of the verification data set to obtain the recognition ability of the first neural network model and the second neural network model. Through practical applications, the mean average precision (mAP) is used to score the trained neural network model. Compared with the traditional neural network model, the scoring result can be improved by 10-15 points. In other words, the method for training a neural network model provided in this application can enhance the neural network model.

FIG. 8 is a schematic diagram of the hardware structure of a data processing device provided by an embodiment of the present application. The data processing device 700 shown in FIG. 8 (the device 700 may specifically be a computer device) includes a memory 701, a processor 702, a communication interface 703, and a bus 704. Among them, the memory 701, the processor 702, and the communication interface 703 realize the communication connection between each other through the bus 704.

The memory 701 may be a read only memory (ROM), a static storage device, a dynamic storage device, or a random access memory (RAM). The memory 701 may store a program. When the program stored in the memory 701 is executed by the processor 702, the processor 702 is configured to execute each step of the data processing method shown in FIG. 6 in the embodiment of the present application. Optionally, the processor 702 is further configured to execute each step of the method for training a neural network model shown in FIG. 7 in the embodiment of the present application.

The processor 702 may adopt a general central processing unit (CPU), a microprocessor, an application specific integrated circuit (ASIC), a graphics processing unit (GPU), or one or more The integrated circuit is used to execute related programs to implement the data processing method shown in FIG. 6 in the embodiment of the present application. Optionally, the processor 702 may adopt a general-purpose central processing unit (central processing unit, CPU), microprocessor, application specific integrated circuit (application specific integrated circuit, ASIC), graphics processing unit (GPU), or One or more integrated circuits are used to execute related programs to implement the method for training a neural network model shown in FIG. 7 in an embodiment of the present application.

The processor 702 may also be an integrated circuit chip with signal processing capability. In the implementation process, each step of the data processing method shown in FIG. 6 in the embodiment of the present application may be completed by an integrated logic circuit of hardware in the processor 702 or instructions in the form of software. Optionally, each step of the method for training a neural network model shown in FIG. 7 in the embodiment of the present application may be completed by an integrated logic circuit of hardware in the processor 702 or instructions in the form of software.

The aforementioned processor 702 may also be a general-purpose processor, a digital signal processing (digital signal processing, DSP), an application specific integrated circuit (ASIC), a ready-made programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, Discrete gates or transistor logic devices, discrete hardware components. The methods, steps, and logical block diagrams disclosed in the embodiments of the present application can be implemented or executed. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like. The steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor. The software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers. The storage medium is located in the memory 701, and the processor 702 reads the information in the memory 701, and combines its hardware to complete the functions required by the units included in the data processing device of the embodiment of the present application, or perform the functions shown in FIG. 6 in the embodiment of the present application. The method of data processing shown. Optionally, it is also used to execute the method for training a neural network model shown in FIG. 7 in the embodiment of the present application

The communication interface 703 uses a transceiver device such as but not limited to a transceiver to implement communication between the device 700 and other devices or communication networks. For example, the information of the neural network to be constructed and the data to be processed (the data to be processed in the embodiment shown in FIG. 6) can be obtained through the communication interface 703. Optionally, the information of the neural network to be constructed and the data to be trained (the data to be trained in the embodiment shown in FIG. 7) can be obtained through the communication interface 703.

The bus 704 may include a path for transferring information between various components of the device 700 (for example, the memory 701, the processor 702, and the communication interface 703).

It should be understood that the acquisition module in the data processing device may be equivalent to the communication interface 703 in the data processing device 700; the processing module in the data processing device may be equivalent to the processor 702.

Fig. 9 is a schematic diagram of the hardware structure of a device for training a neural network model provided by an embodiment of the present application. The device 800 for training a neural network model shown in FIG. 9 (the device 800 may specifically be a computer device) includes a memory 801, a processor 802, a communication interface 803, and a bus 804. Among them, the memory 801, the processor 802, and the communication interface 803 realize the communication connection between each other through the bus 804.

The memory 801 may be a read only memory (ROM), a static storage device, a dynamic storage device, or a random access memory (RAM). The memory 801 may store a program. When the program stored in the memory 801 is executed by the processor 802, the processor 802 is configured to execute each step of the method for training a neural network model shown in FIG. 7 in the embodiment of the present application.

The processor 802 may adopt a general-purpose central processing unit (central processing unit, CPU), microprocessor, application specific integrated circuit (application specific integrated circuit, ASIC), graphics processing unit (graphics processing unit, GPU), or one or more The integrated circuit is used to execute related programs to implement the method for training a neural network model shown in FIG. 7 in the embodiment of the present application.

The processor 802 may also be an integrated circuit chip with signal processing capability. In the implementation process, each step of the method for training a neural network model shown in FIG. 7 in the embodiment of the present application can be completed by an integrated logic circuit of hardware in the processor 802 or instructions in the form of software.

The aforementioned processor 802 may also be a general-purpose processor, a digital signal processing (digital signal processing, DSP), an application specific integrated circuit (ASIC), an off-the-shelf programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, Discrete gates or transistor logic devices, discrete hardware components. The methods, steps, and logical block diagrams disclosed in the embodiments of the present application can be implemented or executed. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like. The steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor. The software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers. The storage medium is located in the memory 801, and the processor 802 reads the information in the memory 801, and combines its hardware to complete the functions required by the units included in the neural network model training device of the embodiment of the present application, or execute the figure in the embodiment of the present application. 7 shows the method of training the neural network model.

The communication interface 803 uses a transceiver device such as but not limited to a transceiver to implement communication between the device 800 and other devices or a communication network. For example, the information of the neural network to be constructed and the training data required in the process of constructing the neural network can be obtained through the communication interface 803 (the data to be trained in the embodiment shown in FIG. 7).

The bus 804 may include a path for transferring information between various components of the device 800 (for example, the memory 801, the processor 802, and the communication interface 803).

It should be understood that the acquisition module in the neural network model training device may be equivalent to the communication interface 803 in the neural network model training device 800; the processing module in the neural network model training device may be equivalent to the processor 802.

It should be noted that although the foregoing device 700 and device 800 only show memory, processor, and communication interface, in the specific implementation process, those skilled in the art should understand that the device 700 and device 800 may also include those necessary for normal operation. Other devices. At the same time, according to specific needs, those skilled in the art should understand that the device 700 and the device 800 may also include hardware devices that implement other additional functions. In addition, those skilled in the art should understand that the device 700 and the device 800 may also only include the components necessary to implement the embodiments of the present application, and not necessarily include all the components shown in FIGS. 8 and 9.

A person of ordinary skill in the art may be aware that the units and algorithm steps of the examples described in combination with the embodiments disclosed herein can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.

Those skilled in the art can clearly understand that, for the convenience and conciseness of description, the specific working process of the above-described system, device, and unit can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.

In the several embodiments provided in this application, it should be understood that the disclosed system, device, and method may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or It can be integrated into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

In addition, the functional units in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.

If the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the technical solution of this application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the method described in each embodiment of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disk or optical disk and other media that can store program code .

The above are only specific implementations of this application, but the protection scope of this application is not limited to this. Any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed in this application. Should be covered within the scope of protection of this application. Therefore, the protection scope of this application should be subject to the protection scope of the claims.

Claims

A data processing method, characterized in that it comprises:

Obtain multiple data to be processed;

Use the first neural network model to process the plurality of to-be-processed data to obtain a plurality of first vectors corresponding to the plurality of to-be-processed data one-to-one, wherein the first neural network model is based on general data training obtain;

Acquiring first association relationship information, where the first association relationship information is used to indicate at least one first vector group, and each first vector group includes two first vectors that satisfy a priori hypothesis;

The plurality of first vectors and the first association relationship information are input into a second neural network model to obtain a processing result for the first data to be processed, and the first data to be processed is among the plurality of data to be processed Any of the data.
The method according to claim 1, wherein the first association relationship information is used to indicate N of the first vector groups, where N is an integer greater than 1, and the first vector groups are And before the first association relationship information is input into the second neural network model, and the processing result for the first to-be-processed data is obtained, the method further includes:

Acquire second association relationship information, where the second association relationship information is used to indicate n second vector groups, the n second vector groups belong to the N first vector groups, n is less than N, and n is positive Integer

The inputting the multiple first vectors and the first association relationship information into a second neural network model to obtain a processing result for the first data to be processed includes:

The plurality of first vectors, the first association relationship information, and the second association relationship information are input into the second neural network model to obtain a processing result for the first to-be-processed data.
The method according to claim 1 or 2, wherein the obtaining multiple pieces of data to be processed comprises:

Acquiring target data, where the target data is one of the multiple to-be-processed data;

Obtain associated data, where the associated data and the target data have an associated relationship that satisfies the a priori hypothesis, and the plurality of data to be processed includes the associated data.
The method according to any one of claims 1 to 3, wherein the first association relationship information includes an association relationship matrix, and a vector located in the first dimension in the association relationship matrix includes a relationship with the plurality of A vector corresponds to a plurality of elements in a one-to-one relationship, and the vector in the second dimension in the correlation matrix includes a plurality of elements corresponding to the plurality of first vectors one-to-one, wherein any element in the correlation matrix It is used to indicate whether the vector corresponding to any element in the first dimension and the vector corresponding to any element in the second dimension have an association relationship that satisfies the a priori hypothesis.
The method according to any one of claims 1 to 4, wherein the weight parameter of the second neural network model is obtained in the following manner:

Obtain multiple data to be trained;

Using the first neural network model to process the plurality of to-be-trained data to obtain a plurality of fourth vectors corresponding to the plurality of to-be-trained data;

Acquiring third association relationship information, where the third association relationship information is used to indicate at least one third vector group, and each third vector group includes two fourth vectors that satisfy the a priori hypothesis;

The plurality of fourth vectors and the third association relationship information are input into the second neural network model to obtain a first processing result for the first to-be-trained data, and the first to-be-trained data is the plurality of For any data in the data to be trained, the first processing result is used to modify the weight parameter of the second neural network model.
The method according to claim 5, wherein said obtaining a first processing result for the first data to be trained comprises:

Obtain the first processing result and the second processing result for the second data to be trained, the label of the first data to be trained is the first label, the label of the second data to be trained is the second label, The first data to be trained and the second data to be trained are any two data of the plurality of data to be trained;

The method also includes:

The similarity between the first label and the second label is matched with the similarity between the first processing result and the second processing result to obtain a matching result, and the matching result is used for correction The weight parameter of the second neural network model.
The method according to claim 5 or 6, wherein the third association relationship information is used to indicate M third vector groups, and M is an integer greater than 1, and the fourth vector group is And before the third association relationship information is input to the second neural network model, and the first processing result for the first data to be trained is obtained, the method further includes:

Acquire fourth association relationship information, where the fourth association relationship information is used to indicate m fourth vector groups, the m fourth vector groups belong to the M third vector groups, m is less than M, and m is positive Integer

The inputting the plurality of fourth vectors and the third association relationship information into the second neural network model to obtain a first processing result for the first data to be trained includes:

The plurality of fourth vectors, the third association relationship information, and the fourth association relationship information are input into the second neural network model to obtain the first processing result.
The method according to any one of claims 5 to 7, wherein the first processing result is further used to modify the weight parameter of the first neural network model.
The method according to any one of claims 5 to 8, wherein the plurality of data to be trained includes one or more target type data, and each target type data has a label for modifying the weight parameter .
A method for training a neural network model, characterized in that it includes:

Obtain multiple data to be trained;

Using the first neural network model to process the plurality of data to be trained to obtain a plurality of fourth vectors corresponding to the plurality of data to be trained one-to-one;

Acquiring third association relationship information, where the third association relationship information is used to indicate at least one third vector group, and each third vector group includes two fourth vectors that satisfy the a priori hypothesis;

The plurality of fourth vectors and the third association relationship information are input into a second neural network model to obtain a first processing result for the first data to be trained, and the first data to be trained is the plurality of data to be trained For any data in the data, the first processing result is used to modify the weight parameter of the second neural network model.
The method according to claim 10, wherein the obtaining a first processing result for the first data to be trained comprises:

Obtain the first processing result and the second processing result for the second data to be trained, the label of the first data to be trained is the first label, the label of the second data to be trained is the second label, The first data to be trained and the second data to be trained are any two data of the plurality of data to be trained;

The method also includes:

The similarity between the first label and the second label is matched with the similarity between the first processing result and the second processing result to obtain a matching result, and the matching result is used for correction The weight parameter of the second neural network model.
The method according to claim 10 or 11, wherein the third association relationship information is used to indicate M third vector groups, and in the combination of the plurality of fourth vectors and the third association relationship Before the information is input to the second neural network model and the first processing result for the first data to be trained is obtained, the method further includes:

Acquire fourth association relationship information, where the fourth association relationship information is used to indicate m fourth vector groups, the m fourth vector groups belong to the M third vector groups, m is less than M, and m is positive Integer

The inputting the plurality of fourth vectors and the third association relationship information into the second neural network model to obtain a first processing result for the first data to be trained includes:

The plurality of fourth vectors, the third association relationship information, and the fourth association relationship information are input into the second neural network model to obtain the first processing result.
The method according to any one of claims 10 to 12, wherein the first processing result is further used to modify the weight parameter of the first neural network model.
The method according to any one of claims 10 to 13, wherein the plurality of data to be trained includes one or more target type data, and each target type data has a label used to modify the weight parameter .
A data processing device, characterized in that it comprises:

The acquisition module is used to acquire multiple data to be processed;

The processing module is configured to process the plurality of to-be-processed data using a first neural network model to obtain a plurality of first vectors corresponding to the plurality of to-be-processed data one-to-one, wherein the first neural network model Is obtained based on general data training;

The acquiring module is further configured to acquire first association relationship information, where the first association relationship information is used to indicate at least one first vector group, and each first vector group includes two first vectors satisfying a priori assumption;

The processing module is further configured to input the multiple first vectors and the first association relationship information into a second neural network model to obtain a processing result for the first data to be processed, and the first data to be processed is Any one of the plurality of data to be processed.
The device according to claim 15, wherein the first association relationship information is used to indicate N of the first vector groups, and N is an integer greater than 1, and the processing module combines the plurality of first vector groups A vector and the first association relationship information are input into the second neural network model, and before the processing result for the first data to be processed is obtained,

The acquiring module is further configured to acquire second association relationship information, where the second association relationship information is used to indicate n second vector groups, and the n second vector groups belong to the N first vector groups, n is less than N, and n is a positive integer;

The processing module is specifically configured to input the plurality of first vectors, the first association relationship information, and the second association relationship information into the second neural network model to obtain information about the first to-be-processed data The processing result.
The device according to claim 15 or 16, wherein the acquisition module is specifically configured to:

Acquiring target data, where the target data is one of the multiple to-be-processed data;

Obtain associated data, where the associated data and the target data have an associated relationship that satisfies the a priori hypothesis, and the plurality of data to be processed includes the associated data.
The device according to any one of claims 15 to 17, wherein the first association relationship information includes an association relationship matrix, and a vector located in the first dimension in the association relationship matrix includes a relationship with the plurality of A vector corresponds to multiple elements one-to-one, the vector in the second dimension in the correlation matrix includes multiple elements corresponding to the multiple first vectors one-to-one, and any element in the correlation matrix It is used to indicate whether the vector corresponding to any element in the first dimension and the vector corresponding to any element in the second dimension have an association relationship that satisfies the a priori hypothesis.
The device according to any one of claims 15 to 18, characterized in that:

The acquisition module is also used to acquire a plurality of data to be trained;

The processing module is further configured to use the first neural network model to process the plurality of to-be-trained data to obtain a plurality of fourth vectors corresponding to the plurality of to-be-trained data in a one-to-one correspondence;

The acquiring module is further configured to acquire third association relationship information, where the third association relationship information is used to indicate at least one third vector group, and each third vector group includes two fourth vector groups that satisfy the a priori assumption. Vector

The processing module is further configured to input the plurality of fourth vectors and the third association relationship information into the second neural network model to obtain a first processing result for the first data to be trained, the first The data to be trained is any data of the plurality of data to be trained, and the first processing result is used to modify the weight parameter of the second neural network model.
The device according to claim 19, wherein:

The processing module is specifically configured to obtain the first processing result and the second processing result for the second data to be trained, the label of the first data to be trained is the first label, and the label of the second data to be trained The label is the second label;

The processing module is further configured to match the similarity between the first label and the second label with the similarity between the first processing result and the second processing result to obtain a matching result , The matching result is used to modify the weight parameter of the second neural network model.
The device according to claim 19 or 20, wherein the third association relationship information is used to indicate M third vector groups, M is an integer greater than 1, and the processing module combines the multiple Before the four vectors and the third association relationship information are input to the second neural network model, and the first processing result for the first data to be trained is obtained,

The acquiring module is further configured to acquire fourth association relationship information, where the fourth association relationship information is used to indicate m fourth vector groups, and the m fourth vector groups belong to the M third vector groups, m is less than M, and m is a positive integer;

The processing module is specifically configured to input the plurality of fourth vectors, the third association relationship information, and the fourth association relationship information into the second neural network model to obtain the first processing result.
The device according to any one of claims 19 to 21, wherein the first processing result is also used to modify the weight parameter of the first neural network model.
The device according to any one of claims 19 to 22, wherein the plurality of data to be trained includes one or more target type data, and each target type data has a label for modifying the weight parameter .
A device for training a neural network model, characterized in that it comprises:

The acquisition module is used to acquire multiple data to be trained;

A processing module, configured to use the first neural network model to process the plurality of data to be trained to obtain a plurality of fourth vectors corresponding to the plurality of data to be trained one to one;

The acquiring module is further configured to acquire third association relationship information, where the third association relationship information is used to indicate at least one third vector group, and each third vector group includes two fourth vector groups satisfying the a priori assumption. Vector

The processing module is further configured to input the plurality of fourth vectors and the third association relationship information into a second neural network model to obtain a first processing result for the first data to be trained. The data is any data of the plurality of data to be trained, and the first processing result is used to modify the weight parameter of the second neural network model.
The device according to claim 24, wherein the processing module is specifically configured to obtain the first processing result and the second processing result for the second data to be trained, the label of the first data to be trained Is the first label, and the label of the second data to be trained is the second label;

The processing module is further configured to match the similarity between the first label and the second label with the similarity between the first processing result and the second processing result to obtain a matching result , The matching result is used to modify the weight parameter of the second neural network model.
The device according to claim 24 or 25, wherein the third association relationship information is used to indicate M third vector groups, and the processing module is used to combine the multiple fourth vectors and the Before the third association relationship information is input to the second neural network model and the first processing result for the first data to be trained is obtained,

The acquiring module is further configured to acquire fourth association relationship information, where the fourth association relationship information is used to indicate m fourth vector groups, and the m fourth vector groups belong to the M third vector groups, m is less than M, and m is a positive integer;

The processing module is specifically configured to input the plurality of fourth vectors, the third association relationship information, and the fourth association relationship information into the second neural network model to obtain the first processing result.
The device according to any one of claims 24 to 26, wherein the first processing result is further used to modify the weight parameter of the first neural network model.
The device according to any one of claims 24 to 27, wherein the plurality of data to be trained includes one or more target type data, and each target type data has a label for modifying the weight parameter .
A computer-readable storage medium, wherein the computer-readable medium stores program code for device execution, and the program code includes a method for executing the method according to any one of claims 1-14.
A chip, characterized in that, the chip comprises a processor and a data interface, and the processor reads instructions stored on a memory through the data interface to execute the method according to any one of claims 1-14 method.