WO2021022521A1 - Method for processing data, and method and device for training neural network model - Google Patents

Method for processing data, and method and device for training neural network model Download PDF

Info

Publication number
WO2021022521A1
WO2021022521A1 PCT/CN2019/099653 CN2019099653W WO2021022521A1 WO 2021022521 A1 WO2021022521 A1 WO 2021022521A1 CN 2019099653 W CN2019099653 W CN 2019099653W WO 2021022521 A1 WO2021022521 A1 WO 2021022521A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
neural network
association relationship
network model
trained
Prior art date
Application number
PCT/CN2019/099653
Other languages
French (fr)
Chinese (zh)
Inventor
李成
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN201980010339.0A priority Critical patent/CN112639828A/en
Priority to PCT/CN2019/099653 priority patent/WO2021022521A1/en
Publication of WO2021022521A1 publication Critical patent/WO2021022521A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks

Definitions

  • This application relates to the field of neural networks, in particular to methods for data processing in neural network systems, methods and equipment for training neural network models.
  • AI Artificial intelligence
  • DL deep learning
  • the trained neural network model sometimes depends on the data to be trained and cannot solve problems in other fields than the field of the data to be trained. For example, when the data to be trained is input into a deep neural network model, the data processing results obtained often match the characteristics of the input data; and when the deep neural network model is actually used, the output result matches the characteristics of the input data Degree is poor. Therefore, in order to weaken the dependence of the neural network model on the training data, it is necessary to provide a new method of constructing the neural network model.
  • This application provides a data processing method, a method and equipment for training a neural network model, with the purpose of reducing the dependence of the neural network model on training data.
  • a data processing method which includes: obtaining a plurality of data to be processed; using a first neural network model to process the plurality of data to be processed, and obtain a one-to-one relationship with the plurality of data to be processed.
  • Corresponding multiple first vectors wherein the first neural network model is obtained based on general data training; acquiring first association information, the first association information is used to indicate at least one first vector group, each The first vector group includes two first vectors satisfying a priori hypothesis; the multiple first vectors and the first association relationship information are input into the second neural network model to obtain the processing result for the first data to be processed,
  • the first data to be processed is any data in the plurality of data to be processed.
  • the first neural network model is a convolutional neural network model or a graph neural network model.
  • the first neural network model may be one of a deep convolutional neural network model, a graph convolutional neural network model, and a graph attention neural network model.
  • the second neural network model is a graph network model, and accordingly, the multiple first vectors are used as nodes of the graph network model, and the first association relationship is used as the graph network model.
  • the side of the network model is a graph network model.
  • the first neural network model and the second neural network model may be two sub-models of a certain neural network model.
  • the first neural network model and the second neural network model can be stored on two different devices, that is to say, the steps in the data processing method provided in this application can be executed by multiple devices.
  • the first neural network model is stored on the first device, and the first device can perform the steps of "obtain multiple data to be processed” and "use the first neural network model to process the multiple data to be processed to obtain
  • a second neural network model is stored on the second device, and the second device can execute “acquire the first association information, the first association
  • the relationship information is used to indicate at least one first vector group, and each first vector group includes two first vectors satisfying a priori hypothesis" and "combining the multiple first vectors and the first association relationship information
  • the first to-be-processed data is any one of the multiple to-be-processed data"
  • the first neural network model is trained using general data, and a general model that is not affected by the scene or less affected by the scene can be obtained, so the first neural network model model can be applied in a variety of scenarios.
  • the application of the first neural network model is not limited by the scene, it is difficult to achieve high-accuracy recognition of any scene using only the first neural network model. Therefore, multiple feature vectors output by the first neural network model can be input into the second neural network model, so that the first neural network model can be applied in relatively special scenes, so that the second neural network model can learn general scenes and special scenes The difference and association between.
  • Existing neural network models usually only recognize a particular scene. Once applied in other fields, most of the parameters of the neural network model can no longer be used.
  • the second neural network model can learn the difference and association between general scenarios and special scenarios, and because the data input to the first neural network model can be general data, the method provided in this application can weaken the scenario where the data to be processed is located on the neural network Restrictions on model architecture and parameters.
  • data associated with the first data to be processed will also be considered. As the amount of processed data increases, it is beneficial to increase the second The recognition accuracy of the neural network model.
  • the correlation between data and data is considered, the learning of the data relationship by the second neural network model can be enhanced.
  • the first association relationship information is used to indicate N of the first vector groups, where N is an integer greater than 1, and in the combination of the plurality of Before the first vector and the first association relationship information are input into the second neural network model, and the processing result for the first to-be-processed data is obtained, the method further includes: acquiring second association relationship information, the second association relationship information Used to indicate n second vector groups, the n second vector groups belong to the N first vector groups, n is less than N, and n is a positive integer; the plurality of first vectors and all the The first association relationship information is input into a second neural network model to obtain a processing result for the first to-be-processed data, including: combining the plurality of first vectors, the first association relationship information, and the second association relationship information Input the second neural network model to obtain the processing result for the first data to be processed.
  • the first association relationship information when the first association relationship information only indicates that there is an association relationship between two first vectors, the first association relationship information cannot reflect the strength of the association between the two first vectors.
  • the second association relationship information may indicate one or more first vector groups with a strong association relationship or a weak association relationship among the plurality of first vector groups, so that the second neural network model can consider in addition to the data associated with the first to-be-processed data.
  • the to-be-processed data can also strengthen the influence of the to-be-processed data that is closely related to the first to-be-processed data on the first to-be-processed data, or weaken the to-be-processed data that is distantly associated with the first to-be-processed data. Because of the influence of processing data, more data can be obtained to identify the first data to be processed.
  • the acquiring multiple data to be processed includes: acquiring target data, where the target data is one of the multiple data to be processed; acquiring associated data , There is an association relationship between the associated data and the target data that satisfies the a priori hypothesis, and the plurality of data to be processed includes the associated data.
  • the associated data can be flexibly introduced according to the data to be processed, which improves the flexibility of obtaining the data to be processed and avoids the introduction of unnecessary redundant data.
  • the first association relationship information includes a second association relationship matrix
  • the vector in the first dimension in the second association relationship matrix includes the The first vector corresponds to a plurality of elements in a one-to-one relationship
  • the vector in the second dimension in the second association relationship matrix includes a plurality of elements in one-to-one correspondence with the plurality of first vectors, wherein the second association relationship Any element in the matrix is used to indicate whether the vector corresponding to the any element in the first dimension and the vector corresponding to the any element in the second dimension has a value that satisfies the a priori hypothesis. connection relation.
  • a matrix is used to represent the association relationship between multiple first vectors, avoiding the introduction of multiple different types of data structures in the second neural network model, which facilitates calculation.
  • the using a first neural network model to process the plurality of data to be processed includes: using the first neural network model to process the plurality of The data to be processed and the fifth association relationship information are processed, where the fifth association relationship information is used to indicate at least one data group to be processed, and each data group to be processed includes two data to be processed that satisfy a priori assumption.
  • the data associated with the first to-be-processed data is also considered while identifying the first to-be-processed data. As the amount of processed data increases, It is helpful to increase the recognition accuracy of the first neural network model. Moreover, since the correlation between the data and the data is considered, the learning of the data relationship by the first neural network model can be enhanced.
  • the fifth association relationship information includes a first association relationship matrix, and a vector located in the first dimension in the first association relationship matrix includes the A plurality of elements corresponding to the data to be processed in a one-to-one relationship, the vector in the second dimension in the first association relationship matrix includes a plurality of elements corresponding to the plurality of data to be processed in a one-to-one relationship, wherein the first association relationship Any element in the matrix is used to indicate whether the vector corresponding to the any element in the first dimension and the vector corresponding to the any element in the second dimension has a value that satisfies the a priori hypothesis. connection relation.
  • a matrix is used to represent the association relationship between multiple to-be-processed data, avoiding the introduction of multiple different types of data structures in the first neural network model, which is conducive to simple calculation.
  • the weight parameter of the second neural network model is obtained by: acquiring a plurality of data to be trained; using the first neural network model to A plurality of to-be-trained data are processed to obtain a plurality of fourth vectors corresponding to the plurality of to-be-trained data; and third association relationship information is obtained, and the third association relationship information is used to indicate at least one third vector group , Each third vector group includes two fourth vectors satisfying the a priori hypothesis; inputting the multiple fourth vectors and the third association relationship information into the second neural network model to obtain
  • the first processing result of the data to be trained, the first data to be trained is any data of the plurality of data to be trained, and the first processing result is used to modify the weight parameter of the second neural network model.
  • the first neural network model is trained using general data, and a general model that is not affected by the scene or less affected by the scene can be obtained, so the first neural network model model can be applied in a variety of scenarios.
  • the multiple feature vectors output by the first neural network model are input into the second neural network model, so that the second neural network model can realize the recognition of a relatively special scene based on the recognition result of the first neural network model. Therefore, the second neural network model can learn the difference and association between general scenes and special scenes.
  • data associated with the first data to be trained is also considered. As the amount of processed data increases, it is beneficial to increase the recognition accuracy of the second neural network model.
  • the correlation between data and data is considered, the learning of the data relationship by the second neural network model can be enhanced.
  • the obtaining a first processing result for the first data to be trained includes: obtaining the first processing result and a second processing result for the second data to be trained As a result of the processing, the label of the first data to be trained is a first label, and the label of the second data to be trained is a second label; the method further includes: combining the first label and the second label The similarity between the two is matched with the similarity between the first processing result and the second processing result to obtain a matching result, and the matching result is used to modify the weight parameter of the second neural network model.
  • the similarity between the label and the label through the similarity between the label and the label, it can be judged whether the similarity between the two processing results is appropriate, and the learning of the association relationship between the data and the data can be strengthened by the second neural network model.
  • the third association relationship information is used to indicate M third vector groups, where M is an integer greater than 1, and in the fourth Before the vector and the third association relationship information are input into the second neural network model, and the first processing result for the first to-be-trained data is obtained, the method further includes: obtaining fourth association relationship information, the fourth association The relationship information is used to indicate m fourth vector groups, the m fourth vector groups belong to the M third vector groups, m is less than M, and m is a positive integer; the multiple fourth vector groups And inputting the third association relationship information into the second neural network model to obtain the first processing result for the first to-be-trained data includes: combining the plurality of fourth vectors, the third association relationship information, and the first Four-association relationship information is input into the second neural network model to obtain the first processing result.
  • the third association relationship information when the third association relationship information only indicates that there is an association relationship between the two fourth vectors, the third association relationship information cannot reflect the strength of the association between the two fourth vectors.
  • the second association relationship information may indicate one or more third vector groups with a strong association relationship or a weak association relationship among the plurality of third vector groups, so that the second neural network model can consider in addition to the data associated with the first training data.
  • the data to be trained can also strengthen the influence of the data to be trained closely related to the first data to be trained on the first data to be trained, or weaken the data to be trained that is distantly associated with the first data to be trained on the first data to be trained. Because of the influence of training data, more data can be obtained to identify the first data to be trained.
  • the first processing result is also used to modify the weight parameter of the first neural network model.
  • the association relationship between data and data can be learned during the training process, if the first processing result is also used to modify the first neural network model, the learning data of the first neural network model can be strengthened The ability to associate data between data.
  • the plurality of data to be trained includes one or more target type data, and each target type data has a label used to modify the weight parameter.
  • a semi-supervised learning method may be used to train the second neural network model.
  • a part of the plurality of data to be trained has a label, and the other part may not have a label.
  • the two parts of data can be merged according to the third association information. Even if the data to be trained includes data without labels, data without labels can still be considered when modifying the second neural network model. Therefore, the number of tags of the data to be trained can be reduced, and the amount of data processing for training the second neural network model can be simplified.
  • the third association relationship information includes a fourth association relationship matrix, and a vector located in the first dimension in the fourth association relationship matrix includes the The fourth vector corresponds to a plurality of elements in a one-to-one relationship, and the vector in the second dimension in the fourth association relationship matrix includes a plurality of elements in one-to-one correspondence with the plurality of fourth vectors, wherein the fourth association relationship Any element in the matrix is used to indicate whether the vector corresponding to the any element in the first dimension and the vector corresponding to the any element in the second dimension has a value that satisfies the a priori hypothesis. connection relation.
  • a matrix is used to represent the association relationship between multiple fourth vectors, avoiding the introduction of multiple different types of data structures in the second neural network model, which facilitates the calculation.
  • the using a first neural network model to process the plurality of data to be trained includes: using the first neural network model to process the plurality of The data to be trained and the sixth association relationship information are processed.
  • the sixth association relationship information is used to indicate at least one to-be-trained data group, and each to-be-trained data group includes two to-be-trained data that satisfy a priori hypothesis.
  • the data associated with the first data to be trained will also be considered while identifying the first data to be trained. As the amount of processed data increases, it is beneficial to increase the recognition accuracy of the first neural network model. Moreover, since the correlation between the data and the data is considered, the learning of the data relationship by the first neural network model can be enhanced.
  • the sixth association relationship information includes a third association relationship matrix
  • the vector in the first dimension in the third association relationship matrix includes the A plurality of elements in one-to-one correspondence with the data to be trained
  • the vector in the second dimension in the third association relationship matrix includes a plurality of elements in one-to-one correspondence with the plurality of data to be trained
  • the third association relationship Any element in the matrix is used to indicate whether the vector corresponding to the any element in the first dimension and the vector corresponding to the any element in the second dimension has a value that satisfies the a priori hypothesis. connection relation.
  • a matrix is used to represent the association relationship between multiple data to be trained, avoiding the introduction of multiple different types of data structures in the first neural network model, which facilitates calculation.
  • a method for training a neural network model which includes: obtaining a plurality of data to be trained; using the first neural network model to process the plurality of data to be trained, and obtain the same Multiple fourth vectors in one-to-one correspondence; acquiring third association relationship information, where the third association relationship information is used to indicate at least one third vector group, and each third vector group includes two that satisfy the a priori hypothesis A fourth vector; input the multiple fourth vectors and the third association relationship information into a second neural network model to obtain a first processing result for the first data to be trained, and the first data to be trained is the For any one of the multiple data to be trained, the first processing result is used to modify the weight parameter of the second neural network model.
  • the first neural network model can be obtained through training on the training data of scenario 1. Input the to-be-trained data of scene 2 into the first neural network model to output multiple feature vectors; then input the multiple feature vectors into the second neural network model, so that the second neural network model can be identified in the first neural network model Based on the results, the recognition of scene 2 is realized. Therefore, the second neural network model can learn the difference and association between scene 1 and scene 2.
  • data associated with the first data to be trained is also considered. As the amount of processed data increases, it is beneficial to increase the recognition accuracy of the second neural network model. In addition, since the correlation between data and data is considered, the learning of the data relationship by the second neural network model can be enhanced.
  • the obtaining the first processing result for the first data to be trained includes: obtaining the first processing result and the second processing result for the second data to be trained As a result of the processing, the label of the first data to be trained is the first label, the label of the second data to be trained is the second label, and the first data to be trained and the second data to be trained are the multiple Any two of the data to be trained; the method further includes: comparing the similarity between the first label and the second label to the difference between the first processing result and the second processing result To obtain a matching result, and the matching result is used to modify the weight parameter of the second neural network model.
  • the third association relationship information is used to indicate M third vector groups, and when the plurality of fourth vectors are associated with the third vector group, Before the relationship information is input to the second neural network model and the first processing result for the first data to be trained is obtained, the method further includes: obtaining fourth association relationship information, where the fourth association relationship information is used to indicate m A fourth vector group, the m fourth vector groups belong to the M third vector groups, m is less than M, and m is a positive integer; and the plurality of fourth vectors and the third association relationship Inputting information into the second neural network model to obtain a first processing result for the first data to be trained includes: inputting the plurality of fourth vectors, the third association relationship information, and the fourth association relationship information into the The second neural network model obtains the first processing result.
  • the first processing result is also used to modify the weight parameter of the first neural network model.
  • the plurality of data to be trained includes one or more target type data, and each target type data has a label used to modify the weight parameter.
  • the third association relationship information includes a fourth association relationship matrix, and a vector located in the first dimension in the fourth association relationship matrix includes the The fourth vector corresponds to a plurality of elements in a one-to-one relationship, and the vector in the second dimension in the fourth association relationship matrix includes a plurality of elements in one-to-one correspondence with the plurality of fourth vectors, wherein the fourth association relationship Any element in the matrix is used to indicate whether the vector corresponding to the any element in the first dimension and the vector corresponding to the any element in the second dimension has a value that satisfies the a priori hypothesis. connection relation.
  • the using the first neural network model to process the plurality of data to be trained includes: using the first neural network model to process the plurality of The data to be trained and the sixth association relationship information are processed.
  • the sixth association relationship information is used to indicate at least one to-be-trained data group, and each to-be-trained data group includes two to-be-trained data that satisfy a priori hypothesis.
  • the sixth association relationship information includes a third association relationship matrix
  • the vector in the first dimension in the third association relationship matrix includes the A plurality of elements in one-to-one correspondence with the data to be trained
  • the vector in the second dimension in the third association relationship matrix includes a plurality of elements in one-to-one correspondence with the plurality of data to be trained
  • the third association relationship Any element in the matrix is used to indicate whether the vector corresponding to the any element in the first dimension and the vector corresponding to the any element in the second dimension has a value that satisfies the a priori hypothesis. connection relation.
  • the first neural network model is obtained based on general data training.
  • the first neural network model is trained using general data, and a general model that is not affected by the scene or less affected by the scene can be obtained, so the first neural network model model can be applied in a variety of scenarios.
  • the multiple feature vectors output by the first neural network model are input into the second neural network model, so that the second neural network model can realize the recognition of a relatively special scene based on the recognition result of the first neural network model. Therefore, the second neural network model can learn the difference and association between the general scene and the special scene.
  • a method for training a neural network model including: acquiring a plurality of data to be trained; inputting the plurality of data to be trained and the seventh association relationship information into a second neural network model to obtain A first processing result of the data to be trained and a second processing result of the second data to be trained, the label of the first data to be trained is the first label, and the label of the second data to be trained is the second label,
  • the first to-be-trained data and the second to-be-trained data are any two pieces of data among the plurality of to-be-trained data; the method further includes: separating the first label and the second label The similarity of is matched with the similarity between the first processing result and the second processing result to obtain a matching result, and the matching result is used to modify the weight parameter of the second neural network model.
  • the similarity between the label and the label through the similarity between the label and the label, it can be judged whether the similarity between the two processing results is appropriate, and the learning of the association relationship between the data and the data can be strengthened by the second neural network model.
  • the method further includes: acquiring seventh association relationship information, where the seventh association relationship information is used to indicate at least one first training data set, and each first training data set is A training data set includes two data to be trained that satisfy the prior hypothesis.
  • the data associated with the first data to be trained will also be considered while identifying the first data to be trained. As the amount of processed data increases, it is beneficial to increase the recognition accuracy of the second neural network model. In addition, since the correlation between data and data is considered, the learning of the data relationship by the second neural network model can be enhanced.
  • the seventh association relationship information is used to indicate the H first training data sets, and in the combination of the plurality of to-be-trained data and the seventh Before the association relationship information is input into the second neural network model and the first processing result for the first data to be trained is obtained, the method further includes: obtaining eighth association relationship information, where the eighth association relationship information is used to indicate h Second data groups to be trained, the h second data groups to be trained belong to the H first training data groups, h is less than H, and h is a positive integer; the multiple data groups to be trained are The seventh association relationship information is input into the second neural network model to obtain the first processing result for the first to-be-trained data, including: combining the plurality of to-be-trained data, the seventh association relationship information, and the eighth The association relationship information is input into the second neural network model to obtain the first processing result.
  • the plurality of to-be-trained data includes one or more target type data, and each target type data has a label for modifying the weight parameter.
  • the seventh association relationship information includes a fifth association relationship matrix, and a vector located in the first dimension in the fifth association relationship matrix includes the A plurality of elements corresponding to the data to be trained in a one-to-one relationship, the vector in the second dimension in the fifth association relationship matrix includes a plurality of elements in one-to-one correspondence with the plurality of data to be trained, wherein the fifth association relationship Any element in the matrix is used to indicate whether the data to be trained corresponding to any element in the first dimension and the data to be trained corresponding to any element in the second dimension satisfy the A priori hypothesis of the relationship.
  • a data processing device in a fourth aspect, includes a module for executing the first aspect or the method in any possible implementation manner of the first aspect.
  • the device may be a cloud server or a terminal device.
  • a device for training a neural network model includes a module for executing the second aspect or the method in any possible implementation of the second aspect.
  • the device may be a cloud server or a terminal device.
  • a device for training a neural network model includes a module for executing the third aspect or the method in any possible implementation manner of the third aspect.
  • the device may be a cloud server or a terminal device.
  • a data processing device in a seventh aspect, includes: a memory for storing a program; a processor for executing the program stored in the memory, and when the program stored in the memory is executed, the The processor is configured to execute the method in any one of the implementation manners in the first aspect.
  • the device may be a cloud server or a terminal device.
  • a device for training a neural network model includes: a memory for storing a program; a processor for executing the program stored in the memory, and when the program stored in the memory is executed, The processor is configured to execute the method in any one of the implementation manners in the second aspect.
  • the device may be a cloud server or a terminal device.
  • a device for training a neural network model includes: a memory for storing a program; a processor for executing the program stored in the memory, and when the program stored in the memory is executed, The processor is configured to execute the method in any one of the implementation manners of the third aspect.
  • the device may be a cloud server or a terminal device.
  • a computer-readable medium stores program code for device execution.
  • the program code includes a method for executing any one of the first to third aspects. .
  • An eleventh aspect provides a computer program product containing instructions, when the computer program product runs on a computer, the computer executes the method in any one of the foregoing first to third aspects.
  • a chip in a twelfth aspect, includes a processor and a data interface.
  • the processor reads instructions stored in a memory through the data interface, and executes any one of the first to third aspects. One way to achieve this.
  • the chip may further include a memory in which instructions are stored, and the processor is configured to execute the instructions stored in the memory.
  • the processor is configured to execute the method in any one of the implementation manners of the first aspect to the third aspect.
  • FIG. 1 is a schematic diagram of a convolutional neural network architecture provided by an embodiment of the present application.
  • Fig. 2 is a schematic diagram of a graph model provided by an embodiment of the application.
  • FIG. 3 is a schematic diagram of a system architecture provided by an embodiment of the application.
  • FIG. 4 is a schematic diagram of the hardware structure of a chip provided by an embodiment of the application.
  • FIG. 5 is a schematic diagram of a system architecture provided by an embodiment of the application.
  • FIG. 6 is a schematic flowchart of a data processing method provided by an embodiment of the application.
  • FIG. 7 is a schematic flowchart of a method for training a neural network model provided by an embodiment of the application.
  • FIG. 8 is a schematic block diagram of a data processing device provided by an embodiment of the application.
  • Fig. 9 is a schematic block diagram of a device for training a neural network model provided by an embodiment of the present application.
  • a neural network can be composed of neural units.
  • a neural unit can refer to an arithmetic unit that takes x s and intercept 1 as inputs.
  • the output of the arithmetic unit can be:
  • s 1, 2,...n, n is a natural number greater than 1
  • W s is the weight of x s
  • b is the bias of the neural unit.
  • f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal.
  • the output signal of the activation function can be used as the input of the next convolutional layer.
  • the activation function can be a sigmoid function.
  • a neural network is a network formed by connecting many of the above-mentioned single neural units together, that is, the output of one neural unit can be the input of another neural unit.
  • the input of each neural unit can be connected with the local receptive field of the previous layer to extract the characteristics of the local receptive field.
  • the local receptive field can be a region composed of several neural units.
  • Deep neural network also known as multi-layer neural network
  • DNN can be understood as a neural network with many hidden layers. There is no special metric for "many” here.
  • the neural network inside DNN can be divided into three categories: input layer, hidden layer, and output layer.
  • the first layer is the input layer
  • the last layer is the output layer
  • the number of layers in the middle are all hidden layers.
  • the layers are fully connected, that is to say, any neuron in the i-th layer must be connected to any neuron in the i+1th layer.
  • DNN looks complicated, it is not complicated in terms of the work of each layer.
  • the coefficient from the kth neuron of the L-1th layer to the jth neuron of the Lth layer is defined as It should be noted that the input layer has no W parameter.
  • more hidden layers make the network more capable of portraying complex situations in the real world. Theoretically speaking, a model with more parameters is more complex and has a greater "capacity", which means it can complete more complex learning tasks.
  • Training a deep neural network is also a process of learning a weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (a weight matrix formed by vectors W of many layers).
  • Convolutional Neural Network (CNN, Convolutional Neuron Network) is a deep neural network with a convolutional structure.
  • the convolutional neural network contains a feature extractor composed of a convolutional layer and a sub-sampling layer.
  • the feature extractor can be seen as a filter, and the convolution process can be seen as using a trainable filter to convolve with an input image or convolution feature map.
  • the convolutional layer refers to the neuron layer that performs convolution processing on the input signal in the convolutional neural network.
  • a neuron can be connected to only part of the neighboring neurons.
  • a convolutional layer usually contains several feature planes, and each feature plane can be composed of some rectangularly arranged neural units.
  • Neural units in the same feature plane share weights, and the shared weights here are the convolution kernels. Sharing weight can be understood as the way to extract image information is independent of location. The underlying principle is that the statistical information of a certain part of the image is the same as that of other parts. This means that the image information learned in one part can also be used in another part. Therefore, the image information obtained by the same learning can be used for all positions on the image. In the same convolution layer, multiple convolution kernels can be used to extract different image information. Generally, the more the number of convolution kernels, the richer the image information reflected by the convolution operation.
  • the convolution kernel can be initialized in the form of a matrix of random size. During the training of the convolutional neural network, the convolution kernel can obtain reasonable weights through learning. In addition, the direct benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.
  • a convolutional neural network (CNN) 400 may include an input layer 410, a convolutional layer/pooling layer 420 (the pooling layer is optional), and a neural network layer 430.
  • the convolutional layer/pooling layer 420 may include layers 421-426.
  • layer 421 is a convolutional layer
  • layer 422 is a pooling layer
  • layer 423 is a convolutional layer.
  • Build layers, 424 layers are pooling layers
  • 425 are convolutional layers
  • 426 are pooling layers
  • 421 and 422 are convolutional layers
  • 423 are pooling layers
  • 424 and 425 are convolutional layers.
  • Layer, 426 is the pooling layer. That is, the output of the convolutional layer can be used as the input of the subsequent pooling layer, or as the input of another convolutional layer to continue the convolution operation.
  • the convolution layer 421 can include many convolution operators.
  • the convolution operator is also called a kernel. Its role in image processing is equivalent to a filter that extracts specific information from the input image matrix.
  • the convolution operator is essentially It can be a weight matrix. This weight matrix is usually pre-defined. In the process of convolution on the image, the weight matrix is usually one pixel after one pixel (or two pixels after two pixels) along the horizontal direction on the input image. ...It depends on the value of stride) to complete the work of extracting specific features from the image.
  • the size of the weight matrix should be related to the size of the image. It should be noted that the depth dimension of the weight matrix and the depth dimension of the input image are the same.
  • the weight matrix will extend to Enter the entire depth of the image. Therefore, convolution with a single weight matrix will produce a single depth dimension convolution output, but in most cases, a single weight matrix is not used, but multiple weight matrices of the same size (row ⁇ column) are applied. That is, multiple homogeneous matrices.
  • the output of each weight matrix is stacked to form the depth dimension of the convolutional image, where the dimension can be understood as determined by the "multiple" mentioned above.
  • Different weight matrices can be used to extract different features in the image. For example, one weight matrix is used to extract edge information of the image, another weight matrix is used to extract specific colors of the image, and another weight matrix is used to eliminate unwanted noise in the image.
  • the multiple weight matrices have the same size (row ⁇ column), and the feature maps extracted by the multiple weight matrices of the same size have the same size, and then the multiple extracted feature maps of the same size are combined to form a convolution operation. Output.
  • weight values in these weight matrices need to be obtained through a lot of training in practical applications.
  • Each weight matrix formed by the weight values obtained through training can be used to extract information from the input image, so that the convolutional neural network 400 can make correct predictions. .
  • the initial convolutional layer (such as 421) often extracts more general features, which can also be called low-level features;
  • the features extracted by the subsequent convolutional layers (for example, 426) become more and more complex, such as features such as high-level semantics, and features with higher semantics are more suitable for the problem to be solved.
  • the pooling layer can be a convolutional layer followed by a layer
  • the pooling layer can also be a multi-layer convolutional layer followed by one or more pooling layers.
  • the only purpose of the pooling layer is to reduce the size of the image space.
  • the pooling layer may include an average pooling operator and/or a maximum pooling operator for sampling the input image to obtain a smaller size image.
  • the average pooling operator can calculate the pixel values in the image within a specific range to generate an average value as the result of average pooling.
  • the maximum pooling operator can take the pixel with the largest value within a specific range as the result of the maximum pooling.
  • the operators in the pooling layer should also be related to the image size.
  • the size of the image output after processing by the pooling layer can be smaller than the size of the image of the input pooling layer, and each pixel in the image output by the pooling layer represents the average value or the maximum value of the corresponding sub-region of the image input to the pooling layer.
  • Neural network layer 430
  • the convolutional neural network 400 After processing by the convolutional layer/pooling layer 420, the convolutional neural network 400 is not enough to output the required output information. Because as mentioned above, the convolutional layer/pooling layer 420 only extracts features and reduces the parameters brought by the input image. However, in order to generate the final output information (required class information or other related information), the convolutional neural network 400 needs to use the neural network layer 430 to generate one or a group of required classes of output. Therefore, the neural network layer 430 can include multiple hidden layers (431, 432 to 43n as shown in FIG. 1) and an output layer 440. The parameters contained in the hidden layers can be based on specific task types. The relevant training data of the, for example, the task type can include image recognition, image classification, image super-resolution reconstruction and so on.
  • the output layer 440 After the multiple hidden layers in the neural network layer 430, that is, the final layer of the entire convolutional neural network 400 is the output layer 440.
  • the output layer 440 has a loss function similar to the classification cross entropy, which is specifically used to calculate the prediction error.
  • the convolutional neural network 400 shown in FIG. 1 is only used as an example of a convolutional neural network. In specific applications, the convolutional neural network may also exist in the form of other network models.
  • RNN Recurrent Neural Networks
  • the specific form is that the network will memorize the previous information and apply it to the calculation of the current output, that is, the nodes between the hidden layer are no longer unconnected but connected, and the input of the hidden layer includes not only The output of the input layer also includes the output of the hidden layer at the previous moment.
  • RNN can process sequence data of any length.
  • the training of RNN is the same as the training of traditional CNN or DNN.
  • the error backpropagation algorithm is also used, but there is a difference: that is, if the RNN is network expanded, then the parameters, such as W, are shared; this is not the case with the traditional neural network mentioned above.
  • the output of each step depends not only on the current step of the network, but also on the state of the previous steps of the network. This learning algorithm is called backpropagation through time (BPTT).
  • BPTT backpropagation through time
  • the loss function is usually a multivariate function, and the gradient can reflect the rate of change of the output value of the loss function when the variable changes.
  • the greater the absolute value of the gradient the greater the rate of change of the output value of the loss function, and the loss can be calculated when updating different parameters.
  • the gradient of the function continuously updates the parameters along the direction of the fastest gradient drop, reducing the output value of the loss function as soon as possible.
  • Convolutional neural networks can use backpropagation (BP) algorithms to modify the size of the parameters in the initial super-resolution model during the training process, so that the reconstruction error loss of the super-resolution model becomes smaller and smaller. Specifically, forwarding the input signal to the output will cause error loss, and the parameters in the initial super-resolution model are updated by backpropagating the error loss information, so that the error loss is converged.
  • the backpropagation algorithm is a backpropagation motion dominated by error loss, and aims to obtain the optimal super-resolution model parameters, such as a weight matrix.
  • GAN Generative adversarial networks
  • the model includes at least two modules: one module is a generative model, and the other is a discriminative model. Through these two modules, they learn from each other to produce better output.
  • Both the generative model and the discriminant model can be a neural network, specifically a deep neural network, or a convolutional neural network.
  • the basic principle of GAN is as follows: Take the GAN that generates pictures as an example, suppose there are two networks, G (generator) and D (discriminator), where G is a network that generates pictures, and it receives a random noise z through this noise Generate a picture and mark it as G(z); D is a discriminant network used to discriminate whether a picture is "real".
  • Its input parameter is x
  • x represents a picture
  • the output D(x) represents the probability that x is a real picture. If it is 1, it means 100% is a real picture, and if it is 0, it means it cannot be real. image.
  • the goal of generating network G is to generate as real pictures as possible to deceive the discriminating network D
  • the goal of discriminating network D is to try to distinguish the pictures generated by G from the real pictures Come. In this way, G and D constitute a dynamic "game” process, that is, the "confrontation" in the "generative confrontation network”.
  • the edge between node n1 and node n2 can be expressed as (n1, n2).
  • Graph Neural Network is a neural network that runs directly on the graph data structure. Among them, the label of node n in the node set can be represented by a vector, and the label of edge (n1, n2) in the edge set can also be represented by a vector.
  • the graph neural network can include an input layer, an output layer, and one or more hidden layers.
  • h v is the state of node v
  • x v is the feature representation of node v
  • x co[v] is the feature representation of the edge associated with node v
  • h ne[v] is the other associated with node v
  • x ne[v] is the characteristic representation of other nodes associated with node v.
  • nodes 2, 3, 4, and 6 on the inner side of the dotted line all have edges between them and the node 1, and the nodes 2, 3, 4, and 6 are all connected to the node 1.
  • the associated node If there is an edge connected between node v and node i, then node i is a node associated with node v, and node i can be called a neighbor node of node v.
  • Graph Convolutional Neural Network is a method for deep learning of graph data, which can be understood as the application of graph neural network in convolutional neural network.
  • Graph convolutional neural networks are usually divided into two categories: spectral approaches (spectral approaches) and non-spectral approaches (non-spectral approaches).
  • the spectral method is based on the spectral representation of the graph.
  • the convolution operation is defined in the Fourier domain.
  • the convolution operation requires intensive matrix calculation and non-local spatial filtering calculation.
  • the non-spectral method is to directly convolve on the graph instead of on the spectrum of the graph.
  • the graph convolutional neural network depends on the structure information of the graph, which leads to the model trained on a specific graph structure often cannot be directly used on other graph structures.
  • the graph convolution operator can be:
  • c ij denotes a normalization factor, related to the FIG structure
  • N i represents a node associated with node i
  • the node associated with the node i may include a node i itself
  • R j represents a node Type of i.
  • the graph attention network includes the core layer of graph attention, which distributes attention to the set of neighboring nodes associated with node i through the implicit self-attention layer. According to the characteristics of the neighboring nodes, it is the node i Allocate different weights and perform weighted summation on the features of neighboring nodes.
  • the difference from the graph convolutional neural network is that the graph attention network does not depend on the specific graph structure.
  • the graph attention network uses a multi-layer multi-head attention mechanism to implement attention distribution to each node under the association structure of the graph, so it can calculate the information that each node obtains from other related nodes. The essence of the multi-head attention mechanism is weighted summation, and the weight comes from the learned attention matrix and the node's own information. Therefore, the network is different from the graph convolutional neural network, and the parameters learned by the network do not depend on the specific graph structure.
  • an embodiment of the present application provides a system architecture 100.
  • the data collection device 160 is used to collect data to be trained.
  • the data to be trained includes: image data, video data, audio data, text data, etc.; and the data to be trained is stored Database 130, the training device 120 obtains the target model/rule 101 based on the training data maintained in the database 130.
  • Embodiment 1 will use Embodiment 1 to describe in more detail how the training device 120 obtains the target model/rule 101 based on the data to be trained.
  • the target model/rule 101 can be used to implement the method for training a neural network model provided in the embodiment of the present application, that is,
  • the target model/rule 101 may include a first neural network model and a second neural network model. Input the data to be trained into the first neural network model to obtain multiple fourth vectors and input the multiple fourth vectors into the second neural network model , By adjusting the weight parameters of the target model/rule 101 through the loss function, the trained target model/rule 101 can be obtained. It should be noted that, in actual applications, the data to be trained in the database 130 may not all come from the collection of the data collection device 160, and may also be received from other devices.
  • the training device 120 does not necessarily perform the training of the target model/rule 101 based on the training data maintained by the database 130. It may also obtain the training data from the cloud or other places for model training. The above description should not be used as Limitations of the embodiments of this application.
  • the target model/rule 101 trained by the training device 120 can be applied to different systems or devices, such as the execution device 110 shown in FIG. 3, which can be a terminal, such as a mobile phone terminal, a tablet computer, Laptops, AR/VR, vehicle-mounted terminals, etc., can also be servers or clouds.
  • the execution device 110 is equipped with an input/output interface 112 for data interaction with external devices.
  • the user can input data to the input/output interface 112 through the client device 140.
  • the input data is described in the embodiment of the present application. Can include multiple data to be processed.
  • the preprocessing module 113 is configured to perform processing according to the input data (such as the image data, video data, audio data, text data, etc.) received by the input/output interface 112.
  • the input data may be the data to be processed in the embodiment of the present application. Preprocessing. In this embodiment of the application, the preprocessing module 113 may be used to extract features of input data, for example.
  • the execution device 110 may call data, codes, etc. in the data storage system 150 for corresponding processing .
  • the data, instructions, etc. obtained by corresponding processing may also be stored in the data storage system 150.
  • the input/output interface 112 returns the processing result to the client device 140 to provide it to the user.
  • the training device 120 can generate corresponding target models/rules 101 based on different data to be trained for different goals or tasks, and the corresponding target models/rules 101 can be used to achieve the above goals or Complete the above tasks to provide users with the desired results.
  • the user can manually set input data, and the manual setting can be operated through the interface provided by the input/output interface 112.
  • the client device 140 can automatically send input data to the input/output interface 112. If the client device 140 is required to automatically send the input data and the user's authorization is required, the user can set the corresponding authority in the client device 140.
  • the user can view the result output by the execution device 110 on the client device 140, and the specific presentation form may be a specific manner such as display, sound, and action.
  • the client device 140 can also be used as a data collection terminal to collect the input data of the input/output interface 112 and the output result of the output input/output interface 112 as new sample data as shown in the figure, and store it in the database 130.
  • the input/output interface 112 directly uses the input data of the input/output interface 112 and the output result of the output input/output interface 112 as a new sample as shown in the figure.
  • the data is stored in the database 130.
  • Fig. 3 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship between the devices, devices, modules, etc. shown in the figure does not constitute any limitation.
  • the data storage system 150 is an external memory relative to the execution device 110. In other cases, the data storage system 150 may also be placed in the execution device 110.
  • the target model/rule 101 is obtained by training according to the training device 120, and the target model/rule 101 may include the first neural network model and the second neural network model in the embodiment of the application.
  • the first neural network model may be a convolutional neural network model or a graph neural network model
  • the second neural network model may be a graph neural network model.
  • FIG. 4 is a chip hardware structure provided by an embodiment of the application, and the chip includes a neural network processor 20.
  • a neural network processor (Neural-network Processing Unit, NPU) 20 can be mounted as a coprocessor to a host central processing unit (Host Central Processing Unit, Host CPU), and the Host CPU allocates tasks.
  • the core part of the NPU is the arithmetic circuit 203.
  • the controller 204 controls the arithmetic circuit 203 to extract data from the memory (weight memory or input memory) and perform calculations.
  • the arithmetic circuit 203 includes multiple processing units (Process Engine, PE). In some implementations, the arithmetic circuit 203 is a two-dimensional systolic array. The arithmetic circuit 203 may also be a one-dimensional systolic array or other electronic circuits capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 203 is a general-purpose matrix processor.
  • the arithmetic circuit fetches the data corresponding to matrix B from the weight memory 202 and caches it on each PE in the arithmetic circuit.
  • the arithmetic circuit fetches matrix A data and matrix B from the input memory 201 to perform matrix operations, and the partial or final result of the obtained matrix is stored in an accumulator 208.
  • the vector calculation unit 207 can perform further processing on the output of the operation circuit 203, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, and so on.
  • the vector calculation unit 207 can be used for network calculations in the non-convolutional/non-FC layer of the neural network, such as pooling, batch normalization, local response normalization, etc. .
  • the vector calculation unit 207 can store the processed output vector to the unified buffer 206.
  • the vector calculation unit 207 may apply a nonlinear function to the output of the arithmetic circuit 203, such as a vector of accumulated values, to generate the activation value.
  • the vector calculation unit 207 generates a normalized value, a combined value, or both.
  • the processed output vector can be used as an activation input to the arithmetic circuit 203, for example for use in a subsequent layer in a neural network.
  • Part or all of the steps of the method provided in this application may be executed by the arithmetic circuit 203 or the vector calculation unit 207.
  • the unified memory 206 is used to store input data and output data.
  • the weight data directly transfers the input data in the external memory to the input memory 201 and/or the unified memory 206 through the storage unit access controller 205 (Direct Memory Access Controller, DMAC), and stores the weight data in the external memory into the weight memory 202, And save the data in the unified memory 206 into the external memory.
  • DMAC Direct Memory Access Controller
  • a bus interface unit (BIU) 210 is used to implement interaction between the main CPU, the DMAC, and the fetch memory 209 through the bus.
  • An instruction fetch buffer 209 connected to the controller 204 is used to store instructions used by the controller 204.
  • the controller 204 is used to call the instructions cached in the instruction fetch memory 209 to control the working process of the computing accelerator.
  • the unified memory 206, the input memory 201, the weight memory 202, and the fetch memory 209 are all on-chip (On-Chip) memories.
  • the external memory is a memory private to the NPU, and the external memory can be synchronized at double data rate.
  • Dynamic random access memory Double Data Rate Synchronous Dynamic Random Access Memory, referred to as DDR SDRAM
  • high bandwidth memory High Bandwidth Memory, HBM
  • other writable memory other readable and writable memory.
  • an embodiment of the present application provides a system architecture 300.
  • the system architecture includes a local device 301, a local device 302, an execution device 310, and a data storage system 350.
  • the local device 301 and the local device 302 are connected to the execution device 310 through a communication network.
  • the execution device 310 may be implemented by one or more servers.
  • the execution device 310 can be used in conjunction with other computing devices, such as data storage, routers, load balancers and other devices.
  • the execution device 310 may be arranged on one physical site or distributed on multiple physical sites.
  • the execution device 310 may use the data in the data storage system 350 or call the program code in the data storage system 350 to implement the method for searching the neural network structure of the embodiment of the present application.
  • the execution device 310 can be built as an image recognition neural network, which can be used for image recognition or image processing.
  • Each local device can represent any computing device, such as personal computers, computer workstations, smart phones, tablets, smart cameras, smart cars or other types of cellular phones, media consumption devices, wearable devices, set-top boxes, game consoles, etc.
  • the local device of each user can interact with the execution device 310 through a communication network of any communication mechanism/communication standard.
  • the communication network can be a wide area network, a local area network, a point-to-point connection, etc., or any combination thereof.
  • the above-mentioned execution device 310 may also be referred to as a cloud device. At this time, the execution device 310 is generally deployed in the cloud.
  • the neural network model will depend on the data to be trained.
  • the output result of the neural network model is close to the characteristics of the data to be trained, and the accuracy rate is high; when the trained neural network model is applied in actual use, the output of the trained neural network model is recognized The result is far from the characteristics of the input data itself, and the accuracy is low.
  • this application provides a data processing method, so that the trained neural network model can achieve high-accuracy recognition when it is applied to a specific scene.
  • FIG. 6 is a schematic flowchart of a data processing method provided by an embodiment of the present application.
  • the method 500 may be executed by the execution device 110 as shown in FIG. 3.
  • the method 500 may be executed by the neural network processor 20 as shown in FIG. 4.
  • the method 500 may be executed by the execution device 310 as shown in FIG. 5.
  • the data to be processed can be understood as the data that is about to be input to the neural network model and processed by the neural network model.
  • the data to be processed can be text data, image data, video data, audio data, etc., such as a text file, a paragraph of text in a text file, a picture file, an image block in a picture file, a frame in a video file, A video file, a video in a video file, an audio file, and an audio in an audio file.
  • Multiple data to be processed can be multiple text files, multiple texts in a text file, multiple picture files, multiple image blocks in a picture file, multiple frames in a video file, multiple video files, Multiple pieces of video in one video file, multiple audio files, multiple pieces of audio in one audio file, etc. This application does not limit the type of data to be processed.
  • the plurality of to-be-processed data are stored in the database, so the device executing the method 500 can directly retrieve the plurality of to-be-processed data from the database.
  • the plurality of data to be processed can be obtained by using a camera shooting method.
  • the cloud device stores the multiple data to be processed, so the device that executes the method 500 can receive the multiple data to be processed sent by the cloud device through the communication network.
  • input multiple data to be processed into the first neural network model and use the first neural network model to perform, for example, feature screening (filter out useful features) and feature fusion (combine multiple features) Wait for processing operations, and output a plurality of first vectors one-to-one corresponding to the plurality of data to be processed.
  • the convolutional neural network shown in Figure 1 Take the convolutional neural network shown in Figure 1 as an example.
  • the multiple data to be processed can be input from the input layer, and pass through hidden layers such as convolutional layer and/or pooling layer.
  • Data processing is performed, and a plurality of first vectors corresponding to the plurality of data to be processed are output from the output layer of the first neural network model.
  • the first vector can be a number or a vector containing multiple numbers.
  • the type of the first neural network model may be a convolutional neural network model, a graph neural network model, a graph convolutional neural network model, a graph attention neural network model, and so on. This application does not limit the type of the first neural network model.
  • the first neural network model may be a traditional convolutional neural network model.
  • the output layer of the traditional convolutional neural network is a fully connected layer, which is sometimes called a classifier.
  • the traditional convolutional neural network model can directly output the recognition result of the data to be processed. For example, if the data to be processed is an image, the traditional convolutional neural network model can directly output the recognition results of whether there is a person in the image, whether the person is male or female. The recognition result can often only represent the probability that the data to be processed belongs to a certain feature.
  • the first neural network model may also be a special convolutional neural network model that does not include a fully connected layer, which can output the calculation result of the convolutional layer or the pooling layer.
  • the first neural network model can output processing results that are intermediate calculation results in the traditional convolutional neural network model.
  • the processing results output by this special convolutional neural network model are called intermediate calculation results.
  • the intermediate calculation result can be used to characterize part or all of the information of the data to be processed.
  • the first neural network model may be a graph neural network model.
  • the using the first neural network model to process the plurality of data to be processed includes: using the first neural network model to process the plurality of data to be processed and the fifth association relationship information,
  • the fifth association relationship information is used to indicate at least one data group to be processed, and each data group to be processed includes two data to be processed that satisfy a priori assumption.
  • the data group to be processed contains two data to be processed that have an association relationship. That is, there is an association relationship between the two to-be-processed data in the to-be-processed data group that satisfies a priori assumption. For example, if the data group to be processed is (data to be processed 1, data to be processed 2), then there is an association relationship between the data to be processed 1 and the data to be processed 2 that satisfies the a priori assumption.
  • the plurality of data to be processed and the fifth association relationship information reflecting the association relationship between the plurality of data to be processed are input into the first neural network model, and the first neural network model can determine the data according to the fifth association relationship information Whether there is an influence between the data and the data, and the weight parameter in the first neural network model reflects the degree of influence between the data and the data, so as to obtain a plurality of first vectors that can reflect the relevance of the data.
  • the multiple to-be-processed data correspond one to one.
  • Hypothesis refers to the explanation of a certain phenomenon in accordance with the pre-determination, that is, based on the known scientific facts and scientific principles, the speculation and explanation of the natural phenomenon under study and its regularity, and After detailed classification, induction and analysis of the data, a temporary but acceptable explanation is obtained.
  • Prior probability appears in Bayesian statistical inference and refers to the prior probability distribution of random variables (usually referred to as a priori), that is, the probability of expressing a person’s belief in the variable before considering some evidence distributed.
  • a priori hypothesis refers to a prior probability distribution proposed for all hypotheses in the hypothesis space.
  • the plurality of data to be processed may be multiple paragraphs of text, wherein a paragraph of text may include multiple sentences.
  • a paragraph of text may include multiple sentences.
  • different paragraphs of text express different topics. Therefore, multiple sentences in a paragraph are more related, and multiple sentences belonging to different paragraphs are weak or non-relevant.
  • a priori hypothesis such as a correlation between multiple sentences belonging to the same paragraph.
  • the multiple to-be-processed data may be multiple frames of pictures. Normally, as time goes by, the longer the interval between two frames, the smaller the correlation between the two frames; the shorter the interval between the two frames, the greater the correlation between the two frames Big. Then there can be a priori assumption, such as a correlation between two frames whose interval is less than a preset threshold.
  • the preset threshold may be 8s, for example.
  • the multiple pieces of to-be-processed data may be multiple pieces of videos, where, as time moves, the longer the interval between the two pieces of video, the smaller the correlation between the two pieces of videos; The shorter the video interval, the greater the correlation between the two videos. Then there may be a priori assumption, such as a correlation between two videos whose minimum interval length is less than a preset threshold.
  • the preset threshold may be 8s, for example.
  • the multiple pieces of to-be-processed data may be multiple pieces of audio, where as time moves, the longer the interval between the two pieces of audio, the smaller the correlation between the two pieces of audio; The shorter the audio interval, the greater the correlation between the two audio segments. Then there may be a priori assumption, such as a correlation between two audio segments whose minimum interval duration is less than a preset threshold.
  • the preset threshold may be 8s, for example.
  • the fifth association relationship information may be a matrix. Compared with other information types, matrix operations are more convenient.
  • the fifth association relationship information includes a first association relationship matrix, and a vector in the first dimension in the first association relationship matrix includes a plurality of elements corresponding to the plurality of data to be processed one-to-one, so
  • the vector in the second dimension in the first correlation matrix includes a plurality of elements corresponding to the plurality of data to be processed, wherein any element in the first correlation matrix is used to indicate any Whether the vector corresponding to the element in the first dimension and the vector corresponding to any element in the second dimension has an association relationship that satisfies the a priori hypothesis.
  • P is a k ⁇ k matrix
  • the i-th column corresponds to the to-be-processed data i
  • the j-th row corresponds to the to-be-processed data j
  • the elements p i,j in the i-th column and the j-th row represent the to-be-processed data i and the to-be-processed data Whether there is an association relationship between j that satisfies the prior hypothesis.
  • the value of p i,j in the i-th column and the j-th row can be 1, and there is no association between the to-be-processed data i and the to-be-processed data j ,
  • the value of element p i,j in the i-th column and j-th row can be 0.
  • the value of the element p i,j in the i-th column and the j-th row can be 0, and there is no relationship between the to-be-processed data i and the to-be-processed data j
  • the value of p i,j in the i-th column and j-th row can be 1.
  • the matrix P T obtained after the matrix P is converted to rank is the same as the matrix P.
  • p i,j p j,i .
  • the association relationship between the to-be-processed data i and the to-be-processed data j may be non-directional.
  • the matrix P T obtained after the matrix P is converted to rank is different from the matrix P.
  • p i,j ⁇ p j,i .
  • the relationship between the to-be-processed data i and the to-be-processed data j is directional.
  • p i,j indicates that there is an association relationship between the to-be-processed data i and the to-be-processed data j
  • p j,i indicates that there is an association between the to-be-processed data i and the to-be-processed data j.
  • the to-be-processed data j points to the association relationship of the to-be-processed data i.
  • p i,j indicates that there is an association relationship between the to-be-processed data i and the to-be-processed data j from the to-be-processed data j to the to-be-processed data i
  • p j,i indicates that there is a relationship between the to-be-processed data i and the to-be-processed data j
  • the data i to be processed points to the relationship of the data j to be processed.
  • the plurality of to-be-processed data consists of the to-be-processed data 1 and a number of to-be-processed data related to the to-be-processed data 1.
  • the edge connected between node 1 and node 4 has an edge connected between node 1 and node 6.
  • the plurality of data to be processed includes data to be processed 1, a number of data to be processed that are associated with the data to be processed 1, and a number of data to be processed that are not associated with the data to be processed 1.
  • node 1 node 4
  • node 6 where there is an edge connected between node 1 and node 4 and an edge connected between node 1 and node 6, and there is an edge connected between node 5
  • a plurality of to-be-processed data is acquired, and based on a priori assumption, it is determined whether there is an association relationship between any two of the plurality of to-be-processed data.
  • a piece of to-be-processed data is acquired, and based on a priori hypothesis, other pieces of to-be-processed data that have an association relationship with the piece of to-be-processed data are determined.
  • the acquiring multiple to-be-processed data includes: acquiring target data, where the target data is one of the multiple to-be-processed data; acquiring associated data that has a priori hypothesis with the target data, so The plurality of data to be processed includes the associated data.
  • the device executing the method 500 first obtains the target data, and then introduces the associated data related to the target data according to a priori assumption.
  • the target data may be sentence 1.
  • the a priori assumption is that there is a correlation between multiple sentences belonging to the same paragraph, then other sentences in the paragraph where the sentence 1 is located except for the sentence 1 are introduced as the correlation data.
  • the target data can be picture 1 in a video.
  • the frame with an interval of less than 8s from the frame 1 is used as the associated data.
  • the target data can be video 1.
  • the video with a minimum interval of less than 8s is used as the associated data.
  • the target data can be audio 1.
  • the audio with the minimum interval of audio 1 less than 8s is regarded as the associated data.
  • a time interval of 8s is taken as an example to obtain the associated data.
  • Those skilled in the art can understand that the above time interval can be adjusted according to different scenarios.
  • the first neural network model can be trained using general data.
  • the so-called general data can be data that is not affected by the scene, or data that has low dependence on the scene.
  • the first neural network model is used to identify character features in images, and its training data set can include various possible scenes, such as street scenes, conference scenes, car scenes, rural scenes, Asian scenes, African scenes, European and American scenes Wait.
  • the multiple to-be-processed data may be data applied in a specific scene.
  • the first neural network model capable of processing general data can be used to process special data.
  • the process of training the first neural network model can be to input general data into the first neural network model, and the first neural network model can perform data processing operations such as feature screening and feature fusion on the general data to obtain feature vectors. Perform matrix operations on the feature vector and the weight matrix containing the weight parameters to obtain the data training result corresponding to the general data. Then, the distance between the data training result and the label of the general data is calculated, so as to modify the weight parameter of the first neural network model.
  • the distance between the data training result and the label of the general data can be understood as the degree of similarity between the data training result and the label of the general data.
  • the specific calculation method of the information distance can be cross entropy, KL divergence, JS divergence, etc.
  • the data in the process of collecting data, the data is usually collected in a video manner, and the training data can be tagged, so as to obtain the labeled data required for the training process.
  • the specific tagging process and the interpretation of tags are common technical content in the field of deep learning, and will not be repeated in this embodiment of the application.
  • the recognition result of general data 1 is: the confidence that general data 1 belongs to feature 1 is 0.7, and the confidence that general data 1 belongs to feature 2 is 0.3.
  • the label of general data 1 is: label 1, which corresponds to feature 1.
  • the recognition result of general data 1 can be represented by (0.7, 0.3), and the label of general data 1 can be represented by (1, 0).
  • the distance between the data training result and the label of the general data may be the distance between the vector (0.7, 0.3) and the vector (1, 0).
  • the label of the general data may be a vector with the same dimension as the intermediate calculation result. Through vector calculation, the distance between the data training result and the label of the general data can be obtained.
  • first association relationship information where the first association relationship information is used to indicate at least one first vector group, and each first vector group includes two first vectors that satisfy a priori assumption.
  • the first association relationship information reflects whether there is an association relationship between the multiple first vectors.
  • the first vector group contains two first vectors that have an association relationship. That is, there is an association relationship between the two first vectors in the first vector group that satisfies a priori hypothesis. For example, if the first vector group indicates (first vector 1, first vector 2), then there is an association relationship between the first vector 1 and the first vector 2 that satisfies the a priori assumption.
  • the first association relationship information reflects whether there is an influence between the multiple first vectors, so that the data processing result that can reflect the data association can be obtained according to the first association relationship information. It should be understood that the first vector may have an association relationship with itself.
  • the first association relationship information may be determined according to the association relationship between the multiple to-be-processed data. That is, the first association relationship information is the same or substantially the same as the fifth association relationship information above.
  • the first association relationship information is different from the fifth association relationship information described above. For example, based on the similarity between any two first vectors in the multiple first vectors, it can be determined whether there is an association relationship between any two first vectors. The greater the similarity, the greater the association; the smaller the similarity, the smaller the association. Then the a priori hypothesis corresponding to the first association relationship information may be that when the similarity exceeds the preset value, it can be considered that there is an association relationship between any two first vectors; when the similarity does not exceed the preset value In the case of the value, it can be considered that there is no association relationship between any two first vectors.
  • the first association relationship information can be reflected through the graph model.
  • node 1, node 2, and node 3 may correspond to first vector 1, first vector 2, and first vector 3, respectively.
  • There is an edge connected between node 1 and node 2 so there is an association relationship between the first vector 1 and the first vector 2; there is an edge connected between node 2 and node 3, so the first vector 2 and the first vector
  • the first association relationship information includes a second association relationship matrix
  • a vector in the first dimension in the second association relationship matrix includes a plurality of elements corresponding to the plurality of first vectors one-to-one
  • the vector in the second dimension in the second correlation matrix includes multiple elements corresponding to the multiple first vectors one-to-one, wherein any element in the second correlation matrix is used to indicate any Whether the vector corresponding to the element in the first dimension and the vector corresponding to any element in the second dimension has an association relationship that satisfies the a priori hypothesis.
  • Q is a matrix of l ⁇ l
  • the i-th column corresponds to the first vector i
  • the j-th row corresponds to the first vector j
  • the elements q i,j in the i-th column and j-th row represent the first vector i and the first vector Whether there is an association relationship between j that satisfies the prior hypothesis.
  • the value of the element q i,j in the i-th column and the j-th row can be 1, and there is no association relationship between the first vector i and the first vector j
  • the value of element q i,j in the i-th column and j-th row can be 0.
  • the value of the element q i,j in the i-th column and the j-th row can be 0, and there is no relationship between the first vector i and the first vector j
  • the value of element q i, j in the i-th column and j-th row can be 1.
  • the matrix Q T obtained after the matrix Q is converted to rank is the same as the matrix Q.
  • q i,j q j,i .
  • the association relationship between the first vector i and the first vector j may be non-directional.
  • the matrix Q T obtained after the matrix Q is converted to rank is different from the matrix Q.
  • q i,j ⁇ q j,i .
  • the association relationship between the first vector i and the first vector j is directional.
  • q i,j indicates that there is an association relationship between the first vector i and the first vector j from the first vector i to the first vector j
  • q j,i indicates that there is a relationship between the first vector i and the first vector j
  • the first vector j points to the association relationship of the first vector i.
  • q i,j indicates that there is an association relationship between the first vector i and the first vector j from the first vector j to the first vector i
  • q j,i indicates that there is a relationship between the first vector i and the first vector j
  • the first vector i points to the association relationship of the first vector j.
  • the second correlation matrix can be compressed to obtain a matrix with a smaller dimension.
  • the second correlation matrix Q is an l ⁇ l matrix
  • the values of all elements on the second correlation matrix Q whose diagonal elements are separated from the second correlation matrix Q by more than l'elements All 0 or all 1, l' ⁇ l then can be divided into several small matrices, the maximum number of rows of the small matrix is l', and the maximum number of columns of the small matrix is l'.
  • This process can also be referred to as the sparseness of the second correlation matrix Q.
  • the second correlation matrix Q can be compressed according to the spectral clustering method.
  • the a priori hypothesis can indicate a forward association relationship or a reverse association relationship.
  • the shorter the picture frame interval the more relevant the content in the picture. Therefore, when the a priori hypothesis indicates that there is an association relationship between picture frames within 8s, it can be understood that the a priori hypothesis indicates a kind of Forward association relationship; when the a priori hypothesis indicates that there is an association relationship between picture frames other than 8s, it can be understood that the a priori hypothesis indicates a reverse association relationship.
  • the output result of the first neural network model and the correlation relationship within the output result are input into the second neural network model.
  • Inputting multiple first vectors into the second neural network model can be understood as inputting the characteristic representations of multiple data to be processed into the second neural network model.
  • Inputting the first association relationship information into the second neural network model can be understood as inputting the information about whether there is an influence between any two of the first vectors in the second neural network model.
  • the multiple first vectors can be understood as nodes in the graph model, and the first association relationship information can be used to indicate whether there are edges between nodes. Therefore, the second neural network model may be a graph neural network model.
  • the second neural network model processes multiple first vectors and the first association relationship information, which can be based on the weight parameters in the second neural network model to determine whether any two first vectors have an impact and the specific impact What is the degree to obtain the processing result of the first data to be processed.
  • the processing result of the first to-be-processed data may be a characteristic representation of the first to-be-processed data, or may be a recognition result of the first to-be-processed data.
  • the processing result of the first data to be processed may be a vector.
  • multiple first vectors are l first vectors, which are represented by x 1 ,..., x l respectively .
  • the first association relationship information is the second association relationship matrix Q mentioned above.
  • h weight matrices W 1 , W 2 , ..., W h to be trained.
  • the dimensions of W 1 , W 2 , ..., W h are all s*s h .
  • Means W 1, W 2, ..., W h contains s * s h a weight parameter.
  • s h s/h, where h is used to represent the number of heads of the graph attention neural network (the number of heads can also be called the number of slices). s h commonly known as single head dimensions.
  • U 1 X ⁇ W 1
  • U 2 X ⁇ W 2
  • the dimensions of U 1 , U 2 , ..., U h are all l*s h at this time .
  • V i,j U i ⁇ U j T , i ⁇ j, 1 ⁇ i ⁇ h, and 1 ⁇ j ⁇ h.
  • the dimensions of Vi,j are already l*l.
  • R i,j is still an l*l matrix, this matrix can be understood as the mutual attention intensity matrix between each point.
  • E i,j and Q are multiplied by matrix elements to obtain E i,j after the Q relation mask.
  • E i,j can be understood as filtering out related points according to the edge relationship, keeping the attention between them, and not keeping the attention of irrelevant points.
  • This matrix contains a large number of interrelated information of nodes, so the information content is relatively rich.
  • E i,j ⁇ U i to get the final expression U inew after each point is updated by other point information.
  • the dimension of U inew is l*s h .
  • the above process is the data processing process of a layer network. If the depth of the graph attention neural network model is h', that is, the h'layer network is included, then the X'output by the current layer can be input to the next layer of the network, that is, the X'output by the current layer is regarded as the X of the next layer of network , Carry out the same or similar data processing process as above.
  • X' has the same matrix size, but each element in X'contains information about one or more elements in X.
  • the second neural network model can obtain more information when recognizing a certain feature, and improve the recognition accuracy. Perform matrix operations on the matrix X'and the weight parameter matrix to obtain the processing result of the first data to be processed.
  • the plurality of to-be-processed data includes first to-be-processed data
  • the first to-be-processed data may be the target data mentioned above
  • the plurality of to-be-processed data further includes one associated with the first to-be-processed data.
  • the second neural network model may combine the impact of the associated data on the first data to be processed according to the first association relationship information, so as to obtain a processing result corresponding to the first data to be processed.
  • the second neural network model not only performs feature extraction on the first to-be-processed data, but also performs feature extraction on other to-be-processed data related to the first to-be-processed data, thus expanding the data input in the prediction process Quantities help improve the accuracy of recognition.
  • the plurality of to-be-processed data includes first to-be-processed data
  • the first to-be-processed data may correspond to a target vector
  • the plurality of first vectors may further include one or more associated vectors associated with the target vector
  • the plurality of data to be processed includes the data to be processed in a one-to-one correspondence with the one or more associated vectors.
  • the second neural network model may combine the influence of the correlation vector on the target vector according to the first correlation information, so as to obtain the processing result corresponding to the first data to be processed.
  • the second neural network model not only performs feature extraction on the target vector, but also performs feature extraction on the associated vector that has an associated relationship with the target vector, thus expanding the amount of data processing in the prediction process and helping to improve recognition accuracy. rate.
  • the second neural network model may output multiple processing results corresponding to the multiple data to be processed in a one-to-one correspondence. That is, the second neural network model synthesizes the multiple first vectors and the association relationship between each first vector, and outputs multiple processing results corresponding to the multiple data to be processed one-to-one.
  • the closeness of the correlation between the two correlations can be The same can be different. For example, two sentences in the same paragraph that are far apart are closely related to each other, and two sentences that are closer to each other in the same paragraph are closely related to each other. For another example, two frames with a longer interval have a lower degree of correlation, and two frames with a shorter interval have a higher degree of correlation. In order to express the degree of closeness of the two associations, there can be multiple expressions.
  • the first association relationship information is a matrix, and the numerical value of the elements in the matrix is used to indicate the closeness of the association relationship. The larger the value, the tighter the association relationship. However, determining the specific size of the value often introduces redundant artificial settings, or will increase the difficulty of training the neural network model.
  • the second association relationship information when there are two types of first vector groups in the first association relationship information that have a close association relationship and a distant association relationship, the second association relationship information can be established, and the second association relationship information is used to indicate the association relationship. Tight first vector group. That is to say, the degree of influence between the two first vectors that are closely related can be enhanced through the second related relationship information.
  • the first association relationship information is used to indicate N of the first vector groups, where N is an integer greater than 1, and when the plurality of first vectors and the first association relationship information are input
  • the second neural network model before obtaining the processing result for the first to-be-processed data, the method further includes: obtaining second association relationship information, where the second association relationship information is used to indicate n second vector groups, The n second vector groups belong to the N first vector groups, n is less than N, and n is a positive integer; said inputting the plurality of first vectors and the first association relationship information into the second neural network model ,
  • Obtaining a processing result for the first to-be-processed data includes: inputting the plurality of first vectors, the first association relationship information, and the second association relationship information into the second neural network model to obtain The processing result of the first data to be processed.
  • the information indicated in the second association relationship information is included in the first association relationship information. That is to say, there must be an association relationship between the two first vectors in each second vector group that satisfies the a priori hypothesis.
  • the first association relationship information can reflect the association relationship between multiple data to be processed, and the second association relationship information can reflect the multiple data to be processed. Whether there is a close relationship between the processed data.
  • the first association relationship information can indicate that there is an association between different sentences in the same paragraph
  • the second association relationship information can indicate the same paragraph. There are close associations between adjacent sentences within.
  • the first association information can indicate that there is an association between two frames with an interval less than 8s
  • the second association information can indicate There is a close correlation between two frames with an interval of less than 2s.
  • the first association information can indicate that there is an association between two videos with a minimum interval of less than 8s
  • the second association information It can indicate that there is a close correlation between two videos whose minimum interval is less than 2s.
  • the first association information can indicate that there is an association between two pieces of audio with a minimum interval of less than 8s
  • the second association information It can indicate that there is a close correlation between two audio segments with a minimum interval of less than 2s.
  • the first association relationship information can reflect the similarity between the multiple first vectors
  • the second association relationship information can reflect the multiple first vectors. Two first vectors with higher similarity.
  • the first association relationship information may indicate that there is an association between two first vectors whose similarity exceeds the preset value 1.
  • the second association relationship information may indicate that there is an association between two first vectors whose similarity exceeds the preset value 2, and the preset value 2 is greater than the preset value 1.
  • the second association relationship information may include a matrix for representing n second vector groups.
  • first neural network model and the second neural network model may be two sub-models in one neural network model.
  • the method of training the second neural network model and obtaining the weight parameters of the second neural network model will be described in detail below with reference to FIG. 7.
  • the method 600 may be performed by the training device 120 as shown in FIG. 3.
  • the data to be trained can be understood as the data that will be input to the neural network model and used to train the neural network model. Some or all of the multiple data to be trained have labels.
  • the neural network model can process the training data to obtain the data processing result. By calculating the distance between the label and the data processing result, the weight parameter of the neural network model can be modified.
  • the distance between the data processing result and the label can be understood as the degree of similarity between the data processing result and the label.
  • the specific calculation method of the information distance can be cross entropy, KL divergence, JS divergence, etc.
  • the data to be trained can be text data, image data, video data, audio data, etc., such as a text file, a paragraph of text in a text file, a picture file, an image block in a picture file, a frame in a video file, A video file, a video in a video file, an audio file, and an audio in an audio file.
  • Multiple data to be trained can be multiple text files, multiple texts in a text file, multiple picture files, multiple image blocks in a picture file, multiple frames in a video file, multiple video files, Multiple pieces of video in one video file, multiple audio files, multiple pieces of audio in one audio file, etc. This application does not limit the type of training data.
  • the multiple data to be trained are stored in the database, so the device executing the method 600 can directly retrieve the multiple data to be trained from the database.
  • the multiple data to be trained can be obtained by using a camera shooting method.
  • the cloud device stores the plurality of data to be trained, so the device executing the method 600 can receive the plurality of data to be trained sent by the cloud device through the communication network.
  • the plurality of data to be trained may be general data.
  • the third association relationship information is used to indicate the association relationship between the data. It is assumed that the third vector group indicated by the third association relationship information includes (fourth vector 1, fourth vector 2), and there is an association relationship between the fourth vector 1 and the fourth vector 2.
  • the fourth vector 1 and the third association relationship information are input into the second neural network model to obtain the first processing result 1. Therefore, at least the influence and contribution of the training data 2 to the training data 1 can be obtained.
  • the first neural network model input multiple data to be trained into the first neural network model, and use the first neural network model to perform, for example, feature screening (filter out useful features) and feature fusion (combine multiple features) on multiple data to be trained. Waiting for processing operations, and output a plurality of fourth vectors corresponding to the plurality of data to be trained one-to-one.
  • the multiple data to be trained can be input from the input layer, and pass through hidden layers such as convolutional layer and/or pooling layer. Data processing is performed, and a plurality of fourth vectors corresponding to the plurality of data to be trained are outputted from the output layer of the first neural network model.
  • the fourth vector can be a number or a vector containing multiple numbers.
  • the first neural network model is a neural network model to be trained.
  • the first neural network model can perform data processing operations such as feature screening and feature fusion on the multiple data to be trained to obtain feature vectors. Perform a matrix operation on the feature vector and the weight matrix containing the weight parameter to obtain a plurality of fourth vectors corresponding to the plurality of data to be trained.
  • the multiple fourth vectors are used to modify the weight parameters of the first neural network model. For example, the distance between the fourth vector and the tags of the multiple data to be trained can be calculated, and the loss function can be combined to modify the weight parameter of the first neural network model. Weight parameter.
  • the first neural network model is a trained neural network model.
  • the first neural network model can be trained using general data.
  • the so-called general data can be data that is not affected by the scene, or data that has low dependence on the scene.
  • the first neural network model is used to identify character features in images, and its training data set can include various possible scenes, such as street scenes, conference scenes, car scenes, rural scenes, Asian scenes, African scenes, European and American scenes Wait.
  • the plurality of data to be trained may be data applied in a specific scene. That is to say, the first neural network model that can handle general data is migrated to a special scene, and the second neural network model that can handle the special scene is obtained through the method of neural network model training.
  • the process of training the first neural network model can be to input general data into the first neural network model, and the first neural network model can perform data processing operations such as feature screening and feature fusion on the general data to obtain feature vectors. Perform matrix operations on the feature vector and the weight matrix containing the weight parameters to obtain the data training result corresponding to the general data. Then, the distance between the data training result and the label of the general data is calculated, and the weight parameter of the first neural network model is corrected.
  • the distance between the data training result and the label of the general data can be understood as the degree of similarity between the data training result and the label of the general data.
  • the specific calculation method of the information distance can be cross entropy, KL divergence, JS divergence, etc.
  • the recognition result of general data 1 is: the confidence that general data 1 belongs to feature 1 is 0.7, and the confidence that general data 1 belongs to feature 2 is 0.3.
  • the label of general data 1 is: label 1, which corresponds to feature 1.
  • the recognition result of general data 1 can be represented by (0.7, 0.3), and the label of general data 1 can be represented by (1, 0).
  • the distance between the data training result and the label of the general data may be the distance between the vector (0.7, 0.3) and the vector (1, 0).
  • the label of the general data may be a vector with the same dimension as the intermediate calculation result. Through vector calculation, the distance between the data training result and the label of the general data can be obtained.
  • the type of the first neural network model may be a convolutional neural network model, a graph neural network model, a graph convolutional neural network model, a graph attention neural network model, and so on. This application does not limit the type of the first neural network model.
  • the first neural network model may be a traditional convolutional neural network model.
  • the output layer of the traditional convolutional neural network is a fully connected layer, which is sometimes called a classifier.
  • the traditional convolutional neural network model can input the recognition result of the data to be trained into the loss function through the fully connected layer. For example, if the data to be trained is an image, the fully connected layer of the traditional convolutional neural network model can directly output the recognition results of whether there is a person in the image, and the person is male or female.
  • the recognition result can often only represent the probability that the data to be trained belongs to a certain feature.
  • the first neural network model may also be a special convolutional neural network model that does not include a fully connected layer, and the calculation result of the convolutional layer or the pooling layer may be input to the loss function.
  • the first neural network model can input the processing result that belongs to the intermediate calculation result in the traditional convolutional neural network model into the loss function.
  • the processing result of the input loss function of this special convolutional neural network model is called the intermediate calculation result.
  • the intermediate calculation result can be used to characterize part or all of the information of the data to be trained. In other words, the intermediate calculation result usually contains more information content than the recognition result.
  • the first neural network model may be a graph neural network model.
  • the using the first neural network model to process the plurality of data to be trained includes: using the first neural network model to process the plurality of data to be trained and the sixth association relationship information,
  • the sixth association relationship information is used to indicate at least one to-be-trained data group, and each to-be-trained data group includes two to-be-trained data that satisfy a priori hypothesis.
  • the to-be-trained data group contains two pieces of to-be-trained data that have an association relationship. That is, there is an association relationship between the two to-be-trained data in the to-be-trained data group that satisfies a priori hypothesis. For example, if the data group to be trained is (data to be trained 1, data to be trained 2), then there is an association relationship between the data to be trained 1 and the data to be trained 2 that satisfies the prior hypothesis.
  • the plurality of data to be trained and the sixth association relationship information reflecting the association relationship between the plurality of data to be trained are input into the first neural network model, and the first neural network model can determine the data according to the sixth association relationship information Whether there is an influence between the data and the data, and the weight parameter in the first neural network model reflects the degree of influence between the data and the data, so as to obtain a plurality of first vectors that can reflect the relevance of the data.
  • the multiple to-be-trained data correspond one-to-one.
  • the multiple pieces of data to be trained may be multiple paragraphs of text, and one piece of text may include multiple sentences.
  • different paragraphs of text express different topics. Therefore, multiple sentences in a paragraph are more related, and multiple sentences belonging to different paragraphs are weak or non-relevant. Then there can be a priori hypothesis, such as a correlation between multiple sentences belonging to the same paragraph.
  • the multiple data to be trained may be multiple frames of pictures. Normally, as time goes by, the longer the interval between two frames, the smaller the correlation between the two frames; the shorter the interval between the two frames, the greater the correlation between the two frames Big. Then there can be a priori assumption, such as a correlation between two frames whose interval is less than a preset threshold.
  • the preset threshold may be 8s, for example.
  • the multiple pieces of to-be-trained data may be multiple pieces of videos.
  • the longer the interval between the two pieces of video the smaller the correlation between the two pieces of videos;
  • the shorter the video interval the greater the correlation between the two videos.
  • the preset threshold may be 8s, for example.
  • the multiple pieces of to-be-trained data may be multiple pieces of audio, where as time moves, the longer the interval between the two pieces of audio, the smaller the correlation between the two pieces of audio; The shorter the audio interval, the greater the correlation between the two audio segments. Then there may be a priori assumption, such as a correlation between two audio segments whose minimum interval duration is less than a preset threshold.
  • the preset threshold may be 8s, for example.
  • the sixth association relationship information may be a matrix. Compared with other information types, matrix operations are more convenient.
  • the sixth association relationship information includes a third association relationship matrix, and a vector located in the first dimension in the third association relationship matrix includes a plurality of elements corresponding to the plurality of data to be trained one-to-one, so The vector in the second dimension in the third correlation matrix includes multiple elements corresponding to the plurality of data to be trained one-to-one, wherein any element in the third correlation matrix is used to indicate any Whether the vector corresponding to the element in the first dimension and the vector corresponding to any element in the second dimension has an association relationship that satisfies the a priori hypothesis.
  • A is a k ⁇ k matrix
  • the i-th column corresponds to the first vector i
  • the j-th row corresponds to the first vector j
  • the elements a i,j in the i-th column and j-th row represent the first vector i and the first vector Whether there is an association relationship between j that satisfies the prior hypothesis.
  • the value of the elements a i, j in the i-th column and the j-th row can be 1, and there is no association relationship between the first vector i and the first vector j
  • the value of element a i, j in the i-th column and j-th row can be 0.
  • the value of the elements a i, j in the i-th column and the j-th row can be 0, and there is no relationship between the first vector i and the first vector j
  • the value of the elements a i and j in the i-th column and j-th row can be 1.
  • the matrix A T obtained after the matrix A is converted to the rank is the same as the matrix A.
  • a i,j a j,i .
  • the association relationship between the first vector i and the first vector j may be non-directional.
  • the matrix A T obtained after the matrix A is converted to rank is different from the matrix A.
  • a i,j ⁇ a j,i .
  • the association relationship between the first vector i and the first vector j is directional.
  • a i,j indicates that there is an association relationship between the first vector i and the first vector j
  • a j,i indicates that there is a relationship between the first vector i and the first vector j.
  • the first vector j points to the association relationship of the first vector i.
  • a i,j indicates that there is an association relationship between the first vector i and the first vector j from the first vector j to the first vector i
  • a j,i indicates that there is a relationship between the first vector i and the first vector j
  • the first vector i points to the association relationship of the first vector j.
  • the third association relationship information reflects whether there is an association relationship between the multiple fourth vectors.
  • the third vector group contains two fourth vectors that have an association relationship. That is, there is an association relationship between the two fourth vectors in the third vector group that satisfies the a priori assumption. For example, if the third vector group indicates (fourth vector 1, fourth vector 2), then there is an association relationship between the fourth vector 1 and the fourth vector 2 that satisfies the a priori assumption.
  • the third association relationship information reflects whether there is an influence between the multiple fourth vectors, so that the data processing result that can reflect the data association can be obtained according to the third association relationship information. It should be understood that the fourth vector may have an association relationship with itself.
  • the third association relationship information may be determined according to the association relationship between the multiple to-be-trained data. That is to say, the third correlation information is the same or substantially the same as the sixth correlation information above.
  • the third association relationship information is different from the above sixth association relationship information. For example, according to the similarity between any two fourth vectors in the multiple fourth vectors, it can be determined whether there is an association relationship between any two fourth vectors. The greater the similarity, the greater the association; the smaller the similarity, the smaller the association. Then the a priori hypothesis corresponding to the third association relationship information can be that when the similarity exceeds the preset value, it can be considered that there is an association relationship between any two fourth vectors; when the similarity does not exceed the preset value In the case of the value, it can be considered that there is no correlation between any two fourth vectors.
  • node 1, node 2, and node 3 may correspond to fourth vector 1, fourth vector 2, and fourth vector 3, respectively.
  • the third association relationship information includes a fourth association relationship matrix, and a vector located in the first dimension in the fourth association relationship matrix includes a plurality of elements corresponding to the plurality of fourth vectors one-to-one, so
  • the vector in the second dimension in the fourth correlation matrix includes multiple elements corresponding to the multiple fourth vectors one-to-one, wherein any element in the fourth correlation matrix is used to indicate any Whether the vector corresponding to the element in the first dimension and the vector corresponding to any element in the second dimension has an association relationship that satisfies the a priori hypothesis.
  • B is a matrix of l ⁇ l
  • the i-th column corresponds to the fourth vector i
  • the j-th row corresponds to the fourth vector j
  • the elements bi ,j in the i-th column and j-th row represent the fourth vector i and the fourth vector Whether there is an association relationship between j that satisfies the prior hypothesis.
  • the value of the element b i,j in the i-th column and the j-th row can be 1
  • there is no association relationship between the fourth vector i and the fourth vector j
  • the value of element bi ,j in the i-th column and j-th row can be 0.
  • the value of the element b i,j in the i-th column and the j-th row can be 0, and there is no relationship between the fourth vector i and the fourth vector j
  • the value of element bi , j in the i-th column and j-th row can be 1.
  • the matrix B T obtained after the matrix B is transformed into the rank is the same as the matrix B.
  • b i,j b j,i .
  • the association relationship between the fourth vector i and the fourth vector j may be non-directional.
  • the matrix B T obtained after the matrix B is converted to rank is different from the matrix B.
  • b i,j ⁇ b j,i .
  • the correlation between the fourth vector i and the fourth vector j is directional.
  • b i,j indicates that there is an association relationship between the fourth vector i and the fourth vector j between the fourth vector i and the fourth vector j
  • b j,i indicates that there is a relationship between the fourth vector i and the fourth vector j.
  • the fourth vector j points to the association relationship of the fourth vector i.
  • b i, j indicates that there is an association relationship between the fourth vector i and the fourth vector j from the fourth vector j to the fourth vector i
  • b j, i indicates that there is a relationship between the fourth vector i and the fourth vector j
  • the fourth vector i points to the association relationship of the fourth vector j.
  • the fourth correlation matrix can be compressed to obtain a matrix with a smaller dimension.
  • the fourth associative relationship matrix B is an l ⁇ l matrix
  • the values of all elements on the fourth associative relationship matrix B and the fourth associative relationship matrix B whose diagonal elements are separated by more than l'elements All 0 or all 1, l' ⁇ l then can be divided into several small matrices, the maximum number of rows of the small matrix is l', and the maximum number of columns of the small matrix is l'.
  • This process can also be referred to as the sparseness of the fourth correlation matrix B.
  • the fourth correlation matrix B can be compressed according to the spectral clustering method.
  • the a priori hypothesis can indicate a forward association relationship or a reverse association relationship.
  • the shorter the picture frame interval the more relevant the content in the picture. Therefore, when the a priori hypothesis indicates that there is an association relationship between picture frames within 8s, it can be understood that the a priori hypothesis indicates a kind of Forward association relationship; when the a priori hypothesis indicates that there is an association relationship between picture frames other than 8s, it can be understood that the a priori hypothesis indicates a reverse association relationship.
  • the output result of the first neural network model and the correlation relationship within the output result are input into the second neural network model.
  • Inputting multiple fourth vectors into the second neural network model can be understood as inputting multiple feature representations of the data to be trained into the second neural network model.
  • Inputting the third association relationship information into the second neural network model can be understood as inputting information about whether there is an influence between any two fourth vectors among the plurality of fourth vectors into the second neural network model.
  • Multiple fourth vectors can be understood as nodes in the graph model, and the third association relationship information can be used to indicate whether there are edges between nodes. Therefore, the second neural network model may be a graph neural network model.
  • the second neural network model processes multiple fourth vectors and the third association relationship information, which can be based on the weight parameters in the second neural network model to determine whether any two fourth vectors have influence and the specific influence What is the degree to obtain the processing result of the first data to be trained.
  • the processing result of the first to-be-trained data may be a feature representation of the first to-be-trained data, or may be a recognition result of the first to-be-trained data.
  • the processing result of the first data to be trained may be a vector.
  • multiple fourth vectors are l fourth vectors, which are represented by y 1 ,..., y l respectively .
  • the third correlation information is the fourth correlation matrix Q mentioned above.
  • W 1 , W 2 , ..., W h The dimensions of W 1 , W 2 , ..., W h are all s*s h .
  • Means W 1, W 2, ..., W h contains s * s h a weight parameter.
  • s h s/h, where h is used to represent the number of heads of the graph attention neural network (the number of heads can also be called the number of slices). s h commonly known as single head dimensions.
  • U 1 Y ⁇ W 1
  • U 2 Y ⁇ W 2
  • the dimensions of U 1 , U 2 , ..., U h are all l*s h at this time .
  • V i,j U i ⁇ U j T , i ⁇ j, 1 ⁇ i ⁇ h, and 1 ⁇ j ⁇ h.
  • the dimensions of Vi,j are already l*l.
  • R i,j is still an l*l matrix, this matrix can be understood as the mutual attention intensity matrix between each point.
  • E i,j and Q are multiplied by matrix elements to obtain E i,j after the Q relation mask.
  • E i,j can be understood as filtering out related points according to the edge relationship, keeping the attention between them, and not keeping the attention of irrelevant points.
  • This matrix contains a large number of interrelated information of nodes, so the information content is relatively rich.
  • E i,j ⁇ U i to get the final expression U inew after each point is updated by other point information.
  • the dimension of U inew is l*s h .
  • the above process is the data processing process of a one-layer network. If the depth of the graph attention neural network model is h', which includes the h'-layer network, the Y'output from the current layer can be input to the next layer of the network, that is, the current layer output The Y'is regarded as the Y of the next layer of network, and the data processing process is the same or similar to the above.
  • Y' has the same matrix size, but each element in Y'contains information about one or more elements in Y.
  • the second neural network model can obtain more information when recognizing a certain feature, and improve the recognition accuracy.
  • the plurality of data to be trained includes first data to be trained, the plurality of data to be trained further includes one or more associated data associated with the first data to be trained, and the second neural network model may be based on the first Three association relationship information, combined with the influence of the associated data on the first data to be trained, so as to obtain the processing result corresponding to the first data to be trained.
  • the second neural network model in addition to feature extraction of the first data to be trained, the second neural network model also extracts features of other data to be trained that has an association relationship with the first data to be trained, thus expanding the data input in the prediction process Quantities help improve the accuracy of recognition.
  • the plurality of data to be trained includes first data to be trained, the first data to be trained may correspond to a target vector, and the plurality of fourth vectors further include one or more associated vectors associated with the target vector,
  • the plurality of data to be trained includes the data to be trained in a one-to-one correspondence with the one or more associated vectors.
  • the second neural network model can combine the influence of the correlation vector on the target vector according to the third correlation information, so as to obtain the processing result corresponding to the first data to be trained. In other words, the second neural network model not only performs feature extraction on the target vector, but also performs feature extraction on the associated vector that has an associated relationship with the target vector, thus expanding the amount of data processing in the prediction process and helping to improve recognition accuracy. rate.
  • the second neural network model may output multiple processing results corresponding to the multiple data to be trained in a one-to-one correspondence. That is, the second neural network model synthesizes multiple fourth vectors and the association relationship between each fourth vector, and outputs multiple processing results corresponding to the multiple data to be trained in a one-to-one correspondence.
  • the closeness of the association between the two associations can be The same can be different. For example, two sentences in the same paragraph that are far apart are closely related to each other, and two sentences that are closer to each other in the same paragraph are closely related to each other. For another example, two frames with a longer interval have a lower degree of correlation, and two frames with a shorter interval have a higher degree of correlation. In order to express the degree of closeness of the two associations, there can be multiple expressions.
  • the third association relationship information is a matrix, and the numerical value of the elements in the matrix is used to indicate the closeness of the association relationship. The larger the value, the tighter the association relationship. However, determining the specific size of the value often introduces redundant artificial settings, or will increase the difficulty of training the neural network model.
  • the fourth association relationship information when there are two types of fourth vector groups in the third association relationship information, which have a close association relationship and a distant association relationship, the fourth association relationship information can be established, and the fourth association relationship information is used to indicate the association relationship.
  • the tight fourth vector group In other words, the degree of influence between the two fourth vectors with a close relationship can be strengthened by the fourth relationship information.
  • the third association relationship information is used to indicate M fourth vector groups, where M is an integer greater than 1, and the plurality of fourth vectors and the third association relationship information are input into the before the second neural network model obtains the first processing result for the first data to be trained, the method further includes: obtaining fourth association relationship information, where the fourth association relationship information is used to indicate m fifth vector groups, The m fifth vector groups belong to the M fourth vector groups, m is less than M, and m is a positive integer; and the plurality of fourth vectors and the third association relationship information are input into the first A second neural network model to obtain the first processing result for the first data to be trained, including: inputting the plurality of fourth vectors, the third association relationship information, and the fourth association relationship information into the second neural network model , To obtain the first processing result.
  • the information indicated in the fourth association relationship information is included in the third association relationship information. In other words, there must be an association relationship between the two fourth vectors in each fourth vector group that satisfies the a priori hypothesis.
  • the third correlation information can reflect the correlation between multiple data to be trained, and the fourth correlation information can reflect multiple data to be trained. Whether there is a close relationship between the training data.
  • the third association relationship information can indicate that there is an association between different sentences in the same paragraph
  • the fourth association relationship information can indicate the same paragraph. There are close associations between adjacent sentences within.
  • the third association information can indicate that there is an association between two frames with an interval of less than 8s
  • the fourth association information can indicate There is a close correlation between two frames with an interval of less than 2s.
  • the third association information can indicate that there is an association between two videos with a minimum interval of less than 8s
  • the fourth association information It can indicate that there is a close correlation between two videos whose minimum interval is less than 2s.
  • the third association relationship information can indicate that there is an association between two pieces of audio with a minimum interval of less than 8s
  • the fourth association relationship information It can indicate that there is a close correlation between two audio segments with a minimum interval of less than 2s.
  • the third association relationship information can reflect the similarity between multiple fourth vectors
  • the fourth association relationship information can reflect the multiple fourth vectors. Two fourth vectors with higher similarity.
  • the third association relationship information may indicate that there is an association between two fourth vectors whose similarity exceeds the preset value 1.
  • the four-association relationship information may indicate that there is an association between two fourth vectors whose similarity exceeds the preset value 2, and the preset value 2 is greater than the preset value 1.
  • the fourth association relationship information may include a matrix for representing m fourth vector groups.
  • the weight parameter of the second neural network model can be corrected through the loss function.
  • the weight parameter of the second neural network model can be modified by using a loss function according to the distance between the label of the first data to be trained and the first processing result. For example, when the distance between the label of the first data to be trained and the first processing result is relatively close (that is, the degree of similarity is high), it means that the weight parameter is more appropriate, and the correction range of the weight parameter is smaller; When the distance between the label and the first processing result is far (that is, the similarity is low), it indicates that the weight parameter is not suitable, and the correction range of the weight parameter can be increased.
  • the plurality of fourth vectors and the third association relationship information are input into the second neural network model to obtain the first processing result for the first data to be trained and the data for the second data to be trained
  • the second processing result, the first data to be trained and the second data to be trained are any two data of the plurality of data to be trained, and the difference between the first processing result and the second processing result
  • the similarity of is used to modify the weight parameter of the second neural network model.
  • the similarity between the first processing result and the second processing result is similarity 1
  • the fourth vector corresponding to the first processing result is fourth vector 1
  • the fourth vector corresponding to the second processing result is The fourth vector 2
  • the similarity between the fourth vector 1 and the fourth vector 2 is similarity 2.
  • the weight parameter When the difference between similarity 1 and similarity 2 is small, the weight parameter is more appropriate, and the correction amplitude of the weight parameter is smaller; when the difference between similarity 1 and similarity 2 is small, the weight parameter is not suitable and can be increased The magnitude of the correction to the weight parameter.
  • the obtaining a first processing result for the first data to be trained includes: obtaining the first processing result and a second processing result for the second data to be trained, a label of the first data to be trained Is the first label, the label of the second data to be trained is the second label, and the first data to be trained and the second data to be trained are any two data of the plurality of data to be trained;
  • the method further includes: matching the similarity between the first label and the second label with the similarity between the first processing result and the second processing result to obtain a matching result, the The matching result is used to modify the weight parameter of the second neural network model.
  • the sixth association relationship information mentioned above may not include the similarity information between the first label and the second label, that is, the association relationship between the first data to be processed and the second data to be processed may be The similarity between the first label and the second label is irrelevant.
  • the sixth association relationship information mentioned above can associate multiple data that may have associations, and increase the amount of data processed by the second neural network model. The similarity between the first label and the second label is used to evaluate whether the first processing result and the second processing result are accurate.
  • the similarity between the first processing result and the second processing result should be low.
  • the first processing result is environmental governance and the second processing result is energy supply
  • the similarity between the first processing result and the second processing result is relatively high, indicating that the weight parameters of the second neural network model are inappropriate, and the loss function can be used Modify the weight parameters of the second neural network model.
  • the similarity between the first processing result and the second processing result should be high.
  • the similarity between the first processing result and the second processing result is low, indicating that the weight parameters of the second neural network model may not be appropriate.
  • the similarity between the first processing result and the second processing result should be low.
  • the first processing result is project investigation and the second processing result is road traffic, then the similarity between the first processing result and the second processing result is low, indicating that the weight parameters of the second neural network model may be appropriate, then The weight parameter of the second neural network model of the loss function has a small correction range.
  • the second label is also a bug sound, which means that the similarity between the first processing result and the second processing result should be high.
  • the first processing result is mosquitoes and the second processing result is flies
  • the similarity between the first processing result and the second processing result is high, indicating that the weight parameters of the second neural network model may be appropriate, so the loss
  • the correction amplitude of the weight parameter of the second neural network model of the function is small.
  • y i ' represents the processing result i for the data i to be trained
  • y j ' represents the processing result j for the data j to be trained
  • z i represents the label i of the data i to be trained
  • z j represents the label of the data j to be trained j.
  • Function C (y i ', y j ') represents the degree of similarity with the processing result of the processing result i j
  • the function C (z i, z j) represents the similarity of labels i and j labels.
  • the matrix D may be a matrix for amplifying the similarity between the processing result i and the processing result j.
  • the label of the data i to be trained can be represented by (1, 0, 0).
  • the label of the data i to be trained can be represented by (0, 1, 0).
  • the label of the data i to be trained can be represented by (1, 0, 1).
  • the label of the data i to be trained can be represented by (1, 1, 1).
  • the plurality of to-be-trained data includes one or more target type data, and each target type data has a label for modifying the weight parameter.
  • the plurality of data to be trained includes the first type of data and the second type of data.
  • the data to be trained belonging to the first type of data has a label
  • the data to be trained belonging to the second type of data does not have a label. Therefore, the weight parameter of the second neural network model can be corrected according to the distance between the processing result of the first type of data and the label of the first type of data.
  • the distance between the processing result of the first type of data and the label of the first type of data can be understood as the degree of similarity between the processing result of the first type of data and the label of the first type of data.
  • the specific calculation method of the information distance can be cross entropy, KL divergence, JS divergence, etc.
  • the second type of data does not have a label, but because there may be an association between the first type of data and the second type of data, the second type of data can be introduced in the process of obtaining the processing result of the first type of data.
  • the second neural network model may be a semi-supervised model, that is, the plurality of data to be trained may include data without labels.
  • the proportion of the first type of data in the multiple data to be trained is generally not less than 5%-10%.
  • the first processing result is also used to modify the weight parameter of the first neural network model.
  • the first processing result can be used to modify the weight parameter of the second neural network model as well as the weight parameter of the first neural network model.
  • the first processing result and the label of the first data to be trained may be input to the loss function of the first neural network model to modify the weight parameter of the first neural network model.
  • the first neural network model Before inputting a plurality of data to be trained into the first neural network model, the first neural network model may be a neural network model that is not restricted by the scene or is less restricted by the scene.
  • the plurality of to-be-trained data may be data of a certain specific scene. Therefore, the weight parameter of the first neural network model may be modified according to the first processing result, so that the first neural network model can adapt to the special scene.
  • first neural network model and the second neural network model may be two sub-models in one neural network model.
  • the following specific examples introduce the effects that the first neural network model and the second neural network model can achieve in training and prediction.
  • the first neural network model may be a multiple granularity network (multiple granularity network, MGN) model.
  • MGN multiple granularity network
  • the multi-granularity network model is a convolutional neural network model.
  • Each fourth vector may include 1024 elements, and each fourth vector is a feature representation of a picture.
  • the a priori hypothesis can be one or more of the following, for example:
  • the third association relationship information used to indicate the association relationship between 90,000 fourth vectors can be determined.
  • the 90,000 fourth vectors and the third association relationship information are input into the second neural network model to obtain the processing result for the first type of data.
  • the processing result of the first type of data takes into account the content of the second type of data.
  • Matching the processing result of the first type of data with the label of the first type of data can modify the parameters of the second neural network model.
  • the data in the verification data set into the first neural network model is input to obtain multiple fourth vectors for the verification data set; then input multiple fourth vectors for the verification data set into the second neural network model, and according to a priori assumptions , Input the association relationship between the multiple fourth vectors for the verification data set to the second neural network model to obtain the data processing result for the verification data set.
  • the data processing result is matched with the label of the verification data set to obtain the recognition ability of the first neural network model and the second neural network model.
  • the mean average precision (mAP) is used to score the trained neural network model. Compared with the traditional neural network model, the scoring result can be improved by 4-20 points.
  • the method for training a neural network model provided in this application can enhance the neural network model.
  • 8,000 Chinese text questions are input into the first neural network model as multiple data to be trained, and each Chinese text question can be one data to be trained.
  • the remaining 7,000 Chinese text questions can be used as verification data to verify whether the weight parameters of the second neural network model are appropriate.
  • the 8,000 Chinese text questions constitute a training data set
  • the 7,000 Chinese text questions constitute a verification data set.
  • the first neural network model is used to process the training data set, and 8 million fourth vectors corresponding to the training data set are obtained.
  • the first neural network model may be a bidirectional encoder representations from transformer (BERT) model based on a transformer.
  • the BERT model can be a convolutional neural network model.
  • Each fourth vector may include 768 elements, and each fourth vector is a feature representation of a Chinese text question.
  • the a priori hypothesis can be one or more of the following, for example:
  • the third association relationship information used to indicate the association relationship between 8 million fourth vectors can be determined.
  • the 8,000 fourth vectors and the third association relationship information are input into the second neural network model to obtain the processing result for the first type of data.
  • the processing result of the first type of data takes into account the content of the second type of data.
  • Matching the processing result of the first type of data with the label of the first type of data can modify the parameters of the second neural network model.
  • the data processing result is matched with the label of the verification data set to obtain the recognition ability of the first neural network model and the second neural network model.
  • mAP mean average precision
  • FIG. 8 is a schematic diagram of the hardware structure of a data processing device provided by an embodiment of the present application.
  • the data processing device 700 shown in FIG. 8 (the device 700 may specifically be a computer device) includes a memory 701, a processor 702, a communication interface 703, and a bus 704.
  • the memory 701, the processor 702, and the communication interface 703 realize the communication connection between each other through the bus 704.
  • the memory 701 may be a read only memory (ROM), a static storage device, a dynamic storage device, or a random access memory (RAM).
  • the memory 701 may store a program.
  • the processor 702 is configured to execute each step of the data processing method shown in FIG. 6 in the embodiment of the present application.
  • the processor 702 is further configured to execute each step of the method for training a neural network model shown in FIG. 7 in the embodiment of the present application.
  • the processor 702 may adopt a general central processing unit (CPU), a microprocessor, an application specific integrated circuit (ASIC), a graphics processing unit (GPU), or one or more The integrated circuit is used to execute related programs to implement the data processing method shown in FIG. 6 in the embodiment of the present application.
  • the processor 702 may adopt a general-purpose central processing unit (central processing unit, CPU), microprocessor, application specific integrated circuit (application specific integrated circuit, ASIC), graphics processing unit (GPU), or One or more integrated circuits are used to execute related programs to implement the method for training a neural network model shown in FIG. 7 in an embodiment of the present application.
  • the processor 702 may also be an integrated circuit chip with signal processing capability.
  • each step of the data processing method shown in FIG. 6 in the embodiment of the present application may be completed by an integrated logic circuit of hardware in the processor 702 or instructions in the form of software.
  • each step of the method for training a neural network model shown in FIG. 7 in the embodiment of the present application may be completed by an integrated logic circuit of hardware in the processor 702 or instructions in the form of software.
  • the aforementioned processor 702 may also be a general-purpose processor, a digital signal processing (digital signal processing, DSP), an application specific integrated circuit (ASIC), a ready-made programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, Discrete gates or transistor logic devices, discrete hardware components.
  • DSP digital signal processing
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • the methods, steps, and logical block diagrams disclosed in the embodiments of the present application can be implemented or executed.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
  • the storage medium is located in the memory 701, and the processor 702 reads the information in the memory 701, and combines its hardware to complete the functions required by the units included in the data processing device of the embodiment of the present application, or perform the functions shown in FIG. 6 in the embodiment of the present application.
  • the method of data processing shown shown.
  • it is also used to execute the method for training a neural network model shown in FIG. 7 in the embodiment of the present application
  • the communication interface 703 uses a transceiver device such as but not limited to a transceiver to implement communication between the device 700 and other devices or communication networks.
  • a transceiver device such as but not limited to a transceiver to implement communication between the device 700 and other devices or communication networks.
  • the information of the neural network to be constructed and the data to be processed can be obtained through the communication interface 703.
  • the information of the neural network to be constructed and the data to be trained (the data to be trained in the embodiment shown in FIG. 7) can be obtained through the communication interface 703.
  • the bus 704 may include a path for transferring information between various components of the device 700 (for example, the memory 701, the processor 702, and the communication interface 703).
  • the acquisition module in the data processing device may be equivalent to the communication interface 703 in the data processing device 700; the processing module in the data processing device may be equivalent to the processor 702.
  • Fig. 9 is a schematic diagram of the hardware structure of a device for training a neural network model provided by an embodiment of the present application.
  • the device 800 for training a neural network model shown in FIG. 9 (the device 800 may specifically be a computer device) includes a memory 801, a processor 802, a communication interface 803, and a bus 804. Among them, the memory 801, the processor 802, and the communication interface 803 realize the communication connection between each other through the bus 804.
  • the memory 801 may be a read only memory (ROM), a static storage device, a dynamic storage device, or a random access memory (RAM).
  • the memory 801 may store a program. When the program stored in the memory 801 is executed by the processor 802, the processor 802 is configured to execute each step of the method for training a neural network model shown in FIG. 7 in the embodiment of the present application.
  • the processor 802 may adopt a general-purpose central processing unit (central processing unit, CPU), microprocessor, application specific integrated circuit (application specific integrated circuit, ASIC), graphics processing unit (graphics processing unit, GPU), or one or more
  • the integrated circuit is used to execute related programs to implement the method for training a neural network model shown in FIG. 7 in the embodiment of the present application.
  • the processor 802 may also be an integrated circuit chip with signal processing capability.
  • each step of the method for training a neural network model shown in FIG. 7 in the embodiment of the present application can be completed by an integrated logic circuit of hardware in the processor 802 or instructions in the form of software.
  • the aforementioned processor 802 may also be a general-purpose processor, a digital signal processing (digital signal processing, DSP), an application specific integrated circuit (ASIC), an off-the-shelf programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, Discrete gates or transistor logic devices, discrete hardware components.
  • DSP digital signal processing
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
  • the storage medium is located in the memory 801, and the processor 802 reads the information in the memory 801, and combines its hardware to complete the functions required by the units included in the neural network model training device of the embodiment of the present application, or execute the figure in the embodiment of the present application. 7 shows the method of training the neural network model.
  • the communication interface 803 uses a transceiver device such as but not limited to a transceiver to implement communication between the device 800 and other devices or a communication network.
  • a transceiver device such as but not limited to a transceiver to implement communication between the device 800 and other devices or a communication network.
  • the information of the neural network to be constructed and the training data required in the process of constructing the neural network can be obtained through the communication interface 803 (the data to be trained in the embodiment shown in FIG. 7).
  • the bus 804 may include a path for transferring information between various components of the device 800 (for example, the memory 801, the processor 802, and the communication interface 803).
  • the acquisition module in the neural network model training device may be equivalent to the communication interface 803 in the neural network model training device 800; the processing module in the neural network model training device may be equivalent to the processor 802.
  • the device 700 and device 800 only show memory, processor, and communication interface, in the specific implementation process, those skilled in the art should understand that the device 700 and device 800 may also include those necessary for normal operation. Other devices. At the same time, according to specific needs, those skilled in the art should understand that the device 700 and the device 800 may also include hardware devices that implement other additional functions. In addition, those skilled in the art should understand that the device 700 and the device 800 may also only include the components necessary to implement the embodiments of the present application, and not necessarily include all the components shown in FIGS. 8 and 9.
  • the disclosed system, device, and method may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • each unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of this application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the method described in each embodiment of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disk or optical disk and other media that can store program code .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

A method for processing data, comprising: acquiring a plurality of pieces of data to be processed (501); processing the plurality of pieces of data by using a first neural network model to obtain a plurality of first vectors that are in one-to-one correspondence with the plurality of pieces of data (502), wherein the first neural network model is obtained by training on the basis of general data; acquiring first association relationship information (503), the first association relationship information being used to indicate at least one first vector group, and each first vector group comprising two first vectors that satisfy a priori hypothesis; and inputting the plurality of first vectors and the first association relationship information into a second neural network model to obtain a processing result for first data to be processed (504), said first data being any data among the plurality of pieces of data. The purpose of the described method for processing data is to weaken the dependence of a neural network model on training data.

Description

数据处理的方法、训练神经网络模型的方法及设备Data processing method, method and equipment for training neural network model 技术领域Technical field
本申请涉及神经网络领域,尤其涉及对神经网络系统中的数据处理的方法、训练神经网络模型的方法及设备。This application relates to the field of neural networks, in particular to methods for data processing in neural network systems, methods and equipment for training neural network models.
背景技术Background technique
人工智能(artificial intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。深度学习(deep learning,DL)作为人工智能一个重要分支,得到了学术界和工业界广泛关注和深入的研究,不仅产生了许多理论创新成果,在产业界也有很多实际应用,诸如图像处理、语音识别、运动分析等等。Artificial intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge, and use knowledge to obtain the best results. As an important branch of artificial intelligence, deep learning (DL) has received extensive attention and in-depth research from academia and industry. It has not only produced many theoretical innovations, but also has many practical applications in the industry, such as image processing and voice. Recognition, motion analysis, etc.
训练好的神经网络模型有时会依赖于待训练数据,无法解决待训练数据所在领域以外的其他领域内的问题。例如,将待训练数据输入一个深度神经网络模型,得到的数据处理结果往往与输入数据的特征相匹配;而在实际使用该深度神经网络模型时,输出的结果与输入数据的特征之间的匹配度较差。因此,为了弱化神经网络模型对待训练数据的依赖程度,需要提供一种新的构建神经网络模型的方法。The trained neural network model sometimes depends on the data to be trained and cannot solve problems in other fields than the field of the data to be trained. For example, when the data to be trained is input into a deep neural network model, the data processing results obtained often match the characteristics of the input data; and when the deep neural network model is actually used, the output result matches the characteristics of the input data Degree is poor. Therefore, in order to weaken the dependence of the neural network model on the training data, it is necessary to provide a new method of constructing the neural network model.
发明内容Summary of the invention
本申请提供一种数据处理的方法、训练神经网络模型的方法及设备,目的在于削弱神经网络模型对待训练数据的依赖性。This application provides a data processing method, a method and equipment for training a neural network model, with the purpose of reducing the dependence of the neural network model on training data.
第一方面,提供了一种数据处理的方法,包括:获取多个待处理数据;使用第一神经网络模型对所述多个待处理数据进行处理,得到与所述多个待处理数据一一对应的多个第一矢量,其中,所述第一神经网络模型是基于通用数据训练获得;获取第一关联关系信息,所述第一关联关系信息用于指示至少一个第一矢量组,每个第一矢量组包括满足先验假设的两个第一矢量;将所述多个第一矢量以及所述第一关联关系信息输入第二神经网络模型,得到针对第一待处理数据的处理结果,所述第一待处理数据是所述多个待处理数据中的任一数据。In a first aspect, a data processing method is provided, which includes: obtaining a plurality of data to be processed; using a first neural network model to process the plurality of data to be processed, and obtain a one-to-one relationship with the plurality of data to be processed. Corresponding multiple first vectors, wherein the first neural network model is obtained based on general data training; acquiring first association information, the first association information is used to indicate at least one first vector group, each The first vector group includes two first vectors satisfying a priori hypothesis; the multiple first vectors and the first association relationship information are input into the second neural network model to obtain the processing result for the first data to be processed, The first data to be processed is any data in the plurality of data to be processed.
可选的,第一神经网络模型为卷积神经网络模型、图神经网络模型。例如,第一神经网络模型可以为深度卷积神经网络模型、图卷积神经网络模型、图注意力神经网络模型中的一个。Optionally, the first neural network model is a convolutional neural network model or a graph neural network model. For example, the first neural network model may be one of a deep convolutional neural network model, a graph convolutional neural network model, and a graph attention neural network model.
在一种可能的实施方式中,所述第二神经网络模型为图网络模型,相应地,所述多个第一矢量作为所述图网络模型的节点,所述第一关联关系作为所述图网络模型的边。In a possible implementation manner, the second neural network model is a graph network model, and accordingly, the multiple first vectors are used as nodes of the graph network model, and the first association relationship is used as the graph network model. The side of the network model.
第一神经网络模型、第二神经网络模型可以是某个神经网络模型的两个子模型。The first neural network model and the second neural network model may be two sub-models of a certain neural network model.
第一神经网络模型和第二神经网络模型可以存储在两个不同的设备上,也就是说本申请提供的数据处理的方法中的步骤可以由多个设备执行。例如,第一设备上存储有第一神 经网络模型,第一设备可以执行“获取多个待处理数据”的步骤以及“使用第一神经网络模型对所述多个待处理数据进行处理,得到与所述多个待处理数据一一对应的多个第一矢量”的步骤,第二设备上存储有第二神经网络模型,第二设备可以执行“获取第一关联关系信息,所述第一关联关系信息用于指示至少一个第一矢量组,每个第一矢量组包括满足先验假设的两个第一矢量”的步骤以及“将所述多个第一矢量以及所述第一关联关系信息输入第二神经网络模型,得到针对第一待处理数据的处理结果,所述第一待处理数据是所述多个待处理数据中的任一数据”的步骤。其中,多个第一矢量可以通过第一设备与第二设备之间的通信接口传输。The first neural network model and the second neural network model can be stored on two different devices, that is to say, the steps in the data processing method provided in this application can be executed by multiple devices. For example, the first neural network model is stored on the first device, and the first device can perform the steps of "obtain multiple data to be processed" and "use the first neural network model to process the multiple data to be processed to obtain In the step of “a plurality of first vectors corresponding to the plurality of data to be processed in a one-to-one relationship”, a second neural network model is stored on the second device, and the second device can execute “acquire the first association information, the first association The relationship information is used to indicate at least one first vector group, and each first vector group includes two first vectors satisfying a priori hypothesis" and "combining the multiple first vectors and the first association relationship information Input the second neural network model to obtain the processing result for the first to-be-processed data. The first to-be-processed data is any one of the multiple to-be-processed data" steps. Wherein, multiple first vectors may be transmitted through the communication interface between the first device and the second device.
在本申请实施例中,第一神经网络模型使用通用数据训练,可以得到不受场景影响或受场景影响较小的通用模型,因此该第一神经网络模型模型可以应用在多种场景中。然而,由于第一神经网络模型的应用不受场景限制,仅使用第一神经网络模型很难实现任意场景的高准确率识别。因此,可以将第一神经网络模型输出的多个特征矢量输入第二神经网络模型,使得第一神经网络模型可以应用在相对特殊的场景内,从而第二神经网络模型可以学习通用场景与特殊场景之间的区别与关联。现有的神经网络模型通常只能识别某个特殊的场景,一旦应用在其他领域,神经网络模型的大部分参数均无法再继续使用。由于第二神经网络模型可以学习通用场景、特殊场景之间的区别与关联,由于输入第一神经网络模型的数据可以是通用数据,因此本申请提供的方法可以弱化待处理数据所在场景对神经网络模型架构、参数的限制。另外,为了增强第二神经网络模型的识别准确率,在识别第一待处理数据的同时还会考虑与该第一待处理数据相关联的数据,由于处理数据量增多,因此有利于增加第二神经网络模型的识别准确率。并且,由于考虑到数据与数据之间的关联性,可以增强第二神经网络模型对数据关系的学习。In the embodiments of the present application, the first neural network model is trained using general data, and a general model that is not affected by the scene or less affected by the scene can be obtained, so the first neural network model model can be applied in a variety of scenarios. However, since the application of the first neural network model is not limited by the scene, it is difficult to achieve high-accuracy recognition of any scene using only the first neural network model. Therefore, multiple feature vectors output by the first neural network model can be input into the second neural network model, so that the first neural network model can be applied in relatively special scenes, so that the second neural network model can learn general scenes and special scenes The difference and association between. Existing neural network models usually only recognize a particular scene. Once applied in other fields, most of the parameters of the neural network model can no longer be used. Since the second neural network model can learn the difference and association between general scenarios and special scenarios, and because the data input to the first neural network model can be general data, the method provided in this application can weaken the scenario where the data to be processed is located on the neural network Restrictions on model architecture and parameters. In addition, in order to enhance the recognition accuracy of the second neural network model, while identifying the first data to be processed, data associated with the first data to be processed will also be considered. As the amount of processed data increases, it is beneficial to increase the second The recognition accuracy of the neural network model. In addition, since the correlation between data and data is considered, the learning of the data relationship by the second neural network model can be enhanced.
结合第一方面,在第一方面的某些实现方式中,所述第一关联关系信息用于指示N个所述第一矢量组,N为大于1的整数,在所述将所述多个第一矢量以及所述第一关联关系信息输入第二神经网络模型,得到针对第一待处理数据的处理结果之前,所述方法还包括:获取第二关联关系信息,所述第二关联关系信息用于指示n个第二矢量组,所述n个第二矢量组属于所述N个第一矢量组,n小于N,且n为正整数;所述将所述多个第一矢量以及所述第一关联关系信息输入第二神经网络模型,得到针对第一待处理数据的处理结果,包括:将所述多个第一矢量、所述第一关联关系信息以及所述第二关联关系信息输入所述第二神经网络模型,得到针对所述第一待处理数据的处理结果。With reference to the first aspect, in some implementations of the first aspect, the first association relationship information is used to indicate N of the first vector groups, where N is an integer greater than 1, and in the combination of the plurality of Before the first vector and the first association relationship information are input into the second neural network model, and the processing result for the first to-be-processed data is obtained, the method further includes: acquiring second association relationship information, the second association relationship information Used to indicate n second vector groups, the n second vector groups belong to the N first vector groups, n is less than N, and n is a positive integer; the plurality of first vectors and all the The first association relationship information is input into a second neural network model to obtain a processing result for the first to-be-processed data, including: combining the plurality of first vectors, the first association relationship information, and the second association relationship information Input the second neural network model to obtain the processing result for the first data to be processed.
在本申请实施例中,当第一关联关系信息仅指示两个第一矢量之间存在关联关系时,第一关联关系信息无法反映这两个第一矢量之间的关联性强弱。第二关联关系信息可以指示多个第一矢量组中关联关系较强或关联关系较弱的一个或多个第一矢量组,使得第二神经网络模型除了可以考虑与第一待处理数据关联的待处理数据,还可以加强与该第一待处理数据紧密关联的待处理数据对该第一待处理数据的影响,或者弱化与该第一待处理数据疏远关联的待处理数据对该第一待处理数据的影响,因此可以获得更多的数据量以识别该第一待处理数据。In the embodiment of the present application, when the first association relationship information only indicates that there is an association relationship between two first vectors, the first association relationship information cannot reflect the strength of the association between the two first vectors. The second association relationship information may indicate one or more first vector groups with a strong association relationship or a weak association relationship among the plurality of first vector groups, so that the second neural network model can consider in addition to the data associated with the first to-be-processed data The to-be-processed data can also strengthen the influence of the to-be-processed data that is closely related to the first to-be-processed data on the first to-be-processed data, or weaken the to-be-processed data that is distantly associated with the first to-be-processed data. Because of the influence of processing data, more data can be obtained to identify the first data to be processed.
结合第一方面,在第一方面的某些实现方式中,所述获取多个待处理数据,包括:获取目标数据,所述目标数据为所述多个待处理数据中的一个;获取关联数据,所述关联数据与所述目标数据之间具有满足所述先验假设的关联关系,所述多个待处理数据包括所述 关联数据。With reference to the first aspect, in some implementations of the first aspect, the acquiring multiple data to be processed includes: acquiring target data, where the target data is one of the multiple data to be processed; acquiring associated data , There is an association relationship between the associated data and the target data that satisfies the a priori hypothesis, and the plurality of data to be processed includes the associated data.
在本申请实施例中,可以根据需要处理的数据灵活引入关联数据,提高了获取待处理数据的灵活性,避免引入不必要的多余数据。In the embodiment of the present application, the associated data can be flexibly introduced according to the data to be processed, which improves the flexibility of obtaining the data to be processed and avoids the introduction of unnecessary redundant data.
结合第一方面,在第一方面的某些实现方式中,所述第一关联关系信息包括第二关联关系矩阵,所述第二关联关系矩阵中位于第一维度的矢量包括与所述多个第一矢量一一对应的多个元素,所述第二关联关系矩阵中位于第二维度的矢量包括与所述多个第一矢量一一对应的多个元素,其中,所述第二关联关系矩阵中任一元素用于指示所述任一元素在所述第一维度上对应的矢量与所述任一元素在所述第二维度上对应的矢量之间是否具有满足所述先验假设的关联关系。With reference to the first aspect, in some implementations of the first aspect, the first association relationship information includes a second association relationship matrix, and the vector in the first dimension in the second association relationship matrix includes the The first vector corresponds to a plurality of elements in a one-to-one relationship, the vector in the second dimension in the second association relationship matrix includes a plurality of elements in one-to-one correspondence with the plurality of first vectors, wherein the second association relationship Any element in the matrix is used to indicate whether the vector corresponding to the any element in the first dimension and the vector corresponding to the any element in the second dimension has a value that satisfies the a priori hypothesis. connection relation.
在本申请实施例中,使用矩阵表示多个第一矢量之间的关联关系,避免在第二神经网络模型中引入多种不同类型的数据结构,有利于计算简便。In the embodiment of the present application, a matrix is used to represent the association relationship between multiple first vectors, avoiding the introduction of multiple different types of data structures in the second neural network model, which facilitates calculation.
结合第一方面,在第一方面的某些实现方式中,所述使用第一神经网络模型对所述多个待处理数据进行处理,包括:使用所述第一神经网络模型对所述多个待处理数据以及第五关联关系信息进行处理,所述第五关联关系信息用于指示至少一个待处理数据组,每个待处理数据组包括满足先验假设的两个待处理数据。With reference to the first aspect, in some implementations of the first aspect, the using a first neural network model to process the plurality of data to be processed includes: using the first neural network model to process the plurality of The data to be processed and the fifth association relationship information are processed, where the fifth association relationship information is used to indicate at least one data group to be processed, and each data group to be processed includes two data to be processed that satisfy a priori assumption.
在本申请实施例中,为了增强第一神经网络模型的识别准确率,在识别第一待处理数据的同时还会考虑与该第一待处理数据相关联的数据,由于处理数据量增多,因此有利于增加第一神经网络模型的识别准确率。并且,由于考虑到数据与数据之间的关联性,可以增强第一神经网络模型对数据关系的学习。In this embodiment of the application, in order to enhance the recognition accuracy of the first neural network model, the data associated with the first to-be-processed data is also considered while identifying the first to-be-processed data. As the amount of processed data increases, It is helpful to increase the recognition accuracy of the first neural network model. Moreover, since the correlation between the data and the data is considered, the learning of the data relationship by the first neural network model can be enhanced.
结合第一方面,在第一方面的某些实现方式中,所述第五关联关系信息包括第一关联关系矩阵,所述第一关联关系矩阵中位于第一维度的矢量包括与所述多个待处理数据一一对应的多个元素,所述第一关联关系矩阵中位于第二维度的矢量包括与所述多个待处理数据一一对应的多个元素,其中,所述第一关联关系矩阵中任一元素用于指示所述任一元素在所述第一维度上对应的矢量与所述任一元素在所述第二维度上对应的矢量之间是否具有满足所述先验假设的关联关系。With reference to the first aspect, in some implementations of the first aspect, the fifth association relationship information includes a first association relationship matrix, and a vector located in the first dimension in the first association relationship matrix includes the A plurality of elements corresponding to the data to be processed in a one-to-one relationship, the vector in the second dimension in the first association relationship matrix includes a plurality of elements corresponding to the plurality of data to be processed in a one-to-one relationship, wherein the first association relationship Any element in the matrix is used to indicate whether the vector corresponding to the any element in the first dimension and the vector corresponding to the any element in the second dimension has a value that satisfies the a priori hypothesis. connection relation.
在本申请实施例中,使用矩阵表示多个待处理数据之间的关联关系,避免在第一神经网络模型中引入多种不同类型的数据结构,有利于计算简便。In the embodiment of the present application, a matrix is used to represent the association relationship between multiple to-be-processed data, avoiding the introduction of multiple different types of data structures in the first neural network model, which is conducive to simple calculation.
结合第一方面,在第一方面的某些实现方式中,所述第二神经网络模型的权重参数是通过以下方式获得:获取多个待训练数据;使用所述第一神经网络模型对所述多个待训练数据进行处理,得到与所述多个待训练数据一一对应的多个第四矢量;获取第三关联关系信息,所述第三关联关系信息用于指示至少一个第三矢量组,每个第三矢量组包括满足所述先验假设的两个第四矢量;将所述多个第四矢量以及所述第三关联关系信息输入所述第二神经网络模型,得到针对第一待训练数据的第一处理结果,所述第一待训练数据是所述多个待训练数据中的任一数据,所述第一处理结果用于修正所述第二神经网络模型的权重参数。With reference to the first aspect, in some implementations of the first aspect, the weight parameter of the second neural network model is obtained by: acquiring a plurality of data to be trained; using the first neural network model to A plurality of to-be-trained data are processed to obtain a plurality of fourth vectors corresponding to the plurality of to-be-trained data; and third association relationship information is obtained, and the third association relationship information is used to indicate at least one third vector group , Each third vector group includes two fourth vectors satisfying the a priori hypothesis; inputting the multiple fourth vectors and the third association relationship information into the second neural network model to obtain The first processing result of the data to be trained, the first data to be trained is any data of the plurality of data to be trained, and the first processing result is used to modify the weight parameter of the second neural network model.
在本申请实施例中,第一神经网络模型使用通用数据训练,可以得到不受场景影响或受场景影响较小的通用模型,因此该第一神经网络模型模型可以应用在多种场景中。将第一神经网络模型输出的多个特征矢量输入第二神经网络模型,使得第二神经网络模型可以在第一神经网络模型的识别结果基础上,实现相对特殊的场景的识别。从而第二神经网络 模型可以学习通用场景与特殊场景之间的区别与关联。为了增强第二神经网络模型的识别准确率,在识别第一待训练数据的同时还会考虑与该第一待训练数据相关联的数据。由于处理数据量增多,因此有利于增加第二神经网络模型的识别准确率。并且,由于考虑到数据与数据之间的关联性,可以增强第二神经网络模型对数据关系的学习。In the embodiments of the present application, the first neural network model is trained using general data, and a general model that is not affected by the scene or less affected by the scene can be obtained, so the first neural network model model can be applied in a variety of scenarios. The multiple feature vectors output by the first neural network model are input into the second neural network model, so that the second neural network model can realize the recognition of a relatively special scene based on the recognition result of the first neural network model. Therefore, the second neural network model can learn the difference and association between general scenes and special scenes. In order to enhance the recognition accuracy of the second neural network model, while recognizing the first data to be trained, data associated with the first data to be trained is also considered. As the amount of processed data increases, it is beneficial to increase the recognition accuracy of the second neural network model. In addition, since the correlation between data and data is considered, the learning of the data relationship by the second neural network model can be enhanced.
结合第一方面,在第一方面的某些实现方式中,所述得到针对第一待训练数据的第一处理结果,包括:得到所述第一处理结果以及针对第二待训练数据的第二处理结果,所述第一待训练数据的标签为第一标签,所述第二待训练数据的标签为第二标签;所述方法还包括:将所述第一标签与所述第二标签之间的相似度同所述第一处理结果与所述第二处理结果之间的相似度进行匹配,得到匹配结果,所述匹配结果用于修正所述第二神经网络模型的权重参数。With reference to the first aspect, in some implementations of the first aspect, the obtaining a first processing result for the first data to be trained includes: obtaining the first processing result and a second processing result for the second data to be trained As a result of the processing, the label of the first data to be trained is a first label, and the label of the second data to be trained is a second label; the method further includes: combining the first label and the second label The similarity between the two is matched with the similarity between the first processing result and the second processing result to obtain a matching result, and the matching result is used to modify the weight parameter of the second neural network model.
在本申请实施例中,通过标签与标签之间的相似度,可以判断两个处理结果之间的相似度是否合适,可以强化第二神经网络模型对数据与数据之间的关联关系的学习。In the embodiment of the present application, through the similarity between the label and the label, it can be judged whether the similarity between the two processing results is appropriate, and the learning of the association relationship between the data and the data can be strengthened by the second neural network model.
结合第一方面,在第一方面的某些实现方式中,所述第三关联关系信息用于指示M个第三矢量组,M为大于1的整数,在所述将所述多个第四矢量以及所述第三关联关系信息输入所述第二神经网络模型,得到针对第一待训练数据的第一处理结果之前,所述方法还包括:获取第四关联关系信息,所述第四关联关系信息用于指示m个第四矢量组,所述m个第四矢量组属于所述M个第三矢量组,m小于M,且m为正整数;所述将所述多个第四矢量以及所述第三关联关系信息输入所述第二神经网络模型,得到针对第一待训练数据的第一处理结果,包括:将所述多个第四矢量、所述第三关联关系信息、第四关联关系信息输入所述第二神经网络模型,得到所述第一处理结果。With reference to the first aspect, in some implementations of the first aspect, the third association relationship information is used to indicate M third vector groups, where M is an integer greater than 1, and in the fourth Before the vector and the third association relationship information are input into the second neural network model, and the first processing result for the first to-be-trained data is obtained, the method further includes: obtaining fourth association relationship information, the fourth association The relationship information is used to indicate m fourth vector groups, the m fourth vector groups belong to the M third vector groups, m is less than M, and m is a positive integer; the multiple fourth vector groups And inputting the third association relationship information into the second neural network model to obtain the first processing result for the first to-be-trained data includes: combining the plurality of fourth vectors, the third association relationship information, and the first Four-association relationship information is input into the second neural network model to obtain the first processing result.
在本申请实施例中,当第三关联关系信息仅指示两个第四矢量之间存在关联关系时,第三关联关系信息无法反映这两个第四矢量之间的关联性强弱。第二关联关系信息可以指示多个第三矢量组中关联关系较强或关联关系较弱的一个或多个第三矢量组,使得第二神经网络模型除了可以考虑与第一待训练数据关联的待训练数据,还可以加强与该第一待训练数据紧密关联的待训练数据对该第一待训练数据的影响,或者弱化与该第一待训练数据疏远关联的待训练数据对该第一待训练数据的影响,因此可以获得更多的数据量以识别该第一待训练数据。In the embodiment of the present application, when the third association relationship information only indicates that there is an association relationship between the two fourth vectors, the third association relationship information cannot reflect the strength of the association between the two fourth vectors. The second association relationship information may indicate one or more third vector groups with a strong association relationship or a weak association relationship among the plurality of third vector groups, so that the second neural network model can consider in addition to the data associated with the first training data. The data to be trained can also strengthen the influence of the data to be trained closely related to the first data to be trained on the first data to be trained, or weaken the data to be trained that is distantly associated with the first data to be trained on the first data to be trained. Because of the influence of training data, more data can be obtained to identify the first data to be trained.
结合第一方面,在第一方面的某些实现方式中,所述第一处理结果还用于修正所述第一神经网络模型的权重参数。With reference to the first aspect, in some implementations of the first aspect, the first processing result is also used to modify the weight parameter of the first neural network model.
在本申请实施例中,由于训练过程中可以学习到数据与数据之间的关联关系,因此,如果第一处理结果还用于修正第一神经网络模型,可以强化第一神经网络模型学习数据与数据之间的关联关系的能力。In the embodiment of the present application, since the association relationship between data and data can be learned during the training process, if the first processing result is also used to modify the first neural network model, the learning data of the first neural network model can be strengthened The ability to associate data between data.
结合第一方面,在第一方面的某些实现方式中,所述多个待训练数据包括一个或多个目标类型数据,每个目标类型数据具有用于修正所述权重参数的标签。With reference to the first aspect, in some implementations of the first aspect, the plurality of data to be trained includes one or more target type data, and each target type data has a label used to modify the weight parameter.
在本申请实施例中,训练第二神经网络模型可以使用半监督学习的方法。也就是说,多个待训练数据中的一部分具有标签,另一部分可以不具有标签。根据第三关联关系信息可以融合这两部分数据。即使待训练数据中包括不具有标签的数据,在修正第二神经网络模型时仍旧可以考虑不具有标签的数据。因此,可以减少待训练数据的标签数量,简便了训练第二神经网络模型的数据处理量。In the embodiment of the present application, a semi-supervised learning method may be used to train the second neural network model. In other words, a part of the plurality of data to be trained has a label, and the other part may not have a label. The two parts of data can be merged according to the third association information. Even if the data to be trained includes data without labels, data without labels can still be considered when modifying the second neural network model. Therefore, the number of tags of the data to be trained can be reduced, and the amount of data processing for training the second neural network model can be simplified.
结合第一方面,在第一方面的某些实现方式中,所述第三关联关系信息包括第四关联关系矩阵,所述第四关联关系矩阵中位于第一维度的矢量包括与所述多个第四矢量一一对应的多个元素,所述第四关联关系矩阵中位于第二维度的矢量包括与所述多个第四矢量一一对应的多个元素,其中,所述第四关联关系矩阵中任一元素用于指示所述任一元素在所述第一维度上对应的矢量与所述任一元素在所述第二维度上对应的矢量之间是否具有满足所述先验假设的关联关系。With reference to the first aspect, in some implementations of the first aspect, the third association relationship information includes a fourth association relationship matrix, and a vector located in the first dimension in the fourth association relationship matrix includes the The fourth vector corresponds to a plurality of elements in a one-to-one relationship, and the vector in the second dimension in the fourth association relationship matrix includes a plurality of elements in one-to-one correspondence with the plurality of fourth vectors, wherein the fourth association relationship Any element in the matrix is used to indicate whether the vector corresponding to the any element in the first dimension and the vector corresponding to the any element in the second dimension has a value that satisfies the a priori hypothesis. connection relation.
在本申请实施例中,使用矩阵表示多个第四矢量之间的关联关系,避免在第二神经网络模型中引入多种不同类型的数据结构,有利于计算简便。In the embodiment of the present application, a matrix is used to represent the association relationship between multiple fourth vectors, avoiding the introduction of multiple different types of data structures in the second neural network model, which facilitates the calculation.
结合第一方面,在第一方面的某些实现方式中,所述使用第一神经网络模型对所述多个待训练数据进行处理,包括:使用所述第一神经网络模型对所述多个待训练数据以及第六关联关系信息进行处理,所述第六关联关系信息用于指示至少一个待训练数据组,每个待训练数据组包括满足先验假设的两个待训练数据。With reference to the first aspect, in some implementations of the first aspect, the using a first neural network model to process the plurality of data to be trained includes: using the first neural network model to process the plurality of The data to be trained and the sixth association relationship information are processed. The sixth association relationship information is used to indicate at least one to-be-trained data group, and each to-be-trained data group includes two to-be-trained data that satisfy a priori hypothesis.
在本申请实施例中,为了增强第一神经网络模型的识别准确率,在识别第一待训练数据的同时还会考虑与该第一待训练数据相关联的数据。由于处理数据量增多,因此有利于增加第一神经网络模型的识别准确率。并且,由于考虑到数据与数据之间的关联性,可以增强第一神经网络模型对数据关系的学习。In the embodiment of the present application, in order to enhance the recognition accuracy of the first neural network model, the data associated with the first data to be trained will also be considered while identifying the first data to be trained. As the amount of processed data increases, it is beneficial to increase the recognition accuracy of the first neural network model. Moreover, since the correlation between the data and the data is considered, the learning of the data relationship by the first neural network model can be enhanced.
结合第一方面,在第一方面的某些实现方式中,所述第六关联关系信息包括第三关联关系矩阵,所述第三关联关系矩阵中位于第一维度的矢量包括与所述多个待训练数据一一对应的多个元素,所述第三关联关系矩阵中位于第二维度的矢量包括与所述多个待训练数据一一对应的多个元素,其中,所述第三关联关系矩阵中任一元素用于指示所述任一元素在所述第一维度上对应的矢量与所述任一元素在所述第二维度上对应的矢量之间是否具有满足所述先验假设的关联关系。With reference to the first aspect, in some implementations of the first aspect, the sixth association relationship information includes a third association relationship matrix, and the vector in the first dimension in the third association relationship matrix includes the A plurality of elements in one-to-one correspondence with the data to be trained, the vector in the second dimension in the third association relationship matrix includes a plurality of elements in one-to-one correspondence with the plurality of data to be trained, wherein the third association relationship Any element in the matrix is used to indicate whether the vector corresponding to the any element in the first dimension and the vector corresponding to the any element in the second dimension has a value that satisfies the a priori hypothesis. connection relation.
在本申请实施例中,使用矩阵表示多个待训练数据之间的关联关系,避免在第一神经网络模型中引入多种不同类型的数据结构,有利于计算简便。In the embodiment of the present application, a matrix is used to represent the association relationship between multiple data to be trained, avoiding the introduction of multiple different types of data structures in the first neural network model, which facilitates calculation.
第二方面,提供了一种训练神经网络模型的方法,包括:获取多个待训练数据;使用第一神经网络模型对所述多个待训练数据进行处理,得到与所述多个待训练数据一一对应的多个第四矢量;获取第三关联关系信息,所述第三关联关系信息用于指示至少一个第三矢量组,每个第三矢量组包括满足所述先验假设的两个第四矢量;将所述多个第四矢量以及所述第三关联关系信息输入第二神经网络模型,得到针对第一待训练数据的第一处理结果,所述第一待训练数据是所述多个待训练数据中的任一数据,所述第一处理结果用于修正所述第二神经网络模型的权重参数。In a second aspect, a method for training a neural network model is provided, which includes: obtaining a plurality of data to be trained; using the first neural network model to process the plurality of data to be trained, and obtain the same Multiple fourth vectors in one-to-one correspondence; acquiring third association relationship information, where the third association relationship information is used to indicate at least one third vector group, and each third vector group includes two that satisfy the a priori hypothesis A fourth vector; input the multiple fourth vectors and the third association relationship information into a second neural network model to obtain a first processing result for the first data to be trained, and the first data to be trained is the For any one of the multiple data to be trained, the first processing result is used to modify the weight parameter of the second neural network model.
在本申请实施例中,第一神经网络模型可以通过场景1的训练数据训练得到。将场景2的待训练数据输入第一神经网络模型,可以输出多个特征矢量;再将该多个特征矢量输入第二神经网络模型,使得第二神经网络模型可以在第一神经网络模型的识别结果基础上,实现对场景2的识别。因此,第二神经网络模型可以学习到场景1与场景2之间的区别与关联。为了增强第二神经网络模型的识别准确率,在识别第一待训练数据的同时还会考虑与该第一待训练数据相关联的数据。由于处理数据量增多,因此有利于增加第二神经网络模型的识别准确率。并且,由于考虑到数据与数据之间的关联性,可以增强第二神经网络模型对数据关系的学习。In the embodiment of the present application, the first neural network model can be obtained through training on the training data of scenario 1. Input the to-be-trained data of scene 2 into the first neural network model to output multiple feature vectors; then input the multiple feature vectors into the second neural network model, so that the second neural network model can be identified in the first neural network model Based on the results, the recognition of scene 2 is realized. Therefore, the second neural network model can learn the difference and association between scene 1 and scene 2. In order to enhance the recognition accuracy of the second neural network model, while recognizing the first data to be trained, data associated with the first data to be trained is also considered. As the amount of processed data increases, it is beneficial to increase the recognition accuracy of the second neural network model. In addition, since the correlation between data and data is considered, the learning of the data relationship by the second neural network model can be enhanced.
结合第二方面,在第二方面的某些实现方式中,所述得到针对第一待训练数据的第一处理结果,包括:得到所述第一处理结果以及针对第二待训练数据的第二处理结果,所述第一待训练数据的标签为第一标签,所述第二待训练数据的标签为第二标签,所述第一待训练数据与所述第二待训练数据为所述多个待训练数据中的任意两个数据;所述方法还包括:将所述第一标签与所述第二标签之间的相似度同所述第一处理结果与所述第二处理结果之间的相似度进行匹配,得到匹配结果,所述匹配结果用于修正所述第二神经网络模型的权重参数。With reference to the second aspect, in some implementations of the second aspect, the obtaining the first processing result for the first data to be trained includes: obtaining the first processing result and the second processing result for the second data to be trained As a result of the processing, the label of the first data to be trained is the first label, the label of the second data to be trained is the second label, and the first data to be trained and the second data to be trained are the multiple Any two of the data to be trained; the method further includes: comparing the similarity between the first label and the second label to the difference between the first processing result and the second processing result To obtain a matching result, and the matching result is used to modify the weight parameter of the second neural network model.
结合第二方面,在第二方面的某些实现方式中,所述第三关联关系信息用于指示M个第三矢量组,在所述将所述多个第四矢量以及所述第三关联关系信息输入所述第二神经网络模型,得到针对第一待训练数据的第一处理结果之前,所述方法还包括:获取第四关联关系信息,所述第四关联关系信息用于指示m个第四矢量组,所述m个第四矢量组属于所述M个第三矢量组,m小于M,且m为正整数;所述将所述多个第四矢量以及所述第三关联关系信息输入所述第二神经网络模型,得到针对第一待训练数据的第一处理结果,包括:将所述多个第四矢量、所述第三关联关系信息、第四关联关系信息输入所述第二神经网络模型,得到所述第一处理结果。With reference to the second aspect, in some implementations of the second aspect, the third association relationship information is used to indicate M third vector groups, and when the plurality of fourth vectors are associated with the third vector group, Before the relationship information is input to the second neural network model and the first processing result for the first data to be trained is obtained, the method further includes: obtaining fourth association relationship information, where the fourth association relationship information is used to indicate m A fourth vector group, the m fourth vector groups belong to the M third vector groups, m is less than M, and m is a positive integer; and the plurality of fourth vectors and the third association relationship Inputting information into the second neural network model to obtain a first processing result for the first data to be trained includes: inputting the plurality of fourth vectors, the third association relationship information, and the fourth association relationship information into the The second neural network model obtains the first processing result.
结合第二方面,在第二方面的某些实现方式中,所述第一处理结果还用于修正所述第一神经网络模型的权重参数。With reference to the second aspect, in some implementations of the second aspect, the first processing result is also used to modify the weight parameter of the first neural network model.
结合第二方面,在第二方面的某些实现方式中,所述多个待训练数据包括一个或多个目标类型数据,每个目标类型数据具有用于修正所述权重参数的标签。With reference to the second aspect, in some implementations of the second aspect, the plurality of data to be trained includes one or more target type data, and each target type data has a label used to modify the weight parameter.
结合第二方面,在第二方面的某些实现方式中,所述第三关联关系信息包括第四关联关系矩阵,所述第四关联关系矩阵中位于第一维度的矢量包括与所述多个第四矢量一一对应的多个元素,所述第四关联关系矩阵中位于第二维度的矢量包括与所述多个第四矢量一一对应的多个元素,其中,所述第四关联关系矩阵中任一元素用于指示所述任一元素在所述第一维度上对应的矢量与所述任一元素在所述第二维度上对应的矢量之间是否具有满足所述先验假设的关联关系。With reference to the second aspect, in some implementations of the second aspect, the third association relationship information includes a fourth association relationship matrix, and a vector located in the first dimension in the fourth association relationship matrix includes the The fourth vector corresponds to a plurality of elements in a one-to-one relationship, and the vector in the second dimension in the fourth association relationship matrix includes a plurality of elements in one-to-one correspondence with the plurality of fourth vectors, wherein the fourth association relationship Any element in the matrix is used to indicate whether the vector corresponding to the any element in the first dimension and the vector corresponding to the any element in the second dimension has a value that satisfies the a priori hypothesis. connection relation.
结合第二方面,在第二方面的某些实现方式中,所述使用第一神经网络模型对所述多个待训练数据进行处理,包括:使用所述第一神经网络模型对所述多个待训练数据以及第六关联关系信息进行处理,所述第六关联关系信息用于指示至少一个待训练数据组,每个待训练数据组包括满足先验假设的两个待训练数据。With reference to the second aspect, in some implementations of the second aspect, the using the first neural network model to process the plurality of data to be trained includes: using the first neural network model to process the plurality of The data to be trained and the sixth association relationship information are processed. The sixth association relationship information is used to indicate at least one to-be-trained data group, and each to-be-trained data group includes two to-be-trained data that satisfy a priori hypothesis.
结合第二方面,在第二方面的某些实现方式中,所述第六关联关系信息包括第三关联关系矩阵,所述第三关联关系矩阵中位于第一维度的矢量包括与所述多个待训练数据一一对应的多个元素,所述第三关联关系矩阵中位于第二维度的矢量包括与所述多个待训练数据一一对应的多个元素,其中,所述第三关联关系矩阵中任一元素用于指示所述任一元素在所述第一维度上对应的矢量与所述任一元素在所述第二维度上对应的矢量之间是否具有满足所述先验假设的关联关系。With reference to the second aspect, in some implementations of the second aspect, the sixth association relationship information includes a third association relationship matrix, and the vector in the first dimension in the third association relationship matrix includes the A plurality of elements in one-to-one correspondence with the data to be trained, the vector in the second dimension in the third association relationship matrix includes a plurality of elements in one-to-one correspondence with the plurality of data to be trained, wherein the third association relationship Any element in the matrix is used to indicate whether the vector corresponding to the any element in the first dimension and the vector corresponding to the any element in the second dimension has a value that satisfies the a priori hypothesis. connection relation.
结合第二方面,在第二方面的某些实现方式中,所述第一神经网络模型是基于通用数据训练获得。With reference to the second aspect, in some implementations of the second aspect, the first neural network model is obtained based on general data training.
在本申请实施例中,第一神经网络模型使用通用数据训练,可以得到不受场景影响或受场景影响较小的通用模型,因此该第一神经网络模型模型可以应用在多种场景中。将第 一神经网络模型输出的多个特征矢量输入第二神经网络模型,使得第二神经网络模型可以在第一神经网络模型的识别结果基础上,实现相对特殊的场景的识别。从而第二神经网络模型可以学习通用场景与特殊场景之间的区别与关联。In the embodiments of the present application, the first neural network model is trained using general data, and a general model that is not affected by the scene or less affected by the scene can be obtained, so the first neural network model model can be applied in a variety of scenarios. The multiple feature vectors output by the first neural network model are input into the second neural network model, so that the second neural network model can realize the recognition of a relatively special scene based on the recognition result of the first neural network model. Therefore, the second neural network model can learn the difference and association between the general scene and the special scene.
第三方面,提供了一种训练神经网络模型的方法,包括:获取多个待训练数据;将所述多个待训练数据以及所述第七关联关系信息输入第二神经网络模型,得到针对第一待训练数据的第一处理结果以及针对第二待训练数据的第二处理结果,所述第一待训练数据的标签为第一标签,所述第二待训练数据的标签为第二标签,所述第一待训练数据与所述第二待训练数据为所述多个待训练数据中的任意两个数据;所述方法还包括:将所述第一标签与所述第二标签之间的相似度同所述第一处理结果与所述第二处理结果之间的相似度进行匹配,得到匹配结果,所述匹配结果用于修正所述第二神经网络模型的权重参数。In a third aspect, a method for training a neural network model is provided, including: acquiring a plurality of data to be trained; inputting the plurality of data to be trained and the seventh association relationship information into a second neural network model to obtain A first processing result of the data to be trained and a second processing result of the second data to be trained, the label of the first data to be trained is the first label, and the label of the second data to be trained is the second label, The first to-be-trained data and the second to-be-trained data are any two pieces of data among the plurality of to-be-trained data; the method further includes: separating the first label and the second label The similarity of is matched with the similarity between the first processing result and the second processing result to obtain a matching result, and the matching result is used to modify the weight parameter of the second neural network model.
在本申请实施例中,通过标签与标签之间的相似度,可以判断两个处理结果之间的相似度是否合适,可以强化第二神经网络模型对数据与数据之间的关联关系的学习。In the embodiment of the present application, through the similarity between the label and the label, it can be judged whether the similarity between the two processing results is appropriate, and the learning of the association relationship between the data and the data can be strengthened by the second neural network model.
结合第三方面,在第三方面的某些实现方式中,所述方法还包括:获取第七关联关系信息,所述第七关联关系信息用于指示至少一个第一训练数据组,每个第一训练数据组包括满足所述先验假设的两个待训练数据。With reference to the third aspect, in some implementations of the third aspect, the method further includes: acquiring seventh association relationship information, where the seventh association relationship information is used to indicate at least one first training data set, and each first training data set is A training data set includes two data to be trained that satisfy the prior hypothesis.
在本申请实施例中,为了增强第二神经网络模型的识别准确率,在识别第一待训练数据的同时还会考虑与该第一待训练数据相关联的数据。由于处理数据量增多,因此有利于增加第二神经网络模型的识别准确率。并且,由于考虑到数据与数据之间的关联性,可以增强第二神经网络模型对数据关系的学习。In the embodiment of the present application, in order to enhance the recognition accuracy of the second neural network model, the data associated with the first data to be trained will also be considered while identifying the first data to be trained. As the amount of processed data increases, it is beneficial to increase the recognition accuracy of the second neural network model. In addition, since the correlation between data and data is considered, the learning of the data relationship by the second neural network model can be enhanced.
结合第三方面,在第三方面的某些实现方式中,所述第七关联关系信息用于指示H个第一训练数据组,在所述将所述多个待训练数据以及所述第七关联关系信息输入所述第二神经网络模型,得到针对第一待训练数据的第一处理结果之前,所述方法还包括:获取第八关联关系信息,所述第八关联关系信息用于指示h个第二待训练数据组,所述h个第二待训练数据组属于所述H个第一训练数据组,h小于H,且h为正整数;所述将所述多个待训练数据以及所述第七关联关系信息输入所述第二神经网络模型,得到针对第一待训练数据的第一处理结果,包括:将所述多个待训练数据、所述第七关联关系信息、第八关联关系信息输入所述第二神经网络模型,得到所述第一处理结果。With reference to the third aspect, in some implementations of the third aspect, the seventh association relationship information is used to indicate the H first training data sets, and in the combination of the plurality of to-be-trained data and the seventh Before the association relationship information is input into the second neural network model and the first processing result for the first data to be trained is obtained, the method further includes: obtaining eighth association relationship information, where the eighth association relationship information is used to indicate h Second data groups to be trained, the h second data groups to be trained belong to the H first training data groups, h is less than H, and h is a positive integer; the multiple data groups to be trained are The seventh association relationship information is input into the second neural network model to obtain the first processing result for the first to-be-trained data, including: combining the plurality of to-be-trained data, the seventh association relationship information, and the eighth The association relationship information is input into the second neural network model to obtain the first processing result.
结合第三方面,在第三方面的某些实现方式中,所述多个待训练数据包括一个或多个目标类型数据,每个目标类型数据具有用于修正所述权重参数的标签。With reference to the third aspect, in some implementations of the third aspect, the plurality of to-be-trained data includes one or more target type data, and each target type data has a label for modifying the weight parameter.
结合第三方面,在第三方面的某些实现方式中,所述第七关联关系信息包括第五关联关系矩阵,所述第五关联关系矩阵中位于第一维度的矢量包括与所述多个待训练数据一一对应的多个元素,所述第五关联关系矩阵中位于第二维度的矢量包括与所述多个待训练数据一一对应的多个元素,其中,所述第五关联关系矩阵中任一元素用于指示所述任一元素在所述第一维度上对应的待训练数据与所述任一元素在所述第二维度上对应的待训练数据之间是否具有满足所述先验假设的关联关系。With reference to the third aspect, in some implementations of the third aspect, the seventh association relationship information includes a fifth association relationship matrix, and a vector located in the first dimension in the fifth association relationship matrix includes the A plurality of elements corresponding to the data to be trained in a one-to-one relationship, the vector in the second dimension in the fifth association relationship matrix includes a plurality of elements in one-to-one correspondence with the plurality of data to be trained, wherein the fifth association relationship Any element in the matrix is used to indicate whether the data to be trained corresponding to any element in the first dimension and the data to be trained corresponding to any element in the second dimension satisfy the A priori hypothesis of the relationship.
第四方面,提供了一种数据处理的设备,所述设备包括用于执行所述第一方面或者第一方面的任一可能的实现方式中的方法的模块。In a fourth aspect, a data processing device is provided, and the device includes a module for executing the first aspect or the method in any possible implementation manner of the first aspect.
可选的,所述设备可以是云端服务器,也可以是终端设备。Optionally, the device may be a cloud server or a terminal device.
第五方面,提供了一种训练神经网络模型的设备,所述设备包括用于执行所述第二方 面或者第二方面的任一可能的实现方式中的方法的模块。In a fifth aspect, a device for training a neural network model is provided, and the device includes a module for executing the second aspect or the method in any possible implementation of the second aspect.
可选的,所述设备可以是云端服务器,也可以是终端设备。Optionally, the device may be a cloud server or a terminal device.
第六方面,提供了一种训练神经网络模型的设备,所述设备包括用于执行所述第三方面或者第三方面的任一可能的实现方式中的方法的模块。In a sixth aspect, a device for training a neural network model is provided, and the device includes a module for executing the third aspect or the method in any possible implementation manner of the third aspect.
可选的,所述设备可以是云端服务器,也可以是终端设备。Optionally, the device may be a cloud server or a terminal device.
第七方面,提供了一种数据处理的设备,该设备包括:存储器,用于存储程序;处理器,用于执行所述存储器存储的程序,当所述存储器存储的程序被执行时,所述处理器用于执行第一方面中的任意一种实现方式中的方法。In a seventh aspect, a data processing device is provided. The device includes: a memory for storing a program; a processor for executing the program stored in the memory, and when the program stored in the memory is executed, the The processor is configured to execute the method in any one of the implementation manners in the first aspect.
可选的,所述设备可以是云端服务器,也可以是终端设备。Optionally, the device may be a cloud server or a terminal device.
第八方面,提供了一种训练神经网络模型的设备,该设备包括:存储器,用于存储程序;处理器,用于执行所述存储器存储的程序,当所述存储器存储的程序被执行时,所述处理器用于执行第二方面中的任意一种实现方式中的方法。In an eighth aspect, a device for training a neural network model is provided. The device includes: a memory for storing a program; a processor for executing the program stored in the memory, and when the program stored in the memory is executed, The processor is configured to execute the method in any one of the implementation manners in the second aspect.
可选的,所述设备可以是云端服务器,也可以是终端设备。Optionally, the device may be a cloud server or a terminal device.
第九方面,提供了一种训练神经网络模型的设备,该设备包括:存储器,用于存储程序;处理器,用于执行所述存储器存储的程序,当所述存储器存储的程序被执行时,所述处理器用于执行第三方面中的任意一种实现方式中的方法。In a ninth aspect, a device for training a neural network model is provided. The device includes: a memory for storing a program; a processor for executing the program stored in the memory, and when the program stored in the memory is executed, The processor is configured to execute the method in any one of the implementation manners of the third aspect.
可选的,所述设备可以是云端服务器,也可以是终端设备。Optionally, the device may be a cloud server or a terminal device.
第十方面,提供一种计算机可读介质,该计算机可读介质存储用于设备执行的程序代码,该程序代码包括用于执行第一方面至第三方面中的任意一种实现方式中的方法。In a tenth aspect, a computer-readable medium is provided, and the computer-readable medium stores program code for device execution. The program code includes a method for executing any one of the first to third aspects. .
第十一方面,提供一种包含指令的计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行上述第一方面至第三方面中的任意一种实现方式中的方法。An eleventh aspect provides a computer program product containing instructions, when the computer program product runs on a computer, the computer executes the method in any one of the foregoing first to third aspects.
第十二方面,提供一种芯片,所述芯片包括处理器与数据接口,所述处理器通过所述数据接口读取存储器上存储的指令,执行上述第一方面至第三方面中的任意一种实现方式中的方法。In a twelfth aspect, a chip is provided. The chip includes a processor and a data interface. The processor reads instructions stored in a memory through the data interface, and executes any one of the first to third aspects. One way to achieve this.
可选地,作为一种实现方式,所述芯片还可以包括存储器,所述存储器中存储有指令,所述处理器用于执行所述存储器上存储的指令,当所述指令被执行时,所述处理器用于执行第一方面至第三方面中的任意一种实现方式中的方法。Optionally, as an implementation manner, the chip may further include a memory in which instructions are stored, and the processor is configured to execute the instructions stored in the memory. When the instructions are executed, the The processor is configured to execute the method in any one of the implementation manners of the first aspect to the third aspect.
附图说明Description of the drawings
图1是本申请实施例提供的一种卷积神经网络架构的示意图。FIG. 1 is a schematic diagram of a convolutional neural network architecture provided by an embodiment of the present application.
图2为本申请实施例提供的一种图模型的示意图。Fig. 2 is a schematic diagram of a graph model provided by an embodiment of the application.
图3为本申请实施例提供的一种系统架构的示意图。FIG. 3 is a schematic diagram of a system architecture provided by an embodiment of the application.
图4为本申请实施例提供的一种芯片的硬件结构示意图。FIG. 4 is a schematic diagram of the hardware structure of a chip provided by an embodiment of the application.
图5为本申请实施例提供的一种系统架构的示意图。FIG. 5 is a schematic diagram of a system architecture provided by an embodiment of the application.
图6为本申请实施例提供的一种数据处理的方法的示意性流程图。FIG. 6 is a schematic flowchart of a data processing method provided by an embodiment of the application.
图7为本申请实施例提供的一种训练神经网络模型的方法的示意性流程图。FIG. 7 is a schematic flowchart of a method for training a neural network model provided by an embodiment of the application.
图8为本申请实施例提供的一种数据处理的设备的示意性框图。FIG. 8 is a schematic block diagram of a data processing device provided by an embodiment of the application.
图9是本申请实施例提供的一种训练神经网络模型的设备的示意性框图。Fig. 9 is a schematic block diagram of a device for training a neural network model provided by an embodiment of the present application.
具体实施方式detailed description
下面将结合附图,对本申请中的技术方案进行描述。The technical solution in this application will be described below in conjunction with the drawings.
(1)神经网络(1) Neural network
神经网络可以是由神经单元组成的,神经单元可以是指以x s和截距1为输入的运算单元,该运算单元的输出可以为: A neural network can be composed of neural units. A neural unit can refer to an arithmetic unit that takes x s and intercept 1 as inputs. The output of the arithmetic unit can be:
Figure PCTCN2019099653-appb-000001
Figure PCTCN2019099653-appb-000001
其中,s=1、2、……n,n为大于1的自然数,W s为x s的权重,b为神经单元的偏置。f为神经单元的激活函数(activation functions),用于将非线性特性引入神经网络中,来将神经单元中的输入信号转换为输出信号。该激活函数的输出信号可以作为下一层卷积层的输入。激活函数可以是sigmoid函数。神经网络是将许多个上述单一的神经单元联结在一起形成的网络,即一个神经单元的输出可以是另一个神经单元的输入。每个神经单元的输入可以与前一层的局部接受域相连,来提取局部接受域的特征,局部接受域可以是由若干个神经单元组成的区域。 Among them, s=1, 2,...n, n is a natural number greater than 1, W s is the weight of x s , and b is the bias of the neural unit. f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal. The output signal of the activation function can be used as the input of the next convolutional layer. The activation function can be a sigmoid function. A neural network is a network formed by connecting many of the above-mentioned single neural units together, that is, the output of one neural unit can be the input of another neural unit. The input of each neural unit can be connected with the local receptive field of the previous layer to extract the characteristics of the local receptive field. The local receptive field can be a region composed of several neural units.
(2)深度神经网络(2) Deep neural network
深度神经网络(deep neural network,DNN),也称多层神经网络,可以理解为具有很多层隐含层的神经网络,这里的“很多”并没有特别的度量标准。从DNN按不同层的位置划分,DNN内部的神经网络可以分为三类:输入层,隐含层,输出层。一般来说第一层是输入层,最后一层是输出层,中间的层数都是隐含层。层与层之间是全连接的,也就是说,第i层的任意一个神经元一定与第i+1层的任意一个神经元相连。虽然DNN看起来很复杂,但是就每一层的工作来说,其实并不复杂,简单来说就是如下线性关系表达式:
Figure PCTCN2019099653-appb-000002
其中,
Figure PCTCN2019099653-appb-000003
是输入向量,
Figure PCTCN2019099653-appb-000004
是输出向量,
Figure PCTCN2019099653-appb-000005
是偏移向量,W是权重矩阵(也称系数),α()是激活函数。每一层仅仅是对输入向量
Figure PCTCN2019099653-appb-000006
经过如此简单的操作得到输出向量
Figure PCTCN2019099653-appb-000007
由于DNN层数多,则系数W和偏移向量
Figure PCTCN2019099653-appb-000008
的数量也就很多了。这些参数在DNN中的定义如下所述:以系数W为例:假设在一个三层的DNN中,第二层的第4个神经元到第三层的第2个神经元的线性系数定义为
Figure PCTCN2019099653-appb-000009
上标3代表系数W所在的层数,而下标对应的是输出的第三层索引2和输入的第二层索引4。总结就是:第L-1层的第k个神经元到第L层的第j个神经元的系数定义为
Figure PCTCN2019099653-appb-000010
需要注意的是,输入层是没有W参数的。在深度神经网络中,更多的隐含层让网络更能够刻画现实世界中的复杂情形。理论上而言,参数越多的模型复杂度越高,“容量”也就越大,也就意味着它能完成更复杂的学习任务。训练深度神经网络的也就是学习权重矩阵的过程,其最终目的是得到训练好的深度神经网络的所有层的权重矩阵(由很多层的向量W形成的权重矩阵)。
Deep neural network (DNN), also known as multi-layer neural network, can be understood as a neural network with many hidden layers. There is no special metric for "many" here. According to the location of different layers of DNN, the neural network inside DNN can be divided into three categories: input layer, hidden layer, and output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the number of layers in the middle are all hidden layers. The layers are fully connected, that is to say, any neuron in the i-th layer must be connected to any neuron in the i+1th layer. Although DNN looks complicated, it is not complicated in terms of the work of each layer. In simple terms, it is the following linear relationship expression:
Figure PCTCN2019099653-appb-000002
among them,
Figure PCTCN2019099653-appb-000003
Is the input vector,
Figure PCTCN2019099653-appb-000004
Is the output vector,
Figure PCTCN2019099653-appb-000005
Is the offset vector, W is the weight matrix (also called coefficient), and α() is the activation function. Each layer is just the input vector
Figure PCTCN2019099653-appb-000006
After such a simple operation, the output vector is obtained
Figure PCTCN2019099653-appb-000007
Due to the large number of DNN layers, the coefficient W and the offset vector
Figure PCTCN2019099653-appb-000008
The number is also a lot. The definition of these parameters in the DNN is as follows: Take the coefficient W as an example: Suppose that in a three-layer DNN, the linear coefficients from the fourth neuron in the second layer to the second neuron in the third layer are defined as
Figure PCTCN2019099653-appb-000009
The superscript 3 represents the number of layers where the coefficient W is located, and the subscript corresponds to the output third layer index 2 and the input second layer index 4. The summary is: the coefficient from the kth neuron of the L-1th layer to the jth neuron of the Lth layer is defined as
Figure PCTCN2019099653-appb-000010
It should be noted that the input layer has no W parameter. In deep neural networks, more hidden layers make the network more capable of portraying complex situations in the real world. Theoretically speaking, a model with more parameters is more complex and has a greater "capacity", which means it can complete more complex learning tasks. Training a deep neural network is also a process of learning a weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (a weight matrix formed by vectors W of many layers).
(3)卷积神经网络(3) Convolutional neural network
卷积神经网络(CNN,Convolutional Neuron Network)是一种带有卷积结构的深度神经网络。卷积神经网络包含了一个由卷积层和子采样层构成的特征抽取器。该特征抽取器可以看作是滤波器,卷积过程可以看作是使用一个可训练的滤波器与一个输入的图像或者卷积特征平面(feature map)做卷积。卷积层是指卷积神经网络中对输入信号进行卷积处理的神经元层。在卷积神经网络的卷积层中,一个神经元可以只与部分邻层神经元连接。一个卷积层中,通常包含若干个特征平面,每个特征平面可以由一些矩形排列的神经单元组成。同一特征平面的神经单元共享权重,这里共享的权重就是卷积核。共享权重可以理 解为提取图像信息的方式与位置无关。这其中隐含的原理是:图像的某一部分的统计信息与其他部分是一样的。即意味着在某一部分学习的图像信息也能用在另一部分上。所以对于图像上的所有位置,都能使用同样的学习得到的图像信息。在同一卷积层中,可以使用多个卷积核来提取不同的图像信息,一般地,卷积核数量越多,卷积操作反映的图像信息越丰富。Convolutional Neural Network (CNN, Convolutional Neuron Network) is a deep neural network with a convolutional structure. The convolutional neural network contains a feature extractor composed of a convolutional layer and a sub-sampling layer. The feature extractor can be seen as a filter, and the convolution process can be seen as using a trainable filter to convolve with an input image or convolution feature map. The convolutional layer refers to the neuron layer that performs convolution processing on the input signal in the convolutional neural network. In the convolutional layer of a convolutional neural network, a neuron can be connected to only part of the neighboring neurons. A convolutional layer usually contains several feature planes, and each feature plane can be composed of some rectangularly arranged neural units. Neural units in the same feature plane share weights, and the shared weights here are the convolution kernels. Sharing weight can be understood as the way to extract image information is independent of location. The underlying principle is that the statistical information of a certain part of the image is the same as that of other parts. This means that the image information learned in one part can also be used in another part. Therefore, the image information obtained by the same learning can be used for all positions on the image. In the same convolution layer, multiple convolution kernels can be used to extract different image information. Generally, the more the number of convolution kernels, the richer the image information reflected by the convolution operation.
卷积核可以以随机大小的矩阵的形式初始化,在卷积神经网络的训练过程中卷积核可以通过学习得到合理的权重。另外,共享权重带来的直接好处是减少卷积神经网络各层之间的连接,同时又降低了过拟合的风险。The convolution kernel can be initialized in the form of a matrix of random size. During the training of the convolutional neural network, the convolution kernel can obtain reasonable weights through learning. In addition, the direct benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.
如图1所示,卷积神经网络(CNN)400可以包括输入层410,卷积层/池化层420(其中池化层为可选的),以及神经网络层430。As shown in FIG. 1, a convolutional neural network (CNN) 400 may include an input layer 410, a convolutional layer/pooling layer 420 (the pooling layer is optional), and a neural network layer 430.
卷积层/池化层420:Convolutional layer/pooling layer 420:
卷积层:Convolutional layer:
如图1所示卷积层/池化层420可以包括如示例421-426层,举例来说:在一种实现中,421层为卷积层,422层为池化层,423层为卷积层,424层为池化层,425为卷积层,426为池化层;在另一种实现方式中,421、422为卷积层,423为池化层,424、425为卷积层,426为池化层。即卷积层的输出可以作为随后的池化层的输入,也可以作为另一个卷积层的输入以继续进行卷积操作。As shown in Figure 1, the convolutional layer/pooling layer 420 may include layers 421-426. For example, in one implementation, layer 421 is a convolutional layer, layer 422 is a pooling layer, and layer 423 is a convolutional layer. Build layers, 424 layers are pooling layers, 425 are convolutional layers, and 426 are pooling layers; in another implementation, 421 and 422 are convolutional layers, 423 are pooling layers, and 424 and 425 are convolutional layers. Layer, 426 is the pooling layer. That is, the output of the convolutional layer can be used as the input of the subsequent pooling layer, or as the input of another convolutional layer to continue the convolution operation.
下面将以卷积层421为例,介绍一层卷积层的内部工作原理。The following will take the convolutional layer 421 as an example to introduce the internal working principle of a convolutional layer.
卷积层421可以包括很多个卷积算子,卷积算子也称为核,其在图像处理中的作用相当于一个从输入图像矩阵中提取特定信息的过滤器,卷积算子本质上可以是一个权重矩阵,这个权重矩阵通常被预先定义,在对图像进行卷积操作的过程中,权重矩阵通常在输入图像上沿着水平方向一个像素接着一个像素(或两个像素接着两个像素……这取决于步长stride的取值)的进行处理,从而完成从图像中提取特定特征的工作。该权重矩阵的大小应该与图像的大小相关,需要注意的是,权重矩阵的纵深维度(depth dimension)和输入图像的纵深维度是相同的,在进行卷积运算的过程中,权重矩阵会延伸到输入图像的整个深度。因此,和一个单一的权重矩阵进行卷积会产生一个单一纵深维度的卷积化输出,但是大多数情况下不使用单一权重矩阵,而是应用多个尺寸(行×列)相同的权重矩阵,即多个同型矩阵。每个权重矩阵的输出被堆叠起来形成卷积图像的纵深维度,这里的维度可以理解为由上面所述的“多个”来决定。不同的权重矩阵可以用来提取图像中不同的特征,例如一个权重矩阵用来提取图像边缘信息,另一个权重矩阵用来提取图像的特定颜色,又一个权重矩阵用来对图像中不需要的噪点进行模糊化等。该多个权重矩阵尺寸(行×列)相同,经过该多个尺寸相同的权重矩阵提取后的特征图的尺寸也相同,再将提取到的多个尺寸相同的特征图合并形成卷积运算的输出。The convolution layer 421 can include many convolution operators. The convolution operator is also called a kernel. Its role in image processing is equivalent to a filter that extracts specific information from the input image matrix. The convolution operator is essentially It can be a weight matrix. This weight matrix is usually pre-defined. In the process of convolution on the image, the weight matrix is usually one pixel after one pixel (or two pixels after two pixels) along the horizontal direction on the input image. ...It depends on the value of stride) to complete the work of extracting specific features from the image. The size of the weight matrix should be related to the size of the image. It should be noted that the depth dimension of the weight matrix and the depth dimension of the input image are the same. During the convolution operation, the weight matrix will extend to Enter the entire depth of the image. Therefore, convolution with a single weight matrix will produce a single depth dimension convolution output, but in most cases, a single weight matrix is not used, but multiple weight matrices of the same size (row × column) are applied. That is, multiple homogeneous matrices. The output of each weight matrix is stacked to form the depth dimension of the convolutional image, where the dimension can be understood as determined by the "multiple" mentioned above. Different weight matrices can be used to extract different features in the image. For example, one weight matrix is used to extract edge information of the image, another weight matrix is used to extract specific colors of the image, and another weight matrix is used to eliminate unwanted noise in the image. Perform fuzzification, etc. The multiple weight matrices have the same size (row×column), and the feature maps extracted by the multiple weight matrices of the same size have the same size, and then the multiple extracted feature maps of the same size are combined to form a convolution operation. Output.
这些权重矩阵中的权重值在实际应用中需要经过大量的训练得到,通过训练得到的权重值形成的各个权重矩阵可以用来从输入图像中提取信息,从而使得卷积神经网络400进行正确的预测。The weight values in these weight matrices need to be obtained through a lot of training in practical applications. Each weight matrix formed by the weight values obtained through training can be used to extract information from the input image, so that the convolutional neural network 400 can make correct predictions. .
当卷积神经网络400有多个卷积层的时候,初始的卷积层(例如421)往往提取较多的一般特征,该一般特征也可以称之为低级别的特征;随着卷积神经网络400深度的加深,越往后的卷积层(例如426)提取到的特征越来越复杂,比如高级别的语义之类的特征, 语义越高的特征越适用于待解决的问题。When the convolutional neural network 400 has multiple convolutional layers, the initial convolutional layer (such as 421) often extracts more general features, which can also be called low-level features; With the deepening of the network 400, the features extracted by the subsequent convolutional layers (for example, 426) become more and more complex, such as features such as high-level semantics, and features with higher semantics are more suitable for the problem to be solved.
池化层:Pooling layer:
由于常常需要减少训练参数的数量,因此卷积层之后常常需要周期性的引入池化层,在如图1中420所示例的421-426各层,可以是一层卷积层后面跟一层池化层,也可以是多层卷积层后面接一层或多层池化层。在图像处理过程中,池化层的唯一目的就是减少图像的空间大小。池化层可以包括平均池化算子和/或最大池化算子,以用于对输入图像进行采样得到较小尺寸的图像。平均池化算子可以在特定范围内对图像中的像素值进行计算产生平均值作为平均池化的结果。最大池化算子可以在特定范围内取该范围内值最大的像素作为最大池化的结果。另外,就像卷积层中用权重矩阵的大小应该与图像尺寸相关一样,池化层中的运算符也应该与图像的大小相关。通过池化层处理后输出的图像尺寸可以小于输入池化层的图像的尺寸,池化层输出的图像中每个像素点表示输入池化层的图像的对应子区域的平均值或最大值。Since it is often necessary to reduce the number of training parameters, it is often necessary to periodically introduce a pooling layer after the convolutional layer. In the layers 421-426 as illustrated by 420 in Figure 1, it can be a convolutional layer followed by a layer The pooling layer can also be a multi-layer convolutional layer followed by one or more pooling layers. In the image processing process, the only purpose of the pooling layer is to reduce the size of the image space. The pooling layer may include an average pooling operator and/or a maximum pooling operator for sampling the input image to obtain a smaller size image. The average pooling operator can calculate the pixel values in the image within a specific range to generate an average value as the result of average pooling. The maximum pooling operator can take the pixel with the largest value within a specific range as the result of the maximum pooling. In addition, just as the size of the weight matrix used in the convolutional layer should be related to the image size, the operators in the pooling layer should also be related to the image size. The size of the image output after processing by the pooling layer can be smaller than the size of the image of the input pooling layer, and each pixel in the image output by the pooling layer represents the average value or the maximum value of the corresponding sub-region of the image input to the pooling layer.
神经网络层430:Neural network layer 430:
在经过卷积层/池化层420的处理后,卷积神经网络400还不足以输出所需要的输出信息。因为如前所述,卷积层/池化层420只会提取特征,并减少输入图像带来的参数。然而为了生成最终的输出信息(所需要的类信息或其他相关信息),卷积神经网络400需要利用神经网络层430来生成一个或者一组所需要的类的数量的输出。因此,在神经网络层430中可以包括多层隐含层(如图1所示的431、432至43n)以及输出层440,该多层隐含层中所包含的参数可以根据具体的任务类型的相关训练数据进行预先训练得到,例如该任务类型可以包括图像识别,图像分类,图像超分辨率重建等等。After processing by the convolutional layer/pooling layer 420, the convolutional neural network 400 is not enough to output the required output information. Because as mentioned above, the convolutional layer/pooling layer 420 only extracts features and reduces the parameters brought by the input image. However, in order to generate the final output information (required class information or other related information), the convolutional neural network 400 needs to use the neural network layer 430 to generate one or a group of required classes of output. Therefore, the neural network layer 430 can include multiple hidden layers (431, 432 to 43n as shown in FIG. 1) and an output layer 440. The parameters contained in the hidden layers can be based on specific task types. The relevant training data of the, for example, the task type can include image recognition, image classification, image super-resolution reconstruction and so on.
在神经网络层430中的多层隐含层之后,也就是整个卷积神经网络400的最后层为输出层440,该输出层440具有类似分类交叉熵的损失函数,具体用于计算预测误差,一旦整个卷积神经网络400的前向传播(如图1由410至440方向的传播为前向传播)完成,反向传播(如图1由440至410方向的传播为反向传播)就会开始更新前面提到的各层的权重值以及偏差,以减少卷积神经网络400的损失,及卷积神经网络400通过输出层输出的结果和理想结果之间的误差。After the multiple hidden layers in the neural network layer 430, that is, the final layer of the entire convolutional neural network 400 is the output layer 440. The output layer 440 has a loss function similar to the classification cross entropy, which is specifically used to calculate the prediction error. Once the forward propagation of the entire convolutional neural network 400 (as shown in Figure 1, the propagation from the 410 to 440 direction is forward propagation) is completed, the back propagation (as shown in Figure 1, the propagation from the 440 to 410 direction is the back propagation) will be Start to update the weight values and deviations of the aforementioned layers to reduce the loss of the convolutional neural network 400 and the error between the output result of the convolutional neural network 400 through the output layer and the ideal result.
需要说明的是,如图1所示的卷积神经网络400仅作为一种卷积神经网络的示例,在具体的应用中,卷积神经网络还可以以其他网络模型的形式存在。It should be noted that the convolutional neural network 400 shown in FIG. 1 is only used as an example of a convolutional neural network. In specific applications, the convolutional neural network may also exist in the form of other network models.
(4)循环神经网络(recurrent neural networks,RNN)是用来处理序列数据的。在传统的神经网络模型中,是从输入层到隐含层再到输出层,层与层之间是全连接的,而对于每一层层内之间的各个节点是无连接的。这种普通的神经网络虽然解决了很多难题,但是却仍然对很多问题却无能无力。例如,你要预测句子的下一个单词是什么,一般需要用到前面的单词,因为一个句子中前后单词并不是独立的。RNN之所以称为循环神经网路,即一个序列当前的输出与前面的输出也有关。具体的表现形式为网络会对前面的信息进行记忆并应用于当前输出的计算中,即隐含层本层之间的节点不再无连接而是有连接的,并且隐含层的输入不仅包括输入层的输出还包括上一时刻隐含层的输出。理论上,RNN能够对任何长度的序列数据进行处理。对于RNN的训练和对传统的CNN或DNN的训练一样。同样使用误差反向传播算法,不过有一点区别:即,如果将RNN进行网络展开,那么其中的参数,如W,是共享的;而如上举例上述的传统神经网络却不是这样。并且在使用梯度下 降算法中,每一步的输出不仅依赖当前步的网络,还依赖前面若干步网络的状态。该学习算法称为基于时间的反向传播算法(back propagation through time,BPTT)。(4) Recurrent Neural Networks (RNN) are used to process sequence data. In the traditional neural network model, from the input layer to the hidden layer and then to the output layer, the layers are fully connected, and each node in each layer is disconnected. Although this ordinary neural network has solved many problems, it is still powerless for many problems. For example, if you want to predict what the next word of a sentence will be, you generally need to use the previous word, because the preceding and following words in a sentence are not independent. The reason why RNN is called recurrent neural network is that the current output of a sequence is also related to the previous output. The specific form is that the network will memorize the previous information and apply it to the calculation of the current output, that is, the nodes between the hidden layer are no longer unconnected but connected, and the input of the hidden layer includes not only The output of the input layer also includes the output of the hidden layer at the previous moment. In theory, RNN can process sequence data of any length. The training of RNN is the same as the training of traditional CNN or DNN. The error backpropagation algorithm is also used, but there is a difference: that is, if the RNN is network expanded, then the parameters, such as W, are shared; this is not the case with the traditional neural network mentioned above. And in the gradient descent algorithm, the output of each step depends not only on the current step of the network, but also on the state of the previous steps of the network. This learning algorithm is called backpropagation through time (BPTT).
(5)损失函数(5) Loss function
在训练深度神经网络的过程中,因为希望深度神经网络的输出尽可能的接近真正想要预测的值,所以可以通过比较当前网络的预测值和真正想要的目标值,再根据两者之间的差异情况来更新每一层神经网络的权重向量(当然,在第一次更新之前通常会有初始化的过程,即为深度神经网络中的各层预先配置参数),比如,如果网络的预测值高了,就调整权重向量让它预测低一些,不断的调整,直到深度神经网络能够预测出真正想要的目标值或与真正想要的目标值非常接近的值。因此,就需要预先定义“如何比较预测值和目标值之间的差异”,这便是损失函数(loss function)或目标函数(objective function),它们是用于衡量预测值和目标值的差异的重要方程。其中,以损失函数举例,损失函数的输出值(loss)越高表示差异越大,那么深度神经网络的训练就变成了尽可能缩小这个loss的过程。损失函数通常是多变量函数,而梯度可以反映变量发生变化时损失函数的输出值的变化速率,梯度的绝对值越大,损失函数的输出值的变化率越大,可以计算更新不同参数时损失函数的梯度,沿着梯度下降最快的方向不断更新参数,尽快缩小损失函数的输出值。In the process of training a deep neural network, because it is hoped that the output of the deep neural network is as close as possible to the value that you really want to predict, you can compare the predicted value of the current network with the target value you really want, and then based on the difference between the two To update the weight vector of each layer of neural network (of course, there is usually an initialization process before the first update, that is, pre-configured parameters for each layer in the deep neural network), for example, if the predicted value of the network If it is high, adjust the weight vector to make it predict lower, and keep adjusting until the deep neural network can predict the really wanted target value or a value very close to the really wanted target value. Therefore, it is necessary to predefine "how to compare the difference between the predicted value and the target value". This is the loss function or objective function, which is used to measure the difference between the predicted value and the target value. Important equation. Among them, take the loss function as an example. The higher the output value (loss) of the loss function, the greater the difference. Then the training of the deep neural network becomes a process of reducing this loss as much as possible. The loss function is usually a multivariate function, and the gradient can reflect the rate of change of the output value of the loss function when the variable changes. The greater the absolute value of the gradient, the greater the rate of change of the output value of the loss function, and the loss can be calculated when updating different parameters. The gradient of the function continuously updates the parameters along the direction of the fastest gradient drop, reducing the output value of the loss function as soon as possible.
(6)反向传播算法(6) Backpropagation algorithm
卷积神经网络可以采用误差反向传播(back propagation,BP)算法在训练过程中修正初始的超分辨率模型中参数的大小,使得超分辨率模型的重建误差损失越来越小。具体地,前向传递输入信号直至输出会产生误差损失,通过反向传播误差损失信息来更新初始的超分辨率模型中参数,从而使误差损失收敛。反向传播算法是以误差损失为主导的反向传播运动,旨在得到最优的超分辨率模型的参数,例如权重矩阵。Convolutional neural networks can use backpropagation (BP) algorithms to modify the size of the parameters in the initial super-resolution model during the training process, so that the reconstruction error loss of the super-resolution model becomes smaller and smaller. Specifically, forwarding the input signal to the output will cause error loss, and the parameters in the initial super-resolution model are updated by backpropagating the error loss information, so that the error loss is converged. The backpropagation algorithm is a backpropagation motion dominated by error loss, and aims to obtain the optimal super-resolution model parameters, such as a weight matrix.
(7)生成式对抗网络(7) Generative confrontation network
生成式对抗网络(generative adversarial networks,GAN)是一种深度学习模型。该模型中至少包括两个模块:一个模块是生成模型(generative model),另一个模块是判别模型(discriminative model),通过这两个模块互相博弈学习,从而产生更好的输出。生成模型和判别模型都可以是神经网络,具体可以是深度神经网络,或者卷积神经网络。GAN的基本原理如下:以生成图片的GAN为例,假设有两个网络,G(generator)和D(discriminator),其中G是一个生成图片的网络,它接收一个随机的噪声z,通过这个噪声生成图片,记做G(z);D是一个判别网络,用于判别一张图片是不是“真实的”。它的输入参数是x,x代表一张图片,输出D(x)代表x为真实图片的概率,如果为1,就代表100%是真实的图片,如果为0,就代表不可能是真实的图片。在对该生成式对抗网络进行训练的过程中,生成网络G的目标就是尽可能生成真实的图片去欺骗判别网络D,而判别网络D的目标就是尽量把G生成的图片和真实的图片区分开来。这样,G和D就构成了一个动态的“博弈”过程,也即“生成式对抗网络”中的“对抗”。最后博弈的结果,在理想的状态下,G可以生成足以“以假乱真”的图片G(z),而D难以判定G生成的图片究竟是不是真实的,即D(G(z))=0.5。这样就得到了一个优异的生成模型G,它可以用来生成图片。Generative adversarial networks (GAN) is a deep learning model. The model includes at least two modules: one module is a generative model, and the other is a discriminative model. Through these two modules, they learn from each other to produce better output. Both the generative model and the discriminant model can be a neural network, specifically a deep neural network, or a convolutional neural network. The basic principle of GAN is as follows: Take the GAN that generates pictures as an example, suppose there are two networks, G (generator) and D (discriminator), where G is a network that generates pictures, and it receives a random noise z through this noise Generate a picture and mark it as G(z); D is a discriminant network used to discriminate whether a picture is "real". Its input parameter is x, x represents a picture, and the output D(x) represents the probability that x is a real picture. If it is 1, it means 100% is a real picture, and if it is 0, it means it cannot be real. image. In the process of training this generative confrontation network, the goal of generating network G is to generate as real pictures as possible to deceive the discriminating network D, and the goal of discriminating network D is to try to distinguish the pictures generated by G from the real pictures Come. In this way, G and D constitute a dynamic "game" process, that is, the "confrontation" in the "generative confrontation network". As a result of the final game, in an ideal state, G can generate a picture G(z) that is enough to "make it fake", but it is difficult for D to determine whether the picture generated by G is real, that is, D(G(z))=0.5. In this way, an excellent generative model G is obtained, which can be used to generate pictures.
(8)图神经网络(8) Graph neural network
在计算机科学中,图是一种数据结构,其由节点,以及节点与节点之间的边这两部分组成,因此,图可以通过公式G=(V,E)表示,G为图,V为节点集合,E为边集合, 如图2所示。节点有时也被称为顶点。节点n1与节点n2之间的边可以表示为(n1,n2)。图神经网络(graph neural network,GNN)是一种直接在图数据结构上运行的神经网络。其中,可以通过矢量表示节点集合中的节点n的标签,还可以通过矢量表示边集合中边(n1,n2)的标签。因此,可以通过节点n1、n2的标签和边(n1,n2)的标签,获得节点n1和/或n2的特征。图神经网络可以包括一个输入层、一个输出层、一个或多个隐含层。In computer science, a graph is a data structure, which is composed of two parts: nodes and edges between nodes. Therefore, the graph can be expressed by the formula G=(V,E), G is the graph, and V is Node set, E is edge set, as shown in Figure 2. Nodes are sometimes called vertices. The edge between node n1 and node n2 can be expressed as (n1, n2). Graph Neural Network (GNN) is a neural network that runs directly on the graph data structure. Among them, the label of node n in the node set can be represented by a vector, and the label of edge (n1, n2) in the edge set can also be represented by a vector. Therefore, the features of the nodes n1 and/or n2 can be obtained through the labels of the nodes n1 and n2 and the labels of the edges (n1, n2). The graph neural network can include an input layer, an output layer, and one or more hidden layers.
图神经网络的目的是训练出一个状态嵌入函数h v=f(x v,x co[v],h ne[v],x ne[v])。其中,h v为节点v的状态(state),x v为节点v的特征表示,x co[v]为与节点v关联的边的特征表示,h ne[v]是与节点v关联的其他节点的状态,x ne[v]是与节点v关联的其他节点的特征表示。以图2所示的节点1为例,虚线内侧的节点2、节点3、节点4、节点6均与节点1之间存在边,节点2、节点3、节点4、节点6均为与节点1关联的节点。存在边连接在节点v与节点i之间,那么节点i为与节点v有关联的节点,节点i可以被称作节点v的邻节点。 The purpose of the graph neural network is to train a state embedding function h v =f(x v ,x co[v] ,h ne[v] ,x ne[v] ). Among them, h v is the state of node v, x v is the feature representation of node v, x co[v] is the feature representation of the edge associated with node v, h ne[v] is the other associated with node v The state of the node, x ne[v] is the characteristic representation of other nodes associated with node v. Take the node 1 shown in Figure 2 as an example, the nodes 2, 3, 4, and 6 on the inner side of the dotted line all have edges between them and the node 1, and the nodes 2, 3, 4, and 6 are all connected to the node 1. The associated node. If there is an edge connected between node v and node i, then node i is a node associated with node v, and node i can be called a neighbor node of node v.
图神经网络模型的输出函数为o v=g(h v,x v),通过损失函数loss优化神经网络, The output function of the graph neural network model is o v =g(h v ,x v ), the neural network is optimized by the loss function loss,
Figure PCTCN2019099653-appb-000011
其中t v为节点v的标签。
Figure PCTCN2019099653-appb-000011
Where t v is the label of node v.
(9)图卷积神经网络(9) Graph Convolutional Neural Network
图卷积神经网络(graph convolutional network,GCN)是一种能对图数据进行深度学习的方法,其可以理解为图神经网络在卷积神经网络中的应用。图卷积神经网络通常又被划分为谱方法(spectral approaches)和非谱方法(non-spectral approaches)两类。谱方法是基于图的谱表征,通过图拉普拉斯算子的特征分解,在傅里叶域中定义卷积运算,该卷积运算需要进行密集的矩阵计算和非局部空间的滤波计算。非谱方法是直接在图上进行卷积而不是在图的谱上。但是图卷积神经网络依赖于图的结构信息,这就导致了在特定图结构上训练得到的模型往往不可以直接被使用到其他图结构上。图卷积算子可以是:Graph Convolutional Neural Network (GCN) is a method for deep learning of graph data, which can be understood as the application of graph neural network in convolutional neural network. Graph convolutional neural networks are usually divided into two categories: spectral approaches (spectral approaches) and non-spectral approaches (non-spectral approaches). The spectral method is based on the spectral representation of the graph. Through the eigendecomposition of the graph Laplacian operator, the convolution operation is defined in the Fourier domain. The convolution operation requires intensive matrix calculation and non-local spatial filtering calculation. The non-spectral method is to directly convolve on the graph instead of on the spectrum of the graph. However, the graph convolutional neural network depends on the structure information of the graph, which leads to the model trained on a specific graph structure often cannot be directly used on other graph structures. The graph convolution operator can be:
Figure PCTCN2019099653-appb-000012
Figure PCTCN2019099653-appb-000012
其中,
Figure PCTCN2019099653-appb-000013
表示节点i在第l层的特征表达,c ij表示归一化因子,与图结构有关,N i表示与节点i关联的节点,与节点i关联的节点可以包含节点i自身,R j表示节点i的类型。通过收集各个节点的特征信息,并作非线性变化,从而增强模型的表达能力。
among them,
Figure PCTCN2019099653-appb-000013
Represents a node i in the characteristics of layer l expression, c ij denotes a normalization factor, related to the FIG structure, N i represents a node associated with node i, the node associated with the node i may include a node i itself, R j represents a node Type of i. By collecting the characteristic information of each node and making nonlinear changes, the expressive ability of the model is enhanced.
(10)图注意力神经网络(10) Graph attention neural network
图注意力网络(graph attention network,GAT)包括图注意力核心层,通过隐式自注意力层将注意力分配到与节点i有关联关系的邻节点集上,根据邻节点的特征,为节点i分配不同的权重,对邻节点的特征进行加权求和。与图卷积神经网络不同之处在于,图注意力网络可以不依赖于具体的图结构。图注意力网络通过多层多头注意力机制,在图的关联结构下对各个节点实施注意力分配,因此可以计算出每个节点从其他相关节点获得的信息。多头注意力机制的本质实质上是加权求和,而权重源自所学会的注意力矩阵与节点自身信息。因此该网络与图卷积神经网络存在差异,该网络所学会的参数不依赖于具体的图结构。The graph attention network (GAT) includes the core layer of graph attention, which distributes attention to the set of neighboring nodes associated with node i through the implicit self-attention layer. According to the characteristics of the neighboring nodes, it is the node i Allocate different weights and perform weighted summation on the features of neighboring nodes. The difference from the graph convolutional neural network is that the graph attention network does not depend on the specific graph structure. The graph attention network uses a multi-layer multi-head attention mechanism to implement attention distribution to each node under the association structure of the graph, so it can calculate the information that each node obtains from other related nodes. The essence of the multi-head attention mechanism is weighted summation, and the weight comes from the learned attention matrix and the node's own information. Therefore, the network is different from the graph convolutional neural network, and the parameters learned by the network do not depend on the specific graph structure.
参见附图3,本申请实施例提供了一种系统架构100。如所述系统架构100所示,数 据采集设备160用于采集待训练数据,本申请实施例中待训练数据包括:图像数据、视频数据、音频数据、文本数据等;并将待训练数据存入数据库130,训练设备120基于数据库130中维护的待训练数据训练得到目标模型/规则101。下面将以实施例一更详细地描述训练设备120如何基于待训练数据得到目标模型/规则101,该目标模型/规则101能够用于实现本申请实施例提供的训练神经网络模型的方法,即,目标模型/规则101可以包括第一神经网络模型以及第二神经网络模型,将待训练数据输入第一神经网络模型,得到多个第四矢量并将该多个第四矢量输入第二神经网络模型,通过损失函数调整该目标模型/规则101的权重参数,即可得到训练后的目标模型/规则101。需要说明的是,在实际的应用中,所述数据库130中维护的待训练数据不一定都来自于数据采集设备160的采集,也有可能是从其他设备接收得到的。另外需要说明的是,训练设备120也不一定完全基于数据库130维护的待训练数据进行目标模型/规则101的训练,也有可能从云端或其他地方获取待训练数据进行模型训练,上述描述不应该作为对本申请实施例的限定。Referring to FIG. 3, an embodiment of the present application provides a system architecture 100. As shown in the system architecture 100, the data collection device 160 is used to collect data to be trained. In the embodiment of the application, the data to be trained includes: image data, video data, audio data, text data, etc.; and the data to be trained is stored Database 130, the training device 120 obtains the target model/rule 101 based on the training data maintained in the database 130. The following will use Embodiment 1 to describe in more detail how the training device 120 obtains the target model/rule 101 based on the data to be trained. The target model/rule 101 can be used to implement the method for training a neural network model provided in the embodiment of the present application, that is, The target model/rule 101 may include a first neural network model and a second neural network model. Input the data to be trained into the first neural network model to obtain multiple fourth vectors and input the multiple fourth vectors into the second neural network model , By adjusting the weight parameters of the target model/rule 101 through the loss function, the trained target model/rule 101 can be obtained. It should be noted that, in actual applications, the data to be trained in the database 130 may not all come from the collection of the data collection device 160, and may also be received from other devices. In addition, it should be noted that the training device 120 does not necessarily perform the training of the target model/rule 101 based on the training data maintained by the database 130. It may also obtain the training data from the cloud or other places for model training. The above description should not be used as Limitations of the embodiments of this application.
根据训练设备120训练得到的目标模型/规则101可以应用于不同的系统或设备中,如应用于图3所示的执行设备110,所述执行设备110可以是终端,如手机终端,平板电脑,笔记本电脑,AR/VR,车载终端等,还可以是服务器或者云端等。在附图3中,执行设备110配置有输入/输出接口112,用于与外部设备进行数据交互,用户可以通过客户设备140向输入/输出接口112输入数据,所述输入数据在本申请实施例中可以包括多个待处理数据。The target model/rule 101 trained by the training device 120 can be applied to different systems or devices, such as the execution device 110 shown in FIG. 3, which can be a terminal, such as a mobile phone terminal, a tablet computer, Laptops, AR/VR, vehicle-mounted terminals, etc., can also be servers or clouds. In FIG. 3, the execution device 110 is equipped with an input/output interface 112 for data interaction with external devices. The user can input data to the input/output interface 112 through the client device 140. The input data is described in the embodiment of the present application. Can include multiple data to be processed.
预处理模块113用于根据输入/输出接口112接收到的输入数据(如所述图像数据、视频数据、音频数据、文本数据等,该输入数据可以是本申请实施例中的待处理数据)进行预处理,在本申请实施例中,预处理模块113可以用于例如提取输入数据的特征。The preprocessing module 113 is configured to perform processing according to the input data (such as the image data, video data, audio data, text data, etc.) received by the input/output interface 112. The input data may be the data to be processed in the embodiment of the present application. Preprocessing. In this embodiment of the application, the preprocessing module 113 may be used to extract features of input data, for example.
在执行设备110对输入数据进行预处理,或者在执行设备110的计算模块111执行计算等相关的处理过程中,执行设备110可以调用数据存储系统150中的数据、代码等以用于相应的处理,也可以将相应处理得到的数据、指令等存入数据存储系统150中。When the execution device 110 preprocesses input data, or when the calculation module 111 of the execution device 110 performs calculations and other related processing, the execution device 110 may call data, codes, etc. in the data storage system 150 for corresponding processing , The data, instructions, etc. obtained by corresponding processing may also be stored in the data storage system 150.
最后,输入/输出接口112将处理结果返回给客户设备140,从而提供给用户。Finally, the input/output interface 112 returns the processing result to the client device 140 to provide it to the user.
值得说明的是,训练设备120可以针对不同的目标或称不同的任务,基于不同的待训练数据生成相应的目标模型/规则101,该相应的目标模型/规则101即可以用于实现上述目标或完成上述任务,从而为用户提供所需的结果。It is worth noting that the training device 120 can generate corresponding target models/rules 101 based on different data to be trained for different goals or tasks, and the corresponding target models/rules 101 can be used to achieve the above goals or Complete the above tasks to provide users with the desired results.
在附图3中所示情况下,用户可以手动给定输入数据,该手动给定可以通过输入/输出接口112提供的界面进行操作。另一种情况下,客户设备140可以自动地向输入/输出接口112发送输入数据,如果要求客户设备140自动发送输入数据需要获得用户的授权,则用户可以在客户设备140中设置相应权限。用户可以在客户设备140查看执行设备110输出的结果,具体的呈现形式可以是显示、声音、动作等具体方式。客户设备140也可以作为数据采集端,采集如图所示输入输入/输出接口112的输入数据及输出输入/输出接口112的输出结果作为新的样本数据,并存入数据库130。当然,也可以不经过客户设备140进行采集,而是由输入/输出接口112直接将如图所示输入输入/输出接口112的输入数据及输出输入/输出接口112的输出结果,作为新的样本数据存入数据库130。In the case shown in FIG. 3, the user can manually set input data, and the manual setting can be operated through the interface provided by the input/output interface 112. In another case, the client device 140 can automatically send input data to the input/output interface 112. If the client device 140 is required to automatically send the input data and the user's authorization is required, the user can set the corresponding authority in the client device 140. The user can view the result output by the execution device 110 on the client device 140, and the specific presentation form may be a specific manner such as display, sound, and action. The client device 140 can also be used as a data collection terminal to collect the input data of the input/output interface 112 and the output result of the output input/output interface 112 as new sample data as shown in the figure, and store it in the database 130. Of course, it is also possible not to collect through the client device 140, but the input/output interface 112 directly uses the input data of the input/output interface 112 and the output result of the output input/output interface 112 as a new sample as shown in the figure. The data is stored in the database 130.
值得注意的是,附图3仅是本申请实施例提供的一种系统架构的示意图,图中所示设备、器件、模块等之间的位置关系不构成任何限制,例如,在附图3中,数据存储系统 150相对执行设备110是外部存储器,在其它情况下,也可以将数据存储系统150置于执行设备110中。It is worth noting that Fig. 3 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship between the devices, devices, modules, etc. shown in the figure does not constitute any limitation. For example, in Fig. 3 The data storage system 150 is an external memory relative to the execution device 110. In other cases, the data storage system 150 may also be placed in the execution device 110.
如图3所示,根据训练设备120训练得到目标模型/规则101,该目标模型/规则101在本申请实施例中可以是包括本申请实施例中的第一神经网络模型和第二神经网络模型,所述第一神经网络模型可以是卷积神经网络模型、图神经网络模型,所述第二神经网络模型可以是图神经网络模型。As shown in FIG. 3, the target model/rule 101 is obtained by training according to the training device 120, and the target model/rule 101 may include the first neural network model and the second neural network model in the embodiment of the application. The first neural network model may be a convolutional neural network model or a graph neural network model, and the second neural network model may be a graph neural network model.
下面介绍本申请实施例提供的一种芯片硬件结构。The following describes a chip hardware structure provided by an embodiment of the present application.
图4为本申请实施例提供的一种芯片硬件结构,该芯片包括神经网络处理器20。FIG. 4 is a chip hardware structure provided by an embodiment of the application, and the chip includes a neural network processor 20.
神经网络处理器(Neural-network Processing Unit,NPU)20可以作为协处理器挂载到主中央处理器(Host Central Processing Unit,Host CPU)上,由Host CPU分配任务。NPU的核心部分为运算电路203,控制器204控制运算电路203提取存储器(权重存储器或输入存储器)中的数据并进行运算。A neural network processor (Neural-network Processing Unit, NPU) 20 can be mounted as a coprocessor to a host central processing unit (Host Central Processing Unit, Host CPU), and the Host CPU allocates tasks. The core part of the NPU is the arithmetic circuit 203. The controller 204 controls the arithmetic circuit 203 to extract data from the memory (weight memory or input memory) and perform calculations.
在一些实现中,运算电路203内部包括多个处理单元(Process Engine,PE)。在一些实现中,运算电路203是二维脉动阵列。运算电路203还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中,运算电路203是通用的矩阵处理器。In some implementations, the arithmetic circuit 203 includes multiple processing units (Process Engine, PE). In some implementations, the arithmetic circuit 203 is a two-dimensional systolic array. The arithmetic circuit 203 may also be a one-dimensional systolic array or other electronic circuits capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 203 is a general-purpose matrix processor.
举例来说,假设有输入矩阵A,权重矩阵B,输出矩阵C。运算电路从权重存储器202中取矩阵B相应的数据,并缓存在运算电路中每一个PE上。运算电路从输入存储器201中取矩阵A数据与矩阵B进行矩阵运算,得到的矩阵的部分结果或最终结果,保存在累加器(accumulator)208中。For example, suppose there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit fetches the data corresponding to matrix B from the weight memory 202 and caches it on each PE in the arithmetic circuit. The arithmetic circuit fetches matrix A data and matrix B from the input memory 201 to perform matrix operations, and the partial or final result of the obtained matrix is stored in an accumulator 208.
向量计算单元207可以对运算电路203的输出做进一步处理,如向量乘,向量加,指数运算,对数运算,大小比较等等。例如,向量计算单元207可以用于神经网络中非卷积/非FC层的网络计算,如池化(Pooling),批归一化(Batch Normalization),局部响应归一化(Local Response Normalization)等。The vector calculation unit 207 can perform further processing on the output of the operation circuit 203, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, and so on. For example, the vector calculation unit 207 can be used for network calculations in the non-convolutional/non-FC layer of the neural network, such as pooling, batch normalization, local response normalization, etc. .
在一些实现中,向量计算单元能207将经处理的输出的向量存储到统一缓存器206。例如,向量计算单元207可以将非线性函数应用到运算电路203的输出,例如累加值的向量,用以生成激活值。在一些实现中,向量计算单元207生成归一化的值、合并值,或二者均有。在一些实现中,处理过的输出的向量能够用作到运算电路203的激活输入,例如用于在神经网络中的后续层中的使用。In some implementations, the vector calculation unit 207 can store the processed output vector to the unified buffer 206. For example, the vector calculation unit 207 may apply a nonlinear function to the output of the arithmetic circuit 203, such as a vector of accumulated values, to generate the activation value. In some implementations, the vector calculation unit 207 generates a normalized value, a combined value, or both. In some implementations, the processed output vector can be used as an activation input to the arithmetic circuit 203, for example for use in a subsequent layer in a neural network.
本申请提供的方法的部分或全部步骤可以由运算电路203或向量计算单元207执行。Part or all of the steps of the method provided in this application may be executed by the arithmetic circuit 203 or the vector calculation unit 207.
统一存储器206用于存放输入数据以及输出数据。The unified memory 206 is used to store input data and output data.
权重数据直接通过存储单元访问控制器205(Direct Memory Access Controller,DMAC)将外部存储器中的输入数据搬运到输入存储器201和/或统一存储器206、将外部存储器中的权重数据存入权重存储器202,以及将统一存储器206中的数据存入外部存储器。The weight data directly transfers the input data in the external memory to the input memory 201 and/or the unified memory 206 through the storage unit access controller 205 (Direct Memory Access Controller, DMAC), and stores the weight data in the external memory into the weight memory 202, And save the data in the unified memory 206 into the external memory.
总线接口单元(Bus Interface Unit,BIU)210,用于通过总线实现主CPU、DMAC和取指存储器209之间进行交互。A bus interface unit (BIU) 210 is used to implement interaction between the main CPU, the DMAC, and the fetch memory 209 through the bus.
与控制器204连接的取指存储器(instruction fetch buffer)209,用于存储控制器204使用的指令。An instruction fetch buffer 209 connected to the controller 204 is used to store instructions used by the controller 204.
控制器204,用于调用取指存储器209中缓存的指令,实现控制该运算加速器的工作过程。The controller 204 is used to call the instructions cached in the instruction fetch memory 209 to control the working process of the computing accelerator.
一般地,统一存储器206,输入存储器201,权重存储器202以及取指存储器209均为片上(On-Chip)存储器,外部存储器为私有于该NPU外部的存储器,该外部存储器可以为双倍数据率同步动态随机存储器(Double Data Rate Synchronous Dynamic Random Access Memory,简称DDR SDRAM)、高带宽存储器(High Bandwidth Memory,HBM)或其他可读可写的存储器。Generally, the unified memory 206, the input memory 201, the weight memory 202, and the fetch memory 209 are all on-chip (On-Chip) memories. The external memory is a memory private to the NPU, and the external memory can be synchronized at double data rate. Dynamic random access memory (Double Data Rate Synchronous Dynamic Random Access Memory, referred to as DDR SDRAM), high bandwidth memory (High Bandwidth Memory, HBM) or other readable and writable memory.
如图5所示,本申请实施例提供了一种系统架构300。该系统架构包括本地设备301、本地设备302以及执行设备310和数据存储系统350,其中,本地设备301和本地设备302通过通信网络与执行设备310连接。As shown in FIG. 5, an embodiment of the present application provides a system architecture 300. The system architecture includes a local device 301, a local device 302, an execution device 310, and a data storage system 350. The local device 301 and the local device 302 are connected to the execution device 310 through a communication network.
执行设备310可以由一个或多个服务器实现。可选的,执行设备310可以与其它计算设备配合使用,例如:数据存储器、路由器、负载均衡器等设备。执行设备310可以布置在一个物理站点上,或者分布在多个物理站点上。执行设备310可以使用数据存储系统350中的数据,或者调用数据存储系统350中的程序代码来实现本申请实施例的搜索神经网络结构的方法。The execution device 310 may be implemented by one or more servers. Optionally, the execution device 310 can be used in conjunction with other computing devices, such as data storage, routers, load balancers and other devices. The execution device 310 may be arranged on one physical site or distributed on multiple physical sites. The execution device 310 may use the data in the data storage system 350 or call the program code in the data storage system 350 to implement the method for searching the neural network structure of the embodiment of the present application.
具体地,执行设备310可以搭建成一个图像识别神经网络,该图像识别神经网络可以用于图像识别或者进行图像处理等等。Specifically, the execution device 310 can be built as an image recognition neural network, which can be used for image recognition or image processing.
用户可以操作各自的用户设备(例如本地设备301和本地设备302)与执行设备310进行交互。每个本地设备可以表示任何计算设备,例如个人计算机、计算机工作站、智能手机、平板电脑、智能摄像头、智能汽车或其他类型蜂窝电话、媒体消费设备、可穿戴设备、机顶盒、游戏机等。The user can operate respective user devices (for example, the local device 301 and the local device 302) to interact with the execution device 310. Each local device can represent any computing device, such as personal computers, computer workstations, smart phones, tablets, smart cameras, smart cars or other types of cellular phones, media consumption devices, wearable devices, set-top boxes, game consoles, etc.
每个用户的本地设备可以通过任何通信机制/通信标准的通信网络与执行设备310进行交互,通信网络可以是广域网、局域网、点对点连接等方式,或它们的任意组合。The local device of each user can interact with the execution device 310 through a communication network of any communication mechanism/communication standard. The communication network can be a wide area network, a local area network, a point-to-point connection, etc., or any combination thereof.
上述执行设备310也可以称为云端设备,此时执行设备310一般部署在云端。The above-mentioned execution device 310 may also be referred to as a cloud device. At this time, the execution device 310 is generally deployed in the cloud.
如上文所述,神经网络模型会依赖于待训练数据。对于待训练数据,神经网络模型的输出结果与待训练数据本身特征的距离较近、准确率高;而将训练好的神经网络模型应用在实际使用时,该训练好的神经网络模型输出的识别结果与输入数据本身特征的距离较远、准确率低。为了降低神经网络模型对待训练数据的依赖程度,本申请提供了一种数据处理的方法,使得训练好的神经网络模型被应用在某个特定场景时可以实现高准确率识别。As mentioned above, the neural network model will depend on the data to be trained. For the data to be trained, the output result of the neural network model is close to the characteristics of the data to be trained, and the accuracy rate is high; when the trained neural network model is applied in actual use, the output of the trained neural network model is recognized The result is far from the characteristics of the input data itself, and the accuracy is low. In order to reduce the degree of dependence of the neural network model on training data, this application provides a data processing method, so that the trained neural network model can achieve high-accuracy recognition when it is applied to a specific scene.
图6是本申请实施例提供的一种数据处理的方法的示意性流程图。方法500可以由如图3所示的执行设备110执行。方法500可以由如图4所示的神经网络处理器20执行。方法500可以由如图5所示的执行设备310执行。FIG. 6 is a schematic flowchart of a data processing method provided by an embodiment of the present application. The method 500 may be executed by the execution device 110 as shown in FIG. 3. The method 500 may be executed by the neural network processor 20 as shown in FIG. 4. The method 500 may be executed by the execution device 310 as shown in FIG. 5.
501,获取多个待处理数据。501. Obtain multiple data to be processed.
待处理数据可以理解为即将输入神经网络模型、并通过该神经网络模型进行处理的数据。待处理数据可以是文本数据、图像数据、视频数据、音频数据等,例如文本文件、文本文件中的一段文字、一个图片文件、一个图片文件中的图像块、一个视频文件中的一帧画面、一个视频文件、一个视频文件中的一段视频、一个音频文件、一个音频文件中的一段音频。多个待处理数据可以是多个文本文件、一个文本文件中的多段文字、多个图片文 件、一张图片文件中的多个图像块、一个视频文件中的多帧画面、多个视频文件、一个视频文件内的多段视频、多个音频文件、一个音频文件内的多段音频等。本申请对待处理数据的类型不作限定。The data to be processed can be understood as the data that is about to be input to the neural network model and processed by the neural network model. The data to be processed can be text data, image data, video data, audio data, etc., such as a text file, a paragraph of text in a text file, a picture file, an image block in a picture file, a frame in a video file, A video file, a video in a video file, an audio file, and an audio in an audio file. Multiple data to be processed can be multiple text files, multiple texts in a text file, multiple picture files, multiple image blocks in a picture file, multiple frames in a video file, multiple video files, Multiple pieces of video in one video file, multiple audio files, multiple pieces of audio in one audio file, etc. This application does not limit the type of data to be processed.
获取待处理数据的方式可以有多种方式。在一个示例中,数据库中存储有该多个待处理数据,因此执行方法500的设备可以直接从数据库中调取该多个待处理数据。在一个示例中,执行方法500的设备上设置有摄像头,那么可以通过使用摄像头拍摄的方法,获取该多个待处理数据。在一个示例中,云端设备上存储有该多个待处理数据,因此执行方法500的设备可以通过通信网络接收云端设备发送的该多个待处理数据。There are many ways to obtain the data to be processed. In an example, the plurality of to-be-processed data are stored in the database, so the device executing the method 500 can directly retrieve the plurality of to-be-processed data from the database. In an example, if a camera is provided on the device that executes the method 500, the plurality of data to be processed can be obtained by using a camera shooting method. In an example, the cloud device stores the multiple data to be processed, so the device that executes the method 500 can receive the multiple data to be processed sent by the cloud device through the communication network.
502,使用第一神经网络模型对所述多个待处理数据进行处理,得到与所述多个待处理数据一一对应的多个第一矢量,其中,所述第一神经网络模型是基于通用数据训练获得。502. Use a first neural network model to process the plurality of to-be-processed data to obtain a plurality of first vectors corresponding to the plurality of to-be-processed data, wherein the first neural network model is based on a general Data training is obtained.
也就是说,将多个待处理数据输入第一神经网络模型,使用第一神经网络模型对多个待处理数据进行例如特征筛选(将有用的特征筛选出来)、特征融合(合并多个特征)等处理操作,输出与该多个待处理数据一一对应的多个第一矢量。以图1所示的卷积神经网络为例,对将多个待处理数据进行处理,可以是从输入层输入该多个待处理数据,通过卷积层和/或池化层等隐含层进行数据处理,从第一神经网络模型的输出层输出与该多个待处理数据一一对应的多个第一矢量。其中,第一矢量可以是一个数,也可以是包含多个数的矢量。In other words, input multiple data to be processed into the first neural network model, and use the first neural network model to perform, for example, feature screening (filter out useful features) and feature fusion (combine multiple features) Wait for processing operations, and output a plurality of first vectors one-to-one corresponding to the plurality of data to be processed. Take the convolutional neural network shown in Figure 1 as an example. To process multiple data to be processed, the multiple data to be processed can be input from the input layer, and pass through hidden layers such as convolutional layer and/or pooling layer. Data processing is performed, and a plurality of first vectors corresponding to the plurality of data to be processed are output from the output layer of the first neural network model. Among them, the first vector can be a number or a vector containing multiple numbers.
第一神经网络模型的类型可以是卷积神经网络模型、图神经网络模型、图卷积神经网络模型、图注意力神经网络模型等。本申请对第一神经网络模型的类型不作限定。The type of the first neural network model may be a convolutional neural network model, a graph neural network model, a graph convolutional neural network model, a graph attention neural network model, and so on. This application does not limit the type of the first neural network model.
特别地,第一神经网络模型可以是传统的卷积神经网络模型。传统的卷积神经网络的输出层为全连接层,全连接层有时又被称为分类器。也就是说,传统的卷积神经网络模型可以直接输出待处理数据的识别结果。例如,待处理数据为图像,传统的卷积神经网络模型可以直接输出该图像中是否存在人物,人物为男或为女等识别结果。该识别结果往往只能表征待处理数据属于某个特征的概率。In particular, the first neural network model may be a traditional convolutional neural network model. The output layer of the traditional convolutional neural network is a fully connected layer, which is sometimes called a classifier. In other words, the traditional convolutional neural network model can directly output the recognition result of the data to be processed. For example, if the data to be processed is an image, the traditional convolutional neural network model can directly output the recognition results of whether there is a person in the image, whether the person is male or female. The recognition result can often only represent the probability that the data to be processed belongs to a certain feature.
特别地,第一神经网络模型还可以是一种不包括全连接层的特殊卷积神经网络模型,其可以输出卷积层或池化层的计算结果。也就是说,第一神经网络模型可以输出在传统的卷积神经网络模型中属于中间计算结果的处理结果。为了描述简便,将这种特殊卷积神经网络模型所输出的处理结果称作中间计算结果。通常情况下,中间计算结果能够用于表征待处理数据的部分或全部信息。In particular, the first neural network model may also be a special convolutional neural network model that does not include a fully connected layer, which can output the calculation result of the convolutional layer or the pooling layer. In other words, the first neural network model can output processing results that are intermediate calculation results in the traditional convolutional neural network model. For simplicity of description, the processing results output by this special convolutional neural network model are called intermediate calculation results. Generally, the intermediate calculation result can be used to characterize part or all of the information of the data to be processed.
特别地,第一神经网络模型可以是图神经网络模型。In particular, the first neural network model may be a graph neural network model.
可选的,所述使用第一神经网络模型对所述多个待处理数据进行处理,包括:使用所述第一神经网络模型对所述多个待处理数据以及第五关联关系信息进行处理,所述第五关联关系信息用于指示至少一个待处理数据组,每个待处理数据组包括满足先验假设的两个待处理数据。Optionally, the using the first neural network model to process the plurality of data to be processed includes: using the first neural network model to process the plurality of data to be processed and the fifth association relationship information, The fifth association relationship information is used to indicate at least one data group to be processed, and each data group to be processed includes two data to be processed that satisfy a priori assumption.
待处理数据组中包含存在关联关系的两个待处理数据。即该待处理数据组内的两个待处理数据之间存在满足先验假设的关联关系。例如,待处理数据组为(待处理数据1,待处理数据2),那么待处理数据1与待处理数据2之间存在满足先验假设的关联关系。也就是说,将多个待处理数据以及反映该多个待处理数据之间关联关系的第五关联关系信息输入第一神经网络模型,第一神经网络模型可以根据第五关联关系信息,确定数据与数据 之间是否有影响,并通过第一神经网络模型内的权重参数反映数据与数据之间的影响程度,从而得到能够反映数据关联性的多个第一矢量,该多个第一矢量与该多个待处理数据一一对应。The data group to be processed contains two data to be processed that have an association relationship. That is, there is an association relationship between the two to-be-processed data in the to-be-processed data group that satisfies a priori assumption. For example, if the data group to be processed is (data to be processed 1, data to be processed 2), then there is an association relationship between the data to be processed 1 and the data to be processed 2 that satisfies the a priori assumption. In other words, the plurality of data to be processed and the fifth association relationship information reflecting the association relationship between the plurality of data to be processed are input into the first neural network model, and the first neural network model can determine the data according to the fifth association relationship information Whether there is an influence between the data and the data, and the weight parameter in the first neural network model reflects the degree of influence between the data and the data, so as to obtain a plurality of first vectors that can reflect the relevance of the data. The multiple to-be-processed data correspond one to one.
假说/假设(hypothesis),即指按照预先设定,对某种现象进行的解释,即根据已知的科学事实和科学原理,对所研究的自然现象及其规律性提出的推测和说明,而且数据经过详细的分类、归纳与分析,得到一个暂时性但是可以被接受的解释。Hypothesis (hypothesis) refers to the explanation of a certain phenomenon in accordance with the pre-determination, that is, based on the known scientific facts and scientific principles, the speculation and explanation of the natural phenomenon under study and its regularity, and After detailed classification, induction and analysis of the data, a temporary but acceptable explanation is obtained.
先验概率(prior probability)出现在贝叶斯统计推断中,是指随机变量的先验概率分布(通常简称为先验),即在考虑某些证据之前表达一个人对该变量的信念的概率分布。Prior probability appears in Bayesian statistical inference and refers to the prior probability distribution of random variables (usually referred to as a priori), that is, the probability of expressing a person’s belief in the variable before considering some evidence distributed.
先验假设即指对假说空间全部假说提出的一种先验概率分布。以文本数据为例,多个待处理数据可以是多段文字,其中,一段文字内可以包括多个语句。通常情况下,不同段文字表述不同的主题,因此,一段文字内的多个语句之间的关联性较大,分属不同段的多个语句之间的关联性较弱或不存在关联。那么可以存在一种先验假设,如属于同段落的多个句子之间存在关联。A priori hypothesis refers to a prior probability distribution proposed for all hypotheses in the hypothesis space. Taking text data as an example, the plurality of data to be processed may be multiple paragraphs of text, wherein a paragraph of text may include multiple sentences. Normally, different paragraphs of text express different topics. Therefore, multiple sentences in a paragraph are more related, and multiple sentences belonging to different paragraphs are weak or non-relevant. Then there can be a priori hypothesis, such as a correlation between multiple sentences belonging to the same paragraph.
以图片数据为例,多个待处理数据可以是多帧画面。通常情况下,随着时间的迁移,两帧画面相隔的时长越长,这两帧画面之间的关联性较小;两帧画面相隔的时长越短,这两帧画面之间的关联性越大。那么可以存在一种先验假设,如间隔时长小于预设阈值的两帧画面之间存在关联。预设阈值例如可以是8s。Taking picture data as an example, the multiple to-be-processed data may be multiple frames of pictures. Normally, as time goes by, the longer the interval between two frames, the smaller the correlation between the two frames; the shorter the interval between the two frames, the greater the correlation between the two frames Big. Then there can be a priori assumption, such as a correlation between two frames whose interval is less than a preset threshold. The preset threshold may be 8s, for example.
以视频数据为例,该多个待处理数据可以是多段视频,其中,随着时间的迁移,两段视频之间相隔的时长越长,这两段视频之间的关联性较小;两段视频相隔的时长越短,这两段视频之间的关联性越大。那么可以存在一种先验假设,如最小间隔时长小于预设阈值的两段视频之间存在关联。预设阈值例如可以是8s。Taking video data as an example, the multiple pieces of to-be-processed data may be multiple pieces of videos, where, as time moves, the longer the interval between the two pieces of video, the smaller the correlation between the two pieces of videos; The shorter the video interval, the greater the correlation between the two videos. Then there may be a priori assumption, such as a correlation between two videos whose minimum interval length is less than a preset threshold. The preset threshold may be 8s, for example.
以音频数据为例,该多个待处理数据可以是多段音频,其中,随着时间的迁移,两段音频之间相隔的时长越长,这两段音频之间的关联性较小;两段音频相隔的时长越短,这两段音频之间的关联性越大。那么可以存在一种先验假设,如最小间隔时长小于预设阈值的两段音频之间存在关联。预设阈值例如可以是8s。Taking audio data as an example, the multiple pieces of to-be-processed data may be multiple pieces of audio, where as time moves, the longer the interval between the two pieces of audio, the smaller the correlation between the two pieces of audio; The shorter the audio interval, the greater the correlation between the two audio segments. Then there may be a priori assumption, such as a correlation between two audio segments whose minimum interval duration is less than a preset threshold. The preset threshold may be 8s, for example.
第五关联关系信息可以是矩阵。与其他信息类型相比,矩阵运算更加便捷。The fifth association relationship information may be a matrix. Compared with other information types, matrix operations are more convenient.
可选的,所述第五关联关系信息包括第一关联关系矩阵,所述第一关联关系矩阵中位于第一维度的矢量包括与所述多个待处理数据一一对应的多个元素,所述第一关联关系矩阵中位于第二维度的矢量包括与所述多个待处理数据一一对应的多个元素,其中,所述第一关联关系矩阵中任一元素用于指示所述任一元素在所述第一维度上对应的矢量与所述任一元素在所述第二维度上对应的矢量之间是否具有满足所述先验假设的关联关系。Optionally, the fifth association relationship information includes a first association relationship matrix, and a vector in the first dimension in the first association relationship matrix includes a plurality of elements corresponding to the plurality of data to be processed one-to-one, so The vector in the second dimension in the first correlation matrix includes a plurality of elements corresponding to the plurality of data to be processed, wherein any element in the first correlation matrix is used to indicate any Whether the vector corresponding to the element in the first dimension and the vector corresponding to any element in the second dimension has an association relationship that satisfies the a priori hypothesis.
假设第一关联关系矩阵为P,Assuming that the first correlation matrix is P,
Figure PCTCN2019099653-appb-000014
Figure PCTCN2019099653-appb-000014
其中,P为k×k的矩阵,第i列对应待处理数据i,第j行对应待处理数据j,第i列、第j行的元素p i,j表示待处理数据i与待处理数据j之间是否存在满足先验假设的关联关系。当待处理数据i与待处理数据j之间存在关联关系,第i列、第j行的元素p i,j取值可以为 1,待处理数据i与待处理数据j之间不存在关联关系,第i列、第j行的元素p i,j取值可以为0。或者,当待处理数据i与待处理数据j之间存在关联关系,第i列、第j行的元素p i,j取值可以为0,待处理数据i与待处理数据j之间不存在关联关系,第i列、第j行的元素p i,j取值可以为1。 Among them, P is a k×k matrix, the i-th column corresponds to the to-be-processed data i, the j-th row corresponds to the to-be-processed data j, the elements p i,j in the i-th column and the j-th row represent the to-be-processed data i and the to-be-processed data Whether there is an association relationship between j that satisfies the prior hypothesis. When there is an association relationship between the to-be-processed data i and the to-be-processed data j , the value of p i,j in the i-th column and the j-th row can be 1, and there is no association between the to-be-processed data i and the to-be-processed data j , The value of element p i,j in the i-th column and j-th row can be 0. Or, when there is an association relationship between the to-be-processed data i and the to-be-processed data j , the value of the element p i,j in the i-th column and the j-th row can be 0, and there is no relationship between the to-be-processed data i and the to-be-processed data j For the association relationship, the value of p i,j in the i-th column and j-th row can be 1.
在一个示例中,矩阵P转秩后得到的矩阵P T与矩阵P相同。也就说,p i,j=p j,i。此时待处理数据i与待处理数据j之间的关联关系可以是没有方向性的。 In an example, the matrix P T obtained after the matrix P is converted to rank is the same as the matrix P. In other words, p i,j = p j,i . At this time, the association relationship between the to-be-processed data i and the to-be-processed data j may be non-directional.
在一个示例中,矩阵P转秩后得到的矩阵P T与矩阵P不同。也就说,p i,j≠p j,i。此时待处理数据i与待处理数据j之间的关联关系是有方向性的。例如,p i,j表示待处理数据i与待处理数据j之间存在由待处理数据i指向待处理数据j的关联关系,p j,i表示待处理数据i与待处理数据j之间存在由待处理数据j指向待处理数据i的关联关系。或者,p i,j表示待处理数据i与待处理数据j之间存在由待处理数据j指向待处理数据i的关联关系,p j,i表示待处理数据i与待处理数据j之间存在由待处理数据i指向待处理数据j的关联关系。 In an example, the matrix P T obtained after the matrix P is converted to rank is different from the matrix P. In other words, p i,j ≠p j,i . At this time, the relationship between the to-be-processed data i and the to-be-processed data j is directional. For example, p i,j indicates that there is an association relationship between the to-be-processed data i and the to-be-processed data j, and p j,i indicates that there is an association between the to-be-processed data i and the to-be-processed data j. The to-be-processed data j points to the association relationship of the to-be-processed data i. Or, p i,j indicates that there is an association relationship between the to-be-processed data i and the to-be-processed data j from the to-be-processed data j to the to-be-processed data i, and p j,i indicates that there is a relationship between the to-be-processed data i and the to-be-processed data j The data i to be processed points to the relationship of the data j to be processed.
多个待处理数据之间的关联程度至少有两种情况。There are at least two cases for the degree of association between multiple data to be processed.
在一个示例中,该多个待处理数据由待处理数据1以及与该待处理数据1有关联的若干个待处理数据组成。如图2所示的节点1、节点2、节点3、节点4、节点6,其中,存在连接在节点1与节点2之间的边,存在连接在节点1与节点3之间的边,存在连接在节点1与节点4之间的边,存在连接在节点1与节点6之间的边。In an example, the plurality of to-be-processed data consists of the to-be-processed data 1 and a number of to-be-processed data related to the to-be-processed data 1. As shown in Figure 2, node 1, node 2, node 3, node 4, node 6, where there is an edge connected between node 1 and node 2, and there is an edge connected between node 1 and node 3. The edge connected between node 1 and node 4 has an edge connected between node 1 and node 6.
在一个示例中,该多个待处理数据包含待处理数据1、与该待处理数据1有关联的若干待处理数据、以及与该待处理数据1无关联的若干待处理数据。如图2所示的节点1、节点4、节点5、节点6,其中,存在连接在节点1与节点4之间的边以及连接在节点1与节点6之间的边,存在连接在节点5与节点4之间的边以及连接在节点5与节点6之间的边,不存在连接在节点1与节点5之间边。In an example, the plurality of data to be processed includes data to be processed 1, a number of data to be processed that are associated with the data to be processed 1, and a number of data to be processed that are not associated with the data to be processed 1. As shown in Figure 2, node 1, node 4, node 5, node 6, where there is an edge connected between node 1 and node 4 and an edge connected between node 1 and node 6, and there is an edge connected between node 5 There is no edge between node 1 and node 5 for the edge between node 4 and the edge between node 5 and node 6.
对于上述两种情况,可以有不同的获取多个待处理数据的方式。For the above two situations, there can be different ways to obtain multiple data to be processed.
在一个示例中,获取多个待处理数据,并根据先验假设,确定该多个待处理数据中任意两个待处理数据之间是否存在关联关系。In an example, a plurality of to-be-processed data is acquired, and based on a priori assumption, it is determined whether there is an association relationship between any two of the plurality of to-be-processed data.
在一个示例中,获取一个待处理数据,并根据先验假设,确定与该一个待处理数据存在关联关系的其他待处理数据。In an example, a piece of to-be-processed data is acquired, and based on a priori hypothesis, other pieces of to-be-processed data that have an association relationship with the piece of to-be-processed data are determined.
可选的,所述获取多个待处理数据,包括:获取目标数据,所述目标数据为所述多个待处理数据中的一个;获取与所述目标数据具有先验假设的关联数据,所述多个待处理数据包括所述关联数据。Optionally, the acquiring multiple to-be-processed data includes: acquiring target data, where the target data is one of the multiple to-be-processed data; acquiring associated data that has a priori hypothesis with the target data, so The plurality of data to be processed includes the associated data.
也就是说,执行方法500的设备先获取目标数据,再根据先验假设,引入与该目标数据有关的关联数据。In other words, the device executing the method 500 first obtains the target data, and then introduces the associated data related to the target data according to a priori assumption.
以文本数据为例,目标数据可以是语句1,当先验假设为属于同段落的多个句子之间存在关联,那么引入该语句1所在段落中除该语句1以外的其他语句作为关联数据。Taking text data as an example, the target data may be sentence 1. When the a priori assumption is that there is a correlation between multiple sentences belonging to the same paragraph, then other sentences in the paragraph where the sentence 1 is located except for the sentence 1 are introduced as the correlation data.
以图片数据为例,目标数据可以是一段视频中的画面1。当先验假设为间隔时长小于8s的两帧画面之间存在关联,那么将与该画面1间隔小于8s的画面作为关联数据。Taking picture data as an example, the target data can be picture 1 in a video. When it is assumed a priori that there is an association between two frames with an interval of less than 8s, then the frame with an interval of less than 8s from the frame 1 is used as the associated data.
以视频数据为例,目标数据可以是视频1,当先验假设为最小间隔时长小于8s的两段视频之间存在关联,那么将与该视频1最小间隔小于8s的视频作为关联数据。Taking video data as an example, the target data can be video 1. When the a priori assumption is that there is an association between two videos with a minimum interval of less than 8s, then the video with a minimum interval of less than 8s is used as the associated data.
以音频数据为例,目标数据可以是音频1,当先验假设为最小间隔时长小于8s的两段 音频之间存在关联,那么将与该音频1最小间隔小于8s的音频作为关联数据。Taking audio data as an example, the target data can be audio 1. When the a priori assumption is that there is a correlation between two pieces of audio with a minimum interval of less than 8s, then the audio with the minimum interval of audio 1 less than 8s is regarded as the associated data.
上述示例中以时间间隔8s为例来获取关联数据,本领域技术人员可以理解的是,上述时间间隔可以根据不同的场景进行调整。In the above example, a time interval of 8s is taken as an example to obtain the associated data. Those skilled in the art can understand that the above time interval can be adjusted according to different scenarios.
另外,为了降低神经网络模型对待训练数据的依赖,可以使用通用数据训练第一神经网络模型。所谓通用数据,可以是不受场景影响的数据,或对场景依赖度低的数据。例如,第一神经网络模型用于识别图像中的人物特征,其训练数据集可以包括各种可能出现的场景,如街道场景、会议场景、车载场景、乡村场景、亚洲场景、非洲场景、欧美场景等。该多个待处理数据可以是应用在特定场景内的数据。也就是说,可以使用能够处理通用数据的第一神经网络模型处理特殊数据。In addition, in order to reduce the dependence of the neural network model on training data, the first neural network model can be trained using general data. The so-called general data can be data that is not affected by the scene, or data that has low dependence on the scene. For example, the first neural network model is used to identify character features in images, and its training data set can include various possible scenes, such as street scenes, conference scenes, car scenes, rural scenes, Asian scenes, African scenes, European and American scenes Wait. The multiple to-be-processed data may be data applied in a specific scene. In other words, the first neural network model capable of processing general data can be used to process special data.
训练第一神经网络模型的过程可以是将通用数据输入该第一神经网络模型,第一神经网络模型可以对该通用数据进行特征筛选、特征融合等数据处理操作,得到特征矢量。将特征矢量与包含权重参数的权重矩阵进行矩阵运算,得到与该通用数据对应的数据训练结果。之后再计算数据训练结果与该通用数据的标签之间的距离,从而修正第一神经网络模型的权重参数。数据训练结果与该通用数据的标签之间的距离,可以理解为数据训练结果与该通用数据的标签之间的相似程度。信息距离的具体计算方法可以是交叉熵、KL散度、JS散度等方式。The process of training the first neural network model can be to input general data into the first neural network model, and the first neural network model can perform data processing operations such as feature screening and feature fusion on the general data to obtain feature vectors. Perform matrix operations on the feature vector and the weight matrix containing the weight parameters to obtain the data training result corresponding to the general data. Then, the distance between the data training result and the label of the general data is calculated, so as to modify the weight parameter of the first neural network model. The distance between the data training result and the label of the general data can be understood as the degree of similarity between the data training result and the label of the general data. The specific calculation method of the information distance can be cross entropy, KL divergence, JS divergence, etc.
示例性的,为了获取大量的图片训练数据,在采集数据的过程中常用视频的方式采集数据,可以对训练数据打标签,从而获得训练过程需要的有标签的数据。具体打标签的过程以及标签的释义为深度学习领域的常见技术内容,本申请实施例不再赘述。Exemplarily, in order to obtain a large amount of image training data, in the process of collecting data, the data is usually collected in a video manner, and the training data can be tagged, so as to obtain the labeled data required for the training process. The specific tagging process and the interpretation of tags are common technical content in the field of deep learning, and will not be repeated in this embodiment of the application.
当数据训练结果为通用数据的识别结果,那么根据该识别结果,可以得到数据训练结果与通用数据的标签之间的距离。例如,通用数据1的识别结果为:通用数据1属于特征1的置信度为0.7,通用数据1属于特征2的置信度为0.3。通用数据1的标签为:标签1,标签1与特征1对应。那么,通用数据1的识别结果可以通过(0.7,0.3)表示,通用数据1的标签可以通过(1,0)表示。数据训练结果与通用数据的标签之间的距离可以是矢量(0.7,0.3)与矢量(1,0)的距离。When the data training result is the recognition result of general data, then according to the recognition result, the distance between the data training result and the label of the general data can be obtained. For example, the recognition result of general data 1 is: the confidence that general data 1 belongs to feature 1 is 0.7, and the confidence that general data 1 belongs to feature 2 is 0.3. The label of general data 1 is: label 1, which corresponds to feature 1. Then, the recognition result of general data 1 can be represented by (0.7, 0.3), and the label of general data 1 can be represented by (1, 0). The distance between the data training result and the label of the general data may be the distance between the vector (0.7, 0.3) and the vector (1, 0).
当数据训练结果为中间计算结果时,该通用数据的标签可以是与该中间计算结果具有相同维度的矢量,通过矢量计算,可以得到数据训练结果与通用数据的标签之间的距离。When the data training result is an intermediate calculation result, the label of the general data may be a vector with the same dimension as the intermediate calculation result. Through vector calculation, the distance between the data training result and the label of the general data can be obtained.
503,获取第一关联关系信息,所述第一关联关系信息用于指示至少一个第一矢量组,每个第一矢量组包括满足先验假设的两个第一矢量。503. Acquire first association relationship information, where the first association relationship information is used to indicate at least one first vector group, and each first vector group includes two first vectors that satisfy a priori assumption.
也就是说,第一关联关系信息反映了多个第一矢量之间是否存在关联关系。第一矢量组中包含存在关联关系的两个第一矢量。即该第一矢量组内的两个第一矢量之间存在满足先验假设的关联关系。例如,第一矢量组指示(第一矢量1,第一矢量2),那么第一矢量1与第一矢量2之间存在满足先验假设的关联关系。第一关联关系信息反映了多个第一矢量之间是否有影响,从而可以根据第一关联关系信息,得到能够反映数据关联性的数据处理结果。应理解,第一矢量与其自身可以存在关联关系。In other words, the first association relationship information reflects whether there is an association relationship between the multiple first vectors. The first vector group contains two first vectors that have an association relationship. That is, there is an association relationship between the two first vectors in the first vector group that satisfies a priori hypothesis. For example, if the first vector group indicates (first vector 1, first vector 2), then there is an association relationship between the first vector 1 and the first vector 2 that satisfies the a priori assumption. The first association relationship information reflects whether there is an influence between the multiple first vectors, so that the data processing result that can reflect the data association can be obtained according to the first association relationship information. It should be understood that the first vector may have an association relationship with itself.
在一个示例中,由于多个第一矢量与该多个待处理数据一一对应,第一关联关系信息可以是根据多个待处理数据之间的关联关系确定的。也就是说,该第一关联关系信息与上文中的第五关联关系信息相同或实质相同。In an example, since the multiple first vectors are in one-to-one correspondence with the multiple to-be-processed data, the first association relationship information may be determined according to the association relationship between the multiple to-be-processed data. That is, the first association relationship information is the same or substantially the same as the fifth association relationship information above.
在另一个示例中,第一关联关系信息与上文中第五关联关系信息不同。例如,可以根 据多个第一矢量中任意两个第一矢量之间的相似度,确定该任意两个第一矢量之间是否存在关联关系。该相似度越大,关联越大;该相似度越小,关联越小。那么与第一关联关系信息对应的先验假设可以是,当该相似度超过预设值的情况下,可以认为该任意两个第一矢量之间存在关联关系;当该相似度未超过预设值的情况下,可以认为该任意两个第一矢量之间不存在关联关系。In another example, the first association relationship information is different from the fifth association relationship information described above. For example, based on the similarity between any two first vectors in the multiple first vectors, it can be determined whether there is an association relationship between any two first vectors. The greater the similarity, the greater the association; the smaller the similarity, the smaller the association. Then the a priori hypothesis corresponding to the first association relationship information may be that when the similarity exceeds the preset value, it can be considered that there is an association relationship between any two first vectors; when the similarity does not exceed the preset value In the case of the value, it can be considered that there is no association relationship between any two first vectors.
可以通过图模型反映第一关联关系信息。如图2所示,节点1、节点2、节点3可以分别对应第一矢量1、第一矢量2、第一矢量3。存在连接在节点1与节点2之间的边,所以第一矢量1与第一矢量2之间存在关联关系;存在连接在节点2与节点3之间的边,所以第一矢量2与第一矢量3之间存在关联关系;不存在连接在节点1与节点3之间的边,所以第一矢量1与第一矢量3之间不存在关联关系。The first association relationship information can be reflected through the graph model. As shown in Figure 2, node 1, node 2, and node 3 may correspond to first vector 1, first vector 2, and first vector 3, respectively. There is an edge connected between node 1 and node 2, so there is an association relationship between the first vector 1 and the first vector 2; there is an edge connected between node 2 and node 3, so the first vector 2 and the first vector There is an association relationship between vectors 3; there is no edge connected between node 1 and node 3, so there is no association relationship between the first vector 1 and the first vector 3.
可选的,所述第一关联关系信息包括第二关联关系矩阵,所述第二关联关系矩阵中位于第一维度的矢量包括与所述多个第一矢量一一对应的多个元素,所述第二关联关系矩阵中位于第二维度的矢量包括与所述多个第一矢量一一对应的多个元素,其中,所述第二关联关系矩阵中任一元素用于指示所述任一元素在所述第一维度上对应的矢量与所述任一元素在所述第二维度上对应的矢量之间是否具有满足所述先验假设的关联关系。Optionally, the first association relationship information includes a second association relationship matrix, and a vector in the first dimension in the second association relationship matrix includes a plurality of elements corresponding to the plurality of first vectors one-to-one, so The vector in the second dimension in the second correlation matrix includes multiple elements corresponding to the multiple first vectors one-to-one, wherein any element in the second correlation matrix is used to indicate any Whether the vector corresponding to the element in the first dimension and the vector corresponding to any element in the second dimension has an association relationship that satisfies the a priori hypothesis.
假设第二关联关系矩阵为Q,Assuming that the second correlation matrix is Q,
Figure PCTCN2019099653-appb-000015
Figure PCTCN2019099653-appb-000015
其中,Q为l×l的矩阵,第i列对应第一矢量i,第j行对应第一矢量j,第i列、第j行的元素q i,j表示第一矢量i与第一矢量j之间是否存在满足先验假设的关联关系。当第一矢量i与第一矢量j之间存在关联关系,第i列、第j行的元素q i,j取值可以为1,第一矢量i与第一矢量j之间不存在关联关系,第i列、第j行的元素q i,j取值可以为0。或者,当第一矢量i与第一矢量j之间存在关联关系,第i列、第j行的元素q i,j取值可以为0,第一矢量i与第一矢量j之间不存在关联关系,第i列、第j行的元素q i,j取值可以为1。 Among them, Q is a matrix of l×l, the i-th column corresponds to the first vector i, the j-th row corresponds to the first vector j, and the elements q i,j in the i-th column and j-th row represent the first vector i and the first vector Whether there is an association relationship between j that satisfies the prior hypothesis. When there is an association relationship between the first vector i and the first vector j , the value of the element q i,j in the i-th column and the j-th row can be 1, and there is no association relationship between the first vector i and the first vector j , The value of element q i,j in the i-th column and j-th row can be 0. Or, when there is an association relationship between the first vector i and the first vector j , the value of the element q i,j in the i-th column and the j-th row can be 0, and there is no relationship between the first vector i and the first vector j For the association relationship, the value of element q i, j in the i-th column and j-th row can be 1.
在一个示例中,矩阵Q转秩后得到的矩阵Q T与矩阵Q相同。也就说,q i,j=q j,i。此时第一矢量i与第一矢量j之间的关联关系可以是没有方向性的。 In an example, the matrix Q T obtained after the matrix Q is converted to rank is the same as the matrix Q. In other words, q i,j =q j,i . At this time, the association relationship between the first vector i and the first vector j may be non-directional.
在一个示例中,矩阵Q转秩后得到的矩阵Q T与矩阵Q不同。也就说,q i,j≠q j,i。此时第一矢量i与第一矢量j之间的关联关系是有方向性的。例如,q i,j表示第一矢量i与第一矢量j之间存在由第一矢量i指向第一矢量j的关联关系,q j,i表示第一矢量i与第一矢量j之间存在由第一矢量j指向第一矢量i的关联关系。或者,q i,j表示第一矢量i与第一矢量j之间存在由第一矢量j指向第一矢量i的关联关系,q j,i表示第一矢量i与第一矢量j之间存在由第一矢量i指向第一矢量j的关联关系。 In an example, the matrix Q T obtained after the matrix Q is converted to rank is different from the matrix Q. In other words, q i,j ≠q j,i . At this time, the association relationship between the first vector i and the first vector j is directional. For example, q i,j indicates that there is an association relationship between the first vector i and the first vector j from the first vector i to the first vector j, and q j,i indicates that there is a relationship between the first vector i and the first vector j The first vector j points to the association relationship of the first vector i. Or, q i,j indicates that there is an association relationship between the first vector i and the first vector j from the first vector j to the first vector i, and q j,i indicates that there is a relationship between the first vector i and the first vector j The first vector i points to the association relationship of the first vector j.
为了避免因矩阵数量过大造成计算困难,可以对第二关联关系矩阵进行压缩处理,得到维度较小的矩阵。In order to avoid calculation difficulties caused by the excessively large number of matrices, the second correlation matrix can be compressed to obtain a matrix with a smaller dimension.
在一个示例中,假设第二关联关系矩阵Q为l×l的矩阵,第二关联关系矩阵Q上与第二关联关系矩阵Q对角线上元素间隔超过l’个元素的所有元素的取值均为0或均为1,l’<l,那么可以将划分为若干个小矩阵,小矩阵的最大行数为l’,且小矩阵的最大列数为l’。此 过程又可以称作为对第二关联关系矩阵Q稀疏化。In an example, suppose that the second correlation matrix Q is an l×l matrix, and the values of all elements on the second correlation matrix Q whose diagonal elements are separated from the second correlation matrix Q by more than l'elements All 0 or all 1, l'<l, then can be divided into several small matrices, the maximum number of rows of the small matrix is l', and the maximum number of columns of the small matrix is l'. This process can also be referred to as the sparseness of the second correlation matrix Q.
在一个示例中,假设第二关联关系矩阵Q无法稀疏化,那么可以按照谱聚类方法压缩第二关联关系矩阵Q。In an example, assuming that the second correlation matrix Q cannot be sparsed, the second correlation matrix Q can be compressed according to the spectral clustering method.
应理解,先验假设可以指示正向关联关系,也可以指示反向关联关系。例如,由于通常情况下画面帧间隔时间越短,画面中的内容越相关,因此,当先验假设指示8s内的画面帧之间存在关联关系时,可以理解为先验假设指示的是一种正向关联关系;当先验假设指示8s以外的画面帧之间存在关联关系时,可以理解为先验假设指示的是一种反向关联关系。It should be understood that the a priori hypothesis can indicate a forward association relationship or a reverse association relationship. For example, in general, the shorter the picture frame interval, the more relevant the content in the picture. Therefore, when the a priori hypothesis indicates that there is an association relationship between picture frames within 8s, it can be understood that the a priori hypothesis indicates a kind of Forward association relationship; when the a priori hypothesis indicates that there is an association relationship between picture frames other than 8s, it can be understood that the a priori hypothesis indicates a reverse association relationship.
504,将所述多个第一矢量以及所述第一关联关系信息输入第二神经网络模型,得到针对第一待处理数据的处理结果,所述第一待处理数据是所述多个待处理数据中的任一数据。504. Input the multiple first vectors and the first association relationship information into a second neural network model to obtain a processing result for the first to-be-processed data, where the first to-be-processed data is the multiple to-be-processed Any data in the data.
也就是说,将第一神经网络模型的输出结果以及该输出结果内部的关联关系输入第二神经网络模型。将多个第一矢量输入第二神经网络模型,可以理解为将多个待处理数据的特征表示输入第二神经网络模型。将第一关联关系信息输入第二神经网络模型,可以理解为将多个第一矢量中任意两个第一矢量之间是否有影响的信息输入第二神经网络模型。多个第一矢量可以理解为图模型中的节点,第一关联关系信息可以用于表示节点与节点之间是否存在边。因此,第二神经网络模型可以为图神经网络模型。That is to say, the output result of the first neural network model and the correlation relationship within the output result are input into the second neural network model. Inputting multiple first vectors into the second neural network model can be understood as inputting the characteristic representations of multiple data to be processed into the second neural network model. Inputting the first association relationship information into the second neural network model can be understood as inputting the information about whether there is an influence between any two of the first vectors in the second neural network model. The multiple first vectors can be understood as nodes in the graph model, and the first association relationship information can be used to indicate whether there are edges between nodes. Therefore, the second neural network model may be a graph neural network model.
第二神经网络模型对多个第一矢量以及所述第一关联关系信息进行处理,可以是根据第二神经网络模型中的权重参数,确定任意两个第一矢量之间是否有影响以及具体影响程度是多少,从而得到第一待处理数据的处理结果。第一待处理数据的处理结果可以是第一待处理数据的特征表示,也可以是第一待处理数据的识别结果。第一待处理数据的处理结果可以是一个矢量。The second neural network model processes multiple first vectors and the first association relationship information, which can be based on the weight parameters in the second neural network model to determine whether any two first vectors have an impact and the specific impact What is the degree to obtain the processing result of the first data to be processed. The processing result of the first to-be-processed data may be a characteristic representation of the first to-be-processed data, or may be a recognition result of the first to-be-processed data. The processing result of the first data to be processed may be a vector.
假设多个第一矢量为l个第一矢量,分别用x 1,…,x l表示。其中,l≤i≤l,l≤t≤s,
Figure PCTCN2019099653-appb-000016
那么,合并该多个第一矢量,可以得到矩阵X,X={x 1,…,x i,…,x l}。假设第一关联关系信息为上文中提到的第二关联关系矩阵Q。,
Suppose that multiple first vectors are l first vectors, which are represented by x 1 ,..., x l respectively . Among them, l≤i≤l, l≤t≤s,
Figure PCTCN2019099653-appb-000016
Then, by combining the multiple first vectors, a matrix X, X={x 1 ,..., x i ,..., x l } can be obtained. It is assumed that the first association relationship information is the second association relationship matrix Q mentioned above. ,
先假设h个待训练权重矩阵W 1、W 2、…、W h。W 1、W 2、…、W h的维度均为s*s h。意味着W 1、W 2、…、W h均包含s*s h个权重参数。s h=s/h,其中,h用于表示图注意力神经网络的头数(头数又可被称为分片数)。s h通常被称为单头维度。 Suppose first that h weight matrices W 1 , W 2 , ..., W h to be trained. The dimensions of W 1 , W 2 , ..., W h are all s*s h . Means W 1, W 2, ..., W h contains s * s h a weight parameter. s h = s/h, where h is used to represent the number of heads of the graph attention neural network (the number of heads can also be called the number of slices). s h commonly known as single head dimensions.
此时分别计算U 1=X·W 1,U 2=X·W 2,…,U h=X·W h。很显然,此时U 1、U 2、…、U h的维度均为l*s hAt this time, U 1 =X·W 1 , U 2 =X·W 2 ,..., U h =X·W h are calculated respectively. Obviously, the dimensions of U 1 , U 2 , ..., U h are all l*s h at this time .
然后计算V i,j=U i·U j T,i≠j,1≤i≤h,且1≤j≤h。此时V i,j维度已经是l*l。然后对V i,j的每一行应用Softmax函数求归一化概率,得到R i,j。R i,j仍然为l*l矩阵,这个矩阵就可以理解成每个点之间的相互注意力强度矩阵。 Then calculate V i,j =U i ·U j T , i≠j, 1≤i≤h, and 1≤j≤h. At this time , the dimensions of Vi,j are already l*l. Then apply the Softmax function to each row of Vi,j to find the normalized probability, and get R i,j . R i,j is still an l*l matrix, this matrix can be understood as the mutual attention intensity matrix between each point.
此后对R i,j与Q实施矩阵对位元素相乘,得到经过Q关系掩码后的E i,j。E i,j可以理解成根据边关系筛选出有关联的点,保留它们之间的注意力,无关点的注意力就不被保留。这个矩阵中就包含了大量的节点的相互关联信息,因此信息含量较为丰富。然后用E i,j·U i就可以得到每个点被其他点信息更新后的最终表达U inew。U inew的维度是l*s hAfter that, R i,j and Q are multiplied by matrix elements to obtain E i,j after the Q relation mask. E i,j can be understood as filtering out related points according to the edge relationship, keeping the attention between them, and not keeping the attention of irrelevant points. This matrix contains a large number of interrelated information of nodes, so the information content is relatively rich. Then use E i,j ·U i to get the final expression U inew after each point is updated by other point information. The dimension of U inew is l*s h .
最后将U 1new,…,U inew,…,U hnew拼接起来,得到X’矩阵,X’={U 1new,…,U inew,…,U hnew},X’的维度为l*s。可以看出,X’包含了节点之间相互关联的信息以及权重参数。 Finally, U 1new , …, U inew , …, U hnew are spliced together to obtain X'matrix , X'={U 1new ,…,U inew ,…,U hnew }, and the dimension of X'is l*s. It can be seen that X'contains the correlation information between nodes and the weight parameters.
上述过程为一层网络的数据处理过程。如果图注意力神经网络模型的深度为h’,即包括h’层网络,则可以将当前层输出的X’输入下一层网络,即当前层输出的X’视作下一层网络的X,进行与上述相同或类似的数据处理过程。The above process is the data processing process of a layer network. If the depth of the graph attention neural network model is h', that is, the h'layer network is included, then the X'output by the current layer can be input to the next layer of the network, that is, the X'output by the current layer is regarded as the X of the next layer of network , Carry out the same or similar data processing process as above.
可以看出,X’与X相比,矩阵大小不变,但X’中每个元素包含X中一个或多个元素的信息。通过整合具有关联关系的数据,第二神经网络模型在识别某一特征时可以获取更多的信息量,提高识别准确率。将矩阵X’与权重参数矩阵进行矩阵运算,即可得到第一待处理数据的处理结果。It can be seen that, compared with X, X'has the same matrix size, but each element in X'contains information about one or more elements in X. By integrating data with association relationships, the second neural network model can obtain more information when recognizing a certain feature, and improve the recognition accuracy. Perform matrix operations on the matrix X'and the weight parameter matrix to obtain the processing result of the first data to be processed.
在一个示例中,多个待处理数据包括第一待处理数据,该第一待处理数据可以是上文中的目标数据,该多个待处理数据还包括与该第一待处理数据相关联的一个或多个关联数据,第二神经网络模型可以根据第一关联关系信息,结合关联数据对第一待处理数据的影响,从而得到与该第一待处理数据对应的处理结果。换句话说,第二神经网络模型除了对第一待处理数据进行特征提取,还对与该第一待处理数据有关联关系的其他待处理数据进行特征提取,因此扩充了预测过程中输入的数据量,有助于提高识别准确率。In an example, the plurality of to-be-processed data includes first to-be-processed data, and the first to-be-processed data may be the target data mentioned above, and the plurality of to-be-processed data further includes one associated with the first to-be-processed data. Or multiple associated data, the second neural network model may combine the impact of the associated data on the first data to be processed according to the first association relationship information, so as to obtain a processing result corresponding to the first data to be processed. In other words, the second neural network model not only performs feature extraction on the first to-be-processed data, but also performs feature extraction on other to-be-processed data related to the first to-be-processed data, thus expanding the data input in the prediction process Quantities help improve the accuracy of recognition.
在一个示例中,多个待处理数据包括第一待处理数据,该第一待处理数据可以对应目标矢量,该多个第一矢量还包括与该目标矢量相关联的一个或多个关联矢量,该多个待处理数据包括与该一个或多个关联矢量一一对应的待处理数据。第二神经网络模型可以根据第一关联关系信息,结合关联矢量对目标矢量的影响,从而得到与该第一待处理数据对应的处理结果。换句话说,第二神经网络模型除了对目标矢量进行特征提取,还对与该目标矢量有关联关系的关联矢量进行特征提取,因此扩充了预测过程中的数据处理量,有助于提高识别准确率。In an example, the plurality of to-be-processed data includes first to-be-processed data, the first to-be-processed data may correspond to a target vector, and the plurality of first vectors may further include one or more associated vectors associated with the target vector, The plurality of data to be processed includes the data to be processed in a one-to-one correspondence with the one or more associated vectors. The second neural network model may combine the influence of the correlation vector on the target vector according to the first correlation information, so as to obtain the processing result corresponding to the first data to be processed. In other words, the second neural network model not only performs feature extraction on the target vector, but also performs feature extraction on the associated vector that has an associated relationship with the target vector, thus expanding the amount of data processing in the prediction process and helping to improve recognition accuracy. rate.
另外,第二神经网络模型可以输出与所述多个待处理数据一一对应的多个处理结果。也就是说,第二神经网络模型综合多个第一矢量以及各个第一矢量之间的关联关系,输出与所述多个待处理数据一一对应的多个处理结果。In addition, the second neural network model may output multiple processing results corresponding to the multiple data to be processed in a one-to-one correspondence. That is, the second neural network model synthesizes the multiple first vectors and the association relationship between each first vector, and outputs multiple processing results corresponding to the multiple data to be processed one-to-one.
假设一个场景,第一矢量A与第一矢量B之间存在第一关联关系,第一矢量A与第一矢量C之间存在第二关联关系,那么,这两种关联关系的关联紧密程度可以相同,也可以不同。例如,相同段落内间隔距离较远的两个语句之间的关联紧密程度较低,相同段落内间隔距离较近的两个语句之间的关联紧密程度较高。又如,间隔时长较长的两帧画面之间的关联紧密程度较低,间隔时长较短的两帧画面之间的关联紧密程度较高。为了表达两种关联关系紧密程度的大小,可以有多种表达方式。Assuming a scene, there is a first correlation between the first vector A and the first vector B, and there is a second correlation between the first vector A and the first vector C, then the closeness of the correlation between the two correlations can be The same can be different. For example, two sentences in the same paragraph that are far apart are closely related to each other, and two sentences that are closer to each other in the same paragraph are closely related to each other. For another example, two frames with a longer interval have a lower degree of correlation, and two frames with a shorter interval have a higher degree of correlation. In order to express the degree of closeness of the two associations, there can be multiple expressions.
在一个示例中,第一关联关系信息为矩阵,矩阵中的元素的数值大小用于表示关联关系紧密程度,数值越大,关联关系越紧密。然而,确定数值具体大小往往引入多余的人为设定,或者会加大神经网络模型的训练难度。In an example, the first association relationship information is a matrix, and the numerical value of the elements in the matrix is used to indicate the closeness of the association relationship. The larger the value, the tighter the association relationship. However, determining the specific size of the value often introduces redundant artificial settings, or will increase the difficulty of training the neural network model.
在一个示例中,当第一关联关系信息中存在关联关系紧密和关联关系疏远的两种第一矢量组的情况下,可以建立第二关联关系信息,该第二关联关系信息用于表示关联关系紧密的第一矢量组。也就是说,关联关系紧密的两个第一矢量之间的影响程度可以通过第二关联关系信息得到强化。In an example, when there are two types of first vector groups in the first association relationship information that have a close association relationship and a distant association relationship, the second association relationship information can be established, and the second association relationship information is used to indicate the association relationship. Tight first vector group. That is to say, the degree of influence between the two first vectors that are closely related can be enhanced through the second related relationship information.
可选的,所述第一关联关系信息用于指示N个所述第一矢量组,N为大于1的整数,在所述将所述多个第一矢量以及所述第一关联关系信息输入第二神经网络模型,得到针对第一待处理数据的处理结果之前,所述方法还包括:获取第二关联关系信息,所述第二关 联关系信息用于指示n个第二矢量组,所述n个第二矢量组属于所述N个第一矢量组,n小于N,且n为正整数;所述将所述多个第一矢量以及所述第一关联关系信息输入第二神经网络模型,得到针对第一待处理数据的处理结果,包括:将所述多个第一矢量、所述第一关联关系信息以及所述第二关联关系信息输入所述第二神经网络模型,得到针对所述第一待处理数据的处理结果。Optionally, the first association relationship information is used to indicate N of the first vector groups, where N is an integer greater than 1, and when the plurality of first vectors and the first association relationship information are input The second neural network model, before obtaining the processing result for the first to-be-processed data, the method further includes: obtaining second association relationship information, where the second association relationship information is used to indicate n second vector groups, The n second vector groups belong to the N first vector groups, n is less than N, and n is a positive integer; said inputting the plurality of first vectors and the first association relationship information into the second neural network model , Obtaining a processing result for the first to-be-processed data includes: inputting the plurality of first vectors, the first association relationship information, and the second association relationship information into the second neural network model to obtain The processing result of the first data to be processed.
第二关联关系信息中指示的信息被包含在第一关联关系信息中。也就是说,每个第二矢量组内的两个第一矢量之间一定存在满足先验假设的关联关系。The information indicated in the second association relationship information is included in the first association relationship information. That is to say, there must be an association relationship between the two first vectors in each second vector group that satisfies the a priori hypothesis.
假设该第一关联关系信息与上文中的第五关联关系信息相同或实质相同,那么第一关联关系信息可以反映多个待处理数据之间的关联关系,第二关联关系信息可以反映多个待处理数据之间是否存在紧密的关联关系。Assuming that the first association relationship information is the same or substantially the same as the fifth association relationship information above, the first association relationship information can reflect the association relationship between multiple data to be processed, and the second association relationship information can reflect the multiple data to be processed. Whether there is a close relationship between the processed data.
以文本数据为例,当先验假设为属于同段落的多个句子之间存在关联,那么第一关联关系信息可以指示同段落内不同句子之间存在关联,第二关联关系信息可以指示同段落内相邻句子之间存在紧密关联。Taking text data as an example, when the a priori assumption is that there are associations between multiple sentences belonging to the same paragraph, then the first association relationship information can indicate that there is an association between different sentences in the same paragraph, and the second association relationship information can indicate the same paragraph. There are close associations between adjacent sentences within.
以图片数据为例,当先验假设为间隔小于8s的两帧画面之间存在关联,那么第一关联关系信息可以指示间隔小于8s的两帧画面之间存在关联,第二关联关系信息可以指示间隔小于2s的两帧画面之间存在紧密关联。Taking picture data as an example, when the a priori assumption is that there is an association between two frames with an interval less than 8s, then the first association information can indicate that there is an association between two frames with an interval less than 8s, and the second association information can indicate There is a close correlation between two frames with an interval of less than 2s.
以视频数据为例,当先验假设为最小间隔小于8s的两段视频之间存在关联,那么第一关联关系信息可以指示最小间隔小于8s的两段视频之间存在关联,第二关联关系信息可以指示最小间隔小于2s的两段视频之间存在紧密关联。Taking video data as an example, when the a priori assumption is that there is an association between two videos with a minimum interval of less than 8s, then the first association information can indicate that there is an association between two videos with a minimum interval of less than 8s, and the second association information It can indicate that there is a close correlation between two videos whose minimum interval is less than 2s.
以音频数据为例,当先验假设为最小间隔小于8s的两段音频之间存在关联,那么第一关联关系信息可以指示最小间隔小于8s的两段音频之间存在关联,第二关联关系信息可以指示最小间隔小于2s的两段音频之间存在紧密关联。Taking audio data as an example, when the a priori assumption is that there is an association between two pieces of audio with a minimum interval of less than 8s, then the first association information can indicate that there is an association between two pieces of audio with a minimum interval of less than 8s, and the second association information It can indicate that there is a close correlation between two audio segments with a minimum interval of less than 2s.
假设该第一关联关系信息与上文中的第五关联关系信息不同,那么第一关联关系信息可以反映多个第一矢量之间的相似度,第二关联关系信息可以反映多个第一矢量中相似度较高的两个第一矢量。Assuming that the first association relationship information is different from the fifth association relationship information above, the first association relationship information can reflect the similarity between the multiple first vectors, and the second association relationship information can reflect the multiple first vectors. Two first vectors with higher similarity.
例如,当先验假设为两个第一矢量之间的相似度超过预设值,那么第一关联关系信息可以指示相似度超过该预设值1的两个第一矢量之间存在关联,第二关联关系信息可以指示相似度超过预设值2的两个第一矢量之间存在关联,预设值2大于预设值1。For example, when the a priori assumption is that the similarity between two first vectors exceeds a preset value, the first association relationship information may indicate that there is an association between two first vectors whose similarity exceeds the preset value 1. The second association relationship information may indicate that there is an association between two first vectors whose similarity exceeds the preset value 2, and the preset value 2 is greater than the preset value 1.
应理解,与第一关联关系信息类似,第二关联关系信息可以包含用于表示n个第二矢量组的矩阵。It should be understood that, similar to the first association relationship information, the second association relationship information may include a matrix for representing n second vector groups.
应理解,第一神经网络模型、第二神经网络模型可以一个神经网络模型中的两个子模型。It should be understood that the first neural network model and the second neural network model may be two sub-models in one neural network model.
下面通过图7详细描述训练第二神经网络模型、得到第二神经网络模型的权重参数的方法。方法600可以由如图3所示的训练设备120执行。The method of training the second neural network model and obtaining the weight parameters of the second neural network model will be described in detail below with reference to FIG. 7. The method 600 may be performed by the training device 120 as shown in FIG. 3.
601,获取多个待训练数据。601: Obtain multiple data to be trained.
待训练数据可以理解为即将输入神经网络模型、用于训练神经网络模型的数据。多个待训练数据中的部分或全部数据具有标签。神经网络模型对待训练数据进行处理可以得到数据处理结果,通过计算该标签与该数据处理结果之间的距离,可以修正神经网络模型的权重参数。该数据处理结果与该标签之间的距离,可以理解为该数据处理结果与该标签之 间的相似程度。信息距离的具体计算方法可以是交叉熵、KL散度、JS散度等方式。The data to be trained can be understood as the data that will be input to the neural network model and used to train the neural network model. Some or all of the multiple data to be trained have labels. The neural network model can process the training data to obtain the data processing result. By calculating the distance between the label and the data processing result, the weight parameter of the neural network model can be modified. The distance between the data processing result and the label can be understood as the degree of similarity between the data processing result and the label. The specific calculation method of the information distance can be cross entropy, KL divergence, JS divergence, etc.
待训练数据可以是文本数据、图像数据、视频数据、音频数据等,例如文本文件、文本文件中的一段文字、一个图片文件、一个图片文件中的图像块、一个视频文件中的一帧画面、一个视频文件、一个视频文件中的一段视频、一个音频文件、一个音频文件中的一段音频。多个待训练数据可以是多个文本文件、一个文本文件中的多段文字、多个图片文件、一张图片文件中的多个图像块、一个视频文件中的多帧画面、多个视频文件、一个视频文件内的多段视频、多个音频文件、一个音频文件内的多段音频等。本申请对待训练数据的类型不作限定。The data to be trained can be text data, image data, video data, audio data, etc., such as a text file, a paragraph of text in a text file, a picture file, an image block in a picture file, a frame in a video file, A video file, a video in a video file, an audio file, and an audio in an audio file. Multiple data to be trained can be multiple text files, multiple texts in a text file, multiple picture files, multiple image blocks in a picture file, multiple frames in a video file, multiple video files, Multiple pieces of video in one video file, multiple audio files, multiple pieces of audio in one audio file, etc. This application does not limit the type of training data.
获取待训练数据的方式可以有多种方式。在一个示例中,数据库中存储有该多个待训练数据,因此执行方法600的设备可以直接从数据库中调取该多个待训练数据。在一个示例中,执行方法600的设备上设置有摄像头,那么可以通过使用摄像头拍摄的方法,获取该多个待训练数据。在一个示例中,云端设备上存储有该多个待训练数据,因此执行方法600的设备可以通过通信网络接收云端设备发送的该多个待训练数据。There are many ways to obtain the data to be trained. In an example, the multiple data to be trained are stored in the database, so the device executing the method 600 can directly retrieve the multiple data to be trained from the database. In an example, if a camera is provided on the device that executes the method 600, then the multiple data to be trained can be obtained by using a camera shooting method. In an example, the cloud device stores the plurality of data to be trained, so the device executing the method 600 can receive the plurality of data to be trained sent by the cloud device through the communication network.
602,使用第一神经网络模型对所述多个待训练数据进行处理,得到与所述多个待训练数据一一对应的多个第四矢量。602. Use the first neural network model to process the plurality of to-be-trained data to obtain a plurality of fourth vectors that correspond to the plurality of to-be-trained data one-to-one.
其中,该多个待训练数据可以是通用数据。Wherein, the plurality of data to be trained may be general data.
将待训练数据1输入第一神经网络模型,得到第四矢量1。将待训练数据2输入第一神经网络模型,得到第四矢量2。Input the data 1 to be trained into the first neural network model to obtain the fourth vector 1. Input the training data 2 into the first neural network model to obtain the fourth vector 2.
第三关联关系信息用于指示数据之间的关联关系。假设第三关联关系信息指示的第三矢量组包括(第四矢量1,第四矢量2),第四矢量1、第四矢量2之间存在关联关系。The third association relationship information is used to indicate the association relationship between the data. It is assumed that the third vector group indicated by the third association relationship information includes (fourth vector 1, fourth vector 2), and there is an association relationship between the fourth vector 1 and the fourth vector 2.
将第四矢量1、第三关联关系信息输入第二神经网络模型,得到第一处理结果1。因此,至少可以得到待训练数据2对待训练数据1的影响、贡献。The fourth vector 1 and the third association relationship information are input into the second neural network model to obtain the first processing result 1. Therefore, at least the influence and contribution of the training data 2 to the training data 1 can be obtained.
也就是说,将多个待训练数据输入第一神经网络模型,使用第一神经网络模型对多个待训练数据进行例如特征筛选(将有用的特征筛选出来)、特征融合(合并多个特征)等处理操作,输出与该多个待训练数据一一对应的多个第四矢量。以图1所示的卷积神经网络为例,对将多个待训练数据进行处理,可以是从输入层输入该多个待训练数据,通过卷积层和/或池化层等隐含层进行数据处理,从第一神经网络模型的输出层输出与该多个待训练数据一一对应的多个第四矢量。其中,第四矢量可以是一个数,也可以是包含多个数的矢量。That is to say, input multiple data to be trained into the first neural network model, and use the first neural network model to perform, for example, feature screening (filter out useful features) and feature fusion (combine multiple features) on multiple data to be trained. Waiting for processing operations, and output a plurality of fourth vectors corresponding to the plurality of data to be trained one-to-one. Taking the convolutional neural network shown in Figure 1 as an example, to process multiple data to be trained, the multiple data to be trained can be input from the input layer, and pass through hidden layers such as convolutional layer and/or pooling layer. Data processing is performed, and a plurality of fourth vectors corresponding to the plurality of data to be trained are outputted from the output layer of the first neural network model. Among them, the fourth vector can be a number or a vector containing multiple numbers.
在一个示例中,第一神经网络模型为待训练的神经网络模型。第一神经网络模型可以对该多个待训练数据进行特征筛选、特征融合等数据处理操作,得到特征矢量。将特征矢量与包含权重参数的权重矩阵进行矩阵运算,得到与该多个待训练数据一一对应的多个第四矢量。该多个第四矢量用于修正第一神经网络模型的权重参数,例如可以计算第四矢量与该多个待训练数据的标签之间的距离,并结合损失函数,修正第一神经网络模型的权重参数。In an example, the first neural network model is a neural network model to be trained. The first neural network model can perform data processing operations such as feature screening and feature fusion on the multiple data to be trained to obtain feature vectors. Perform a matrix operation on the feature vector and the weight matrix containing the weight parameter to obtain a plurality of fourth vectors corresponding to the plurality of data to be trained. The multiple fourth vectors are used to modify the weight parameters of the first neural network model. For example, the distance between the fourth vector and the tags of the multiple data to be trained can be calculated, and the loss function can be combined to modify the weight parameter of the first neural network model. Weight parameter.
在一个示例中,第一神经网络模型为训练完毕的神经网络模型。In one example, the first neural network model is a trained neural network model.
为了降低神经网络模型对待训练数据的依赖,可以使用通用数据训练第一神经网络模型。所谓通用数据,可以是不受场景影响的数据,或对场景依赖度低的数据。例如,第一神经网络模型用于识别图像中的人物特征,其训练数据集可以包括各种可能出现的场景, 如街道场景、会议场景、车载场景、乡村场景、亚洲场景、非洲场景、欧美场景等。那么该多个待训练数据可以是应用在特定场景内的数据。也就是说,将能够处理通用数据的第一神经网络模型迁移至某个特殊场景,通过神经网络模型训练的方法,得到能够处理特殊场景的第二神经网络模型。In order to reduce the dependence of the neural network model on training data, the first neural network model can be trained using general data. The so-called general data can be data that is not affected by the scene, or data that has low dependence on the scene. For example, the first neural network model is used to identify character features in images, and its training data set can include various possible scenes, such as street scenes, conference scenes, car scenes, rural scenes, Asian scenes, African scenes, European and American scenes Wait. Then the plurality of data to be trained may be data applied in a specific scene. That is to say, the first neural network model that can handle general data is migrated to a special scene, and the second neural network model that can handle the special scene is obtained through the method of neural network model training.
训练第一神经网络模型的过程可以是将通用数据输入该第一神经网络模型,第一神经网络模型可以对该通用数据进行特征筛选、特征融合等数据处理操作,得到特征矢量。将特征矢量与包含权重参数的权重矩阵进行矩阵运算,得到与该通用数据对应的数据训练结果。之后再计算数据训练结果与该通用数据的标签之间的距离,修正第一神经网络模型的权重参数。该数据训练结果与该通用数据的标签之间的距离,可以理解为该数据训练结果与该通用数据的标签之间的相似程度。信息距离的具体计算方法可以是交叉熵、KL散度、JS散度等方式。The process of training the first neural network model can be to input general data into the first neural network model, and the first neural network model can perform data processing operations such as feature screening and feature fusion on the general data to obtain feature vectors. Perform matrix operations on the feature vector and the weight matrix containing the weight parameters to obtain the data training result corresponding to the general data. Then, the distance between the data training result and the label of the general data is calculated, and the weight parameter of the first neural network model is corrected. The distance between the data training result and the label of the general data can be understood as the degree of similarity between the data training result and the label of the general data. The specific calculation method of the information distance can be cross entropy, KL divergence, JS divergence, etc.
当数据训练结果为通用数据的识别结果,那么根据该识别结果,可以得到数据训练结果与通用数据的标签之间的距离。例如,通用数据1的识别结果为:通用数据1属于特征1的置信度为0.7,通用数据1属于特征2的置信度为0.3。通用数据1的标签为:标签1,标签1与特征1对应。那么,通用数据1的识别结果可以通过(0.7,0.3)表示,通用数据1的标签可以通过(1,0)表示。数据训练结果与通用数据的标签之间的距离可以是矢量(0.7,0.3)与矢量(1,0)的距离。When the data training result is the recognition result of general data, then according to the recognition result, the distance between the data training result and the label of the general data can be obtained. For example, the recognition result of general data 1 is: the confidence that general data 1 belongs to feature 1 is 0.7, and the confidence that general data 1 belongs to feature 2 is 0.3. The label of general data 1 is: label 1, which corresponds to feature 1. Then, the recognition result of general data 1 can be represented by (0.7, 0.3), and the label of general data 1 can be represented by (1, 0). The distance between the data training result and the label of the general data may be the distance between the vector (0.7, 0.3) and the vector (1, 0).
当数据训练结果为中间计算结果时,该通用数据的标签可以是与该中间计算结果具有相同维度的矢量,通过矢量计算,可以得到数据训练结果与通用数据的标签之间的距离。When the data training result is an intermediate calculation result, the label of the general data may be a vector with the same dimension as the intermediate calculation result. Through vector calculation, the distance between the data training result and the label of the general data can be obtained.
第一神经网络模型的类型可以是卷积神经网络模型、图神经网络模型、图卷积神经网络模型、图注意力神经网络模型等。本申请对第一神经网络模型的类型不作限定。The type of the first neural network model may be a convolutional neural network model, a graph neural network model, a graph convolutional neural network model, a graph attention neural network model, and so on. This application does not limit the type of the first neural network model.
特别地,第一神经网络模型可以是传统的卷积神经网络模型。传统的卷积神经网络的输出层为全连接层,全连接层有时又被称为分类器。也就是说,传统的卷积神经网络模型可以通过全连接层将待训练数据的识别结果输入损失函数。例如,待训练数据为图像,传统的卷积神经网络模型的全连接层可以直接输出该图像中是否存在人物,人物为男或为女等识别结果。该识别结果往往只能表征待训练数据属于某个特征的概率。In particular, the first neural network model may be a traditional convolutional neural network model. The output layer of the traditional convolutional neural network is a fully connected layer, which is sometimes called a classifier. In other words, the traditional convolutional neural network model can input the recognition result of the data to be trained into the loss function through the fully connected layer. For example, if the data to be trained is an image, the fully connected layer of the traditional convolutional neural network model can directly output the recognition results of whether there is a person in the image, and the person is male or female. The recognition result can often only represent the probability that the data to be trained belongs to a certain feature.
特别地,第一神经网络模型还可以是一种不包括全连接层的特殊卷积神经网络模型,可以将卷积层或池化层的计算结果输入损失函数。也就是说,第一神经网络模型可以将在传统的卷积神经网络模型中属于中间计算结果的处理结果输入损失函数。为了描述简便,将这种特殊卷积神经网络模型输入损失函数的处理结果称作中间计算结果。通常情况下,中间计算结果能够用于表征待训练数据的部分或全部信息。也就是说,中间计算结果通常包含较识别结果更多的信息含量。In particular, the first neural network model may also be a special convolutional neural network model that does not include a fully connected layer, and the calculation result of the convolutional layer or the pooling layer may be input to the loss function. In other words, the first neural network model can input the processing result that belongs to the intermediate calculation result in the traditional convolutional neural network model into the loss function. For simplicity of description, the processing result of the input loss function of this special convolutional neural network model is called the intermediate calculation result. Generally, the intermediate calculation result can be used to characterize part or all of the information of the data to be trained. In other words, the intermediate calculation result usually contains more information content than the recognition result.
特别地,第一神经网络模型可以是图神经网络模型。In particular, the first neural network model may be a graph neural network model.
可选的,所述使用第一神经网络模型对所述多个待训练数据进行处理,包括:使用所述第一神经网络模型对所述多个待训练数据以及第六关联关系信息进行处理,所述第六关联关系信息用于指示至少一个待训练数据组,每个待训练数据组包括满足先验假设的两个待训练数据。Optionally, the using the first neural network model to process the plurality of data to be trained includes: using the first neural network model to process the plurality of data to be trained and the sixth association relationship information, The sixth association relationship information is used to indicate at least one to-be-trained data group, and each to-be-trained data group includes two to-be-trained data that satisfy a priori hypothesis.
待训练数据组中包含存在关联关系的两个待训练数据。即该待训练数据组内的两个待训练数据之间存在满足先验假设的关联关系。例如,待训练数据组为(待训练数据1,待 训练数据2),那么待训练数据1与待训练数据2之间存在满足先验假设的关联关系。也就是说,将多个待训练数据以及反映该多个待训练数据之间关联关系的第六关联关系信息输入第一神经网络模型,第一神经网络模型可以根据第六关联关系信息,确定数据与数据之间是否有影响,并通过第一神经网络模型内的权重参数反映数据与数据之间的影响程度,从而得到能够反映数据关联性的多个第一矢量,该多个第一矢量与该多个待训练数据一一对应。The to-be-trained data group contains two pieces of to-be-trained data that have an association relationship. That is, there is an association relationship between the two to-be-trained data in the to-be-trained data group that satisfies a priori hypothesis. For example, if the data group to be trained is (data to be trained 1, data to be trained 2), then there is an association relationship between the data to be trained 1 and the data to be trained 2 that satisfies the prior hypothesis. That is to say, the plurality of data to be trained and the sixth association relationship information reflecting the association relationship between the plurality of data to be trained are input into the first neural network model, and the first neural network model can determine the data according to the sixth association relationship information Whether there is an influence between the data and the data, and the weight parameter in the first neural network model reflects the degree of influence between the data and the data, so as to obtain a plurality of first vectors that can reflect the relevance of the data. The multiple to-be-trained data correspond one-to-one.
以文本数据为例,多个待训练数据可以是多段文字,其中,一段文字内可以包括多个语句。通常情况下,不同段文字表述不同的主题,因此,一段文字内的多个语句之间的关联性较大,分属不同段的多个语句之间的关联性较弱或不存在关联。那么可以存在一种先验假设,如属于同段落的多个句子之间存在关联。Taking text data as an example, the multiple pieces of data to be trained may be multiple paragraphs of text, and one piece of text may include multiple sentences. Normally, different paragraphs of text express different topics. Therefore, multiple sentences in a paragraph are more related, and multiple sentences belonging to different paragraphs are weak or non-relevant. Then there can be a priori hypothesis, such as a correlation between multiple sentences belonging to the same paragraph.
以图片数据为例,多个待训练数据可以是多帧画面。通常情况下,随着时间的迁移,两帧画面相隔的时长越长,这两帧画面之间的关联性较小;两帧画面相隔的时长越短,这两帧画面之间的关联性越大。那么可以存在一种先验假设,如间隔时长小于预设阈值的两帧画面之间存在关联。预设阈值例如可以是8s。Taking picture data as an example, the multiple data to be trained may be multiple frames of pictures. Normally, as time goes by, the longer the interval between two frames, the smaller the correlation between the two frames; the shorter the interval between the two frames, the greater the correlation between the two frames Big. Then there can be a priori assumption, such as a correlation between two frames whose interval is less than a preset threshold. The preset threshold may be 8s, for example.
以视频数据为例,该多个待训练数据可以是多段视频,其中,随着时间的迁移,两段视频之间相隔的时长越长,这两段视频之间的关联性较小;两段视频相隔的时长越短,这两段视频之间的关联性越大。那么可以存在一种先验假设,如最小间隔时长小于预设阈值的两段视频之间存在关联。预设阈值例如可以是8s。Taking video data as an example, the multiple pieces of to-be-trained data may be multiple pieces of videos. As time moves, the longer the interval between the two pieces of video, the smaller the correlation between the two pieces of videos; The shorter the video interval, the greater the correlation between the two videos. Then there may be a priori assumption, such as a correlation between two videos whose minimum interval length is less than a preset threshold. The preset threshold may be 8s, for example.
以音频数据为例,该多个待训练数据可以是多段音频,其中,随着时间的迁移,两段音频之间相隔的时长越长,这两段音频之间的关联性较小;两段音频相隔的时长越短,这两段音频之间的关联性越大。那么可以存在一种先验假设,如最小间隔时长小于预设阈值的两段音频之间存在关联。预设阈值例如可以是8s。Taking audio data as an example, the multiple pieces of to-be-trained data may be multiple pieces of audio, where as time moves, the longer the interval between the two pieces of audio, the smaller the correlation between the two pieces of audio; The shorter the audio interval, the greater the correlation between the two audio segments. Then there may be a priori assumption, such as a correlation between two audio segments whose minimum interval duration is less than a preset threshold. The preset threshold may be 8s, for example.
第六关联关系信息可以是矩阵。与其他信息类型相比,矩阵运算更加便捷。The sixth association relationship information may be a matrix. Compared with other information types, matrix operations are more convenient.
可选的,所述第六关联关系信息包括第三关联关系矩阵,所述第三关联关系矩阵中位于第一维度的矢量包括与所述多个待训练数据一一对应的多个元素,所述第三关联关系矩阵中位于第二维度的矢量包括与所述多个待训练数据一一对应的多个元素,其中,所述第三关联关系矩阵中任一元素用于指示所述任一元素在所述第一维度上对应的矢量与所述任一元素在所述第二维度上对应的矢量之间是否具有满足所述先验假设的关联关系。Optionally, the sixth association relationship information includes a third association relationship matrix, and a vector located in the first dimension in the third association relationship matrix includes a plurality of elements corresponding to the plurality of data to be trained one-to-one, so The vector in the second dimension in the third correlation matrix includes multiple elements corresponding to the plurality of data to be trained one-to-one, wherein any element in the third correlation matrix is used to indicate any Whether the vector corresponding to the element in the first dimension and the vector corresponding to any element in the second dimension has an association relationship that satisfies the a priori hypothesis.
假设第三关联关系矩阵为A,Suppose the third correlation matrix is A,
Figure PCTCN2019099653-appb-000017
Figure PCTCN2019099653-appb-000017
其中,A为k×k的矩阵,第i列对应第一矢量i,第j行对应第一矢量j,第i列、第j行的元素a i,j表示第一矢量i与第一矢量j之间是否存在满足先验假设的关联关系。当第一矢量i与第一矢量j之间存在关联关系,第i列、第j行的元素a i,j取值可以为1,第一矢量i与第一矢量j之间不存在关联关系,第i列、第j行的元素a i,j取值可以为0。或者,当第一矢量i与第一矢量j之间存在关联关系,第i列、第j行的元素a i,j取值可以为0,第一矢量i与第一矢量j之间不存在关联关系,第i列、第j行的元素a i,j取值可以为1。 Among them, A is a k×k matrix, the i-th column corresponds to the first vector i, the j-th row corresponds to the first vector j, and the elements a i,j in the i-th column and j-th row represent the first vector i and the first vector Whether there is an association relationship between j that satisfies the prior hypothesis. When there is an association relationship between the first vector i and the first vector j , the value of the elements a i, j in the i-th column and the j-th row can be 1, and there is no association relationship between the first vector i and the first vector j , The value of element a i, j in the i-th column and j-th row can be 0. Or, when there is an association relationship between the first vector i and the first vector j , the value of the elements a i, j in the i-th column and the j-th row can be 0, and there is no relationship between the first vector i and the first vector j For the association relationship, the value of the elements a i and j in the i-th column and j-th row can be 1.
在一个示例中,矩阵A转秩后得到的矩阵A T与矩阵A相同。也就说,a i,j=a j,i。此时第一矢量i与第一矢量j之间的关联关系可以是没有方向性的。 In an example, the matrix A T obtained after the matrix A is converted to the rank is the same as the matrix A. In other words, a i,j =a j,i . At this time, the association relationship between the first vector i and the first vector j may be non-directional.
在一个示例中,矩阵A转秩后得到的矩阵A T与矩阵A不同。也就说,a i,j≠a j,i。此时第一矢量i与第一矢量j之间的关联关系是有方向性的。例如,a i,j表示第一矢量i与第一矢量j之间存在由第一矢量i指向第一矢量j的关联关系,a j,i表示第一矢量i与第一矢量j之间存在由第一矢量j指向第一矢量i的关联关系。或者,a i,j表示第一矢量i与第一矢量j之间存在由第一矢量j指向第一矢量i的关联关系,a j,i表示第一矢量i与第一矢量j之间存在由第一矢量i指向第一矢量j的关联关系。 In an example, the matrix A T obtained after the matrix A is converted to rank is different from the matrix A. In other words, a i,j ≠a j,i . At this time, the association relationship between the first vector i and the first vector j is directional. For example, a i,j indicates that there is an association relationship between the first vector i and the first vector j, and a j,i indicates that there is a relationship between the first vector i and the first vector j. The first vector j points to the association relationship of the first vector i. Or, a i,j indicates that there is an association relationship between the first vector i and the first vector j from the first vector j to the first vector i, and a j,i indicates that there is a relationship between the first vector i and the first vector j The first vector i points to the association relationship of the first vector j.
603,获取第三关联关系信息,所述第三关联关系信息用于指示至少一个第三矢量组,每个第三矢量组包括满足所述先验假设的两个第四矢量。603. Acquire third association relationship information, where the third association relationship information is used to indicate at least one third vector group, and each third vector group includes two fourth vectors that satisfy the a priori hypothesis.
也就是说,第三关联关系信息反映了多个第四矢量之间是否存在关联关系。第三矢量组中包含存在关联关系的两个第四矢量。即该第三矢量组内的两个第四矢量之间存在满足先验假设的关联关系。例如,第三矢量组指示(第四矢量1,第四矢量2),那么第四矢量1与第四矢量2之间存在满足先验假设的关联关系。第三关联关系信息反映了多个第四矢量之间是否有影响,从而可以根据第三关联关系信息,得到能够反映数据关联性的数据处理结果。应理解,第四矢量与其自身可以存在关联关系。In other words, the third association relationship information reflects whether there is an association relationship between the multiple fourth vectors. The third vector group contains two fourth vectors that have an association relationship. That is, there is an association relationship between the two fourth vectors in the third vector group that satisfies the a priori assumption. For example, if the third vector group indicates (fourth vector 1, fourth vector 2), then there is an association relationship between the fourth vector 1 and the fourth vector 2 that satisfies the a priori assumption. The third association relationship information reflects whether there is an influence between the multiple fourth vectors, so that the data processing result that can reflect the data association can be obtained according to the third association relationship information. It should be understood that the fourth vector may have an association relationship with itself.
在一个示例中,由于多个第四矢量与该多个待训练数据一一对应,第三关联关系信息可以是根据多个待训练数据之间的关联关系确定的。也就是说,该第三关联关系信息与上文中的第六关联关系信息相同或实质相同。In an example, since the multiple fourth vectors are in one-to-one correspondence with the multiple to-be-trained data, the third association relationship information may be determined according to the association relationship between the multiple to-be-trained data. That is to say, the third correlation information is the same or substantially the same as the sixth correlation information above.
在一个示例中,第三关联关系信息与上文中第六关联关系信息不同。例如,可以根据多个第四矢量中任意两个第四矢量之间的相似度,确定该任意两个第四矢量之间是否存在关联关系。该相似度越大,关联越大;该相似度越小,关联越小。那么与第三关联关系信息对应的先验假设可以是,当该相似度超过预设值的情况下,可以认为该任意两个第四矢量之间存在关联关系;当该相似度未超过预设值的情况下,可以认为该任意两个第四矢量之间不存在关联关系。In an example, the third association relationship information is different from the above sixth association relationship information. For example, according to the similarity between any two fourth vectors in the multiple fourth vectors, it can be determined whether there is an association relationship between any two fourth vectors. The greater the similarity, the greater the association; the smaller the similarity, the smaller the association. Then the a priori hypothesis corresponding to the third association relationship information can be that when the similarity exceeds the preset value, it can be considered that there is an association relationship between any two fourth vectors; when the similarity does not exceed the preset value In the case of the value, it can be considered that there is no correlation between any two fourth vectors.
可以通过图模型反映第三关联关系信息。如图2所示,节点1、节点2、节点3可以分别对应第四矢量1、第四矢量2、第四矢量3。存在连接在节点1与节点2之间的边,所以第四矢量1与第四矢量2之间存在关联关系;存在连接在节点2与节点3之间的边,所以第四矢量2与第四矢量3之间存在关联关系;不存在连接在节点1与节点3之间的边,所以第四矢量1与第四矢量3之间不存在关联关系。The third association relationship information can be reflected through the graph model. As shown in Fig. 2, node 1, node 2, and node 3 may correspond to fourth vector 1, fourth vector 2, and fourth vector 3, respectively. There is an edge connected between node 1 and node 2, so there is an association between the fourth vector 1 and the fourth vector 2; there is an edge connected between node 2 and node 3, so the fourth vector 2 and the fourth vector There is an association relationship between vector 3; there is no edge connected between node 1 and node 3, so there is no association relationship between fourth vector 1 and fourth vector 3.
可选的,所述第三关联关系信息包括第四关联关系矩阵,所述第四关联关系矩阵中位于第一维度的矢量包括与所述多个第四矢量一一对应的多个元素,所述第四关联关系矩阵中位于第二维度的矢量包括与所述多个第四矢量一一对应的多个元素,其中,所述第四关联关系矩阵中任一元素用于指示所述任一元素在所述第一维度上对应的矢量与所述任一元素在所述第二维度上对应的矢量之间是否具有满足所述先验假设的关联关系。Optionally, the third association relationship information includes a fourth association relationship matrix, and a vector located in the first dimension in the fourth association relationship matrix includes a plurality of elements corresponding to the plurality of fourth vectors one-to-one, so The vector in the second dimension in the fourth correlation matrix includes multiple elements corresponding to the multiple fourth vectors one-to-one, wherein any element in the fourth correlation matrix is used to indicate any Whether the vector corresponding to the element in the first dimension and the vector corresponding to any element in the second dimension has an association relationship that satisfies the a priori hypothesis.
假设第四关联关系矩阵为B,Assuming that the fourth correlation matrix is B,
Figure PCTCN2019099653-appb-000018
Figure PCTCN2019099653-appb-000018
其中,B为l×l的矩阵,第i列对应第四矢量i,第j行对应第四矢量j,第i列、第j行的元素b i,j表示第四矢量i与第四矢量j之间是否存在满足先验假设的关联关系。当第四矢量i与第四矢量j之间存在关联关系,第i列、第j行的元素b i,j取值可以为1,第四矢量i与第四矢量j之间不存在关联关系,第i列、第j行的元素b i,j取值可以为0。或者,当第四矢量i与第四矢量j之间存在关联关系,第i列、第j行的元素b i,j取值可以为0,第四矢量i与第四矢量j之间不存在关联关系,第i列、第j行的元素b i,j取值可以为1。 Among them, B is a matrix of l×l, the i-th column corresponds to the fourth vector i, the j-th row corresponds to the fourth vector j, and the elements bi ,j in the i-th column and j-th row represent the fourth vector i and the fourth vector Whether there is an association relationship between j that satisfies the prior hypothesis. When there is an association relationship between the fourth vector i and the fourth vector j , the value of the element b i,j in the i-th column and the j-th row can be 1, and there is no association relationship between the fourth vector i and the fourth vector j , The value of element bi ,j in the i-th column and j-th row can be 0. Or, when there is an association relationship between the fourth vector i and the fourth vector j , the value of the element b i,j in the i-th column and the j-th row can be 0, and there is no relationship between the fourth vector i and the fourth vector j For the association relationship, the value of element bi , j in the i-th column and j-th row can be 1.
在一个示例中,矩阵B转秩后得到的矩阵B T与矩阵B相同。也就说,b i,j=b j,i。此时第四矢量i与第四矢量j之间的关联关系可以是没有方向性的。 In an example, the matrix B T obtained after the matrix B is transformed into the rank is the same as the matrix B. In other words, b i,j = b j,i . At this time, the association relationship between the fourth vector i and the fourth vector j may be non-directional.
在一个示例中,矩阵B转秩后得到的矩阵B T与矩阵B不同。也就说,b i,j≠b j,i。此时第四矢量i与第四矢量j之间的关联关系是有方向性的。例如,b i,j表示第四矢量i与第四矢量j之间存在由第四矢量i指向第四矢量j的关联关系,b j,i表示第四矢量i与第四矢量j之间存在由第四矢量j指向第四矢量i的关联关系。或者,b i,j表示第四矢量i与第四矢量j之间存在由第四矢量j指向第四矢量i的关联关系,b j,i表示第四矢量i与第四矢量j之间存在由第四矢量i指向第四矢量j的关联关系。 In an example, the matrix B T obtained after the matrix B is converted to rank is different from the matrix B. In other words, b i,j ≠b j,i . At this time, the correlation between the fourth vector i and the fourth vector j is directional. For example, b i,j indicates that there is an association relationship between the fourth vector i and the fourth vector j between the fourth vector i and the fourth vector j, and b j,i indicates that there is a relationship between the fourth vector i and the fourth vector j. The fourth vector j points to the association relationship of the fourth vector i. Or, b i, j indicates that there is an association relationship between the fourth vector i and the fourth vector j from the fourth vector j to the fourth vector i, and b j, i indicates that there is a relationship between the fourth vector i and the fourth vector j The fourth vector i points to the association relationship of the fourth vector j.
为了避免因矩阵数量过大造成计算困难,可以对第四关联关系矩阵进行压缩处理,得到维度较小的矩阵。In order to avoid calculation difficulties caused by the excessively large number of matrices, the fourth correlation matrix can be compressed to obtain a matrix with a smaller dimension.
在一个示例中,假设第四关联关系矩阵B为l×l的矩阵,第四关联关系矩阵B上与第四关联关系矩阵B对角线上元素间隔超过l’个元素的所有元素的取值均为0或均为1,l’<l,那么可以将划分为若干个小矩阵,小矩阵的最大行数为l’,且小矩阵的最大列数为l’。此过程又可以称作为对第四关联关系矩阵B稀疏化。In an example, suppose that the fourth associative relationship matrix B is an l×l matrix, and the values of all elements on the fourth associative relationship matrix B and the fourth associative relationship matrix B whose diagonal elements are separated by more than l'elements All 0 or all 1, l'<l, then can be divided into several small matrices, the maximum number of rows of the small matrix is l', and the maximum number of columns of the small matrix is l'. This process can also be referred to as the sparseness of the fourth correlation matrix B.
在一个示例中,假设第四关联关系矩阵B无法稀疏化,那么可以按照谱聚类方法压缩第四关联关系矩阵B。In an example, assuming that the fourth correlation matrix B cannot be sparsed, the fourth correlation matrix B can be compressed according to the spectral clustering method.
应理解,先验假设可以指示正向关联关系,也可以指示反向关联关系。例如,由于通常情况下画面帧间隔时间越短,画面中的内容越相关,因此,当先验假设指示8s内的画面帧之间存在关联关系时,可以理解为先验假设指示的是一种正向关联关系;当先验假设指示8s以外的画面帧之间存在关联关系时,可以理解为先验假设指示的是一种反向关联关系。It should be understood that the a priori hypothesis can indicate a forward association relationship or a reverse association relationship. For example, in general, the shorter the picture frame interval, the more relevant the content in the picture. Therefore, when the a priori hypothesis indicates that there is an association relationship between picture frames within 8s, it can be understood that the a priori hypothesis indicates a kind of Forward association relationship; when the a priori hypothesis indicates that there is an association relationship between picture frames other than 8s, it can be understood that the a priori hypothesis indicates a reverse association relationship.
604,将所述多个第四矢量以及所述第三关联关系信息输入所述第二神经网络模型,得到针对第一待训练数据的第一处理结果,所述第一待训练数据是所述多个待训练数据中的任一数据,所述第一处理结果用于修正所述第二神经网络模型的权重参数。604. Input the plurality of fourth vectors and the third association relationship information into the second neural network model to obtain a first processing result for the first data to be trained, and the first data to be trained is the For any one of the multiple data to be trained, the first processing result is used to modify the weight parameter of the second neural network model.
也就是说,将第一神经网络模型的输出结果以及该输出结果内部的关联关系输入第二神经网络模型。将多个第四矢量输入第二神经网络模型,可以理解为将多个待训练数据的特征表示输入第二神经网络模型。将第三关联关系信息输入第二神经网络模型,可以理解为将多个第四矢量中任意两个第四矢量之间是否有影响的信息输入第二神经网络模型。多个第四矢量可以理解为图模型中的节点,第三关联关系信息可以用于表示节点与节点之间 是否存在边。因此,第二神经网络模型可以为图神经网络模型。That is to say, the output result of the first neural network model and the correlation relationship within the output result are input into the second neural network model. Inputting multiple fourth vectors into the second neural network model can be understood as inputting multiple feature representations of the data to be trained into the second neural network model. Inputting the third association relationship information into the second neural network model can be understood as inputting information about whether there is an influence between any two fourth vectors among the plurality of fourth vectors into the second neural network model. Multiple fourth vectors can be understood as nodes in the graph model, and the third association relationship information can be used to indicate whether there are edges between nodes. Therefore, the second neural network model may be a graph neural network model.
第二神经网络模型对多个第四矢量以及所述第三关联关系信息进行处理,可以是根据第二神经网络模型中的权重参数,确定任意两个第四矢量之间是否有影响以及具体影响程度是多少,从而得到第一待训练数据的处理结果。第一待训练数据的处理结果可以是第一待训练数据的特征表示,也可以是第一待训练数据的识别结果。第一待训练数据的处理结果可以是一个矢量。The second neural network model processes multiple fourth vectors and the third association relationship information, which can be based on the weight parameters in the second neural network model to determine whether any two fourth vectors have influence and the specific influence What is the degree to obtain the processing result of the first data to be trained. The processing result of the first to-be-trained data may be a feature representation of the first to-be-trained data, or may be a recognition result of the first to-be-trained data. The processing result of the first data to be trained may be a vector.
假设多个第四矢量为l个第四矢量,分别用y 1,…,y l表示。其中,l≤i≤l,l≤t≤s,
Figure PCTCN2019099653-appb-000019
那么,合并该多个第四矢量,可以得到矩阵Y,Y={y 1,…,y i,…,y l}。假设第三关联关系信息为上文中提到的第四关联关系矩阵Q。
Suppose that multiple fourth vectors are l fourth vectors, which are represented by y 1 ,..., y l respectively . Among them, l≤i≤l, l≤t≤s,
Figure PCTCN2019099653-appb-000019
Then, by combining the multiple fourth vectors, a matrix Y, Y={y 1 ,..., y i ,..., y l } can be obtained. Assume that the third correlation information is the fourth correlation matrix Q mentioned above.
先假设两个待训练权重矩阵W 1、W 2、…、W h。W 1、W 2、…、W h的维度均为s*s h。意味着W 1、W 2、…、W h均包含s*s h个权重参数。s h=s/h,其中,h用于表示图注意力神经网络的头数(头数又可被称为分片数)。s h通常被称为单头维度。 First assume two weight matrices to be trained W 1 , W 2 , ..., W h . The dimensions of W 1 , W 2 , ..., W h are all s*s h . Means W 1, W 2, ..., W h contains s * s h a weight parameter. s h = s/h, where h is used to represent the number of heads of the graph attention neural network (the number of heads can also be called the number of slices). s h commonly known as single head dimensions.
此时分别计算U 1=Y·W 1,U 2=Y·W 2,…,U h=Y·W h。很显然,此时U 1、U 2、…、U h的维度均为l*s hAt this time, U 1 =Y·W 1 , U 2 =Y·W 2 ,..., U h =Y·W h are calculated respectively. Obviously, the dimensions of U 1 , U 2 , ..., U h are all l*s h at this time .
然后计算V i,j=U i·U j T,i≠j,1≤i≤h,且1≤j≤h。此时V i,j维度已经是l*l。然后对V i,j的每一行应用Softmax函数求归一化概率,得到R i,j。R i,j仍然为l*l矩阵,这个矩阵就可以理解成每个点之间的相互注意力强度矩阵。 Then calculate V i,j =U i ·U j T , i≠j, 1≤i≤h, and 1≤j≤h. At this time , the dimensions of Vi,j are already l*l. Then apply the Softmax function to each row of Vi,j to find the normalized probability, and get R i,j . R i,j is still an l*l matrix, this matrix can be understood as the mutual attention intensity matrix between each point.
此后对R i,j与Q实施矩阵对位元素相乘,得到经过Q关系掩码后的E i,j。E i,j可以理解成根据边关系筛选出有关联的点,保留它们之间的注意力,无关点的注意力就不被保留。这个矩阵中就包含了大量的节点的相互关联信息,因此信息含量较为丰富。然后用E i,j·U i就可以得到每个点被其他点信息更新后的最终表达U inew。U inew的维度是l*s hAfter that, R i,j and Q are multiplied by matrix elements to obtain E i,j after the Q relation mask. E i,j can be understood as filtering out related points according to the edge relationship, keeping the attention between them, and not keeping the attention of irrelevant points. This matrix contains a large number of interrelated information of nodes, so the information content is relatively rich. Then use E i,j ·U i to get the final expression U inew after each point is updated by other point information. The dimension of U inew is l*s h .
最后将U 1new,…,U inew,…,U hnew拼接起来,得到Y’矩阵,Y’={U 1new,…,U inew,…,U hnew},Y’的维度为l*s。可以看出,Y’包含了节点之间相互关联的信息以及权重参数。 Finally, U 1new , …, U inew , …, U hnew are spliced together to obtain Y'matrix , Y'={U 1new ,…, U inew ,…, U hnew }, and the dimension of Y'is l*s. It can be seen that Y'contains the correlation information between nodes and the weight parameters.
上述过程为一层网络的数据处理过程,如果图注意力神经网络模型的深度为h’,即包括h’层网络,则可以将当前层输出的Y’输入下一层网络,即当前层输出的Y’视作下一层网络的Y,进行与上述相同或类似的数据处理过程。The above process is the data processing process of a one-layer network. If the depth of the graph attention neural network model is h', which includes the h'-layer network, the Y'output from the current layer can be input to the next layer of the network, that is, the current layer output The Y'is regarded as the Y of the next layer of network, and the data processing process is the same or similar to the above.
可以看出,Y’与Y相比,矩阵大小不变,但Y’中每个元素包含Y中一个或多个元素的信息。通过整合具有关联关系的数据,第二神经网络模型在识别某一特征时可以获取更多的信息量,提高识别准确率。It can be seen that compared with Y, Y'has the same matrix size, but each element in Y'contains information about one or more elements in Y. By integrating data with association relationships, the second neural network model can obtain more information when recognizing a certain feature, and improve the recognition accuracy.
在一个示例中,多个待训练数据包括第一待训练数据,该多个待训练数据还包括与该第一待训练数据相关联的一个或多个关联数据,第二神经网络模型可以根据第三关联关系信息,结合关联数据对第一待训练数据的影响,从而得到与该第一待训练数据对应的处理结果。换句话说,第二神经网络模型除了对第一待训练数据进行特征提取,还对与该第一待训练数据有关联关系的其他待训练数据进行特征提取,因此扩充了预测过程中输入的数据量,有助于提高识别准确率。In an example, the plurality of data to be trained includes first data to be trained, the plurality of data to be trained further includes one or more associated data associated with the first data to be trained, and the second neural network model may be based on the first Three association relationship information, combined with the influence of the associated data on the first data to be trained, so as to obtain the processing result corresponding to the first data to be trained. In other words, in addition to feature extraction of the first data to be trained, the second neural network model also extracts features of other data to be trained that has an association relationship with the first data to be trained, thus expanding the data input in the prediction process Quantities help improve the accuracy of recognition.
在一个示例中,多个待训练数据包括第一待训练数据,该第一待训练数据可以对应目标矢量,该多个第四矢量还包括与该目标矢量相关联的一个或多个关联矢量,该多个待训练数据包括与该一个或多个关联矢量一一对应的待训练数据。第二神经网络模型可以根据第三关联关系信息,结合关联矢量对目标矢量的影响,从而得到与该第一待训练数据对应 的处理结果。换句话说,第二神经网络模型除了对目标矢量进行特征提取,还对与该目标矢量有关联关系的关联矢量进行特征提取,因此扩充了预测过程中的数据处理量,有助于提高识别准确率。In an example, the plurality of data to be trained includes first data to be trained, the first data to be trained may correspond to a target vector, and the plurality of fourth vectors further include one or more associated vectors associated with the target vector, The plurality of data to be trained includes the data to be trained in a one-to-one correspondence with the one or more associated vectors. The second neural network model can combine the influence of the correlation vector on the target vector according to the third correlation information, so as to obtain the processing result corresponding to the first data to be trained. In other words, the second neural network model not only performs feature extraction on the target vector, but also performs feature extraction on the associated vector that has an associated relationship with the target vector, thus expanding the amount of data processing in the prediction process and helping to improve recognition accuracy. rate.
另外,第二神经网络模型可以输出与所述多个待训练数据一一对应的多个处理结果。也就是说,第二神经网络模型综合多个第四矢量以及各个第四矢量之间的关联关系,输出与所述多个待训练数据一一对应的多个处理结果。In addition, the second neural network model may output multiple processing results corresponding to the multiple data to be trained in a one-to-one correspondence. That is, the second neural network model synthesizes multiple fourth vectors and the association relationship between each fourth vector, and outputs multiple processing results corresponding to the multiple data to be trained in a one-to-one correspondence.
假设一个场景,第四矢量A与第四矢量B之间存在第一关联关系,第四矢量A与第四矢量C之间存在第二关联关系,那么,这两种关联关系的关联紧密程度可以相同,也可以不同。例如,相同段落内间隔距离较远的两个语句之间的关联紧密程度较低,相同段落内间隔距离较近的两个语句之间的关联紧密程度较高。又如,间隔时长较长的两帧画面之间的关联紧密程度较低,间隔时长较短的两帧画面之间的关联紧密程度较高。为了表达两种关联关系紧密程度的大小,可以有多种表达方式。Assuming a scene, there is a first association between the fourth vector A and the fourth vector B, and a second association exists between the fourth vector A and the fourth vector C, then the closeness of the association between the two associations can be The same can be different. For example, two sentences in the same paragraph that are far apart are closely related to each other, and two sentences that are closer to each other in the same paragraph are closely related to each other. For another example, two frames with a longer interval have a lower degree of correlation, and two frames with a shorter interval have a higher degree of correlation. In order to express the degree of closeness of the two associations, there can be multiple expressions.
在一个示例中,第三关联关系信息为矩阵,矩阵中的元素的数值大小用于表示关联关系紧密程度,数值越大,关联关系越紧密。然而,确定数值具体大小往往引入多余的人为设定,或者会加大神经网络模型的训练难度。In an example, the third association relationship information is a matrix, and the numerical value of the elements in the matrix is used to indicate the closeness of the association relationship. The larger the value, the tighter the association relationship. However, determining the specific size of the value often introduces redundant artificial settings, or will increase the difficulty of training the neural network model.
在一个示例中,当第三关联关系信息中存在关联关系紧密和关联关系疏远的两种第四矢量组的情况下,可以建立第四关联关系信息,该第四关联关系信息用于表示关联关系紧密的第四矢量组。也就是说,关联关系紧密的两个第四矢量之间的影响程度可以通过第四关联关系信息得到强化。In an example, when there are two types of fourth vector groups in the third association relationship information, which have a close association relationship and a distant association relationship, the fourth association relationship information can be established, and the fourth association relationship information is used to indicate the association relationship. The tight fourth vector group. In other words, the degree of influence between the two fourth vectors with a close relationship can be strengthened by the fourth relationship information.
可选的,所述第三关联关系信息用于指示M个第四矢量组,M为大于1的整数,在所述将所述多个第四矢量以及所述第三关联关系信息输入所述第二神经网络模型,得到针对第一待训练数据的第一处理结果之前,所述方法还包括:获取第四关联关系信息,所述第四关联关系信息用于指示m个第五矢量组,所述m个第五矢量组属于所述M个第四矢量组,m小于M,且m为正整数;所述将所述多个第四矢量以及所述第三关联关系信息输入所述第二神经网络模型,得到针对第一待训练数据的第一处理结果,包括:将所述多个第四矢量、所述第三关联关系信息、第四关联关系信息输入所述第二神经网络模型,得到所述第一处理结果。Optionally, the third association relationship information is used to indicate M fourth vector groups, where M is an integer greater than 1, and the plurality of fourth vectors and the third association relationship information are input into the Before the second neural network model obtains the first processing result for the first data to be trained, the method further includes: obtaining fourth association relationship information, where the fourth association relationship information is used to indicate m fifth vector groups, The m fifth vector groups belong to the M fourth vector groups, m is less than M, and m is a positive integer; and the plurality of fourth vectors and the third association relationship information are input into the first A second neural network model to obtain the first processing result for the first data to be trained, including: inputting the plurality of fourth vectors, the third association relationship information, and the fourth association relationship information into the second neural network model , To obtain the first processing result.
第四关联关系信息中指示的信息被包含在第三关联关系信息中。也就是说,每个第四矢量组内的两个第四矢量之间一定存在满足先验假设的关联关系。The information indicated in the fourth association relationship information is included in the third association relationship information. In other words, there must be an association relationship between the two fourth vectors in each fourth vector group that satisfies the a priori hypothesis.
假设该第三关联关系信息与上文中的第六关联关系信息相同或实质相同,那么第三关联关系信息可以反映多个待训练数据之间的关联关系,第四关联关系信息可以反映多个待训练数据之间是否存在紧密的关联关系。Assuming that the third correlation information is the same or substantially the same as the sixth correlation information above, the third correlation information can reflect the correlation between multiple data to be trained, and the fourth correlation information can reflect multiple data to be trained. Whether there is a close relationship between the training data.
以文本数据为例,当先验假设为属于同段落的多个句子之间存在关联,那么第三关联关系信息可以指示同段落内不同句子之间存在关联,第四关联关系信息可以指示同段落内相邻句子之间存在紧密关联。Taking text data as an example, when the a priori assumption is that there is an association between multiple sentences belonging to the same paragraph, the third association relationship information can indicate that there is an association between different sentences in the same paragraph, and the fourth association relationship information can indicate the same paragraph. There are close associations between adjacent sentences within.
以图片数据为例,当先验假设为间隔小于8s的两帧画面之间存在关联,那么第三关联关系信息可以指示间隔小于8s的两帧画面之间存在关联,第四关联关系信息可以指示间隔小于2s的两帧画面之间存在紧密关联。Taking picture data as an example, when the a priori assumption is that there is an association between two frames with an interval of less than 8s, the third association information can indicate that there is an association between two frames with an interval of less than 8s, and the fourth association information can indicate There is a close correlation between two frames with an interval of less than 2s.
以视频数据为例,当先验假设为最小间隔小于8s的两段视频之间存在关联,那么第 三关联关系信息可以指示最小间隔小于8s的两段视频之间存在关联,第四关联关系信息可以指示最小间隔小于2s的两段视频之间存在紧密关联。Taking video data as an example, when the a priori assumption is that there is an association between two videos with a minimum interval of less than 8s, the third association information can indicate that there is an association between two videos with a minimum interval of less than 8s, and the fourth association information It can indicate that there is a close correlation between two videos whose minimum interval is less than 2s.
以音频数据为例,当先验假设为最小间隔小于8s的两段音频之间存在关联,那么第三关联关系信息可以指示最小间隔小于8s的两段音频之间存在关联,第四关联关系信息可以指示最小间隔小于2s的两段音频之间存在紧密关联。Taking audio data as an example, when the a priori assumption is that there is an association between two pieces of audio with a minimum interval of less than 8s, the third association relationship information can indicate that there is an association between two pieces of audio with a minimum interval of less than 8s, and the fourth association relationship information It can indicate that there is a close correlation between two audio segments with a minimum interval of less than 2s.
假设该第三关联关系信息与上文中的第六关联关系信息不同,那么第三关联关系信息可以反映多个第四矢量之间的相似度,第四关联关系信息可以反映多个第四矢量中相似度较高的两个第四矢量。Assuming that the third association relationship information is different from the sixth association relationship information above, the third association relationship information can reflect the similarity between multiple fourth vectors, and the fourth association relationship information can reflect the multiple fourth vectors. Two fourth vectors with higher similarity.
例如,当先验假设为两个第四矢量之间的相似度超过预设值,那么第三关联关系信息可以指示相似度超过该预设值1的两个第四矢量之间存在关联,第四关联关系信息可以指示相似度超过预设值2的两个第四矢量之间存在关联,预设值2大于预设值1。For example, when the a priori assumption is that the similarity between two fourth vectors exceeds a preset value, the third association relationship information may indicate that there is an association between two fourth vectors whose similarity exceeds the preset value 1. The four-association relationship information may indicate that there is an association between two fourth vectors whose similarity exceeds the preset value 2, and the preset value 2 is greater than the preset value 1.
应理解,与第三关联关系信息类似,第四关联关系信息可以包含用于表示m个第四矢量组的矩阵。It should be understood that, similar to the third association relationship information, the fourth association relationship information may include a matrix for representing m fourth vector groups.
得到针对第一待训练数据的第一处理结果后,可以通过损失函数修正第二神经网络模型的权重参数。After obtaining the first processing result for the first to-be-trained data, the weight parameter of the second neural network model can be corrected through the loss function.
在一个示例中,可以根据第一待训练数据的标签和第一处理结果之间的距离,使用损失函数修正第二神经网络模型的权重参数。例如,当第一待训练数据的标签和第一处理结果之间的距离较近(即相似程度较高)时,说明权重参数较为合适,权重参数的修正幅度越小;第一待训练数据的标签和第一处理结果之间的距离较远(即相似程度较低)时,说明权重参数不太合适,可以增大对权重参数的修正幅度。In an example, the weight parameter of the second neural network model can be modified by using a loss function according to the distance between the label of the first data to be trained and the first processing result. For example, when the distance between the label of the first data to be trained and the first processing result is relatively close (that is, the degree of similarity is high), it means that the weight parameter is more appropriate, and the correction range of the weight parameter is smaller; When the distance between the label and the first processing result is far (that is, the similarity is low), it indicates that the weight parameter is not suitable, and the correction range of the weight parameter can be increased.
在一个示例中,将所述多个第四矢量以及所述第三关联关系信息输入所述第二神经网络模型,得到针对第一待训练数据的第一处理结果以及针对第二待训练数据的第二处理结果,所述第一待训练数据、所述第二待训练数据是所述多个待训练数据中的任意两个数据,所述第一处理结果以及所述第二处理结果之间的相似度用于修正所述第二神经网络模型的权重参数。例如,第一处理结果与第二处理结果之间的相似度为相似度1,与该第一处理结果对应的第四矢量为第四矢量1,与该第二处理结果对应的第四矢量为第四矢量2,第四矢量1与第四矢量2之间的相似度为相似度2。当相似度1与相似度2的差异较小时,说明权重参数较为合适,权重参数的修正幅度越小;当相似度1与相似度2的差异较小时,说明权重参数不太合适,可以增大对权重参数的修正幅度。In an example, the plurality of fourth vectors and the third association relationship information are input into the second neural network model to obtain the first processing result for the first data to be trained and the data for the second data to be trained The second processing result, the first data to be trained and the second data to be trained are any two data of the plurality of data to be trained, and the difference between the first processing result and the second processing result The similarity of is used to modify the weight parameter of the second neural network model. For example, the similarity between the first processing result and the second processing result is similarity 1, the fourth vector corresponding to the first processing result is fourth vector 1, and the fourth vector corresponding to the second processing result is The fourth vector 2, the similarity between the fourth vector 1 and the fourth vector 2 is similarity 2. When the difference between similarity 1 and similarity 2 is small, the weight parameter is more appropriate, and the correction amplitude of the weight parameter is smaller; when the difference between similarity 1 and similarity 2 is small, the weight parameter is not suitable and can be increased The magnitude of the correction to the weight parameter.
可选的,所述得到针对第一待训练数据的第一处理结果,包括:得到所述第一处理结果以及针对第二待训练数据的第二处理结果,所述第一待训练数据的标签为第一标签,所述第二待训练数据的标签为第二标签,所述第一待训练数据与所述第二待训练数据为所述多个待训练数据中的任意两个数据;所述方法还包括:将所述第一标签与所述第二标签之间的相似度同所述第一处理结果与所述第二处理结果之间的相似度进行匹配,得到匹配结果,所述匹配结果用于修正所述第二神经网络模型的权重参数。Optionally, the obtaining a first processing result for the first data to be trained includes: obtaining the first processing result and a second processing result for the second data to be trained, a label of the first data to be trained Is the first label, the label of the second data to be trained is the second label, and the first data to be trained and the second data to be trained are any two data of the plurality of data to be trained; The method further includes: matching the similarity between the first label and the second label with the similarity between the first processing result and the second processing result to obtain a matching result, the The matching result is used to modify the weight parameter of the second neural network model.
上文中提到的第六关联关系信息中可以不包括第一标签与第二标签之间相似度的信息,也就是说,第一待处理数据与第二待处理数据之间的关联关系可以与第一标签与第二标签之间相似度无关。上文中提到的第六关联关系信息可以将可能存在关联的多个数据关联起来,增加第二神经网络模型处理数据的数据量。第一标签与第二标签的相似度用于评 价第一处理结果与第二处理结果是否准确。The sixth association relationship information mentioned above may not include the similarity information between the first label and the second label, that is, the association relationship between the first data to be processed and the second data to be processed may be The similarity between the first label and the second label is irrelevant. The sixth association relationship information mentioned above can associate multiple data that may have associations, and increase the amount of data processed by the second neural network model. The similarity between the first label and the second label is used to evaluate whether the first processing result and the second processing result are accurate.
以文本数据为例,当第一标签为散文,第二标签为议论文,意味着第一处理结果与第二处理结果之间的相似度应当较低。当第一处理结果为环境治理,第二处理结果为能源供应,第一处理结果与第二处理结果之间的相似度较高,说明第二神经网络模型的权重参数不合适,可以使用损失函数修正第二神经网络模型的权重参数。Taking text data as an example, when the first label is prose and the second label is argumentative, it means that the similarity between the first processing result and the second processing result should be low. When the first processing result is environmental governance and the second processing result is energy supply, the similarity between the first processing result and the second processing result is relatively high, indicating that the weight parameters of the second neural network model are inappropriate, and the loss function can be used Modify the weight parameters of the second neural network model.
以图片数据为例,当第一标签为兔子,第二标签也为兔子,意味着第一处理结果与第二处理结果之间的相似度应当较高。当第一处理结果为长耳,第二处理结果为短耳时,那么第一处理结果与第二处理结果之间的相似度较低,说明第二神经网络模型的权重参数可能不合适,可以使用损失函数修正第二神经网络模型的权重参数。Taking picture data as an example, when the first label is rabbit and the second label is rabbit, it means that the similarity between the first processing result and the second processing result should be high. When the first processing result is long ears and the second processing result is short ears, then the similarity between the first processing result and the second processing result is low, indicating that the weight parameters of the second neural network model may not be appropriate. Use the loss function to modify the weight parameters of the second neural network model.
以视频数据为例,当第一标签为会议,第二标签为车载,意味着第一处理结果与第二处理结果之间的相似度应当较低。当第一处理结果为项目调研,第二处理结果为道路交通,那么第一处理结果与第二处理结果之间的相似度较低,说明第二神经网络模型的权重参数可能是合适的,那么损失函数第二神经网络模型的权重参数的修正幅度较小。Taking video data as an example, when the first tag is a meeting and the second tag is a vehicle, it means that the similarity between the first processing result and the second processing result should be low. When the first processing result is project investigation and the second processing result is road traffic, then the similarity between the first processing result and the second processing result is low, indicating that the weight parameters of the second neural network model may be appropriate, then The weight parameter of the second neural network model of the loss function has a small correction range.
以音频数据为例,当第一标签为虫声,第二标签也为虫声,意味着第一处理结果与第二处理结果之间的相似度应当较高。当第一处理结果为蚊子,第二处理结果为苍蝇时,那么第一处理结果与第二处理结果之间的相似度较高,说明第二神经网络模型的权重参数可能是合适的,那么损失函数第二神经网络模型的权重参数的修正幅度较小。Taking audio data as an example, when the first label is a bug sound, the second label is also a bug sound, which means that the similarity between the first processing result and the second processing result should be high. When the first processing result is mosquitoes and the second processing result is flies, the similarity between the first processing result and the second processing result is high, indicating that the weight parameters of the second neural network model may be appropriate, so the loss The correction amplitude of the weight parameter of the second neural network model of the function is small.
下面给出一种可能的损失函数loss的形式。A possible form of loss function loss is given below.
Figure PCTCN2019099653-appb-000020
Figure PCTCN2019099653-appb-000020
其中,y i’表示针对待训练数据i的处理结果i,y j’表示针对待训练数据j的处理结果j,z i表示待训练数据i的标签i,z j表示待训练数据j的标签j。函数C(y i’,y j’)表示处理结果i与处理结果j的相似度,函数C(z i,z j)表示标签i与标签j的相似度。矩阵D可以是用于放大处理结果i与处理结果j的相似度的矩阵。 Among them, y i 'represents the processing result i for the data i to be trained, y j ' represents the processing result j for the data j to be trained, z i represents the label i of the data i to be trained, z j represents the label of the data j to be trained j. Function C (y i ', y j ') represents the degree of similarity with the processing result of the processing result i j, the function C (z i, z j) represents the similarity of labels i and j labels. The matrix D may be a matrix for amplifying the similarity between the processing result i and the processing result j.
例如,有标签a、b、c。当待训练数据i的标签包括标签a,不包括标签b、标签c时,待训练数据i的标签可以通过(1,0,0)表示。例如,有标签a、b、c。当待训练数据i的标签包括标签b,不包括标签a、标签c时,待训练数据i的标签可以通过(0,1,0)表示。当待训练数据i的标签包括标签a、标签c,不包括标签b时,待训练数据i的标签可以通过(1,0,1)表示。当待训练数据i的标签包括标签a、标签b、标签c时,待训练数据i的标签可以通过(1,1,1)表示。For example, there are labels a, b, and c. When the label of the data i to be trained includes label a, but does not include the labels b and c, the label of the data i to be trained can be represented by (1, 0, 0). For example, there are labels a, b, and c. When the label of the data i to be trained includes label b, but does not include the labels a and c, the label of the data i to be trained can be represented by (0, 1, 0). When the label of the data i to be trained includes label a and label c but does not include the label b, the label of the data i to be trained can be represented by (1, 0, 1). When the label of the data i to be trained includes label a, label b, and label c, the label of the data i to be trained can be represented by (1, 1, 1).
可选的,所述多个待训练数据包括一个或多个目标类型数据,每个目标类型数据具有用于修正所述权重参数的标签。Optionally, the plurality of to-be-trained data includes one or more target type data, and each target type data has a label for modifying the weight parameter.
也就是说,多个待训练数据包括第一类型数据以及第二类型数据,属于第一类型数据的待训练数据具有标签,属于第二类型数据的待训练数据不具有标签。因此,可以根据第一类型数据的处理结果与第一类型数据的标签之间的距离,修正第二神经网络模型的权重参数。该第一类型数据的处理结果与该第一类型数据的标签之间的距离,可以理解为第一类型数据的处理结果与该第一类型数据的标签之间的相似程度。信息距离的具体计算方法可以是交叉熵、KL散度、JS散度等方式。第二类型数据不具有标签,但由于第一类型数据与第二类型数据之间可能存在关联,因此,在获得第一类型数据的处理结果的过程中可 以引入第二类型数据。也就是说,第二神经网络模型可以是半监督模型,即该多个待训练数据可以包括没有标签的数据。为了保证第二神经网络模型的训练可靠性,第一类型数据占多个待训练数据的比重一般不小于5%-10%。That is, the plurality of data to be trained includes the first type of data and the second type of data. The data to be trained belonging to the first type of data has a label, and the data to be trained belonging to the second type of data does not have a label. Therefore, the weight parameter of the second neural network model can be corrected according to the distance between the processing result of the first type of data and the label of the first type of data. The distance between the processing result of the first type of data and the label of the first type of data can be understood as the degree of similarity between the processing result of the first type of data and the label of the first type of data. The specific calculation method of the information distance can be cross entropy, KL divergence, JS divergence, etc. The second type of data does not have a label, but because there may be an association between the first type of data and the second type of data, the second type of data can be introduced in the process of obtaining the processing result of the first type of data. In other words, the second neural network model may be a semi-supervised model, that is, the plurality of data to be trained may include data without labels. In order to ensure the training reliability of the second neural network model, the proportion of the first type of data in the multiple data to be trained is generally not less than 5%-10%.
可选的,第一处理结果还用于修正第一神经网络模型的权重参数。Optionally, the first processing result is also used to modify the weight parameter of the first neural network model.
也就是说,第一处理结果除了用于修正第二神经网络模型的权重参数,还可以修正第一神经网络模型的权重参数。In other words, the first processing result can be used to modify the weight parameter of the second neural network model as well as the weight parameter of the first neural network model.
一个示例中,可以将第一处理结果以及第一待训练数据的标签输入第一神经网络模型的损失函数,修正第一神经网络模型的权重参数。In an example, the first processing result and the label of the first data to be trained may be input to the loss function of the first neural network model to modify the weight parameter of the first neural network model.
在将多个待训练数据输入第一神经网络模型之前,第一神经网络模型可以是不受场景限制或受场景约束度较小的神经网络模型。该多个待训练数据可以是某个特定场景的数据,因此,可以根据第一处理结果修正第一神经网络模型的权重参数,从而使第一神经网络模型能够适应该特殊场景。Before inputting a plurality of data to be trained into the first neural network model, the first neural network model may be a neural network model that is not restricted by the scene or is less restricted by the scene. The plurality of to-be-trained data may be data of a certain specific scene. Therefore, the weight parameter of the first neural network model may be modified according to the first processing result, so that the first neural network model can adapt to the special scene.
应理解,第一神经网络模型、第二神经网络模型可以一个神经网络模型中的两个子模型。It should be understood that the first neural network model and the second neural network model may be two sub-models in one neural network model.
下面通过具体的示例介绍第一神经网络模型和第二神经网络模型在训练和预测中所能够实现的效果。The following specific examples introduce the effects that the first neural network model and the second neural network model can achieve in training and prediction.
示例一Example one
获取某公司全部摄像头在某个月份内拍摄到的全部图片,共计约10万张图片。将这10万张图片中的9万张作为多个待训练数据输入第一神经网络模型,其中每一张图片可以是一个待训练数据。剩余1万张图片可以作为验证数据,用于验证第二神经网络模型的权重参数是否合适。为例便于描述,该9万张图片组成训练数据集,该1万张图片组成验证数据集。Get all the pictures taken by all the cameras of a certain company in a certain month, a total of about 100,000 pictures. 90,000 of these 100,000 pictures are input into the first neural network model as multiple data to be trained, and each picture can be a data to be trained. The remaining 10,000 pictures can be used as verification data to verify whether the weight parameters of the second neural network model are appropriate. As an example for ease of description, the 90,000 pictures constitute the training data set, and the 10,000 pictures constitute the verification data set.
选取训练数据集中的1万张图片作为具有标签的第一类型数据,那么,训练数据集中剩余8万张图片为不具有标签的第二类型数据。获取第一类型数据的标签。Select 10,000 pictures in the training data set as the first type of data with labels, then the remaining 80,000 pictures in the training data set are the second type data without labels. Get the label of the first type of data.
使用第一神经网络模型对该训练数据集进行处理,得到与该训练数据集一一对应的9万个第四矢量。第一神经网络模型可以是多粒度网络(multiple granularity network,MGN)模型。多粒度网络模型是一种卷积神经网络模型。每个第四矢量可以包括1024个元素,每个第四矢量为一个图片的特征表示。Using the first neural network model to process the training data set, 90,000 fourth vectors corresponding to the training data set are obtained. The first neural network model may be a multiple granularity network (multiple granularity network, MGN) model. The multi-granularity network model is a convolutional neural network model. Each fourth vector may include 1024 elements, and each fourth vector is a feature representation of a picture.
获取先验假设。先验假设例如可以是以下中的一个或多个:Get a priori hypothesis. The a priori hypothesis can be one or more of the following, for example:
(1)间隔时长8s以内的两张图片之间存在关联关系。(1) There is an association relationship between two pictures with an interval of less than 8s.
(2)源自同一摄像头的两张图片之间存在关联关系。(2) There is an association between two pictures from the same camera.
(3)图像相似度大于50%的两张图片之间存在关联关系。(3) There is an association relationship between two pictures with an image similarity greater than 50%.
应理解,先验假设的具体内容和第一神经网络模型、第二神经网络模型所应用的场景有关,在此不做限定。It should be understood that the specific content of the a priori hypothesis is related to the application scenarios of the first neural network model and the second neural network model, and is not limited here.
根据先验假设,可以确定用于指示9万个第四矢量之间的关联关系的第三关联关系信息。According to the a priori assumption, the third association relationship information used to indicate the association relationship between 90,000 fourth vectors can be determined.
将9万个第四矢量以及第三关联关系信息输入第二神经网络模型,得到针对第一类型数据的处理结果。其中,由于第一类型数据与第二类型数据之间可以有关联,因此第一类型数据的处理结果考虑到了第二类型数据的内容。The 90,000 fourth vectors and the third association relationship information are input into the second neural network model to obtain the processing result for the first type of data. Among them, since the first type of data may be associated with the second type of data, the processing result of the first type of data takes into account the content of the second type of data.
将第一类型数据的处理结果与第一类型数据的标签进行匹配,可以修正第二神经网络模型的参数。Matching the processing result of the first type of data with the label of the first type of data can modify the parameters of the second neural network model.
之后将验证数据集中的数据输入第一神经网络模型,得到针对验证数据集的多个第四矢量;再将针对验证数据集的多个第四矢量输入第二神经网络模型,并且根据先验假设,向第二神经网络模型输入针对验证数据集的多个第四矢量之间的关联关系,得到针对验证数据集的数据处理结果。再将该数据处理结果与验证数据集的标签进行匹配,得到该第一神经网络模型和第二神经网络模型的识别能力。通过实际应用,采用平均精度均值(mean average precision,mAP)对训练后的神经网络模型打分。与传统的神经网络模型相比,打分结果可以提升4-20分。也就是说,本申请提供的训练神经网络模型的方法可以增强神经网络模型。Then input the data in the verification data set into the first neural network model to obtain multiple fourth vectors for the verification data set; then input multiple fourth vectors for the verification data set into the second neural network model, and according to a priori assumptions , Input the association relationship between the multiple fourth vectors for the verification data set to the second neural network model to obtain the data processing result for the verification data set. Then, the data processing result is matched with the label of the verification data set to obtain the recognition ability of the first neural network model and the second neural network model. Through practical application, the mean average precision (mAP) is used to score the trained neural network model. Compared with the traditional neural network model, the scoring result can be improved by 4-20 points. In other words, the method for training a neural network model provided in this application can enhance the neural network model.
示例二Example two
获取某公司机器人客服在某个月份内收集到的中文文本问题,共计约1.5万条中文文本问题。将这1.5万条中文文本问题中的0.8万条中文文本问题作为多个待训练数据输入第一神经网络模型,其中每一条中文文本问题可以是一个待训练数据。剩余0.7万条中文文本问题可以作为验证数据,用于验证第二神经网络模型的权重参数是否合适。为例便于描述,该0.8万条中文文本问题组成训练数据集,该0.7万条中文文本问题组成验证数据集。Get the Chinese text questions collected by a company's robot customer service in a certain month, a total of about 15,000 Chinese text questions. Among the 15,000 Chinese text questions, 8,000 Chinese text questions are input into the first neural network model as multiple data to be trained, and each Chinese text question can be one data to be trained. The remaining 7,000 Chinese text questions can be used as verification data to verify whether the weight parameters of the second neural network model are appropriate. As an example for ease of description, the 8,000 Chinese text questions constitute a training data set, and the 7,000 Chinese text questions constitute a verification data set.
选取训练数据集中的0.2万条中文文本问题作为具有标签的第一类型数据,那么,训练数据集中剩余0.6万条中文文本问题为不具有标签的第二类型数据。获取第一类型数据的标签。Selecting 2,000 Chinese text questions in the training data set as the first type of data with labels, then the remaining 6,000 Chinese text questions in the training data set are the second type data without labels. Get the label of the first type of data.
使用第一神经网络模型对该训练数据集进行处理,得到与该训练数据集一一对应的0.8万个第四矢量。第一神经网络模型可以是基于变换器的双向编码表示模型(bidirectional encoder representations from transformer,BERT)模型。BERT模型可以是一种卷积神经网络模型。每个第四矢量可以包括768个元素,每个第四矢量为一条中文文本问题的特征表示。The first neural network model is used to process the training data set, and 8 million fourth vectors corresponding to the training data set are obtained. The first neural network model may be a bidirectional encoder representations from transformer (BERT) model based on a transformer. The BERT model can be a convolutional neural network model. Each fourth vector may include 768 elements, and each fourth vector is a feature representation of a Chinese text question.
获取先验假设。先验假设例如可以是以下中的一个或多个:Get a priori hypothesis. The a priori hypothesis can be one or more of the following, for example:
(1)文本中存在相同关键词的两条中文文本问题之间存在关联关系。(1) There is an association between two Chinese text questions with the same keywords in the text.
(2)文本相似度大于50%的两条中文文本问题之间存在关联关系。(2) There is a correlation between two Chinese text questions with text similarity greater than 50%.
应理解,先验假设的具体内容和第一神经网络模型、第二神经网络模型所应用的场景有关,在此不做限定。It should be understood that the specific content of the a priori hypothesis is related to the application scenarios of the first neural network model and the second neural network model, and is not limited here.
根据先验假设,可以确定用于指示0.8万个第四矢量之间的关联关系的第三关联关系信息。According to a priori hypothesis, the third association relationship information used to indicate the association relationship between 8 million fourth vectors can be determined.
将0.8万个第四矢量以及第三关联关系信息输入第二神经网络模型,得到针对第一类型数据的处理结果。其中,由于第一类型数据与第二类型数据之间可以有关联,因此第一类型数据的处理结果考虑到了第二类型数据的内容。The 8,000 fourth vectors and the third association relationship information are input into the second neural network model to obtain the processing result for the first type of data. Among them, since the first type of data may be associated with the second type of data, the processing result of the first type of data takes into account the content of the second type of data.
将第一类型数据的处理结果与第一类型数据的标签进行匹配,可以修正第二神经网络模型的参数。Matching the processing result of the first type of data with the label of the first type of data can modify the parameters of the second neural network model.
之后将验证数据集中的数据输入第一神经网络模型,得到针对验证数据集的多个第四矢量;再将针对验证数据集的多个第四矢量输入第二神经网络模型,并且根据先验假设, 向第二神经网络模型输入针对验证数据集的多个第四矢量之间的关联关系,得到针对验证数据集的数据处理结果。再将该数据处理结果与验证数据集的标签进行匹配,得到该第一神经网络模型和第二神经网络模型的识别能力。通过实际应用,采用平均精度均值(mean average precision,mAP)对训练后的神经网络模型打分,与传统的神经网络模型相比,打分结果可以提升10-15分。也就是说,本申请提供的训练神经网络模型的方法可以增强神经网络模型。Then input the data in the verification data set into the first neural network model to obtain multiple fourth vectors for the verification data set; then input multiple fourth vectors for the verification data set into the second neural network model, and according to a priori assumptions , Input the association relationship between the multiple fourth vectors for the verification data set to the second neural network model to obtain the data processing result for the verification data set. Then, the data processing result is matched with the label of the verification data set to obtain the recognition ability of the first neural network model and the second neural network model. Through practical applications, the mean average precision (mAP) is used to score the trained neural network model. Compared with the traditional neural network model, the scoring result can be improved by 10-15 points. In other words, the method for training a neural network model provided in this application can enhance the neural network model.
图8是本申请实施例提供的数据处理的设备的硬件结构示意图。图8所示的数据处理的设备700(该设备700具体可以是一种计算机设备)包括存储器701、处理器702、通信接口703以及总线704。其中,存储器701、处理器702、通信接口703通过总线704实现彼此之间的通信连接。FIG. 8 is a schematic diagram of the hardware structure of a data processing device provided by an embodiment of the present application. The data processing device 700 shown in FIG. 8 (the device 700 may specifically be a computer device) includes a memory 701, a processor 702, a communication interface 703, and a bus 704. Among them, the memory 701, the processor 702, and the communication interface 703 realize the communication connection between each other through the bus 704.
存储器701可以是只读存储器(read only memory,ROM),静态存储设备,动态存储设备或者随机存取存储器(random access memory,RAM)。存储器701可以存储程序,当存储器701中存储的程序被处理器702执行时,处理器702用于执行本申请实施例中图6所示的数据处理的方法的各个步骤。可选的,处理器702还用于执行本申请实施例中图7所示的训练神经网络模型的方法的各个步骤。The memory 701 may be a read only memory (ROM), a static storage device, a dynamic storage device, or a random access memory (RAM). The memory 701 may store a program. When the program stored in the memory 701 is executed by the processor 702, the processor 702 is configured to execute each step of the data processing method shown in FIG. 6 in the embodiment of the present application. Optionally, the processor 702 is further configured to execute each step of the method for training a neural network model shown in FIG. 7 in the embodiment of the present application.
处理器702可以采用通用的中央处理器(central processing unit,CPU),微处理器,应用专用集成电路(application specific integrated circuit,ASIC),图形处理器(graphics processing unit,GPU)或者一个或多个集成电路,用于执行相关程序,以实现本申请实施例中图6所示的数据处理的方法。可选的,处理器702可以采用通用的中央处理器(central processing unit,CPU),微处理器,应用专用集成电路(application specific integrated circuit,ASIC),图形处理器(graphics processing unit,GPU)或者一个或多个集成电路,用于执行相关程序,以实现本申请实施例中图7所示的训练神经网络模型的方法。The processor 702 may adopt a general central processing unit (CPU), a microprocessor, an application specific integrated circuit (ASIC), a graphics processing unit (GPU), or one or more The integrated circuit is used to execute related programs to implement the data processing method shown in FIG. 6 in the embodiment of the present application. Optionally, the processor 702 may adopt a general-purpose central processing unit (central processing unit, CPU), microprocessor, application specific integrated circuit (application specific integrated circuit, ASIC), graphics processing unit (GPU), or One or more integrated circuits are used to execute related programs to implement the method for training a neural network model shown in FIG. 7 in an embodiment of the present application.
处理器702还可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,本申请实施例中图6所示的数据处理的方法的各个步骤可以通过处理器702中的硬件的集成逻辑电路或者软件形式的指令完成。可选的,本申请实施例中图7所示的训练神经网络模型的方法的各个步骤可以通过处理器702中的硬件的集成逻辑电路或者软件形式的指令完成。The processor 702 may also be an integrated circuit chip with signal processing capability. In the implementation process, each step of the data processing method shown in FIG. 6 in the embodiment of the present application may be completed by an integrated logic circuit of hardware in the processor 702 or instructions in the form of software. Optionally, each step of the method for training a neural network model shown in FIG. 7 in the embodiment of the present application may be completed by an integrated logic circuit of hardware in the processor 702 or instructions in the form of software.
上述处理器702还可以是通用处理器、数字信号处理器(digital signal processing,DSP)、专用集成电路(ASIC)、现成可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器701,处理器702读取存储器701中的信息,结合其硬件完成本申请实施例的数据处理设备中包括的单元所需执行的功能,或者执行本申请实施例中图6所示的数据处理的方法。可选的,还用于执行本申请实施例中图7所示的训练神经网络模型的方法The aforementioned processor 702 may also be a general-purpose processor, a digital signal processing (digital signal processing, DSP), an application specific integrated circuit (ASIC), a ready-made programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, Discrete gates or transistor logic devices, discrete hardware components. The methods, steps, and logical block diagrams disclosed in the embodiments of the present application can be implemented or executed. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like. The steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor. The software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers. The storage medium is located in the memory 701, and the processor 702 reads the information in the memory 701, and combines its hardware to complete the functions required by the units included in the data processing device of the embodiment of the present application, or perform the functions shown in FIG. 6 in the embodiment of the present application. The method of data processing shown. Optionally, it is also used to execute the method for training a neural network model shown in FIG. 7 in the embodiment of the present application
通信接口703使用例如但不限于收发器一类的收发设备,来实现设备700与其他设备 或通信网络之间的通信。例如,可以通过通信接口703获取待构建的神经网络的信息以及构建神经网络过程中需要的待处理数据(如图6所示实施例中的待处理数据)。可选的,可以通过通信接口703获取待构建的神经网络的信息以及构建神经网络过程中需要的待训练数据(如图7所示实施例中的待训练数据)。The communication interface 703 uses a transceiver device such as but not limited to a transceiver to implement communication between the device 700 and other devices or communication networks. For example, the information of the neural network to be constructed and the data to be processed (the data to be processed in the embodiment shown in FIG. 6) can be obtained through the communication interface 703. Optionally, the information of the neural network to be constructed and the data to be trained (the data to be trained in the embodiment shown in FIG. 7) can be obtained through the communication interface 703.
总线704可包括在设备700各个部件(例如,存储器701、处理器702、通信接口703)之间传送信息的通路。The bus 704 may include a path for transferring information between various components of the device 700 (for example, the memory 701, the processor 702, and the communication interface 703).
应理解,数据处理设备中的获取模块可以相当于数据处理设备700中的通信接口703;数据处理设备中的处理模块可以相当于处理器702。It should be understood that the acquisition module in the data processing device may be equivalent to the communication interface 703 in the data processing device 700; the processing module in the data processing device may be equivalent to the processor 702.
图9是本申请实施例提供的训练神经网络模型的设备的硬件结构示意图。图9所示的训练神经网络模型的设备800(该设备800具体可以是一种计算机设备)包括存储器801、处理器802、通信接口803以及总线804。其中,存储器801、处理器802、通信接口803通过总线804实现彼此之间的通信连接。Fig. 9 is a schematic diagram of the hardware structure of a device for training a neural network model provided by an embodiment of the present application. The device 800 for training a neural network model shown in FIG. 9 (the device 800 may specifically be a computer device) includes a memory 801, a processor 802, a communication interface 803, and a bus 804. Among them, the memory 801, the processor 802, and the communication interface 803 realize the communication connection between each other through the bus 804.
存储器801可以是只读存储器(read only memory,ROM),静态存储设备,动态存储设备或者随机存取存储器(random access memory,RAM)。存储器801可以存储程序,当存储器801中存储的程序被处理器802执行时,处理器802用于执行本申请实施例中图7所示的训练神经网络模型的方法的各个步骤。The memory 801 may be a read only memory (ROM), a static storage device, a dynamic storage device, or a random access memory (RAM). The memory 801 may store a program. When the program stored in the memory 801 is executed by the processor 802, the processor 802 is configured to execute each step of the method for training a neural network model shown in FIG. 7 in the embodiment of the present application.
处理器802可以采用通用的中央处理器(central processing unit,CPU),微处理器,应用专用集成电路(application specific integrated circuit,ASIC),图形处理器(graphics processing unit,GPU)或者一个或多个集成电路,用于执行相关程序,以实现本申请实施例中图7所示的训练神经网络模型的方法。The processor 802 may adopt a general-purpose central processing unit (central processing unit, CPU), microprocessor, application specific integrated circuit (application specific integrated circuit, ASIC), graphics processing unit (graphics processing unit, GPU), or one or more The integrated circuit is used to execute related programs to implement the method for training a neural network model shown in FIG. 7 in the embodiment of the present application.
处理器802还可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,本申请实施例中图7所示的训练神经网络模型的方法的各个步骤可以通过处理器802中的硬件的集成逻辑电路或者软件形式的指令完成。The processor 802 may also be an integrated circuit chip with signal processing capability. In the implementation process, each step of the method for training a neural network model shown in FIG. 7 in the embodiment of the present application can be completed by an integrated logic circuit of hardware in the processor 802 or instructions in the form of software.
上述处理器802还可以是通用处理器、数字信号处理器(digital signal processing,DSP)、专用集成电路(ASIC)、现成可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器801,处理器802读取存储器801中的信息,结合其硬件完成本申请实施例的神经网络模型训练设备中包括的单元所需执行的功能,或者执行本申请实施例中图7所示的训练神经网络模型的方法。The aforementioned processor 802 may also be a general-purpose processor, a digital signal processing (digital signal processing, DSP), an application specific integrated circuit (ASIC), an off-the-shelf programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, Discrete gates or transistor logic devices, discrete hardware components. The methods, steps, and logical block diagrams disclosed in the embodiments of the present application can be implemented or executed. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like. The steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor. The software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers. The storage medium is located in the memory 801, and the processor 802 reads the information in the memory 801, and combines its hardware to complete the functions required by the units included in the neural network model training device of the embodiment of the present application, or execute the figure in the embodiment of the present application. 7 shows the method of training the neural network model.
通信接口803使用例如但不限于收发器一类的收发设备,来实现设备800与其他设备或通信网络之间的通信。例如,可以通过通信接口803获取待构建的神经网络的信息以及构建神经网络过程中需要的训练数据(如图7所示实施例中的待训练数据)。The communication interface 803 uses a transceiver device such as but not limited to a transceiver to implement communication between the device 800 and other devices or a communication network. For example, the information of the neural network to be constructed and the training data required in the process of constructing the neural network can be obtained through the communication interface 803 (the data to be trained in the embodiment shown in FIG. 7).
总线804可包括在设备800各个部件(例如,存储器801、处理器802、通信接口803)之间传送信息的通路。The bus 804 may include a path for transferring information between various components of the device 800 (for example, the memory 801, the processor 802, and the communication interface 803).
应理解,神经网络模型训练设备中的获取模块可以相当于神经网络模型训练设备800 中的通信接口803;神经网络模型训练设备中的处理模块可以相当于处理器802。It should be understood that the acquisition module in the neural network model training device may be equivalent to the communication interface 803 in the neural network model training device 800; the processing module in the neural network model training device may be equivalent to the processor 802.
应注意,尽管上述设备700、设备800仅仅示出了存储器、处理器、通信接口,但是在具体实现过程中,本领域的技术人员应当理解,设备700、设备800还可以包括实现正常运行所必须的其他器件。同时,根据具体需要,本领域的技术人员应当理解,设备700、设备800还可包括实现其他附加功能的硬件器件。此外,本领域的技术人员应当理解,设备700、设备800也可仅仅包括实现本申请实施例所必须的器件,而不必包括图8、图9中所示的全部器件。It should be noted that although the foregoing device 700 and device 800 only show memory, processor, and communication interface, in the specific implementation process, those skilled in the art should understand that the device 700 and device 800 may also include those necessary for normal operation. Other devices. At the same time, according to specific needs, those skilled in the art should understand that the device 700 and the device 800 may also include hardware devices that implement other additional functions. In addition, those skilled in the art should understand that the device 700 and the device 800 may also only include the components necessary to implement the embodiments of the present application, and not necessarily include all the components shown in FIGS. 8 and 9.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。A person of ordinary skill in the art may be aware that the units and algorithm steps of the examples described in combination with the embodiments disclosed herein can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and conciseness of description, the specific working process of the above-described system, device, and unit can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed system, device, and method may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or It can be integrated into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, the functional units in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the technical solution of this application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the method described in each embodiment of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disk or optical disk and other media that can store program code .
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。The above are only specific implementations of this application, but the protection scope of this application is not limited to this. Any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed in this application. Should be covered within the scope of protection of this application. Therefore, the protection scope of this application should be subject to the protection scope of the claims.

Claims (30)

  1. 一种数据处理的方法,其特征在于,包括:A data processing method, characterized in that it comprises:
    获取多个待处理数据;Obtain multiple data to be processed;
    使用第一神经网络模型对所述多个待处理数据进行处理,得到与所述多个待处理数据一一对应的多个第一矢量,其中,所述第一神经网络模型是基于通用数据训练获得;Use the first neural network model to process the plurality of to-be-processed data to obtain a plurality of first vectors corresponding to the plurality of to-be-processed data one-to-one, wherein the first neural network model is based on general data training obtain;
    获取第一关联关系信息,所述第一关联关系信息用于指示至少一个第一矢量组,每个第一矢量组包括满足先验假设的两个第一矢量;Acquiring first association relationship information, where the first association relationship information is used to indicate at least one first vector group, and each first vector group includes two first vectors that satisfy a priori hypothesis;
    将所述多个第一矢量以及所述第一关联关系信息输入第二神经网络模型,得到针对第一待处理数据的处理结果,所述第一待处理数据是所述多个待处理数据中的任一数据。The plurality of first vectors and the first association relationship information are input into a second neural network model to obtain a processing result for the first data to be processed, and the first data to be processed is among the plurality of data to be processed Any of the data.
  2. 根据权利要求1所述的方法,其特征在于,所述第一关联关系信息用于指示N个所述第一矢量组,N为大于1的整数,在所述将所述多个第一矢量以及所述第一关联关系信息输入第二神经网络模型,得到针对第一待处理数据的处理结果之前,所述方法还包括:The method according to claim 1, wherein the first association relationship information is used to indicate N of the first vector groups, where N is an integer greater than 1, and the first vector groups are And before the first association relationship information is input into the second neural network model, and the processing result for the first to-be-processed data is obtained, the method further includes:
    获取第二关联关系信息,所述第二关联关系信息用于指示n个第二矢量组,所述n个第二矢量组属于所述N个第一矢量组,n小于N,且n为正整数;Acquire second association relationship information, where the second association relationship information is used to indicate n second vector groups, the n second vector groups belong to the N first vector groups, n is less than N, and n is positive Integer
    所述将所述多个第一矢量以及所述第一关联关系信息输入第二神经网络模型,得到针对第一待处理数据的处理结果,包括:The inputting the multiple first vectors and the first association relationship information into a second neural network model to obtain a processing result for the first data to be processed includes:
    将所述多个第一矢量、所述第一关联关系信息以及所述第二关联关系信息输入所述第二神经网络模型,得到针对所述第一待处理数据的处理结果。The plurality of first vectors, the first association relationship information, and the second association relationship information are input into the second neural network model to obtain a processing result for the first to-be-processed data.
  3. 根据权利要求1或2所述的方法,其特征在于,所述获取多个待处理数据,包括:The method according to claim 1 or 2, wherein the obtaining multiple pieces of data to be processed comprises:
    获取目标数据,所述目标数据为所述多个待处理数据中的一个;Acquiring target data, where the target data is one of the multiple to-be-processed data;
    获取关联数据,所述关联数据与所述目标数据之间具有满足所述先验假设的关联关系,所述多个待处理数据包括所述关联数据。Obtain associated data, where the associated data and the target data have an associated relationship that satisfies the a priori hypothesis, and the plurality of data to be processed includes the associated data.
  4. 根据权利要求1至3中任一项所述的方法,其特征在于,所述第一关联关系信息包括关联关系矩阵,所述关联关系矩阵中位于第一维度的矢量包括与所述多个第一矢量一一对应的多个元素,所述关联关系矩阵中位于第二维度的矢量包括与所述多个第一矢量一一对应的多个元素,其中,所述关联关系矩阵中任一元素用于指示所述任一元素在所述第一维度上对应的矢量与所述任一元素在所述第二维度上对应的矢量之间是否具有满足所述先验假设的关联关系。The method according to any one of claims 1 to 3, wherein the first association relationship information includes an association relationship matrix, and a vector located in the first dimension in the association relationship matrix includes a relationship with the plurality of A vector corresponds to a plurality of elements in a one-to-one relationship, and the vector in the second dimension in the correlation matrix includes a plurality of elements corresponding to the plurality of first vectors one-to-one, wherein any element in the correlation matrix It is used to indicate whether the vector corresponding to any element in the first dimension and the vector corresponding to any element in the second dimension have an association relationship that satisfies the a priori hypothesis.
  5. 根据权利要求1至4中任一项所述的方法,其特征在于,所述第二神经网络模型的权重参数是通过以下方式获得:The method according to any one of claims 1 to 4, wherein the weight parameter of the second neural network model is obtained in the following manner:
    获取多个待训练数据;Obtain multiple data to be trained;
    使用所述第一神经网络模型对所述多个待训练数据进行处理,得到与所述多个待训练数据一一对应的多个第四矢量;Using the first neural network model to process the plurality of to-be-trained data to obtain a plurality of fourth vectors corresponding to the plurality of to-be-trained data;
    获取第三关联关系信息,所述第三关联关系信息用于指示至少一个第三矢量组,每个第三矢量组包括满足所述先验假设的两个第四矢量;Acquiring third association relationship information, where the third association relationship information is used to indicate at least one third vector group, and each third vector group includes two fourth vectors that satisfy the a priori hypothesis;
    将所述多个第四矢量以及所述第三关联关系信息输入所述第二神经网络模型,得到针对第一待训练数据的第一处理结果,所述第一待训练数据是所述多个待训练数据中的任一 数据,所述第一处理结果用于修正所述第二神经网络模型的权重参数。The plurality of fourth vectors and the third association relationship information are input into the second neural network model to obtain a first processing result for the first to-be-trained data, and the first to-be-trained data is the plurality of For any data in the data to be trained, the first processing result is used to modify the weight parameter of the second neural network model.
  6. 根据权利要求5所述的方法,其特征在于,所述得到针对第一待训练数据的第一处理结果,包括:The method according to claim 5, wherein said obtaining a first processing result for the first data to be trained comprises:
    得到所述第一处理结果以及针对第二待训练数据的第二处理结果,所述第一待训练数据的标签为第一标签,所述第二待训练数据的标签为第二标签,所述第一待训练数据与所述第二待训练数据为所述多个待训练数据中的任意两个数据;Obtain the first processing result and the second processing result for the second data to be trained, the label of the first data to be trained is the first label, the label of the second data to be trained is the second label, The first data to be trained and the second data to be trained are any two data of the plurality of data to be trained;
    所述方法还包括:The method also includes:
    将所述第一标签与所述第二标签之间的相似度同所述第一处理结果与所述第二处理结果之间的相似度进行匹配,得到匹配结果,所述匹配结果用于修正所述第二神经网络模型的权重参数。The similarity between the first label and the second label is matched with the similarity between the first processing result and the second processing result to obtain a matching result, and the matching result is used for correction The weight parameter of the second neural network model.
  7. 根据权利要求5或6所述的方法,其特征在于,所述第三关联关系信息用于指示M个第三矢量组,M为大于1的整数,在所述将所述多个第四矢量以及所述第三关联关系信息输入所述第二神经网络模型,得到针对第一待训练数据的第一处理结果之前,所述方法还包括:The method according to claim 5 or 6, wherein the third association relationship information is used to indicate M third vector groups, and M is an integer greater than 1, and the fourth vector group is And before the third association relationship information is input to the second neural network model, and the first processing result for the first data to be trained is obtained, the method further includes:
    获取第四关联关系信息,所述第四关联关系信息用于指示m个第四矢量组,所述m个第四矢量组属于所述M个第三矢量组,m小于M,且m为正整数;Acquire fourth association relationship information, where the fourth association relationship information is used to indicate m fourth vector groups, the m fourth vector groups belong to the M third vector groups, m is less than M, and m is positive Integer
    所述将所述多个第四矢量以及所述第三关联关系信息输入所述第二神经网络模型,得到针对第一待训练数据的第一处理结果,包括:The inputting the plurality of fourth vectors and the third association relationship information into the second neural network model to obtain a first processing result for the first data to be trained includes:
    将所述多个第四矢量、所述第三关联关系信息、第四关联关系信息输入所述第二神经网络模型,得到所述第一处理结果。The plurality of fourth vectors, the third association relationship information, and the fourth association relationship information are input into the second neural network model to obtain the first processing result.
  8. 根据权利要求5至7中任一项所述的方法,其特征在于,所述第一处理结果还用于修正所述第一神经网络模型的权重参数。The method according to any one of claims 5 to 7, wherein the first processing result is further used to modify the weight parameter of the first neural network model.
  9. 根据权利要求5至8中任一项所述的方法,其特征在于,所述多个待训练数据包括一个或多个目标类型数据,每个目标类型数据具有用于修正所述权重参数的标签。The method according to any one of claims 5 to 8, wherein the plurality of data to be trained includes one or more target type data, and each target type data has a label for modifying the weight parameter .
  10. 一种训练神经网络模型的方法,其特征在于,包括:A method for training a neural network model, characterized in that it includes:
    获取多个待训练数据;Obtain multiple data to be trained;
    使用第一神经网络模型对所述多个待训练数据进行处理,得到与所述多个待训练数据一一对应的多个第四矢量;Using the first neural network model to process the plurality of data to be trained to obtain a plurality of fourth vectors corresponding to the plurality of data to be trained one-to-one;
    获取第三关联关系信息,所述第三关联关系信息用于指示至少一个第三矢量组,每个第三矢量组包括满足所述先验假设的两个第四矢量;Acquiring third association relationship information, where the third association relationship information is used to indicate at least one third vector group, and each third vector group includes two fourth vectors that satisfy the a priori hypothesis;
    将所述多个第四矢量以及所述第三关联关系信息输入第二神经网络模型,得到针对第一待训练数据的第一处理结果,所述第一待训练数据是所述多个待训练数据中的任一数据,所述第一处理结果用于修正所述第二神经网络模型的权重参数。The plurality of fourth vectors and the third association relationship information are input into a second neural network model to obtain a first processing result for the first data to be trained, and the first data to be trained is the plurality of data to be trained For any data in the data, the first processing result is used to modify the weight parameter of the second neural network model.
  11. 根据权利要求10所述的方法,其特征在于,所述得到针对第一待训练数据的第一处理结果,包括:The method according to claim 10, wherein the obtaining a first processing result for the first data to be trained comprises:
    得到所述第一处理结果以及针对第二待训练数据的第二处理结果,所述第一待训练数据的标签为第一标签,所述第二待训练数据的标签为第二标签,所述第一待训练数据与所述第二待训练数据为所述多个待训练数据中的任意两个数据;Obtain the first processing result and the second processing result for the second data to be trained, the label of the first data to be trained is the first label, the label of the second data to be trained is the second label, The first data to be trained and the second data to be trained are any two data of the plurality of data to be trained;
    所述方法还包括:The method also includes:
    将所述第一标签与所述第二标签之间的相似度同所述第一处理结果与所述第二处理结果之间的相似度进行匹配,得到匹配结果,所述匹配结果用于修正所述第二神经网络模型的权重参数。The similarity between the first label and the second label is matched with the similarity between the first processing result and the second processing result to obtain a matching result, and the matching result is used for correction The weight parameter of the second neural network model.
  12. 根据权利要求10或11所述的方法,其特征在于,所述第三关联关系信息用于指示M个第三矢量组,在所述将所述多个第四矢量以及所述第三关联关系信息输入所述第二神经网络模型,得到针对第一待训练数据的第一处理结果之前,所述方法还包括:The method according to claim 10 or 11, wherein the third association relationship information is used to indicate M third vector groups, and in the combination of the plurality of fourth vectors and the third association relationship Before the information is input to the second neural network model and the first processing result for the first data to be trained is obtained, the method further includes:
    获取第四关联关系信息,所述第四关联关系信息用于指示m个第四矢量组,所述m个第四矢量组属于所述M个第三矢量组,m小于M,且m为正整数;Acquire fourth association relationship information, where the fourth association relationship information is used to indicate m fourth vector groups, the m fourth vector groups belong to the M third vector groups, m is less than M, and m is positive Integer
    所述将所述多个第四矢量以及所述第三关联关系信息输入所述第二神经网络模型,得到针对第一待训练数据的第一处理结果,包括:The inputting the plurality of fourth vectors and the third association relationship information into the second neural network model to obtain a first processing result for the first data to be trained includes:
    将所述多个第四矢量、所述第三关联关系信息、第四关联关系信息输入所述第二神经网络模型,得到所述第一处理结果。The plurality of fourth vectors, the third association relationship information, and the fourth association relationship information are input into the second neural network model to obtain the first processing result.
  13. 根据权利要求10至12中任一项所述的方法,其特征在于,所述第一处理结果还用于修正所述第一神经网络模型的权重参数。The method according to any one of claims 10 to 12, wherein the first processing result is further used to modify the weight parameter of the first neural network model.
  14. 根据权利要求10至13中任一项所述的方法,其特征在于,所述多个待训练数据包括一个或多个目标类型数据,每个目标类型数据具有用于修正所述权重参数的标签。The method according to any one of claims 10 to 13, wherein the plurality of data to be trained includes one or more target type data, and each target type data has a label used to modify the weight parameter .
  15. 一种数据处理的设备,其特征在于,包括:A data processing device, characterized in that it comprises:
    获取模块,用于获取多个待处理数据;The acquisition module is used to acquire multiple data to be processed;
    处理模块,用于使用第一神经网络模型对所述多个待处理数据进行处理,得到与所述多个待处理数据一一对应的多个第一矢量,其中,所述第一神经网络模型是基于通用数据训练获得;The processing module is configured to process the plurality of to-be-processed data using a first neural network model to obtain a plurality of first vectors corresponding to the plurality of to-be-processed data one-to-one, wherein the first neural network model Is obtained based on general data training;
    所述获取模块还用于,获取第一关联关系信息,所述第一关联关系信息用于指示至少一个第一矢量组,每个第一矢量组包括满足先验假设的两个第一矢量;The acquiring module is further configured to acquire first association relationship information, where the first association relationship information is used to indicate at least one first vector group, and each first vector group includes two first vectors satisfying a priori assumption;
    所述处理模块还用于,将所述多个第一矢量以及所述第一关联关系信息输入第二神经网络模型,得到针对第一待处理数据的处理结果,所述第一待处理数据是所述多个待处理数据中的任一数据。The processing module is further configured to input the multiple first vectors and the first association relationship information into a second neural network model to obtain a processing result for the first data to be processed, and the first data to be processed is Any one of the plurality of data to be processed.
  16. 根据权利要求15所述的设备,其特征在于,所述第一关联关系信息用于指示N个所述第一矢量组,N为大于1的整数,在所述处理模块将所述多个第一矢量以及所述第一关联关系信息输入第二神经网络模型,得到针对第一待处理数据的处理结果之前,The device according to claim 15, wherein the first association relationship information is used to indicate N of the first vector groups, and N is an integer greater than 1, and the processing module combines the plurality of first vector groups A vector and the first association relationship information are input into the second neural network model, and before the processing result for the first data to be processed is obtained,
    所述获取模块还用于,获取第二关联关系信息,所述第二关联关系信息用于指示n个第二矢量组,所述n个第二矢量组属于所述N个第一矢量组,n小于N,且n为正整数;The acquiring module is further configured to acquire second association relationship information, where the second association relationship information is used to indicate n second vector groups, and the n second vector groups belong to the N first vector groups, n is less than N, and n is a positive integer;
    所述处理模块具体用于,将所述多个第一矢量、所述第一关联关系信息以及所述第二关联关系信息输入所述第二神经网络模型,得到针对所述第一待处理数据的处理结果。The processing module is specifically configured to input the plurality of first vectors, the first association relationship information, and the second association relationship information into the second neural network model to obtain information about the first to-be-processed data The processing result.
  17. 根据权利要求15或16所述的设备,其特征在于,所述获取模块具体用于:The device according to claim 15 or 16, wherein the acquisition module is specifically configured to:
    获取目标数据,所述目标数据为所述多个待处理数据中的一个;Acquiring target data, where the target data is one of the multiple to-be-processed data;
    获取关联数据,所述关联数据与所述目标数据之间具有满足所述先验假设的关联关系,所述多个待处理数据包括所述关联数据。Obtain associated data, where the associated data and the target data have an associated relationship that satisfies the a priori hypothesis, and the plurality of data to be processed includes the associated data.
  18. 根据权利要求15至17中任一项所述的设备,其特征在于,所述第一关联关系信息包括关联关系矩阵,所述关联关系矩阵中位于第一维度的矢量包括与所述多个第一矢量 一一对应的多个元素,所述关联关系矩阵中位于第二维度的矢量包括与所述多个第一矢量一一对应的多个元素,其中,所述关联关系矩阵中任一元素用于指示所述任一元素在所述第一维度上对应的矢量与所述任一元素在所述第二维度上对应的矢量之间是否具有满足所述先验假设的关联关系。The device according to any one of claims 15 to 17, wherein the first association relationship information includes an association relationship matrix, and a vector located in the first dimension in the association relationship matrix includes a relationship with the plurality of A vector corresponds to multiple elements one-to-one, the vector in the second dimension in the correlation matrix includes multiple elements corresponding to the multiple first vectors one-to-one, and any element in the correlation matrix It is used to indicate whether the vector corresponding to any element in the first dimension and the vector corresponding to any element in the second dimension have an association relationship that satisfies the a priori hypothesis.
  19. 根据权利要求15至18中任一项所述的设备,其特征在于,The device according to any one of claims 15 to 18, characterized in that:
    所述获取模块还用于,获取多个待训练数据;The acquisition module is also used to acquire a plurality of data to be trained;
    所述处理模块还用于,使用所述第一神经网络模型对所述多个待训练数据进行处理,得到与所述多个待训练数据一一对应的多个第四矢量;The processing module is further configured to use the first neural network model to process the plurality of to-be-trained data to obtain a plurality of fourth vectors corresponding to the plurality of to-be-trained data in a one-to-one correspondence;
    所述获取模块还用于,获取第三关联关系信息,所述第三关联关系信息用于指示至少一个第三矢量组,每个第三矢量组包括满足所述先验假设的两个第四矢量;The acquiring module is further configured to acquire third association relationship information, where the third association relationship information is used to indicate at least one third vector group, and each third vector group includes two fourth vector groups that satisfy the a priori assumption. Vector
    所述处理模块还用于,将所述多个第四矢量以及所述第三关联关系信息输入所述第二神经网络模型,得到针对第一待训练数据的第一处理结果,所述第一待训练数据是所述多个待训练数据中的任一数据,所述第一处理结果用于修正所述第二神经网络模型的权重参数。The processing module is further configured to input the plurality of fourth vectors and the third association relationship information into the second neural network model to obtain a first processing result for the first data to be trained, the first The data to be trained is any data of the plurality of data to be trained, and the first processing result is used to modify the weight parameter of the second neural network model.
  20. 根据权利要求19所述的设备,其特征在于,The device according to claim 19, wherein:
    所述处理模块具体用于,得到所述第一处理结果以及针对第二待训练数据的第二处理结果,所述第一待训练数据的标签为第一标签,所述第二待训练数据的标签为第二标签;The processing module is specifically configured to obtain the first processing result and the second processing result for the second data to be trained, the label of the first data to be trained is the first label, and the label of the second data to be trained The label is the second label;
    所述处理模块还用于,将所述第一标签与所述第二标签之间的相似度同所述第一处理结果与所述第二处理结果之间的相似度进行匹配,得到匹配结果,所述匹配结果用于修正所述第二神经网络模型的权重参数。The processing module is further configured to match the similarity between the first label and the second label with the similarity between the first processing result and the second processing result to obtain a matching result , The matching result is used to modify the weight parameter of the second neural network model.
  21. 根据权利要求19或20所述的设备,其特征在于,所述第三关联关系信息用于指示M个第三矢量组,M为大于1的整数,在所述处理模块将所述多个第四矢量以及所述第三关联关系信息输入所述第二神经网络模型,得到针对第一待训练数据的第一处理结果之前,The device according to claim 19 or 20, wherein the third association relationship information is used to indicate M third vector groups, M is an integer greater than 1, and the processing module combines the multiple Before the four vectors and the third association relationship information are input to the second neural network model, and the first processing result for the first data to be trained is obtained,
    所述获取模块还用于,获取第四关联关系信息,所述第四关联关系信息用于指示m个第四矢量组,所述m个第四矢量组属于所述M个第三矢量组,m小于M,且m为正整数;The acquiring module is further configured to acquire fourth association relationship information, where the fourth association relationship information is used to indicate m fourth vector groups, and the m fourth vector groups belong to the M third vector groups, m is less than M, and m is a positive integer;
    所述处理模块具体用于,将所述多个第四矢量、所述第三关联关系信息、第四关联关系信息输入所述第二神经网络模型,得到所述第一处理结果。The processing module is specifically configured to input the plurality of fourth vectors, the third association relationship information, and the fourth association relationship information into the second neural network model to obtain the first processing result.
  22. 根据权利要求19至21中任一项所述的设备,其特征在于,所述第一处理结果还用于修正所述第一神经网络模型的权重参数。The device according to any one of claims 19 to 21, wherein the first processing result is also used to modify the weight parameter of the first neural network model.
  23. 根据权利要求19至22中任一项所述的设备,其特征在于,所述多个待训练数据包括一个或多个目标类型数据,每个目标类型数据具有用于修正所述权重参数的标签。The device according to any one of claims 19 to 22, wherein the plurality of data to be trained includes one or more target type data, and each target type data has a label for modifying the weight parameter .
  24. 一种训练神经网络模型的设备,其特征在于,包括:A device for training a neural network model, characterized in that it comprises:
    获取模块,用于获取多个待训练数据;The acquisition module is used to acquire multiple data to be trained;
    处理模块,用于使用第一神经网络模型对所述多个待训练数据进行处理,得到与所述多个待训练数据一一对应的多个第四矢量;A processing module, configured to use the first neural network model to process the plurality of data to be trained to obtain a plurality of fourth vectors corresponding to the plurality of data to be trained one to one;
    所述获取模块,还用于获取第三关联关系信息,所述第三关联关系信息用于指示至少一个第三矢量组,每个第三矢量组包括满足所述先验假设的两个第四矢量;The acquiring module is further configured to acquire third association relationship information, where the third association relationship information is used to indicate at least one third vector group, and each third vector group includes two fourth vector groups satisfying the a priori assumption. Vector
    所述处理模块,还用于将所述多个第四矢量以及所述第三关联关系信息输入第二神经网络模型,得到针对第一待训练数据的第一处理结果,所述第一待训练数据是所述多个待训练数据中的任一数据,所述第一处理结果用于修正所述第二神经网络模型的权重参数。The processing module is further configured to input the plurality of fourth vectors and the third association relationship information into a second neural network model to obtain a first processing result for the first data to be trained. The data is any data of the plurality of data to be trained, and the first processing result is used to modify the weight parameter of the second neural network model.
  25. 根据权利要求24所述的设备,其特征在于,所述处理模块具体用于,得到所述第一处理结果以及针对第二待训练数据的第二处理结果,所述第一待训练数据的标签为第一标签,所述第二待训练数据的标签为第二标签;The device according to claim 24, wherein the processing module is specifically configured to obtain the first processing result and the second processing result for the second data to be trained, the label of the first data to be trained Is the first label, and the label of the second data to be trained is the second label;
    所述处理模块,还用于将所述第一标签与所述第二标签之间的相似度同所述第一处理结果与所述第二处理结果之间的相似度进行匹配,得到匹配结果,所述匹配结果用于修正所述第二神经网络模型的权重参数。The processing module is further configured to match the similarity between the first label and the second label with the similarity between the first processing result and the second processing result to obtain a matching result , The matching result is used to modify the weight parameter of the second neural network model.
  26. 根据权利要求24或25所述的设备,其特征在于,所述第三关联关系信息用于指示M个第三矢量组,在所述处理模块用于将所述多个第四矢量以及所述第三关联关系信息输入所述第二神经网络模型,得到针对第一待训练数据的第一处理结果之前,The device according to claim 24 or 25, wherein the third association relationship information is used to indicate M third vector groups, and the processing module is used to combine the multiple fourth vectors and the Before the third association relationship information is input to the second neural network model and the first processing result for the first data to be trained is obtained,
    所述获取模块还用于,获取第四关联关系信息,所述第四关联关系信息用于指示m个第四矢量组,所述m个第四矢量组属于所述M个第三矢量组,m小于M,且m为正整数;The acquiring module is further configured to acquire fourth association relationship information, where the fourth association relationship information is used to indicate m fourth vector groups, and the m fourth vector groups belong to the M third vector groups, m is less than M, and m is a positive integer;
    所述处理模块具体用于,将所述多个第四矢量、所述第三关联关系信息、第四关联关系信息输入所述第二神经网络模型,得到所述第一处理结果。The processing module is specifically configured to input the plurality of fourth vectors, the third association relationship information, and the fourth association relationship information into the second neural network model to obtain the first processing result.
  27. 根据权利要求24至26中任一项所述的设备,其特征在于,所述第一处理结果还用于修正所述第一神经网络模型的权重参数。The device according to any one of claims 24 to 26, wherein the first processing result is further used to modify the weight parameter of the first neural network model.
  28. 根据权利要求24至27中任一项所述的设备,其特征在于,所述多个待训练数据包括一个或多个目标类型数据,每个目标类型数据具有用于修正所述权重参数的标签。The device according to any one of claims 24 to 27, wherein the plurality of data to be trained includes one or more target type data, and each target type data has a label for modifying the weight parameter .
  29. 一种计算机可读存储介质,其特征在于,所述计算机可读介质存储用于设备执行的程序代码,该程序代码包括用于执行如权利要求1-14中任一项所述的方法。A computer-readable storage medium, wherein the computer-readable medium stores program code for device execution, and the program code includes a method for executing the method according to any one of claims 1-14.
  30. 一种芯片,其特征在于,所述芯片包括处理器与数据接口,所述处理器通过所述数据接口读取存储器上存储的指令,以执行如权利要求1-14中任一项所述的方法。A chip, characterized in that, the chip comprises a processor and a data interface, and the processor reads instructions stored on a memory through the data interface to execute the method according to any one of claims 1-14 method.
PCT/CN2019/099653 2019-08-07 2019-08-07 Method for processing data, and method and device for training neural network model WO2021022521A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201980010339.0A CN112639828A (en) 2019-08-07 2019-08-07 Data processing method, method and equipment for training neural network model
PCT/CN2019/099653 WO2021022521A1 (en) 2019-08-07 2019-08-07 Method for processing data, and method and device for training neural network model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/099653 WO2021022521A1 (en) 2019-08-07 2019-08-07 Method for processing data, and method and device for training neural network model

Publications (1)

Publication Number Publication Date
WO2021022521A1 true WO2021022521A1 (en) 2021-02-11

Family

ID=74503009

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/099653 WO2021022521A1 (en) 2019-08-07 2019-08-07 Method for processing data, and method and device for training neural network model

Country Status (2)

Country Link
CN (1) CN112639828A (en)
WO (1) WO2021022521A1 (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112926569A (en) * 2021-03-16 2021-06-08 重庆邮电大学 Method for detecting natural scene image text in social network
CN112950291A (en) * 2021-03-31 2021-06-11 北京奇艺世纪科技有限公司 Model deviation optimization method, device, equipment and computer readable medium
CN112989134A (en) * 2021-03-29 2021-06-18 腾讯科技(深圳)有限公司 Node relation graph processing method, device, equipment and storage medium
CN113095870A (en) * 2021-03-16 2021-07-09 支付宝(杭州)信息技术有限公司 Prediction method, prediction device, computer equipment and storage medium
CN113194458A (en) * 2021-04-08 2021-07-30 南京中新赛克科技有限责任公司 Multi-card treasure number identification method and device
CN113222328A (en) * 2021-03-25 2021-08-06 中国科学技术大学先进技术研究院 Air quality monitoring equipment point arrangement and site selection method based on road section pollution similarity
CN113239844A (en) * 2021-05-26 2021-08-10 哈尔滨理工大学 Intelligent cosmetic mirror system based on multi-head attention target detection
CN113505193A (en) * 2021-06-01 2021-10-15 华为技术有限公司 Data processing method and related equipment
CN113724036A (en) * 2021-07-29 2021-11-30 阿里巴巴(中国)有限公司 Method and electronic equipment for providing question consultation service
CN114529765A (en) * 2022-02-16 2022-05-24 腾讯科技(深圳)有限公司 Data processing method, data processing equipment and computer readable storage medium
CN114863162A (en) * 2022-03-28 2022-08-05 北京百度网讯科技有限公司 Object classification method, deep learning model training method, device and equipment
CN115396831A (en) * 2021-05-08 2022-11-25 中国移动通信集团浙江有限公司 Interaction model generation method, device, equipment and storage medium
WO2023011237A1 (en) * 2021-08-04 2023-02-09 支付宝(杭州)信息技术有限公司 Service processing
CN116935230A (en) * 2023-09-13 2023-10-24 山东建筑大学 Crop pest identification method, device, equipment and medium

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113518035B (en) * 2021-05-26 2023-01-31 香港中文大学(深圳) Route determining method and device
CN113360659B (en) * 2021-07-19 2022-11-22 云南大学 Cross-domain emotion classification method and system based on semi-supervised learning
CN113642807B (en) * 2021-09-01 2022-04-12 智慧足迹数据科技有限公司 Population mobility prediction method and related device
CN114238692A (en) * 2022-02-23 2022-03-25 北京嘉沐安科技有限公司 Network live broadcast-oriented video big data accurate retrieval method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106897746A (en) * 2017-02-28 2017-06-27 北京京东尚科信息技术有限公司 Data classification model training method and device
CN107766324A (en) * 2017-09-25 2018-03-06 浙江大学 A kind of text coherence analysis method based on deep neural network
US20190139622A1 (en) * 2017-08-03 2019-05-09 Zymergen, Inc. Graph neural networks for representing microorganisms
CN109766840A (en) * 2019-01-10 2019-05-17 腾讯科技(深圳)有限公司 Facial expression recognizing method, device, terminal and storage medium
CN110083829A (en) * 2019-04-03 2019-08-02 平安科技(深圳)有限公司 Feeling polarities analysis method and relevant apparatus

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106897746A (en) * 2017-02-28 2017-06-27 北京京东尚科信息技术有限公司 Data classification model training method and device
US20190139622A1 (en) * 2017-08-03 2019-05-09 Zymergen, Inc. Graph neural networks for representing microorganisms
CN107766324A (en) * 2017-09-25 2018-03-06 浙江大学 A kind of text coherence analysis method based on deep neural network
CN109766840A (en) * 2019-01-10 2019-05-17 腾讯科技(深圳)有限公司 Facial expression recognizing method, device, terminal and storage medium
CN110083829A (en) * 2019-04-03 2019-08-02 平安科技(深圳)有限公司 Feeling polarities analysis method and relevant apparatus

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113095870A (en) * 2021-03-16 2021-07-09 支付宝(杭州)信息技术有限公司 Prediction method, prediction device, computer equipment and storage medium
CN112926569B (en) * 2021-03-16 2022-10-18 重庆邮电大学 Method for detecting natural scene image text in social network
CN112926569A (en) * 2021-03-16 2021-06-08 重庆邮电大学 Method for detecting natural scene image text in social network
CN113222328B (en) * 2021-03-25 2022-02-25 中国科学技术大学先进技术研究院 Air quality monitoring equipment point arrangement and site selection method based on road section pollution similarity
CN113222328A (en) * 2021-03-25 2021-08-06 中国科学技术大学先进技术研究院 Air quality monitoring equipment point arrangement and site selection method based on road section pollution similarity
CN112989134A (en) * 2021-03-29 2021-06-18 腾讯科技(深圳)有限公司 Node relation graph processing method, device, equipment and storage medium
CN112989134B (en) * 2021-03-29 2023-08-25 腾讯科技(深圳)有限公司 Processing method, device, equipment and storage medium of node relation graph
CN112950291A (en) * 2021-03-31 2021-06-11 北京奇艺世纪科技有限公司 Model deviation optimization method, device, equipment and computer readable medium
CN113194458A (en) * 2021-04-08 2021-07-30 南京中新赛克科技有限责任公司 Multi-card treasure number identification method and device
CN113194458B (en) * 2021-04-08 2022-05-13 南京中新赛克科技有限责任公司 Multi-card treasure number identification method and device
CN115396831A (en) * 2021-05-08 2022-11-25 中国移动通信集团浙江有限公司 Interaction model generation method, device, equipment and storage medium
CN113239844A (en) * 2021-05-26 2021-08-10 哈尔滨理工大学 Intelligent cosmetic mirror system based on multi-head attention target detection
CN113505193A (en) * 2021-06-01 2021-10-15 华为技术有限公司 Data processing method and related equipment
CN113724036A (en) * 2021-07-29 2021-11-30 阿里巴巴(中国)有限公司 Method and electronic equipment for providing question consultation service
WO2023011237A1 (en) * 2021-08-04 2023-02-09 支付宝(杭州)信息技术有限公司 Service processing
CN114529765A (en) * 2022-02-16 2022-05-24 腾讯科技(深圳)有限公司 Data processing method, data processing equipment and computer readable storage medium
CN114863162A (en) * 2022-03-28 2022-08-05 北京百度网讯科技有限公司 Object classification method, deep learning model training method, device and equipment
CN116935230A (en) * 2023-09-13 2023-10-24 山东建筑大学 Crop pest identification method, device, equipment and medium
CN116935230B (en) * 2023-09-13 2023-12-15 山东建筑大学 Crop pest identification method, device, equipment and medium

Also Published As

Publication number Publication date
CN112639828A (en) 2021-04-09

Similar Documents

Publication Publication Date Title
WO2021022521A1 (en) Method for processing data, and method and device for training neural network model
WO2020221200A1 (en) Neural network construction method, image processing method and devices
US20220092351A1 (en) Image classification method, neural network training method, and apparatus
WO2021043168A1 (en) Person re-identification network training method and person re-identification method and apparatus
CN110084281B (en) Image generation method, neural network compression method, related device and equipment
WO2021190451A1 (en) Method and apparatus for training image processing model
WO2021057056A1 (en) Neural architecture search method, image processing method and device, and storage medium
WO2021042828A1 (en) Neural network model compression method and apparatus, and storage medium and chip
WO2021120719A1 (en) Neural network model update method, and image processing method and device
WO2021043112A1 (en) Image classification method and apparatus
WO2021218517A1 (en) Method for acquiring neural network model, and image processing method and apparatus
WO2021164772A1 (en) Method for training cross-modal retrieval model, cross-modal retrieval method, and related device
WO2022001805A1 (en) Neural network distillation method and device
US20220375213A1 (en) Processing Apparatus and Method and Storage Medium
WO2022052601A1 (en) Neural network model training method, and image processing method and device
WO2021008206A1 (en) Neural architecture search method, and image processing method and device
CN110222718B (en) Image processing method and device
WO2021018245A1 (en) Image classification method and apparatus
WO2021073311A1 (en) Image recognition method and apparatus, computer-readable storage medium and chip
WO2021018251A1 (en) Image classification method and device
WO2021227787A1 (en) Neural network predictor training method and apparatus, and image processing method and apparatus
CN113807399A (en) Neural network training method, neural network detection method and neural network detection device
WO2022007867A1 (en) Method and device for constructing neural network
CN113326930A (en) Data processing method, neural network training method, related device and equipment
WO2021129668A1 (en) Neural network training method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19940931

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19940931

Country of ref document: EP

Kind code of ref document: A1