WO2021258967A1

WO2021258967A1 - Neural network training method and device, and data acquisition method and device

Info

Publication number: WO2021258967A1
Application number: PCT/CN2021/096019
Authority: WO
Inventors: 韩亚洪; 姜品; 武阿明; 邵云峰; 齐美玉; 李秉帅
Original assignee: 华为技术有限公司
Priority date: 2020-06-24
Filing date: 2021-05-26
Publication date: 2021-12-30
Also published as: CN111898635A

Abstract

The present application discloses a neural network training method and device, and a data acquisition method and device in the field of artificial intelligence. The neural network training method comprises: acquiring training data; and training the neural network by using the training data, so that the neural network learns to decompose domain-invariant representation and domain-specific representation from the training data. By decomposing the domain-invariant representation and the domain-specific representation from the training data, the domain-invariant representation can be decoupled from the domain-specific representation, wherein the domain-specific representation refers to features characterizing a domain to which the training data belongs, and the domain-invariant representation refers to features irrelevant to the domain to which the training data belongs. As the neural network trained by the method of the present application uses domain-invariant representation obtained by feature decoupling to execute a task, the influence of domain-specific representation on the neural network is avoided, thereby improving the migration performance of the neural network between different domains.

Description

Neural network training method, data acquisition method and device

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on June 24, 2020, the application number is 202010594053.6, and the invention title is "Neural Network Training Method, Data Acquisition Method and Device", the entire content of which is incorporated by reference in In this application.

Technical field

This application relates to the field of artificial intelligence, in particular to a neural network training method, data acquisition method and device.

Background technique

Artificial intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge, and use knowledge to obtain the best results. In other words, artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a similar way to human intelligence. Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making. Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision-making and reasoning, human-computer interaction, recommendation and search, and basic AI theories.

For example, in computer vision-related application scenarios, neural networks trained by machine learning can be used to complete multiple tasks such as target classification/detection/recognition/segmentation/prediction. In many application scenarios, training samples and test samples are likely to come from different domains, which will cause problems for the practical application of neural networks. For example, in the application scenario of vehicle detection, the source domain data may be a traffic scene image taken on a sunny day, while the target domain data may be a traffic scene image taken on a foggy day. At this time, the target detection model trained with source domain data is difficult to achieve good results in the target domain data scenario. In order to solve this model application problem caused by the domain deviation between training samples and test samples, domain adaptation (DA) learning as an important research field of machine learning has received extensive attention in recent years.

Domain adaptive learning usually uses a distribution alignment method to align the probability distribution between the source domain and target domain data, so as to alleviate the adverse effects of domain deviation on the domain adaptive learning task. Since this distribution alignment process is only performed at the overall feature representation level, the domain adaptive learning task is inevitably affected by specific features in different fields. Therefore, the trained neural network still has the problem of poor migration performance.

Summary of the invention

This application provides a neural network training method, data acquisition method and device, which can better improve the migration performance of the neural network between different fields.

In a first aspect, a neural network training method is provided, including: obtaining training data; training the neural network using the training data, so that the neural network learns to decompose domain invariant features and domains from the training data Specific features; wherein, the domain specific features are features that characterize the domain to which the training data belongs, and the domain invariant features are features that have nothing to do with the domain to which the training data belongs.

By decomposing domain-invariant features and domain-specific features from training data, domain-invariant features can be decoupled from domain-specific features. Since the neural network obtained by the training method of the present application uses domain invariant features to perform tasks, the influence of domain-specific features on the neural network is avoided, thereby improving the migration performance of the neural network between different domains.

With reference to the first aspect of the present application, in a possible implementation manner, the training of the neural network using the training data includes: decomposing domain-invariant features and domain-specific features from the training data; The domain invariant feature performs a task to obtain a task loss, and the mutual information loss between the domain invariant feature and the domain specific feature is calculated, and the task loss is used to characterize the use of the domain invariant feature to perform the task. The difference between the obtained result and the task label, the mutual information loss is used to represent the difference between the domain invariant feature and the domain specific feature; according to the task loss and the mutual information loss, training the The neural network.

By training the neural network according to the task loss and mutual information loss, not only can the decomposed domain invariant features correspond to the instance more accurately, but also reduce the interaction between domain invariant features and domain specific features during the training process. Information loss to promote the complete decoupling of domain-invariant features and domain-specific features, and further reduce the influence of domain-specific features on domain-invariant features.

With reference to the first aspect of the present application, in a possible implementation manner, the method further includes: using the domain-specific features to perform domain classification to obtain a domain classification loss; wherein, according to the task loss and the Mutual information loss, training the neural network includes: training the neural network according to the task loss, the mutual information loss, and the domain classification loss.

By introducing domain classification loss, it is helpful to extract domain invariant features from the features of training data.

With reference to the first aspect of the present application, in a possible implementation manner, said decomposing domain-invariant features and domain-specific features from said training data includes: extracting initial features from said training data; The initial feature is decomposed into the domain invariant feature and the domain specific feature, wherein the method further includes: training the neural network to reduce the information contained in the initial feature and the domain invariant feature The difference between the information contained in common with the domain-specific features.

By reducing the difference between the information contained in the initial features and the information jointly contained in the domain invariant features and domain specific features, the decoupled domain invariant features and domain specific features can contain all the feature information of the training data. To improve the completeness and rationality of feature decoupling.

In combination with the first aspect of the present application, in a possible implementation manner, in the training of the neural network, the information contained in the initial feature, the domain invariant feature and the domain specific feature are reduced Before the difference between the commonly contained information, the method further includes: reconstructing the initial feature using the domain invariant feature and the domain specific feature to obtain a reconstructed feature; comparing the initial feature with the reconstructed feature Feature to determine the difference between the information contained in the initial feature and the information contained in the domain invariant feature and the domain specific feature.

Using the reconstruction loss to train the neural network can make the decoupled domain-invariant features and domain-specific features contain all the feature information of the training data, so as to improve the completeness and rationality of feature decoupling.

With reference to the first aspect of the present application, in a possible implementation manner, the method further includes: using the domain invariant feature and the domain specific feature to reconstruct the initial feature to obtain a reconstructed feature, wherein the domain The invariant feature and the domain-specific feature are features decomposed from the initial feature; the initial feature and the reconstruction feature are compared to obtain a reconstruction loss, and the reconstruction loss is used to characterize what the initial feature contains The difference between the information and the information jointly contained in the domain invariant feature and the domain specific feature, wherein the training the neural network according to the task loss and the mutual information loss includes: according to the Task loss, training the neural network in the first stage; training the neural network in the second stage based on the mutual information loss, wherein the method further includes: performing training on the neural network based on the reconstruction loss The neural network is trained in the third stage.

Carrying out the training process of the neural network in stages can simplify the amount of training in each stage and speed up the convergence speed of the parameters of the neural network.

With reference to the first aspect of the present application, in a possible implementation manner, the neural network includes a first decoupler and a second decoupler, and the domain invariant features and domains are decomposed from the training data. The specific feature includes: extracting the first feature of the training data from the training data; using the first decoupler to extract preliminary domain invariant features and preliminary domain specific features from the first feature; The preliminary domain invariant feature is fused with the first feature to obtain a second feature; the third feature of the training data is extracted from the second feature; the third feature is extracted from the third feature by using a second decoupler The domain invariant feature and the domain specific feature.

By first obtaining the first feature, and decoupling the preliminary domain invariant feature based on the first decoupler, the preliminary domain invariant feature is merged with the first feature to obtain the second feature, so that the domain invariant feature information is in the first The level of characteristics has been increased. Then use this second feature to decouple the domain invariant features based on the second decoupler. The decoupling accuracy of the domain invariant features is further enhanced, which can make the trained neural network perform better in task execution and domain adaptability. Also better.

With reference to the first aspect of the application, in a possible implementation manner, the method further includes: training the neural network to reduce the information contained in the third feature and the domain invariant feature The difference between the information contained in the specific features of the domain.

By reducing the difference between the information contained in the third feature and the information jointly contained in the domain invariant features and domain specific features, the decoupled domain invariant features and domain specific features can be further promoted to contain all the features of the training data Information to improve the completeness and rationality of feature decoupling.

With reference to the first aspect of the present application, in a possible implementation manner, the neural network is used for domain adaptive learning, and the training data includes image data in different fields.

By extracting domain invariant features and domain specific features of image data in different fields, and training the neural network based on the task loss obtained by using domain invariant features to perform tasks, the domain invariant features can be decoupled from domain specific features. Due to the use of domain invariant features to perform tasks, the neural network obtained by the training method of this application can self-adapt to processing tasks for images in a variety of different fields through domain adaptive learning, thereby realizing adaptive processing of image data in different fields .

In a second aspect, a data acquisition method is provided, including: acquiring data of a source domain and/or data of a target domain; inputting the data of the source domain and/or the data of the target domain into a neural network for training, Obtain the gradient information of the loss function; according to the gradient information, perturb the data of the source domain and/or the data of the target domain to obtain the data of the intermediate domain; wherein the source domain and the target domain are Two areas with different data characteristics, the difference in data characteristics between the intermediate domain and any one of the source domain and the target domain is smaller than the data characteristics between the source domain and the target domain The difference.

The introduction of the direction information between the source domain and the target domain makes the disturbance of the training data more targeted. The training data of the intermediate domain obtained through the disturbance can fill the "domain gap" between the source domain and the target domain, and alleviate the source domain There is a big difference between the distribution of training data and the training data of the target domain.

In conjunction with the second aspect of the present application, in a possible implementation manner, said inputting the data of the source domain and/or the data of the target domain into a neural network for training, so as to obtain gradient information of the loss function, The method includes: inputting the labeled data of the source domain into a first neural network and performing training to obtain first gradient information, wherein the first neural network is generated based on the labeled data of the target domain through training of.

The first neural network is generated by training the labeled data of the target domain. Therefore, the first gradient information obtained after inputting the labeled data of the source domain into the first neural network can be a good measure of The direction from the source domain to the target domain.

With reference to the second aspect of the present application, in a possible implementation manner, said inputting the data of the source domain and/or the data of the target domain into a neural network for training to obtain gradient information of a loss function includes : Input the unlabeled data of the target domain into the second neural network, and perform training in the manner of virtual confrontation training to obtain the second gradient information, where the second neural network is generated based on the labeled data training .

The second neural network is generated by training with labeled data in the source domain. Therefore, the second gradient information obtained through virtual confrontation training after inputting unlabeled data in the target domain into the second neural network can be very good Measure the direction from the target domain to the source domain.

In a third aspect, a neural network training device is provided, including: a module calculation module for executing the first aspect.

In a fourth aspect, a data acquisition device is provided, including: a module calculation module for executing the method described in the second aspect.

In a fifth aspect, a neural network training device is provided, including: a memory for storing a program; a processor for executing the program stored in the memory, and when the program stored in the memory is executed, the processing The device is used to perform the method described in the first aspect or the second aspect.

In a sixth aspect, a neural network is provided, including: a first feature extraction layer for extracting a first feature based on input data; a first domain invariant feature decoupling layer for extracting a first feature based on the first feature Domain invariant features; a feature fusion layer for fusing the first feature and the first domain invariant feature to obtain a second feature; a second feature extraction layer for extracting a third feature based on the second feature Features; the second domain invariant feature decoupling layer, used to extract the second domain invariant features based on the third feature; wherein, the first domain invariant features and the second domain invariant features are respectively a characterization The features of the field to which the input data belongs, the first domain specific feature and the second domain specific feature are respectively features that have nothing to do with the field to which the input data belongs.

In a seventh aspect, a data processing system is provided, including: a data acquisition network for acquiring gradient information of a loss function based on first data, and perturbing the first data according to the gradient information to acquire second data The feature decoupling network is used to train the neural network using the training data including the second data, so that the neural network learns to decompose domain invariant features and domain specific features from the training data; wherein, the The domain-specific features are features that characterize the domain to which the training data belongs, and the domain invariant features are features that have nothing to do with the domain to which the training data belongs.

With reference to the seventh aspect of the present application, in a possible implementation manner, the feature decoupling network includes: a first feature extraction layer for extracting first features based on the training data; first domain invariant feature extraction Layer, used to extract the first domain invariant feature based on the first feature; the first domain specific feature extraction layer, used to extract the first domain specific feature based on the first feature; the first mutual information loss acquisition layer, using To obtain the first mutual information loss based on the first domain invariant feature and the first domain specific feature; the feature fusion layer is used to fuse the first feature and the first domain invariant feature to obtain the first Two features; a second feature extraction layer for extracting a third feature based on the second feature; a second domain invariant feature decoupling layer for extracting a second domain invariant feature based on the third feature; second The domain-specific feature extraction layer is used to extract second domain-specific features based on the third feature; the second mutual information loss acquisition layer is used to obtain the second domain-specific features based on the second domain invariant features and the second domain specific features. Two mutual information loss; the task loss acquisition layer, which is used to perform tasks using the invariant features of the second domain to obtain the task loss.

With reference to the seventh aspect of the present application, in a possible implementation manner, the data processing system further includes: a first domain classifier, configured to perform a classification task based on the specific characteristics of the first domain to obtain the first classification loss The first gradient inversion layer is used to invert the gradient information of the first classification loss; and/or the second domain classifier is used to perform classification tasks based on the specific features of the second domain to obtain the second Classification loss; a second gradient inversion layer, in which the gradient information of the second classification loss is inverted.

With reference to the seventh aspect of the present application, in a possible implementation manner, the data processing system further includes: a reconstruction loss acquisition layer, configured to use the second domain invariant feature and the second domain specific feature pair The third feature is reconstructed to obtain a reconstructed feature; the third feature and the reconstructed feature are compared to obtain a reconstruction loss.

With reference to the seventh aspect of the present application, in a possible implementation manner, the first data includes data of a source domain and/or data of a target domain, wherein the data acquisition network includes: A first training network generated by training on labeled data; and/or a second training network generated based on training on labeled data.

In an eighth aspect, a security device is provided, including the neural network described in the sixth aspect.

In a ninth aspect, a computer-readable storage medium is provided, the computer-readable storage medium stores computer program instructions, and when the computer program instructions are executed by a processor, the processor executes the first aspect or the first aspect. The method described in the two aspects.

In a tenth aspect, a computer program product is provided, including computer program instructions that, when run by a processor, cause the processor to execute the method described in the first aspect or the second aspect.

In an eleventh aspect, a chip is provided. The chip includes a processor and a data interface. The processor reads instructions stored in a memory through the data interface and executes the method described in the first or second aspect.

Optionally, as an implementation manner, the chip may further include a memory in which instructions are stored, and the processor is configured to execute instructions stored on the memory. When the instructions are executed, the The processor is configured to execute the method described in the first aspect or the second aspect.

Description of the drawings

Figure 1 is a schematic diagram of an artificial intelligence main frame.

Fig. 2 is a system architecture provided by an embodiment of the application.

FIG. 3 is a diagram of the chip hardware structure provided by an embodiment of the application.

Fig. 4 is a system architecture provided by an embodiment of the application.

FIG. 5 is a schematic flowchart of a neural network training method provided by an embodiment of this application.

FIG. 6 is a schematic flowchart of a neural network training method provided by an embodiment of this application.

FIG. 7 is a schematic structural diagram of a neural network provided by an embodiment of this application.

FIG. 8 is a schematic diagram of the principle of feature decoupling provided by an embodiment of this application.

Fig. 9 is a schematic structural diagram of a neural network provided by another embodiment of the application.

FIG. 10 is a schematic structural diagram of a neural network provided by an embodiment of this application.

FIG. 11 is a schematic diagram of the process of extracting domain invariant features and domain specific features based on the neural network architecture shown in FIG. 10.

FIG. 12 is a schematic diagram of the principle of the training process provided by an embodiment of the application.

FIG. 13 is a schematic structural diagram of a neural network provided by an embodiment of this application.

FIG. 14 is a schematic diagram of a process for obtaining data of an intermediate domain according to an embodiment of the application.

FIG. 15 is a schematic structural diagram of a neural network provided by an embodiment of this application.

FIG. 16 is a schematic diagram of two-way confrontation training provided by another embodiment of this application.

FIG. 17 is a schematic structural diagram of a data processing system provided by an embodiment of this application.

FIG. 18 is a schematic structural diagram of a neural network training device provided by an embodiment of the application.

FIG. 19 is a schematic structural diagram of a data acquisition device provided by another embodiment of this application.

FIG. 20 is a schematic diagram of the hardware structure of a neural network training device provided by an embodiment of the application.

detailed description

The technical solution in this application will be described below in conjunction with the accompanying drawings.

Figure 1 is a schematic diagram of an artificial intelligence main frame. The main framework describes the overall workflow of the artificial intelligence system, which is suitable for general artificial intelligence field requirements.

The following describes the above-mentioned artificial intelligence theme framework from the two dimensions of "intelligent information chain" (horizontal axis) and "IT value chain" (vertical axis).

"Intelligent Information Chain" reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, the data has gone through the condensing process of "data-information-knowledge-wisdom".

"IT value chain" from the low-level infrastructure of human intelligence, information (providing and processing technology realization) to the industrial ecological process of the system, reflecting the value that artificial intelligence brings to the information technology industry.

(1) Infrastructure:

The infrastructure provides computing power support for the artificial intelligence system, realizes communication with the outside world, and realizes support through the basic platform. Communicate with the outside through sensors; computing capabilities are provided by smart chips (CPU, NPU, GPU, ASIC, FPGA and other hardware acceleration chips); the basic platform includes distributed computing framework and network related platform guarantees and support, which can include cloud storage and Computing, interconnection network, etc. For example, sensors communicate with the outside to obtain data, and these data are provided to the smart chip in the distributed computing system provided by the basic platform for calculation.

(2) Data

The data in the upper layer of the infrastructure is used to represent the data source in the field of artificial intelligence. The data involves graphics, images, voice, text, and IoT data of traditional devices, including business data of existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.

(3) Data processing

Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making and other methods.

Among them, machine learning and deep learning can symbolize and formalize data for intelligent information modeling, extraction, preprocessing, training, etc.

Reasoning refers to the process of simulating human intelligent reasoning in a computer or intelligent system, using formal information to conduct machine thinking and solving problems based on reasoning control strategies. The typical function is search and matching.

Decision-making refers to the process of making decisions after intelligent information is reasoned, and usually provides functions such as classification, ranking, and prediction.

(4) General ability

After the above-mentioned data processing is performed on the data, some general capabilities can be formed based on the results of the data processing, such as an algorithm or a general system, for example, translation, text analysis, computer vision processing, speech recognition, image Recognition and so on.

(5) Smart products and industry applications

Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. It is an encapsulation of the overall solution of artificial intelligence, productizing intelligent information decision-making and realizing landing applications. Its application fields mainly include: intelligent manufacturing, intelligent transportation, Smart home, smart medical, smart security, autonomous driving, smart city, smart terminal, etc.

As mentioned earlier, for domain adaptive learning tasks, due to the distribution difference between the source domain and the target domain, a model that performs well in the source domain will cause performance limitations if it is directly applied to the target domain. When training the neural network model for domain adaptive learning, a distributed alignment strategy is adopted, that is, the data of the source domain and the data of the target domain are aligned at the level of feature representation. Since this distribution alignment process is only performed at the overall feature representation level, the domain adaptive learning task is inevitably affected by specific features in different fields. Therefore, the trained neural network model still has the problem of poor migration performance.

In response to the above technical problems, this application proposes a method for training a neural network model, which can extract domain invariant features (domain invariant features can be understood as features at the instance level that are not related to the domain) from the features of the data during the training process. Decoupling makes the domain adaptive learning task not affected by the specific characteristics of different domains, thereby improving the migration performance of the neural network model.

It should be understood that the neural network model trained in the embodiments of the present application can be applied to various different application scenarios, and the neural network model can also have different structures according to different specific application scenarios. For example, in image classification application scenarios (such as vehicle recognition, face recognition, etc.), the neural network model can be a convolutional neural network model, while in regression prediction application scenarios (such as energy consumption prediction of industrial production lines, weather prediction, etc.) Landslide prediction, etc.), the neural network model can include a multilayer perceptron architecture. The embodiments of the present application do not limit the specific application scenarios and structure of the trained neural network model.

Since the embodiments of the present application involve applications in domain adaptive learning and neural networks, in order to facilitate understanding, the following briefly introduces related terms and neural networks and other related concepts that may be involved in the embodiments of the present application.

(1) Domain adaptive learning

Domain adaptive learning is a machine learning method used to solve the problem of inconsistency in the probability distribution of training samples and test samples. It aims to overcome the difference between the probability distribution of source domain samples and the probability distribution of target domain samples in the training process to achieve the target domain Learning tasks.

(2) Neural network

A neural network can be composed of neural units, which can refer to _{an arithmetic unit that takes x s} and intercept 1 as inputs. The output of this arithmetic unit can be expressed by the following formula (1).

In formula (1), s=1, 2,...n, n is a natural number greater than 1, W _s is the weight of x _s , and b is the bias of the neural unit. f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal. The output signal of the activation function can be used as the input of the next convolutional layer. The activation function can be a sigmoid function. A neural network is a network formed by connecting many of the above-mentioned single neural units together, that is, the output of one neural unit can be the input of another neural unit. The input of each neural unit can be connected with the local receptive field of the previous layer to extract the characteristics of the local receptive field. The local receptive field can be a region composed of several neural units.

(3) Deep neural network

Deep neural network (DNN), also known as multi-layer neural network, can be understood as a neural network with many hidden layers. There is no special metric for "many" here. Dividing DNN according to the location of different layers, the neural network inside DNN can be divided into three categories: input layer, hidden layer, and output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the middle layers are all hidden layers. The layers are fully connected, that is to say, any neuron in the i-th layer must be connected to any neuron in the i+1th layer. Although DNN looks very complicated, the work of each layer can be expressed based on the linear relationship expression described in the following formula (2).

In formula (2),

Represents the input vector,

Represents the output vector, b represents the offset vector, W represents the weight matrix (also called coefficient), and α(.) represents the activation function. Each layer is just the input vector

After such a simple operation, the output vector is obtained

Due to the large number of DNN layers, the number of coefficients W and offset vectors b is also large. The definition of these parameters in DNN is as follows.

Take the coefficient W as an example: Suppose in a three-layer DNN, the linear coefficient from the fourth neuron in the second layer to the second neuron in the third layer is defined as

The superscript 3 represents the number of layers where the coefficient W is located, and the subscript 24 corresponds to the output third-level index 2 and the input second-level index 4. The summary is: the coefficient from the kth neuron in the L-1th layer to the jth neuron in the Lth layer is defined as

It should be noted that there is no W parameter in the input layer. In deep neural networks, more hidden layers make the network more capable of portraying complex situations in the real world. In theory, a model with more parameters is more complex and has a greater "capacity", which means that it can complete more complex learning tasks. The process of training the deep neural network is also the process of learning the weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (the weight matrix formed by the vector W of many layers).

(4) Loss function

In the process of training a deep neural network, because it is hoped that the output of the deep neural network is as close as possible to the value that you really want to predict, you can compare the predicted value of the current network with the target value you really want, and then based on the difference between the two To update the weight vector of each layer of neural network (of course, there is usually an initialization process before the first update, that is, pre-configured parameters for each layer in the deep neural network). For example, if the predicted value of the network is high, adjust the weight vector to make it predict lower, and keep adjusting until the deep neural network can predict the really desired target value or a value very close to the really desired target value. Therefore, it is necessary to predefine "how to compare the difference between the predicted value and the target value". This is the loss function or objective function, which is used to measure the difference between the predicted value and the target value. Important equation. Among them, taking the loss function as an example, the higher the output value (loss) of the loss function, the greater the difference, then the training of the deep neural network becomes a process of reducing this loss as much as possible.

(5) Backpropagation algorithm

Convolutional neural networks can use backpropagation (BP) algorithms to modify the size of the parameters in the initial neural network during the training process, so that the reconstruction error loss of the neural network becomes smaller and smaller. Specifically, forwarding the input signal to the output will cause error loss, and the parameters in the initial neural network are updated by backpropagating the error loss information, so that the error loss is converged. The backpropagation algorithm is a backpropagation motion dominated by error loss, aiming to obtain the optimal neural network parameters, such as the weight matrix.

(6) Adversarial samples

The adversarial sample refers to the input sample formed by adding disturbance to the data set, which causes the neural network to give an incorrect output with high confidence. Since the ultimate goal of the neural network is actually to obtain the correct output result, the adversarial sample is used to train the neural network with this adversarial training strategy, so that the neural network can adapt to this perturbation, thereby being robust to the adversarial sample.

(7) Virtual confrontation training

Virtual confrontation training refers to a confrontation training method that does not rely on training labels. Virtual confrontation training generates a disturbance based on the first output of the neural network. This disturbance makes the second output obtained by inputting the generated confrontation sample into the neural network different from the previous first output, so as to realize the strategy of confrontation training.

The system architecture provided by the embodiment of the present application will be described in detail below in conjunction with FIG. 2.

FIG. 2 is a system architecture 200 provided by an embodiment of the application. As shown in FIG. 2, the system architecture 200 includes an execution device 210, a training device 220, a database 230, a client device 240, a data storage system 250, and a data collection system 260. The execution device 210 includes a calculation module 211, an I/O interface 212, a preprocessing module 213, and a preprocessing module 214. The calculation module 211 may include the target model/rule 201, and the preprocessing module 213 and the preprocessing module 214 are optional.

The data collection device 260 is used to collect training data (or sample data for training) and store it in the database 230. The training data in the embodiment of the present application may include training data in different fields, such as training data in the source domain and the target domain. Training data. The training device 220 trains the target model/rule 201 based on the training data maintained in the database 230, so that the target model/rule 201 has the function of decoupling domain invariant features and domain specific features from the input data, and uses the Domain-invariant features can complete tasks required by actual application scenarios, such as the ability to complete tasks such as target classification/detection/recognition/segmentation.

The target model/rule 201 may be a neural network model. The work of each layer in the neural network model can be expressed in mathematical expressions

To describe: From the physical level, the work of each layer in the neural network model can be understood as through five operations on the input space (the set of input vectors) to complete the transformation from the input space to the output space (that is, the row space of the matrix to the column Space), these five operations include: 1. Dimension Up/Down; 2. Enlarge/Reduce; 3. Rotate; 4. Translation; 5. "Bend". The operations of 1, 2, and 3 are determined by

Completed, the operation of 4 is completed by +b, and the operation of 5 is realized by a(). The reason why the word "space" is used here is because the object to be classified is not a single thing, but a class of things, and space refers to the collection of all individuals of this class of things. Among them, W is a weight vector, and each value in the vector represents the weight value of a neuron in the layer of neural network. This vector W determines the spatial transformation from the input space to the output space described above, that is, the weight W of each layer controls how the space is transformed. The purpose of training the neural network model is to finally obtain the weight matrix of all layers of the trained neural network (the weight matrix formed by the vector W of many layers). Therefore, the training process of the neural network is essentially the way of learning the control space transformation, and more specifically the learning weight matrix.

Because it is hoped that the output of the neural network model is as close as possible to the value that you really want to predict, you can compare the current network's predicted value with the really desired target value, and then update each layer of neural network according to the difference between the two. The weight vector of the network (of course, there is usually an initialization process before the first update, which is to pre-configure parameters for each layer in the neural network model). For example, if the predicted value of the network is high, adjust the weight vector to make it The prediction is lower and keep adjusting until the neural network can predict the target value you really want. Therefore, it is necessary to predefine "how to compare the difference between the predicted value and the target value". This is the loss function or objective function, which is used to measure the difference between the predicted value and the target value. Important equation. Among them, taking the loss function as an example, the higher the output value (loss) of the loss function, the greater the difference, and the training of the neural network model becomes a process of reducing this loss as much as possible.

The target model/rule obtained by the training device 220 can be applied to different systems or devices. In FIG. 2, the execution device 210 is configured with an I/O interface 212 to perform data interaction with external devices. The "user" can input data to the I/O interface 212 through the client device 240.

The execution device 210 can call data, codes, etc. in the data storage system 250, and can also store data, instructions, etc. in the data storage system 250.

The calculation module 211 uses the target model/rule 201 to process the input data. In the working process of the actual application scenario, the specific input data of the calculation module 211 is related to the specific application scenario. For example, in an application scenario of face recognition, the input data of the calculation module 211 may be image data including a face image. Since the calculation module 211 uses the target model/rule 201 to process the input data, the calculation module actually obtains instance-level features based on the input data, and then uses the instance-level features to perform specific tasks.

In an embodiment of the present application, the system architecture 200 may also include some management function modules connected to the calculation module 211 to complete more flexible subdivision tasks based on the output result of the calculation module 211. For example, when the data that the "user" can input to the I/O interface 212 through the client device 240 is image data of a traffic scene, the associated function module 213 shown in FIG. The characteristic information further identifies information such as the license plate number and model of the vehicle; and the correlation function module 214 may be configured to further identify the gender, height, and age of the pedestrian based on the characteristics of the pedestrian output by the calculation module 211. However, this application does not limit whether the system architecture includes these associated function modules, and the specific functions performed by these associated function modules.

Finally, the I/O interface 212 returns the processing result to the client device 240 and provides it to the user.

At a deeper level, the training device 220 can generate corresponding target models/rules 201 based on different data for different targets, so as to provide users with better results.

In the case shown in FIG. 2, the user can manually specify to input data in the execution device 210, for example, to operate in the interface provided by the I/O interface 212. In another case, the client device 240 can automatically input data to the I/O interface 212 and obtain the result. If the client device 240 automatically inputs data and needs the user's authorization, the user can set the corresponding authority in the client device 240. The user can view the result output by the execution device 210 on the client device 240, and the specific presentation form may be a specific manner such as display, sound, and action. The client device 240 can also serve as a data collection terminal to store the collected sample data in the database 230.

It is worth noting that Figure 2 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship between the devices, devices, modules, etc. shown in the figure does not constitute any limitation. For example, in Figure 2 The data storage system 250 is an external memory relative to the execution device 210. In other cases, the data storage system 250 may also be placed in the execution device 210.

The following describes a chip hardware structure provided by an embodiment of the present application in conjunction with FIG. 3.

FIG. 3 is a diagram of the chip hardware structure provided by an embodiment of the application. As shown in FIG. 3, the chip includes a neural-network processing unit (NPU) 300. The chip can be set in the execution device 210 as shown in FIG. 2 to complete the calculation work of the calculation module 211. The chip can also be set in the training device 220 shown in FIG. 2 to complete the training work of the training device 220 and output the target model/rule 201. In addition, the following neural network training methods shown in FIG. 4, FIG. 9 and FIG. 11 can all be implemented in the chip shown in FIG. 3.

The neural network processor 300 is mounted on a main central processing unit (host central processing unit, host CPU) as a coprocessor, and the main CPU distributes tasks. The core part of the neural network processor 300 is the arithmetic circuit 303, and the controller 304 controls the arithmetic circuit 303 to extract data from the memory (weight memory 302 or input memory 301) and perform calculations.

In some implementations, the arithmetic circuit 303 includes multiple processing units (process engines, PE). In some implementations, the arithmetic circuit 303 is a two-dimensional systolic array. The arithmetic circuit 303 may also be a one-dimensional systolic array or other electronic circuits capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 303 is a general-purpose matrix processor.

For example, suppose there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit 303 fetches the data corresponding to the weight matrix B from the weight memory 302 and caches it on each PE in the arithmetic circuit 303. The arithmetic circuit 303 fetches the input matrix A and the weight matrix B from the input memory 301 to perform matrix operations to obtain partial results or final results of the matrix, and store them in an accumulator 308.

The vector calculation unit 307 can perform further processing on the output of the operation circuit 303, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, and so on. For example, the vector calculation unit 307 can be used for network calculations in the non-convolutional/non-FC layer of the neural network, such as pooling, batch normalization, local response normalization, etc. .

In some implementations, the vector calculation unit 307 can store the processed output vector to the unified memory 306. For example, the vector calculation unit 307 may apply a nonlinear function to the output of the arithmetic circuit 303, such as a vector of accumulated values, to generate the activation value. In some implementations, the vector calculation unit 307 generates a normalized value, a combined value, or both. In some implementations, the processed output vector can be used as an activation input to the arithmetic circuit 303, for example for use in a subsequent layer in a neural network.

The unified memory 306 is used to store input data and output data.

The weight data directly transfers the input data in the external memory to the input memory 301 and/or the unified memory 306 through the direct memory access controller (DMAC) 305, and stores the weight data in the external memory into the weight memory 302, And the data in the unified memory 306 is stored in the external memory.

The bus interface unit (BIU) 310 is used to implement interaction between the main CPU, the DMAC, and the instruction fetch memory 309 through the bus.

An instruction fetch buffer 309 connected to the controller 304 is used to store instructions used by the controller 304.

The controller 304 is used to call the instructions cached in the instruction fetch memory 309 to control the working process of the computing accelerator.

Generally, the unified memory 306, the input memory 301, the weight memory 302, and the fetch memory 309 are all on-chip memories. The external memory is a memory external to the NPU. The external memory can be a double data rate synchronous dynamic random access memory. Memory (double data rate synchronous dynamic random access memory, DDR SDRAM), high bandwidth memory (HBM) or other readable and writable memory.

FIG. 4 is a system architecture 400 provided by an embodiment of this application. The execution device 410 is implemented by one or more servers set in the cloud. The server can also cooperate with other computing devices, such as data storage, routers, load balancers and other devices; the execution device 410 can be arranged on a physical site or distributed On multiple physical sites. In an embodiment of the present application, the execution device 410 may use the data in the data storage system 420 or call the program code in the data storage system 420 to implement the neural network training method provided by the embodiment of the present application; specifically, execute The device 410 can train the neural network according to the training data in the data storage system 420 in the method provided in the embodiment of the present application, and complete the corresponding intelligent task according to the request of the local device 401/402. In another embodiment of the present application, the execution device 410 may not have the function of training a neural network, but the neural network trained according to the neural network training method provided by the embodiment of the present application can complete the corresponding intelligent task; Specifically, after the execution device 410 is configured with the neural network training method provided by the embodiment of the present application, after the neural network is trained, the corresponding intelligent task can be completed after receiving the request of the local device 401/402, and the result will be fed back. To the local device 401/402.

The user can operate respective user devices (for example, the local device 401 and the local device 402) to interact with the execution device 410. Each local device can represent any computing device, such as personal computers, computer workstations, smart phones, tablets, smart cameras, smart cars or other types of cellular phones, media consumption devices, wearable devices, set-top boxes, game consoles, etc. In an embodiment of the present application, the local device may be a security device, such as a surveillance camera device, a smoke alarm device, or a fire extinguishing device.

The local device of each user can interact with the execution device 410 through a communication network of any communication mechanism/communication standard. The communication network can be a wide area network, a local area network, a point-to-point connection, etc., or any combination thereof.

In another implementation, one or more aspects of the execution device 410 may be implemented by each local device. For example, the local device 401 may provide the execution device 410 with local data or feed back calculation results.

In another implementation, all the functions of the foregoing execution device 410 may also be implemented by a local device. The local device 401 executes the neural network training method provided in the embodiments of the present application, and uses the trained neural network to provide services to users.

FIG. 5 is a schematic flowchart of a neural network training method provided by an embodiment of the application. The training method of the neural network shown in FIG. 5 can be executed by the training device 220 shown in FIG. 2, and the target model/rule 201 trained by the training device 220 is the neural network. As shown in Figure 5, the neural network training method includes the following steps:

Step 501: Obtain training data.

The training data is the input data of the training process. The training data can be collected by the user, or an existing training database can be used. It should be understood that the training data may have different formats and forms according to different requirements of actual scenarios. For example, in a target detection or target recognition scenario, the training data may be image data. In the scenario of regression prediction, the training data can be the collected historical housing price data.

In an embodiment of the present application, the training data input to the neural network may include training data of different domains. For example, different domains can include target domains and source domains. Taking the domain adaptive learning task as an example, the training data may include the training data of the source domain and the training data of the target domain. In an embodiment of the present application, the difference in domains may be embodied as the difference in scenarios. Taking the application scenario of vehicle detection as an example, the training data of the source domain may be a large number of traffic scene images in a sunny scene, and the training data of the target domain may be a large number of traffic scene images in a foggy scene. However, it should be understood that depending on the application scenario, the training data of the source domain and the training data of the target domain may also reflect this domain difference in other aspects. For example, in a regression prediction scenario, the training data of the source domain may be collected last year. The energy consumption data of the production line, and the training data of the target domain may be the energy consumption data of the production line collected this year. At this time, the domain difference is reflected in the inconsistent value distribution of the energy consumption data due to time changes.

Step 502: Use the training data to train the neural network so that the neural network learns to decompose domain invariant features and domain specific features from the training data.

In the process of training the neural network, the neural network can use any of the methods of supervised learning, semi-supervised learning or unsupervised learning to learn from the training data. Taking the domain adaptive learning task as an example, the training data can include the training data of the source domain with sufficient labels and the training data of the target domain with a small number of labels. In this case, the neural network can use semi-supervised learning to perform Training data for learning; alternatively, the training data can include the training data of the source domain with sufficient labels and the training data of the target domain without labels. At this time, the neural network can use unsupervised learning on the training data. learn.

Domain-invariant representation (DIR) is a feature that has nothing to do with the domain to which the training data belongs, and is a feature that does not change due to domain differences. Domain-invariant features can sometimes be referred to as task-related instance-level features. Taking the application scenario of vehicle detection as an example, the domain difference is reflected in the image difference between the weather changes between the traffic scene image taken on a sunny day and the traffic scene image taken on a foggy day, and the characteristics of the vehicle in the traffic scene will not follow The weather changes. At the same time, the target object (ie, instance) of the target detection task is the vehicle in the image, so the characteristics of the vehicle are the domain invariant features to be extracted. In the domain adaptive learning scene of target detection, what we want to achieve is that regardless of whether the image obtained in actual work is taken on a sunny day or a foggy day, this trained neural network can accurately extract the characteristics of the vehicle to complete the target. Inspection task.

Domain-specific representation (DSR) is a feature that characterizes the domain to which the training data belongs. It is a feature unique to the domain to which the training data belongs, and will change due to domain differences; at the same time, domain-specific features are also irrelevant to the instance The characteristics are also irrelevant to the goal of the task in the actual task execution process. For example, in the aforementioned vehicle detection application scenario, the characteristics of the surrounding environment (trees, sky, street scene, etc.) of the vehicle in the traffic scene image are not related to the characteristics of the vehicle, because the recognition or detection of the vehicle does not require knowledge of the surrounding environment. Features, and the feature information of the surrounding environment (such as the sky) will change with the domain difference (weather change).

It can be seen that the embodiment of the present application decomposes the domain invariant feature and the domain specific feature from the training data, so that the domain invariant feature can be decoupled from the domain specific feature. Since the neural network obtained by the training method of the present application uses domain invariant features to perform tasks, the influence of domain-specific features on the neural network is avoided, thereby improving the migration performance of the neural network between different domains.

FIG. 6 is a schematic flowchart of a neural network training method provided by an embodiment of the application. Fig. 7 is a schematic diagram of the structure of the neural network trained in Fig. 6. The training method of the neural network shown in FIG. 6 can be executed by the training device 220 shown in FIG. 2, and the target model/rule 201 trained by the training device 220 is the neural network. As shown in Fig. 6 and Fig. 7, the training method of the neural network includes the following steps:

Step 601: Decompose domain-invariant features and domain-specific features from the training data.

As shown in Figure 7, the process of extracting the domain invariant feature DIR and the domain specific feature DSR can be respectively completed by the domain invariant feature extractor E _DIR and the domain specific feature extractor E _DSR in the neural network. After the training data is input into the neural network, the domain invariant feature extractor E _DIR and the domain specific feature extractor E _DSR can be used to complete the extraction process of the domain invariant feature DIR and the domain specific feature DSR.

The decoupling of domain-invariant features and domain-specific features in the target detection scene will be explained below in conjunction with FIG. 8. In the target detection scene shown in Figure 8, the goal of the task is to detect objects (including people and vehicles) in the image data. The training data of the source domain on the left side of Fig. 8 is a photo image, and the training data of the target domain on the right side is a cartoon image. In the domain invariant space, the domain invariant features extracted from the training data of the source domain are the characters and vehicles in the photo image, and the domain invariant features extracted from the training data of the target domain are the characters and vehicles in the cartoon image. , The line C ₁ represents the classification boundary between the domain invariant feature of the person and the domain invariant feature of the vehicle in the domain invariant space. In the domain-specific space, the domain-specific features extracted from the training data of the source domain are other features other than the characters and vehicles in the photo image, and the domain-specific features extracted from the training data of the target domain are the cartoon images that remove the characters and For other features outside the vehicle, the line C ₂ represents the distribution boundary of the domain-specific features from the source domain and the domain-specific features from the target domain in the domain-specific space.

Step 602: Use the domain invariant feature to perform the task, obtain the task loss, and calculate the mutual information loss between the domain invariant feature and the domain specific feature. The mutual information loss is used to represent the difference between the domain invariant feature and the domain specific feature .

As mentioned earlier, domain invariant features are used to characterize feature information at the instance level. Therefore, domain invariant features are used to perform tasks and obtain task loss, which can improve the effect of domain invariant features on tasks related to tasks. The accuracy and completeness of the instance characterization. The task loss is used to characterize the gap between the result of using domain invariant features to perform the task and the task label. For example, when the domain invariant feature is used to perform a target detection task, the results obtained from the task can include the attribute features of the detected target object, and the task label corresponds to the target object to which the domain invariant feature actually corresponds In this way, the difference between the detected attribute feature and the standard attribute feature can be characterized by the task loss.

Mutual information (MI) loss characterizes the interdependence of two variables. The mutual information loss I of two random variables X and Z can be defined by the following formula (3), where H(X) is the edge entropy and H(X|Z) is the conditional entropy:

I(X,Z)=H(X)-H(X|Z) (3)

Mutual information loss is used to represent the difference between domain-invariant features and domain-specific features. By calculating the mutual information loss between domain-invariant features and domain-specific features, and training the neural network based on the mutual information loss, it can help to further distinguish between domain-invariant features and domain-specific features to force The role of domain-invariant features and domain-specific features decoupling. It should be understood that the calculation method of mutual information loss can be selected according to actual scenario requirements. For example, mutual information neural estimator (MINE) can be selected to obtain the mutual information loss. The specific calculation of mutual information loss in this application The method is not strictly limited.

Step 603: Train the neural network according to the task loss and the mutual information loss.

As mentioned earlier, the training process of the neural network is actually the process of adjusting the weight vector according to the value of the loss function. The task loss here characterizes the ability to complete the task based on the domain invariant features extracted from the training data. If the domain invariant feature cannot correspond to the instance accurately enough, then the value of the task loss will be relatively large. At this time, the weight vector in the neural network needs to be adjusted so that the domain invariant feature can be obtained in the next prediction process Lower mission loss. Through the iteration of training, the domain invariant features extracted by the domain invariant feature extractor will correspond to the examples more and more accurately.

The process of training the neural network based on the mutual information loss may be a process of training the neural network to reduce the mutual information loss between the domain invariant feature and the domain specific feature, for example, to minimize the mutual information loss. In order to ensure that the domain invariant feature can correspond to the instance more accurately, the mutual information loss between the domain invariant feature and the domain specific feature can be calculated, and the mutual information loss can be used to further improve the accuracy of the domain invariant feature extraction.

Since the mutual information loss characterizes the correlation between domain invariant features and domain-specific features, adjusting the weight vector of the neural network according to the mutual information loss can make the extracted domain invariant features better and domain-specific The features are distinguished, and they play a role in forcing the decoupling of features. If the mutual information loss is large, it means that the current domain-invariant features and domain-specific features are relatively related, that is, the current domain-invariant feature extractor may still include domain-specific features in the features extracted Information content, at this time, the weight vector of the neural network needs to be adjusted to reduce the mutual information loss.

Since the extraction of domain invariant features will be trained in the training process based on task loss, the features extracted by the domain invariant feature extractor may have some relevance to the task, so the training process based on mutual information loss can also be It is regarded as a process of "removing" domain-specific features from domain invariant features, so that the features extracted by the domain invariant feature extractor become more and more consistent with the instance as the training iterations, and it also makes the domain-specific feature extractor With the iteration of training, the extracted features become more and more irrelevant to the instance, thereby realizing the decoupling of domain-invariant features and domain-specific features. It can be seen that with the iteration of the training process, the features extracted by the domain-specific feature extractor will become more and more irrelevant to the instance, that is, closer and closer to the unique features of the representation domain itself, so domain-specific feature extraction The device is also trained in this training process based on mutual information loss.

It should be understood that the foregoing training process based on task loss and the training process based on mutual information loss are not necessarily performed at the same time. In an embodiment of the present application, the training process based on mutual information loss may also be performed during training based on task loss. After the process starts, this application does not strictly limit the specific execution sequence of the two training processes.

This application trains the neural network based on the task loss and the mutual information loss, which not only makes the decomposed domain invariant features more accurately correspond to the instance, but also reduces the gap between the domain invariant features and the domain specific features during the training process. In order to promote the complete decoupling of domain-invariant features and domain-specific features, the influence of domain-specific features on domain-invariant features is further reduced.

It should be understood that although the mutual information loss is taken as an example above to characterize the correlation between domain invariant features and domain specific features, in other embodiments of the present application, other forms of loss information may also be used to represent domain invariant features and domains. The correlation of specific features; then based on the task loss and the other forms of loss information training neural network, so that the extracted domain invariant features can be better distinguished from the domain specific features, play a role in forcing feature decoupling . In an embodiment of the present application, one or more combinations of the following loss information between domain invariant features and domain specific features can be calculated: mutual information loss, metric loss (for example, L1 distance or L2 distance), measurement Loss of data distribution (such as KL (kullback-leibler) divergence) and wasserstein distance. This application does not strictly limit the form of loss information used to characterize the correlation between domain invariant features and domain specific features.

In an embodiment of the present application, the neural network can be used for domain adaptive learning, and the training data can come from image data of different domains (for example, different styles), such as photorealistic style, comic style, etc. By extracting domain-invariant features and domain-specific features of image data of different styles, and training the neural network based on the task loss obtained by using domain-invariant features to perform tasks, the domain-invariant features can be decoupled from domain-specific features. Since domain invariant features are used to perform tasks, the neural network obtained by the training method of the present application can adapt to various image processing tasks in different fields through domain adaptive learning, such as target detection/recognition/segmentation, etc., so as to achieve Adaptive processing of image data in different fields.

Although it is mentioned in the above description that the domain-specific feature extractor will also be trained in the training process based on mutual information loss, considering that when the domain-specific feature extractor's extraction accuracy for domain-specific features is improved, the The information loss training process can effectively distinguish domain-specific features from domain invariant features, and the extraction accuracy of domain invariant features will be further improved indirectly. Therefore, it is necessary to further improve the extraction accuracy of domain-specific features to indirectly improve the extraction accuracy of domain invariant feature extractors through the training process based on mutual information loss.

In some embodiments of the present application, the domain-specific features extracted by the domain-specific feature extractor may be subjected to domain classification to obtain the domain classification loss, and then the neural network can be trained according to the task loss, mutual information loss, and domain classification loss.

For example, as shown in Figure 9, the domain-specific feature extractor can be connected to a domain classifier, and a gradient reversal layer (GRL) can be set between the feature extractor and the domain classifier. The domain-specific features extracted by the domain-specific feature extractor are input into the domain classifier to distinguish whether the domain-specific features are really domain-specific features to obtain the domain classification loss. The domain classification loss is actually the domain-specific feature extractor The accuracy of the extraction result; then the domain classification loss will pass through the gradient reversal layer during the back propagation process to the domain specific feature extractor, so that the gradient direction of the domain classification loss in the back propagation process is automatically reversed to "Confusion" domain specific feature extractor. Since the domain classification loss is automatically inverted during the backpropagation process, the goal of the domain classifier is actually to confuse the domain-specific feature extractor; the goal of the specific feature extractor is to ensure that the extracted features are domain-specific Features, through this confrontation strategy between the domain classifier and the domain-specific feature extractor, in order to finally achieve the purpose of improving the accuracy of the domain-specific feature extractor to extract the domain-specific features.

This application introduces domain classification loss, which helps to extract domain invariant features from the features of training data.

In an embodiment of the present application, in order to further promote the decoupling of domain-invariant features and domain-specific features to contain all the feature information of the training data, so as to improve the completeness and rationality of the feature decoupling, the training data can be obtained first Extract the initial features, decompose the initial features into domain-invariant features and domain-specific features, and then train the neural network to reduce the difference between the information contained in the initial features and the information jointly contained in the domain-invariant features and domain-specific features .

Specifically, as shown in Figure 9, after the domain-specific features and domain-invariant features are extracted, the initial features can be reconstructed using the domain-invariant features and domain-specific features to obtain the reconstructed features, and then compare the initial features and the original features. Reconstruct the features to determine the difference between the information contained in the initial features and the information contained in the domain invariant features and domain-specific features, that is, reconstruction loss; then use the reconstruction loss to train the neural network to make the domain The domain invariant features extracted by the invariant features and the domain specific features extracted by the domain specific feature extractor can better cover the feature information of the training data.

This application reduces the difference between the information contained in the initial features and the information jointly contained in the domain invariant features and domain specific features, so that the decoupled domain invariant features and domain specific features can contain all the features of the training data Information to improve the completeness and rationality of feature decoupling.

The following further describes the extraction process of domain invariant features and domain specific features in the embodiments of the present application in conjunction with FIG. 10 and FIG. 11.

FIG. 10 is a schematic structural diagram of a neural network provided by an embodiment of this application. As shown in Figure 10, the neural network includes a first decoupler U1 and a second decoupler U2, through the joint action of the first decoupler U1 and the second decoupler U2 to complete the domain invariant features and domain specific Feature extraction process. FIG. 11 is a schematic diagram of the extraction process of domain invariant features and domain specific features based on the neural network architecture shown in FIG. 10 according to an embodiment of the application. As shown in Figure 11, the extraction process of the domain invariant features and domain specific features may include the following steps:

Step 1101: Extract the first feature of the training data from the training data.

As shown in Figure 10, the neural network includes a feature extractor

Feature extractor

Used to specifically complete the extraction of the first feature from the training data

First feature

It is the feature basis for subsequent domain invariant feature enhancement. It should be understood that the qualifier "first" in the first feature means the feature extractor

It is the result of "preliminary" feature extraction on the training data. For example, when the training data is image data, the first feature is actually the result of feature extraction on the image texture level.

Step 1102: Use the first decoupler U1 to extract preliminary domain invariant features and preliminary domain specific features from the first feature.

The first decoupler U1 includes a domain invariant feature extractor

And domain specific feature extractor

Respectively used to extract the invariant features of the preliminary domain

And preliminary domain-specific features

Preliminary domain invariant features

And preliminary domain-specific features

The respective extraction process can be expressed by the following formula (4):

In an embodiment of the present application, as shown in FIG. 10, the first decoupler U1 can be trained using mutual information loss to ensure the extraction accuracy of the preliminary domain invariant features and the preliminary domain specific features. As mentioned earlier, the mutual information (MI) loss characterizes the interdependence of two variables, and the mutual information loss here characterizes the invariant features of the preliminary domain.

And preliminary domain-specific features

difference between. Therefore, adjusting the weight vector of the network structure in the first decoupler U1 according to the mutual information loss can make the extracted preliminary domain invariant features

Can be better with preliminary domain-specific features

Distinguish, play a role in forcing the decoupling of features. If the mutual information loss is large, it means that the current preliminary domain invariant features

And preliminary domain-specific features

Is more relevant, that is, the current domain invariant feature extractor

The extracted features may still include preliminary domain-specific features

At this time, the weight vector of the network structure of the first decoupler U1 needs to be adjusted to reduce the mutual information loss.

In an embodiment of the present application, as shown in FIG. 10, in order to further improve the preliminary domain invariant feature

For the extraction accuracy of the first decoupler U1, the domain classifier and the gradient reversal layer (GRL) can also be used. Through domain classifier and domain specific feature extractor

Between adversarial strategies to improve domain-specific feature extractors

For preliminary domain specific features

The accuracy of extraction, combined with the training process of mutual information loss to indirectly improve the initial domain invariant features

The purpose of the extraction accuracy.

Step 1103: The preliminary domain invariant feature is merged with the first feature to obtain the second feature.

Invariant features

With the first feature

The second feature F ¹ of the fusion process can be expressed by the following formula (5):

It should be understood that the specific method of feature fusion can be selected according to the requirements of actual application scenarios. For example, on the basis of keeping the number of channels unchanged, the initial domain invariant feature

With the first feature

Superimpose to form the second feature with the same number of channels; the initial domain invariant feature can also be

With the first feature

The second feature of increasing the number of channels is formed by "splicing" in a connected manner. This application does not strictly limit the specific implementation of the fusion process.

Due to preliminary domain invariant features

Includes the domain invariant feature information corresponding to the instance, so the preliminary domain invariant feature

With the first feature

Fusion realizes the data enhancement of domain-invariant features at the first feature level, so that the first feature

It can include more domain invariant feature information, so that the trained neural network can better adapt to domain differences in actual application scenarios.

Step 1104: Extract the third feature of the training data from the second feature.

As shown in Figure 10, the neural network also includes a feature extractor

Feature extractor

Used to specifically complete the extraction of the third feature ^{from the second feature F 1}

Third feature

It serves as the basis for the extraction of subsequent domain invariant features and domain-specific features. It should be understood that the qualifier "third" in the third feature means that the third feature is extracted based on the second feature including the first feature, and the extraction process will be more refined, for example, when the training data is For image data, the third feature may be an extracted feature map that represents the semantic level of the image. The feature extraction process can be expressed by the following formula (6):

Step 1105: Use the second decoupler U2 to extract domain-invariant features and domain-specific features from the third feature.

The second decoupler U2 includes a domain invariant feature extractor

And domain specific feature extractor

Respectively used to extract domain invariant features

And domain-specific features

Domain invariant features

And domain-specific features

The respective extraction process can be expressed by the following formula (7):

As shown in Figure 10, after obtaining the domain invariant feature

After that, you can use domain invariant features

Perform tasks to obtain task loss, and calculate mutual information (MI) loss between domain-invariant features and domain-specific features. As mentioned earlier, the domain invariant feature

Used to perform tasks and obtain task losses, which can improve domain invariant characteristics

The accuracy and completeness of the instance characterization related to the task. At the same time, in order to ensure domain invariant characteristics

It can correspond to the instance more accurately, and it can also calculate the domain invariant features during the training process

And domain-specific features

Mutual information loss between the two, and use the mutual information loss to further improve the domain invariant features

Accuracy of extraction. In an embodiment of the present application, in the training process of the neural network based on task loss and mutual information loss, the first decoupler U1 is used to extract preliminary domain invariant features

Domain invariant feature extractor

And/or used to extract preliminary domain-specific features

Domain specific feature extractor

It can also participate in the parameter tuning process in the training process to ensure that the first decoupler U1 has invariant features for the preliminary domain.

The extraction accuracy of, thereby further improving the data enhancement effect of the domain invariant feature realized by the first decoupler U1.

In an embodiment of the present application, as shown in FIG. 10, in order to further improve the domain invariant feature

The extraction accuracy of the domain classifier and gradient reversal layer can also be used in the second decoupler U2. Through domain classifier and domain specific feature extractor

Between adversarial strategies to improve domain-specific feature extractors

For domain specific features

The extraction accuracy of, thus combined with the training process based on mutual information loss to indirectly improve the domain invariant features

The purpose of the extraction accuracy.

In an embodiment of the present application, in order to further promote the decoupling of the domain invariant features

And domain-specific features

It can contain all the feature information of the training data to improve the completeness and rationality of feature decoupling, and it can train the neural network to reduce the third feature

Contained information and domain invariant features

And domain-specific features

The difference between the information contained in common. Specifically, as shown in Figure 10, the domain invariant feature

And domain-specific features

After being extracted, domain invariant features can be used

And domain-specific features

To the third feature

Perform reconstruction, get the reconstructed feature, and then compare the third feature

And this reconstruction feature to determine the third feature

Contained information and domain invariant features

And domain-specific features

The difference between the information contained in common is the reconstruction loss. In an embodiment of the present application, the calculation process of reconstruction loss can be expressed by the following formula (8):

Among them, R represents the reconstruction network, F _r is the feature after reconstruction, and L _recon is the reconstruction loss, which is reflected in the reconstruction feature F _r and the third feature

The L2 distance. The reconstruction loss is used to train the neural network to make the domain invariant features

And domain-specific features

It can better cover the feature information of the training data.

It should be understood that although the qualifiers such as "first", "second" and "third" are used in the foregoing description, these qualifiers are only used to more clearly illustrate the technical solution and distinguish similar concepts. These qualifiers themselves cannot be used to limit the scope of protection of this application.

It can be seen that, in the embodiment of the present application, the concept of "two-layer domain invariant feature decoupling" can be used to train a neural network to extract domain invariant features. By first obtaining the first feature, and decoupling the preliminary domain invariant feature based on the first decoupler U1, the preliminary domain invariant feature is merged with the first feature to obtain the second feature, so that the domain invariant feature information is in the first A feature level has been enhanced. Then use this second feature to decouple the domain invariant features based on the second decoupler U2, and the decoupling accuracy of the domain invariant features is further enhanced, which can make the task execution performance of the trained neural network stronger and domain adaptive Ability is also better.

With reference to Figure 10 and Figure 11, the training process related to the neural network is described in detail. As can be seen from the description of Figure 10 and Figure 11, in some implementations, the training process of the neural network can include: (1) training related to task loss and domain classification loss; (2) related to mutual information loss Training; (3) Training related to reconstruction loss.

The above-mentioned three kinds of training can be carried out at the same time or carried out in stages, which is not limited in the embodiment of the present application. In the following, the training sequence of the above three types of training will be illustrated with examples in conjunction with FIG. 12.

As shown in Figure 12, the training process of the neural network can be divided into the following three stages in sequence.

The first stage: Control the neural network to perform training related to task loss and domain classification loss. This training stage aims to allow the neural network to learn the ability to decompose domain-invariant features and domain-specific features from training data. Therefore, the first The stage can also be called the feature decomposition stage (referred to as stage-fd, fd stands for feature decomposition).

The second stage: Control the neural network to perform training related to the loss of mutual information. This training stage aims to allow the first neural network to learn the ability to increase the difference between domain-invariant features and domain-specific features. Therefore, the second phase also It can be called the feature decomposition stage (referred to as stage-fs, fs stands for feature separation). In the second stage, the

The parameters of is fixed, or you can change the

and

The parameters are fixed.

The third stage: Control the neural network to perform training related to reconstruction loss. This training stage aims to make the domain-invariant features and domain-specific features decomposed by the neural network contain all the information in the initial features as much as possible. Therefore, the third stage It can also be called the feature reconstruction stage (referred to as stage-fr, fr stands for feature reconstruction).

FIG. 13 is a schematic structural diagram of a neural network provided by an embodiment of this application. The neural network is trained by using the training method provided in the above-mentioned embodiment of the present application. As shown in FIG. 13, the neural network 130 includes:

The first feature extraction layer 1301 is used to extract first features based on input data.

The first domain invariant feature decoupling layer 1302 is used to extract the first domain invariant feature based on the first feature.

The feature fusion layer 1303 is used to fuse the first feature and the invariant feature of the first domain to obtain the second feature.

The second feature extraction layer 1304 is used to extract a third feature based on the second feature.

The second domain invariant feature decoupling layer 1305 is used to extract the second domain invariant feature based on the third feature.

The first domain invariant feature and the second domain invariant feature are respectively features that characterize the field to which the input data belongs, and the first domain specific feature and the second domain specific feature are respectively features that have nothing to do with the field to which the input data belongs.

It can be seen that although in the training process shown in Figure 10 and Figure 11, by extracting domain specific features to calculate mutual information loss and domain classification loss, the neural network has the ability to decompose domain invariant features and domain specific features, However, in actual use, the trained neural network shown in Figure 13 does not actually need to extract domain-specific features. After the first feature is extracted by the first feature extraction layer 1301, the first domain invariant characteristics are extracted based on the first feature, and the domain invariant feature enhancement is realized by fusion with the first feature; and then based on the second feature The features further extract the invariant features of the second domain, and the extracted invariant features of the second domain can accurately correspond to the examples, so that the neural network has stronger performance when performing specific tasks and better domain adaptability.

In an embodiment of the present application, considering that in the scenario of domain adaptive learning, traditional training data often comes from the source domain and/or target domain, and domain adaptive learning actually solves the cross-domain migration capability of neural networks. To improve the domain generalization ability of the neural network, it is not only necessary to train based on the feature information of the source domain, but also based on the feature information of the target domain. Therefore, when training the neural network, add the training data between the source domain and the target The training data of the intermediate domain between domains. By generating training data located in the intermediate domain, the "domain gap" between the source domain and the target domain is filled, and the problem of large distribution differences between the training data of the source domain and the training data of the target domain is alleviated.

FIG. 14 is a schematic diagram of a process for obtaining data of an intermediate domain according to an embodiment of the application. FIG. 15 is a schematic diagram of a principle for obtaining data of an intermediate domain provided by an embodiment of this application. As shown in FIG. 14 and FIG. 15, the process of obtaining data of the intermediate domain may include the following steps:

Step 1401: Obtain data of the source domain and/or data of the target domain.

The source domain and the target domain are two domains with differences in data characteristics, and the difference in data characteristics between the intermediate domain and any one of the source domain and the target domain is smaller than the difference in data characteristics between the source domain and the target domain. The data of the intermediate domain is actually generated by adding disturbances on the basis of the data of the source domain and/or the data of the target domain. Therefore, the data of the source domain and/or the data of the target domain must be obtained first.

Step 1402: Input the data of the source domain and/or the data of the target domain into the neural network for training, so as to obtain gradient information of the loss function.

Since the data to be generated is the intermediate domain between the source domain and the target domain, it is necessary to obtain the gradient information of the loss function to guide the subsequent perturbation process to generate the intermediate domain data.

Step 1403: Perturb the data of the source domain and/or the data of the target domain according to the gradient information to obtain the data of the intermediate domain.

Perturb the data in the source domain or the data in the target domain to generate new data, and these newly generated data can be used as the data in the intermediate domain.

In this application, the introduction of directional information between the source domain and the target domain makes the perturbation of the data more targeted. The data of the intermediate domain obtained through the disturbance can fill the "domain gap" between the source domain and the target domain, and alleviate There is a big difference between the distribution of the data in the source domain and the data in the target domain. In an embodiment of the present application, the data of the source domain, the data of the target domain, and the data of the intermediate domain can be used as training data to train the neural network, so that the trained neural network can have better domain adaptability.

In an embodiment of the present application, as shown in FIG. 15, the labeled data X _{s of the} source domain can be input to the neural network TNet for training to obtain the gradient information of the loss function.

Specifically, the neural network TNet is _{generated by training based on labeled data X l of the} target domain, and may include a feature extractor _FT and a classifier _CT . In the training process, _{the feature information P T} extracted by the feature extractor _FT is input into the classifier C _T to obtain the cross-entropy loss L _{ce of the} classification task to guide the tuning process of TNet. Since the neural network TNet is _{calculated based on the input X l to} calculate the task loss and adjust the network parameters, the neural network TNet is actually more suitable for the target domain. In this way, _{input X s} into the neural network TNet will generate the source domain The first gradient information to the target domain. At this time, X _{s is regarded} _{as an object that can be optimized, and a certain magnitude of gradient disturbance is superimposed on X s} according to the first gradient information of the task loss backpropagation, and the new disturbance after this disturbance from the source domain to the target domain is superimposed. The samples can be used as intermediate domain data, as shown in AAT in Figure 15.

In this application, the neural network TNet is generated by training the labeled data of the target domain. Therefore, the first gradient information obtained after inputting the labeled data of the source domain into the neural network can be very good. A good measure of the direction from the source domain to the target domain.

In another embodiment of the present application, as shown in FIG. 15, unlabeled data X _{u of the} target domain can be input into the neural network HNet. Since X _u is unlabeled, virtual confrontation training can be used to obtain the gradient information.

Specifically, the neural network HNet may be generated by training _{based on the labeled data X s of the source domain.} And TNet architecture similar, hNET may include a feature extractor F _H and classifier C _H, the process of training, the feature extractor F _H extracted feature information P _H is input to the classifier C _H to obtain a classification The task's cross-entropy loss L _{ce is} used to guide the HNet parameter tuning process. X _{s is} input to HNet to calculate the task loss, and the network parameters of HNet are updated according to the task loss. In a further embodiment, the labeled data X _{l of the} target domain can also be used to train the neural network HNet together with the labeled data X _{s of the} source domain to further improve the accuracy of the neural network HNet executing tasks.

After the data X _{u of the} unlabeled target domain is input into HNet, the virtual confrontation training method is used to generate the predicted virtual label, the task loss is calculated based on the virtual label, and the second gradient information back-propagated based on the task loss is generated _{on X u} With a certain magnitude of gradient disturbance, the new sample after superimposing this disturbance from the target domain to the source domain can be used as the intermediate domain data, as shown in E-VAT in Figure 15.

In this application, the neural network HNet is generated from the labeled data of the source domain and the labeled data of the target domain. Therefore, the unlabeled data of the target domain is input into the neural network and then undergoes virtual confrontation training. The obtained back-propagated second gradient information can be a good measure of the direction from the target domain to the source domain.

In another embodiment of the present application, when the data of the source domain and the data of the target domain are both labeled, the labeled data X _l of the target domain may also be input into an auxiliary neural network to obtain the loss function Gradient information.

The auxiliary neural network is generated based on the labeled data X _{s of the source domain.} Since the auxiliary neural network is _{calculated based on the input X s to} calculate the task loss and adjust the network parameters, the auxiliary neural network is actually more suitable for the source domain. _{Inputting X l} into the auxiliary neural network will generate the transition from the target domain to the target domain. The gradient information of the source domain. At this time, X _{l is regarded} _{as an object that can be optimized, and a certain magnitude of gradient disturbance is superimposed on X l} according to the gradient information of the task loss backpropagation. The new sample after superimposing this disturbance from the target domain to the source domain can also be used. Used as intermediate domain data.

It can be seen that the embodiment shown in Figure 15 actually proposes a "two-way confrontation training" method to generate data in the intermediate domain, that is, the gradient information of the network is used to guide the perturbation direction of the sample, and the sample generated after superimposed perturbation is used as The data of the intermediate domain. For example, as shown in Figure 16, circles and triangles represent different sample categories. The gradient information can be used to obtain the perturbation direction from the source domain to the target domain (as shown by the direction of the arrow from left to right in Figure 16), and then in the source domain Perturbation is added to the data to generate the data of the intermediate domain; at the same time, the gradient information can also be used to obtain the perturbation direction from the target domain to the source domain (as shown by the arrow direction from right to left in Figure 16), and then in the target domain Perturbations are added to the data to generate intermediate domain data. Specifically, the auxiliary network obtained through training can give the gradient direction from the source domain to the target domain or from the target domain to the source domain, and use the gradient direction to perturb the data of the source domain or the data of the target domain to generate a confrontation Samples; virtual confrontation training can also be used to generate confrontation samples from the target domain to the source domain, so as to generate confrontation samples in the "domain gap" between the source domain and the target domain in both directions to construct the intermediate domain.

However, it should be understood that according to different scenarios of domain adaptive learning, it is also possible to obtain only the data superimposed with the disturbance from the source domain to the target domain as the data of the intermediate domain, or only obtain the data superimposed with the disturbance from the target domain to the source domain As the data of the intermediate domain. For example, in the scenario of unsupervised learning, the data of the target domain is not labeled, so it is impossible to _{train the neural network TNet based on the labeled data X l of the} target domain. In this case, only the superimposed data from the target domain to The disturbed data in the source domain may be used as the data in the intermediate domain.

In an embodiment of this application, the acquired data of the intermediate domain, together with the data of the source domain and the data of the target domain, can be input into the neural network shown in FIG. 9 and pass the training provided by the embodiment of this application The neural network is trained in the way of neural network, so as to realize the combination of "two-way confrontation training" and "two-layer domain invariant feature decoupling". Since the data of the feature decoupling includes the data of the intermediate domain, the data of the source domain and the data of the target domain can be effectively supplemented, the difference between the source domain and the target domain is reduced, and the data of the intermediate domain is used as the training data. The training of feature decoupling can greatly improve the domain invariant feature decoupling ability, so that the domain generalization performance and cross-domain migration ability of the trained neural network are more significantly improved.

In an embodiment of the present application, in order to further improve the robustness of the trained neural network when performing tasks, as shown in Figure 15, the neural network HNet generated by training _{based on the labeled data X s of the source domain} Later, random noise disturbances can also be _{generated near X s} , and these noise disturbances are correspondingly superimposed on X _s to generate adversarial samples in the neighborhood. And the adversarial samples of the neighborhood are also input into the neural network for training as part of the training data. In an embodiment of the present application, the adversarial samples in the neighborhood can be input into HNet, and the feature extractor F _{H in} HNet based on the feature map extracted by the adversarial samples in the neighborhood is input into the classifier C _H to obtain The cross-entropy loss L _{at of the} classification task guides the adjustment process of HNet's network parameters, so that HNet can be further trained. In a further embodiment, when the labeled data X _{l of the} target domain also participates in the training of the neural network HNet, _{random noise disturbances can also be generated near X l} , and these noise disturbances are correspondingly superimposed on X _l to supplement the adversarial samples in the neighborhood.

It can be seen that the embodiment of the present application can also generate adversarial samples in the neighborhood based on the data of the source domain and the target domain, so as to effectively supplement the data of the source domain and the target domain, and reduce the difference between the source domain and the target domain. The difference makes the domain generalization performance and cross-domain migration ability of the trained neural network further improved.

FIG. 17 is a schematic structural diagram of a data processing system provided by an embodiment of this application. As shown in FIG. 17, the data processing system 170 is used to train a neural network, and includes: a data acquisition network 1701 and a feature decoupling network 1702.

The data acquisition network 1701 is used to acquire the gradient information of the loss function based on the first data, and perturb the input data according to the gradient information to acquire the second data. By acquiring the adversarial sample that fills the "domain gap" of the first data as the new first data Two data, so that the training process can have better domain adaptability.

The feature decoupling network 1702 is used to train a neural network according to the training data including the second data, so that the neural network learns to decompose domain invariant features and domain specific features from the training data.

In an embodiment of the present application, the feature decoupling network 1702 includes: a first feature extraction layer 17021 for extracting a first feature based on training data; a first domain invariant feature extraction layer 17022 for extracting a first feature based on the first feature A domain invariant feature; the first domain specific feature extraction layer 17023, used to extract the first domain specific feature based on the first feature; the first mutual information loss acquisition layer 17024, based on the first domain invariant feature and the first domain Specific features obtain the first mutual information loss; the feature fusion layer 17025 is used to fuse the first feature and the first domain invariant feature to obtain the second feature; the second feature extraction layer 17026 is used to extract the third feature based on the second feature Features; the second domain invariant feature decoupling layer 17027 is used to extract the second domain invariant features based on the third feature; the second domain specific feature extraction layer 17028 is used to extract the second domain specific features based on the third feature; The second mutual information loss acquisition layer 17029 is used to acquire the second mutual information loss based on the invariant features of the second domain and the specific characteristics of the second domain; the task loss acquisition layer 17030 is used to perform tasks using the invariant features of the second domain to acquire tasks loss.

In an embodiment of the present application, the data processing system 170 may further include: a first domain classifier 17031, configured to perform a classification task based on a specific feature of the first domain to obtain a first classification loss; a first gradient reversal layer 17032, Used to reverse the gradient information of the first classification loss;

And/or, the data processing system 170 may further include: a second domain classifier 17033, configured to perform a classification task based on specific features of the second domain to obtain the second classification loss; and a second gradient reversal layer 17034, configured to convert the second domain The gradient information of the binary loss is inverted.

In an embodiment of the present application, the data processing system 170 may further include: a reconstruction loss acquisition layer 17035 for reconstructing the third feature using the second domain invariant feature and the second domain specific feature to obtain the reconstructed feature; The third feature and the reconstruction feature are used to obtain the reconstruction loss.

In an embodiment of the present application, the first data includes data of the source domain and/or data of the target domain. The data acquisition network 1701 includes: a first training network generated based on the labeled data training of the target domain; and/or a second training network generated based on the labeled data training. In an embodiment of the present application, the first training network or the second training network may include a feature extractor and a classifier. In the training process, the feature information extracted by the feature extractor is input into the classifier to obtain the cross-entropy loss of the classification task to guide the parameter adjustment process of the first training network or the second training network.

The specific functions and operations of the various modules in the above-mentioned data processing system 170 have been described in detail in the neural network training method described above, and therefore, repeated descriptions thereof will be omitted here.

It can be seen that the data processing system 170 shown in FIG. 17 realizes the combination of "adversarial training to fill the domain gap" and "two-layer domain invariant feature decoupling". Since the feature-decoupled training data includes data that can fill the gap in the field of the first data, it can effectively complement the original training data, reduce the difference between training data in different fields, and use data to obtain network output The training of feature decoupling on data can greatly improve the domain invariant feature decoupling ability, so that the domain generalization performance and cross-domain migration ability of the trained neural network are more significantly improved.

FIG. 18 is a schematic structural diagram of a neural network training device provided by an embodiment of the application. As shown in Fig. 18, the neural network training device 180 includes:

The obtaining module 1801 is configured to obtain training data;

The training module 1802 is configured to use training data to train the neural network, so that the neural network learns to decompose domain invariant features and domain specific features from the training data.

The neural network training device 180 provided by the embodiment of this application decomposes domain invariant features and domain specific features from training data. Since the neural network obtained by the training method of this application uses domain invariant features to perform tasks, this avoids The influence of domain-specific features on the neural network is improved, and the migration performance of the neural network between different fields is improved.

In an embodiment of the present application, the training module 1802 is configured to decompose domain-invariant features and domain-specific features from the training data; use the domain-invariant features to perform tasks, obtain the task loss, and calculate the domain-invariant features and domain-specific features. Mutual information loss between features. Mutual information loss is used to represent the difference between domain invariant features and domain-specific features; according to task loss and mutual information loss, neural networks are trained.

In an embodiment of the present application, the training module 1802 is further configured to perform domain classification using domain-specific features to obtain domain classification loss; and train a neural network based on task loss, mutual information loss, and domain classification loss.

In an embodiment of the present application, the training module 1802 is further configured to extract initial features from the training data; decompose the initial features into domain-invariant features and domain-specific features; and train a neural network to reduce the information contained in the initial features The difference between the information contained together with the domain invariant feature and the domain specific feature.

In an embodiment of the present application, the training module 1802 is configured to reconstruct the initial features using domain-invariant features and domain-specific features to obtain reconstructed features; compare the initial features and the reconstructed features to determine the information and domains contained in the initial features The difference between the information contained in invariant features and domain-specific features.

In an embodiment of the present application, the training module 1802 is further configured to reconstruct the initial features using domain invariant features and domain specific features to obtain reconstructed features, where the domain invariant features and domain specific features are decomposed from the initial features The characteristics of; compare the initial feature and the reconstruction feature to obtain the reconstruction loss. The reconstruction loss is used to characterize the difference between the information contained in the initial feature and the information contained in the domain invariant feature and the domain specific feature.

The training module is configured to train the neural network in the first stage according to the task loss; train the neural network in the second stage according to the mutual information loss; and train the neural network in the third stage according to the reconstruction loss.

In an embodiment of the present application, the neural network includes a first decoupler and a second decoupler, and the training module 1602 is configured to extract the first feature of the training data from the training data; Extract the preliminary domain invariant features and preliminary domain specific features from the features; fuse the preliminary domain invariant features with the first feature to obtain the second feature; extract the third feature of the training data from the second feature; use the second decoupler Extract domain-invariant features and domain-specific features from the third feature.

In an embodiment of the present application, the training module 1802 is further configured to train a neural network to reduce the difference between the information contained in the third feature and the information jointly contained in the domain invariant feature and the domain specific feature.

FIG. 19 is a schematic structural diagram of a data acquisition device provided by an embodiment of this application. As shown in FIG. 19, the data acquisition device 190 includes:

The data acquisition module 1901 is configured to acquire the data of the source domain and/or the data of the target domain; among them, the source domain and the target domain are two domains with different data characteristics, and the intermediate domain and any one of the source domain and the target domain The difference in data characteristics between is smaller than the difference in data characteristics between the source domain and the target domain;

The gradient information acquisition module 1902 is configured to input data of the source domain and/or data of the target domain into the neural network for training, so as to acquire gradient information of the loss function;

The intermediate domain data generating module 1903 is configured to perturb the data of the source domain and/or the data of the target domain according to the gradient information to obtain the data of the intermediate domain.

In an embodiment of the present application, the gradient information acquisition module 1902 is configured to input the labeled data of the source domain into the first neural network for training to obtain the first gradient information, where the first neural network is based on the target domain The labeled data is generated by training.

In an embodiment of the present application, the gradient information acquisition module 1902 is configured to input unlabeled data of the target domain into the second neural network, and perform training in a virtual confrontation training manner to obtain the second gradient information, where the second neural network The network is generated based on training with labeled data.

The specific functions and operations of each module in the neural network training device 180/190 have been described in detail in the neural network training method described above, and therefore, repeated descriptions thereof will be omitted here.

FIG. 20 is a schematic diagram of the hardware structure of a neural network training device provided by an embodiment of the application. The neural network training device 2000 shown in FIG. 20 (the device 2000 may specifically be a computer device) includes a memory 2001, a processor 2002, a communication interface 2003, and a bus 2004. Among them, the memory 2001, the processor 2002, and the communication interface 2003 realize the communication connection between each other through the bus 2004.

The memory 2001 may be a read only memory (ROM), a static storage device, a dynamic storage device, or a random access memory (RAM). The memory 2001 may store a program. When the program stored in the memory 2001 is executed by the processor 2002, the processor 2002 and the communication interface 2003 are used to execute each step of the neural network training method of the embodiment of the present application.

The processor 2002 can adopt a general-purpose central processing unit (central processing unit, CPU), microprocessor, application specific integrated circuit (application specific integrated circuit, ASIC), graphics processing unit (GPU), or one or more The integrated circuit is used to execute related programs to realize the functions required by the units in the neural network training device of the embodiment of the present application, or to execute the neural network training method of the method embodiment of the present application.

The processor 2002 may also be an integrated circuit chip with signal processing capability. In the implementation process, each step of the neural network training method of the present application can be completed by the integrated logic circuit of hardware in the processor 2002 or instructions in the form of software. The aforementioned processor 2002 may also be a general-purpose processor, a digital signal processing (digital signal processing, DSP), an application-specific integrated circuit (ASIC), a ready-made programmable gate array (field programmable gate array, FPGA) or other programmable logic devices , Discrete gates or transistor logic devices, discrete hardware components. The methods, steps, and logical block diagrams disclosed in the embodiments of the present application can be implemented or executed. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like. The steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor. The software module can be located in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers. The storage medium is located in the memory 2001, and the processor 2002 reads the information in the memory 2001, and combines its hardware to complete the functions required by the units included in the neural network training device of the embodiment of the present application, or perform the functions of the method embodiment of the present application. Training method of neural network.

The communication interface 2003 uses a transceiver device such as but not limited to a transceiver to implement communication between the device 2000 and other devices or a communication network. For example, the training data can be obtained through the communication interface 2003.

The bus 2004 may include a path for transferring information between various components of the device 2000 (for example, the memory 2001, the processor 2002, and the communication interface 2003).

It should be understood that the acquisition module 1801 and the training module 1802 in the neural network training device 180, or the data acquisition module 1901, the gradient information acquisition module 1902, the intermediate domain data generation module 1903, and the training execution module in the neural network training device 190 1904 can be equivalent to processor 2002.

It should be noted that although the device 2000 shown in FIG. 20 only shows a memory, a processor, and a communication interface, in the specific implementation process, those skilled in the art should understand that the device 2000 also includes other devices necessary for normal operation. . At the same time, according to specific needs, those skilled in the art should understand that the device 2000 may also include hardware devices that implement other additional functions. In addition, those skilled in the art should understand that the device 2000 may also include only the components necessary to implement the embodiments of the present application, and does not necessarily include all the components shown in FIG. 20.

It can be understood that the apparatus 2000 is equivalent to the training device 220 in FIG. 2. A person of ordinary skill in the art may be aware that the units and algorithm steps of the examples described in combination with the embodiments disclosed herein can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered as going beyond the scope of this application.

Those skilled in the art can clearly understand that, for the convenience and conciseness of description, the specific working process of the above-described system, device, and unit can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.

In the several embodiments provided in this application, it should be understood that the disclosed system, device, and method can be implemented in other ways. For example, the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or It can be integrated into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

In addition, the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.

If the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the technical solution of this application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory, random access memory, magnetic disk or optical disk and other media that can store program codes.

The above are only specific implementations of this application, but the protection scope of this application is not limited to this. Any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed in this application. Should be covered within the scope of protection of this application. Therefore, the protection scope of this application should be subject to the protection scope of the claims.

Claims

A neural network training method, which is characterized in that it includes:

Obtain training data;

Training a neural network using the training data, so that the neural network learns to decompose domain invariant features and domain specific features from the training data;

Wherein, the domain-specific feature is a feature that characterizes the domain to which the training data belongs, and the domain invariant feature is a feature that has nothing to do with the domain to which the training data belongs.
The method according to claim 1, wherein the training a neural network using the training data comprises:

Decompose domain-invariant features and domain-specific features from the features of the training data;

Use the domain invariant feature to perform a task to obtain a task loss, and calculate the mutual information loss between the domain invariant feature and the domain specific feature, and the task loss is used to characterize the execution using the domain invariant feature The gap between the result obtained by the task and the task label, and the mutual information loss is used to represent the difference between the domain invariant feature and the domain specific feature;

Training the neural network according to the task loss and the mutual information loss.
The method according to claim 2, further comprising:

Use the domain specific features to perform domain classification to obtain a domain classification loss;

Wherein, the training of the neural network according to the task loss and the mutual information loss includes:

Training the neural network according to the task loss, the mutual information loss, and the domain classification loss.
The method according to claim 2 or 3, wherein the decomposing domain-invariant features and domain-specific features from the features of the training data comprises:

Extract initial features from the training data;

Decompose the initial feature into the domain invariant feature and the domain specific feature,

Wherein, the method further includes:

The neural network is trained to reduce the difference between the information contained in the initial feature and the information contained in the domain invariant feature and the domain specific feature.
The method according to claim 4, wherein the neural network is trained to reduce the information contained in the initial feature and the information contained in the domain invariant feature and the domain specific feature Before the differences, include:

Using the domain invariant feature and the domain specific feature to reconstruct the initial feature to obtain a reconstructed feature;

The initial feature and the reconstructed feature are compared to determine the difference between the information contained in the initial feature and the information contained in the domain invariant feature and the domain specific feature.
The method according to claim 2, further comprising:

Using the domain invariant feature and the domain specific feature to reconstruct an initial feature to obtain a reconstructed feature, wherein the domain invariant feature and the domain specific feature are features decomposed from the initial feature;

The initial feature and the reconstruction feature are compared to obtain a reconstruction loss, and the reconstruction loss is used to characterize the difference between the information contained in the initial feature and the information contained in the domain invariant feature and the domain specific feature. difference,

Wherein, the training of the neural network according to the task loss and the mutual information loss includes:

According to the task loss, perform the first stage of training on the neural network;

According to the mutual information loss, the neural network is trained in the second stage,

Wherein, the method further includes:

According to the reconstruction loss, the neural network is trained in the third stage.
The method according to claim 2 or 3, wherein the neural network includes a first decoupler and a second decoupler, and the domain invariant features and domains are decomposed from the features of the training data. Specific characteristics, including:

Extracting the first feature of the training data from the training data;

Extracting preliminary domain invariant features and preliminary domain specific features from the first feature by using the first decoupler;

Fusing the preliminary domain invariant feature with the first feature to obtain a second feature;

Extracting the third feature of the training data from the second feature;

A second decoupler is used to extract the domain invariant feature and the domain specific feature from the third feature.
The method according to claim 7, further comprising:

The neural network is trained to reduce the difference between the information contained in the third feature and the information contained in the domain invariant feature and the domain specific feature.
The method according to any one of claims 1 to 8, wherein the neural network is used for domain adaptive learning, and the training data includes image data in different fields.
A data acquisition method, characterized in that it comprises:

Obtain the data of the source domain and/or the data of the target domain;

Input the data of the source domain and/or the data of the target domain into the neural network for training, so as to obtain gradient information of the loss function;

Perturb the data of the source domain and/or the data of the target domain according to the gradient information to obtain the data of the intermediate domain;

Wherein, the source domain and the target domain are two domains with differences in data characteristics, and the difference in data characteristics between the intermediate domain and any one of the source domain and the target domain is less than that of the The difference in data characteristics between the source domain and the target domain.
The method according to claim 10, wherein the inputting the data of the source domain and/or the data of the target domain into a neural network for training to obtain gradient information of a loss function comprises:

The labeled data of the source domain is input into a first neural network for training to obtain first gradient information, where the first neural network is generated based on the labeled data of the target domain through training.
The method according to claim 10, wherein the inputting the data of the source domain and/or the data of the target domain into a neural network for training to obtain gradient information of a loss function comprises:

Input the unlabeled data of the target domain into a second neural network, and perform training in a virtual confrontation training manner to obtain second gradient information, wherein the second neural network is generated based on the labeled data training.
A neural network training device, which is characterized in that it comprises:

The acquisition module is configured to acquire training data;

A training module configured to use the training data to train a neural network, so that the neural network learns to decompose domain invariant features and domain specific features from the training data;

Wherein, the domain-specific feature is a feature that characterizes the domain to which the training data belongs, and the domain invariant feature is a feature that has nothing to do with the domain to which the training data belongs.
The device according to claim 13, wherein the training module is configured to decompose domain-invariant features and domain-specific features from the features of the training data; use the domain-invariant features to perform tasks to obtain Task loss, and calculate the mutual information loss between the domain invariant feature and the domain specific feature. The task loss is used to characterize the difference between the result of performing the task using the domain invariant feature and the task label. Gap, the mutual information loss is used to represent the difference between the domain invariant feature and the domain specific feature; training the neural network according to the task loss and the mutual information loss.
The apparatus according to claim 14, wherein the training module is further configured to perform domain classification using the domain specific features to obtain a domain classification loss; according to the task loss, the mutual information loss, and the The domain classification loss trains the neural network.
The device according to claim 14 or 15, wherein the training module is further configured to extract an initial feature from the training data; decompose the initial feature into the domain invariant feature and the domain Specific feature; training the neural network to reduce the difference between the information contained in the initial feature and the information contained in the domain invariant feature and the domain specific feature.
The apparatus according to claim 16, wherein the training module is configured to use the domain invariant feature and the domain specific feature to reconstruct the initial feature to obtain a reconstructed feature; compare the initial feature And the reconstruction feature to determine the difference between the information contained in the initial feature and the information contained in the domain invariant feature and the domain specific feature.
The apparatus according to claim 17, wherein the training module is further configured to use the domain invariant feature and the domain specific feature to reconstruct the initial feature to obtain a reconstructed feature, wherein the domain invariant The feature and the domain-specific feature are features decomposed from the initial feature; the initial feature and the reconstruction feature are compared to obtain a reconstruction loss, and the reconstruction loss is used to characterize the information contained in the initial feature and The difference between the information jointly contained in the domain invariant feature and the domain specific feature,

Wherein, the training module is configured to train the neural network in the first stage according to the task loss; train the neural network in the second stage according to the mutual information loss; The neural network performs the third stage of training.
The device according to claim 14 or 15, wherein the neural network comprises a first decoupler and a second decoupler, and the training module is configured to extract the training data from the training data Using the first decoupler to extract preliminary domain invariant features and preliminary domain specific features from the first feature; fusing the preliminary domain invariant features with the first feature to obtain the first feature Two features; extract the third feature of the training data from the second feature; use a second decoupler to extract the domain invariant feature and the domain specific feature from the third feature.
The device according to claim 19, wherein the training module is further configured to train the neural network to reduce the information contained in the third feature and the domain invariant feature and the domain The difference between the information contained in a particular feature.
A data acquisition device, characterized in that it comprises:

The data acquisition module is configured to acquire the data of the source domain and/or the data of the target domain;

A gradient information acquisition module, configured to input the data of the source domain and/or the data of the target domain into a neural network for training, so as to acquire gradient information of a loss function;

An intermediate domain data generating module configured to perturb the data of the source domain and/or the data of the target domain according to the gradient information to obtain the data of the intermediate domain;

Wherein, the source domain and the target domain are two domains with differences in data characteristics, and the difference in data characteristics between the intermediate domain and any one of the source domain and the target domain is less than that of the The difference in data characteristics between the source domain and the target domain.
The device according to claim 21, wherein the gradient information acquisition module is configured to input the labeled data of the source domain into the first neural network for training to obtain the first gradient information, wherein The first neural network is trained and generated based on labeled data of the target domain.
The device according to claim 21, wherein the gradient information acquisition module is configured to input unlabeled data of the target domain into a second neural network, and perform training in a virtual confrontation training manner to obtain a second gradient Information, wherein the second neural network is generated based on training with labeled data.
A neural network training device, including:

Memory, used to store programs;

The processor is configured to execute the program stored in the memory, and when the program stored in the memory is executed, the processor is configured to execute the neural network training method according to any one of claims 1-9 or The data acquisition method according to any one of claim 10 to claim 12.
A neural network, characterized in that it includes:

The first feature extraction layer is used to extract the first feature based on input data;

The first domain invariant feature decoupling layer is used to extract the first domain invariant feature based on the first feature;

The feature fusion layer is used to fuse the first feature and the invariant feature of the first domain to obtain a second feature;

The second feature extraction layer is used to extract a third feature based on the second feature;

The second domain invariant feature decoupling layer is used to extract the second domain invariant feature based on the third feature;

Wherein, the first domain invariant feature and the second domain invariant feature are respectively features that characterize the domain to which the input data belongs, and the first domain specific feature and the second domain specific feature are respectively and Features irrelevant to the field of the input data.
A data processing system is characterized in that it comprises:

A data acquisition network, configured to acquire gradient information of a loss function based on the first data, and perturb the first data according to the gradient information to acquire second data;

A feature decoupling network is used to train a neural network using training data including the second data, so that the neural network learns to decompose domain-invariant features and domain-specific features from the training data;

Wherein, the domain-specific feature is a feature that characterizes the domain to which the training data belongs, and the domain invariant feature is a feature that has nothing to do with the domain to which the training data belongs.
The data processing system according to claim 26, wherein the characteristic decoupling network comprises:

The first feature extraction layer is used to extract the first feature based on the training data;

The first domain invariant feature extraction layer is configured to extract the first domain invariant feature based on the first feature;

The first domain specific feature extraction layer is configured to extract the first domain specific feature based on the first feature;

The first mutual information loss acquisition layer is configured to acquire the first mutual information loss based on the invariant feature of the first domain and the specific feature of the first domain;

The feature fusion layer is used to fuse the first feature and the invariant feature of the first domain to obtain a second feature;

The second feature extraction layer is used to extract a third feature based on the second feature;

The second domain invariant feature decoupling layer is used to extract the second domain invariant feature based on the third feature;

The second domain specific feature extraction layer is configured to extract the second domain specific feature based on the third feature;

The second mutual information loss acquisition layer is configured to acquire the second mutual information loss based on the invariant feature of the second domain and the specific feature of the second domain;

The task loss acquisition layer is used to perform tasks using the invariant features of the second domain to obtain task losses.
The data processing system according to claim 27, further comprising:

A first domain classifier, configured to perform a classification task based on the specific characteristics of the first domain to obtain a first classification loss;

The first gradient inversion layer is used to reverse the gradient information of the first classification loss;

and / or,

A second domain classifier, configured to perform a classification task based on the specific characteristics of the second domain to obtain a second classification loss;

The second gradient reversal layer is used to reverse the gradient information of the second classification loss.
The data processing system according to claim 27 or 28, further comprising:

The reconstruction loss acquisition layer is used to reconstruct the third feature using the invariant feature of the second domain and the specific feature of the second domain to obtain a reconstructed feature; compare the third feature with the reconstructed feature to Get reconstruction loss.
The data processing system according to any one of claims 26 to 29, wherein the first data comprises data of a source domain and/or data of a target domain, wherein the data acquisition network comprises:

A first training network generated based on the labeled data training of the target domain;

and / or,

A second training network generated based on the labeled data training.
A security equipment, characterized by comprising the neural network according to claim 25.
A computer-readable storage medium, comprising instructions, which when run on a computer, cause the computer to execute the method according to any one of claims 1-12.
A computer program product containing instructions that, when run on a computer, causes the computer to execute the method according to any one of claims 1-12.