WO2021258967A1 - Neural network training method and device, and data acquisition method and device - Google Patents

Neural network training method and device, and data acquisition method and device Download PDF

Info

Publication number
WO2021258967A1
WO2021258967A1 PCT/CN2021/096019 CN2021096019W WO2021258967A1 WO 2021258967 A1 WO2021258967 A1 WO 2021258967A1 CN 2021096019 W CN2021096019 W CN 2021096019W WO 2021258967 A1 WO2021258967 A1 WO 2021258967A1
Authority
WO
WIPO (PCT)
Prior art keywords
domain
feature
data
training
neural network
Prior art date
Application number
PCT/CN2021/096019
Other languages
French (fr)
Chinese (zh)
Inventor
韩亚洪
姜品
武阿明
邵云峰
齐美玉
李秉帅
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021258967A1 publication Critical patent/WO2021258967A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Definitions

  • This application relates to the field of artificial intelligence, in particular to a neural network training method, data acquisition method and device.
  • Artificial intelligence is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge, and use knowledge to obtain the best results.
  • artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a similar way to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision-making and reasoning, human-computer interaction, recommendation and search, and basic AI theories.
  • neural networks trained by machine learning can be used to complete multiple tasks such as target classification/detection/recognition/segmentation/prediction.
  • training samples and test samples are likely to come from different domains, which will cause problems for the practical application of neural networks.
  • the source domain data may be a traffic scene image taken on a sunny day
  • the target domain data may be a traffic scene image taken on a foggy day.
  • the target detection model trained with source domain data is difficult to achieve good results in the target domain data scenario.
  • domain adaptation (DA) learning as an important research field of machine learning has received extensive attention in recent years.
  • Domain adaptive learning usually uses a distribution alignment method to align the probability distribution between the source domain and target domain data, so as to alleviate the adverse effects of domain deviation on the domain adaptive learning task. Since this distribution alignment process is only performed at the overall feature representation level, the domain adaptive learning task is inevitably affected by specific features in different fields. Therefore, the trained neural network still has the problem of poor migration performance.
  • This application provides a neural network training method, data acquisition method and device, which can better improve the migration performance of the neural network between different fields.
  • a neural network training method including: obtaining training data; training the neural network using the training data, so that the neural network learns to decompose domain invariant features and domains from the training data Specific features; wherein, the domain specific features are features that characterize the domain to which the training data belongs, and the domain invariant features are features that have nothing to do with the domain to which the training data belongs.
  • domain-invariant features By decomposing domain-invariant features and domain-specific features from training data, domain-invariant features can be decoupled from domain-specific features. Since the neural network obtained by the training method of the present application uses domain invariant features to perform tasks, the influence of domain-specific features on the neural network is avoided, thereby improving the migration performance of the neural network between different domains.
  • the training of the neural network using the training data includes: decomposing domain-invariant features and domain-specific features from the training data;
  • the domain invariant feature performs a task to obtain a task loss, and the mutual information loss between the domain invariant feature and the domain specific feature is calculated, and the task loss is used to characterize the use of the domain invariant feature to perform the task.
  • the difference between the obtained result and the task label, the mutual information loss is used to represent the difference between the domain invariant feature and the domain specific feature; according to the task loss and the mutual information loss, training the The neural network.
  • the method further includes: using the domain-specific features to perform domain classification to obtain a domain classification loss; wherein, according to the task loss and the Mutual information loss, training the neural network includes: training the neural network according to the task loss, the mutual information loss, and the domain classification loss.
  • said decomposing domain-invariant features and domain-specific features from said training data includes: extracting initial features from said training data; The initial feature is decomposed into the domain invariant feature and the domain specific feature, wherein the method further includes: training the neural network to reduce the information contained in the initial feature and the domain invariant feature The difference between the information contained in common with the domain-specific features.
  • the decoupled domain invariant features and domain specific features can contain all the feature information of the training data. To improve the completeness and rationality of feature decoupling.
  • the method further includes: reconstructing the initial feature using the domain invariant feature and the domain specific feature to obtain a reconstructed feature; comparing the initial feature with the reconstructed feature Feature to determine the difference between the information contained in the initial feature and the information contained in the domain invariant feature and the domain specific feature.
  • Using the reconstruction loss to train the neural network can make the decoupled domain-invariant features and domain-specific features contain all the feature information of the training data, so as to improve the completeness and rationality of feature decoupling.
  • the method further includes: using the domain invariant feature and the domain specific feature to reconstruct the initial feature to obtain a reconstructed feature, wherein the domain The invariant feature and the domain-specific feature are features decomposed from the initial feature; the initial feature and the reconstruction feature are compared to obtain a reconstruction loss, and the reconstruction loss is used to characterize what the initial feature contains The difference between the information and the information jointly contained in the domain invariant feature and the domain specific feature, wherein the training the neural network according to the task loss and the mutual information loss includes: according to the Task loss, training the neural network in the first stage; training the neural network in the second stage based on the mutual information loss, wherein the method further includes: performing training on the neural network based on the reconstruction loss The neural network is trained in the third stage.
  • Carrying out the training process of the neural network in stages can simplify the amount of training in each stage and speed up the convergence speed of the parameters of the neural network.
  • the neural network includes a first decoupler and a second decoupler, and the domain invariant features and domains are decomposed from the training data.
  • the specific feature includes: extracting the first feature of the training data from the training data; using the first decoupler to extract preliminary domain invariant features and preliminary domain specific features from the first feature; The preliminary domain invariant feature is fused with the first feature to obtain a second feature; the third feature of the training data is extracted from the second feature; the third feature is extracted from the third feature by using a second decoupler The domain invariant feature and the domain specific feature.
  • the preliminary domain invariant feature is merged with the first feature to obtain the second feature, so that the domain invariant feature information is in the first
  • the level of characteristics has been increased.
  • this second feature uses this second feature to decouple the domain invariant features based on the second decoupler.
  • the decoupling accuracy of the domain invariant features is further enhanced, which can make the trained neural network perform better in task execution and domain adaptability. Also better.
  • the method further includes: training the neural network to reduce the information contained in the third feature and the domain invariant feature The difference between the information contained in the specific features of the domain.
  • the decoupled domain invariant features and domain specific features can be further promoted to contain all the features of the training data Information to improve the completeness and rationality of feature decoupling.
  • the neural network is used for domain adaptive learning, and the training data includes image data in different fields.
  • the domain invariant features can be decoupled from domain specific features. Due to the use of domain invariant features to perform tasks, the neural network obtained by the training method of this application can self-adapt to processing tasks for images in a variety of different fields through domain adaptive learning, thereby realizing adaptive processing of image data in different fields .
  • a data acquisition method including: acquiring data of a source domain and/or data of a target domain; inputting the data of the source domain and/or the data of the target domain into a neural network for training, Obtain the gradient information of the loss function; according to the gradient information, perturb the data of the source domain and/or the data of the target domain to obtain the data of the intermediate domain; wherein the source domain and the target domain are Two areas with different data characteristics, the difference in data characteristics between the intermediate domain and any one of the source domain and the target domain is smaller than the data characteristics between the source domain and the target domain The difference.
  • the introduction of the direction information between the source domain and the target domain makes the disturbance of the training data more targeted.
  • the training data of the intermediate domain obtained through the disturbance can fill the "domain gap" between the source domain and the target domain, and alleviate the source domain There is a big difference between the distribution of training data and the training data of the target domain.
  • said inputting the data of the source domain and/or the data of the target domain into a neural network for training, so as to obtain gradient information of the loss function includes: inputting the labeled data of the source domain into a first neural network and performing training to obtain first gradient information, wherein the first neural network is generated based on the labeled data of the target domain through training of.
  • the first neural network is generated by training the labeled data of the target domain. Therefore, the first gradient information obtained after inputting the labeled data of the source domain into the first neural network can be a good measure of The direction from the source domain to the target domain.
  • said inputting the data of the source domain and/or the data of the target domain into a neural network for training to obtain gradient information of a loss function includes : Input the unlabeled data of the target domain into the second neural network, and perform training in the manner of virtual confrontation training to obtain the second gradient information, where the second neural network is generated based on the labeled data training .
  • the second neural network is generated by training with labeled data in the source domain. Therefore, the second gradient information obtained through virtual confrontation training after inputting unlabeled data in the target domain into the second neural network can be very good Measure the direction from the target domain to the source domain.
  • a neural network training device including: a module calculation module for executing the first aspect.
  • a data acquisition device including: a module calculation module for executing the method described in the second aspect.
  • a neural network training device including: a memory for storing a program; a processor for executing the program stored in the memory, and when the program stored in the memory is executed, the processing The device is used to perform the method described in the first aspect or the second aspect.
  • a neural network including: a first feature extraction layer for extracting a first feature based on input data; a first domain invariant feature decoupling layer for extracting a first feature based on the first feature Domain invariant features; a feature fusion layer for fusing the first feature and the first domain invariant feature to obtain a second feature; a second feature extraction layer for extracting a third feature based on the second feature Features; the second domain invariant feature decoupling layer, used to extract the second domain invariant features based on the third feature; wherein, the first domain invariant features and the second domain invariant features are respectively a characterization
  • the features of the field to which the input data belongs, the first domain specific feature and the second domain specific feature are respectively features that have nothing to do with the field to which the input data belongs.
  • a data processing system including: a data acquisition network for acquiring gradient information of a loss function based on first data, and perturbing the first data according to the gradient information to acquire second data
  • the feature decoupling network is used to train the neural network using the training data including the second data, so that the neural network learns to decompose domain invariant features and domain specific features from the training data; wherein, the The domain-specific features are features that characterize the domain to which the training data belongs, and the domain invariant features are features that have nothing to do with the domain to which the training data belongs.
  • the feature decoupling network includes: a first feature extraction layer for extracting first features based on the training data; first domain invariant feature extraction Layer, used to extract the first domain invariant feature based on the first feature; the first domain specific feature extraction layer, used to extract the first domain specific feature based on the first feature; the first mutual information loss acquisition layer, using To obtain the first mutual information loss based on the first domain invariant feature and the first domain specific feature; the feature fusion layer is used to fuse the first feature and the first domain invariant feature to obtain the first Two features; a second feature extraction layer for extracting a third feature based on the second feature; a second domain invariant feature decoupling layer for extracting a second domain invariant feature based on the third feature; second The domain-specific feature extraction layer is used to extract second domain-specific features based on the third feature; the second mutual information loss acquisition layer is used to obtain the second domain-specific features based on the second domain invariant features and
  • the data processing system further includes: a first domain classifier, configured to perform a classification task based on the specific characteristics of the first domain to obtain the first classification loss
  • the first gradient inversion layer is used to invert the gradient information of the first classification loss
  • the second domain classifier is used to perform classification tasks based on the specific features of the second domain to obtain the second Classification loss
  • a second gradient inversion layer in which the gradient information of the second classification loss is inverted.
  • the data processing system further includes: a reconstruction loss acquisition layer, configured to use the second domain invariant feature and the second domain specific feature pair The third feature is reconstructed to obtain a reconstructed feature; the third feature and the reconstructed feature are compared to obtain a reconstruction loss.
  • a reconstruction loss acquisition layer configured to use the second domain invariant feature and the second domain specific feature pair The third feature is reconstructed to obtain a reconstructed feature; the third feature and the reconstructed feature are compared to obtain a reconstruction loss.
  • the first data includes data of a source domain and/or data of a target domain
  • the data acquisition network includes: A first training network generated by training on labeled data; and/or a second training network generated based on training on labeled data.
  • a security device including the neural network described in the sixth aspect.
  • a computer-readable storage medium stores computer program instructions, and when the computer program instructions are executed by a processor, the processor executes the first aspect or the first aspect. The method described in the two aspects.
  • a computer program product including computer program instructions that, when run by a processor, cause the processor to execute the method described in the first aspect or the second aspect.
  • a chip in an eleventh aspect, includes a processor and a data interface.
  • the processor reads instructions stored in a memory through the data interface and executes the method described in the first or second aspect.
  • the chip may further include a memory in which instructions are stored, and the processor is configured to execute instructions stored on the memory.
  • the processor is configured to execute the method described in the first aspect or the second aspect.
  • Figure 1 is a schematic diagram of an artificial intelligence main frame.
  • Fig. 2 is a system architecture provided by an embodiment of the application.
  • FIG. 3 is a diagram of the chip hardware structure provided by an embodiment of the application.
  • Fig. 4 is a system architecture provided by an embodiment of the application.
  • FIG. 5 is a schematic flowchart of a neural network training method provided by an embodiment of this application.
  • FIG. 6 is a schematic flowchart of a neural network training method provided by an embodiment of this application.
  • FIG. 7 is a schematic structural diagram of a neural network provided by an embodiment of this application.
  • FIG. 8 is a schematic diagram of the principle of feature decoupling provided by an embodiment of this application.
  • Fig. 9 is a schematic structural diagram of a neural network provided by another embodiment of the application.
  • FIG. 10 is a schematic structural diagram of a neural network provided by an embodiment of this application.
  • FIG. 11 is a schematic diagram of the process of extracting domain invariant features and domain specific features based on the neural network architecture shown in FIG. 10.
  • FIG. 12 is a schematic diagram of the principle of the training process provided by an embodiment of the application.
  • FIG. 13 is a schematic structural diagram of a neural network provided by an embodiment of this application.
  • FIG. 14 is a schematic diagram of a process for obtaining data of an intermediate domain according to an embodiment of the application.
  • FIG. 15 is a schematic structural diagram of a neural network provided by an embodiment of this application.
  • FIG. 16 is a schematic diagram of two-way confrontation training provided by another embodiment of this application.
  • FIG. 17 is a schematic structural diagram of a data processing system provided by an embodiment of this application.
  • FIG. 18 is a schematic structural diagram of a neural network training device provided by an embodiment of the application.
  • FIG. 19 is a schematic structural diagram of a data acquisition device provided by another embodiment of this application.
  • FIG. 20 is a schematic diagram of the hardware structure of a neural network training device provided by an embodiment of the application.
  • Figure 1 is a schematic diagram of an artificial intelligence main frame.
  • the main framework describes the overall workflow of the artificial intelligence system, which is suitable for general artificial intelligence field requirements.
  • Intelligent Information Chain reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, the data has gone through the condensing process of "data-information-knowledge-wisdom".
  • IT value chain from the low-level infrastructure of human intelligence, information (providing and processing technology realization) to the industrial ecological process of the system, reflecting the value that artificial intelligence brings to the information technology industry.
  • the infrastructure provides computing power support for the artificial intelligence system, realizes communication with the outside world, and realizes support through the basic platform.
  • the basic platform includes distributed computing framework and network related platform guarantees and support, which can include cloud storage and Computing, interconnection network, etc.
  • sensors communicate with the outside to obtain data, and these data are provided to the smart chip in the distributed computing system provided by the basic platform for calculation.
  • the data in the upper layer of the infrastructure is used to represent the data source in the field of artificial intelligence.
  • the data involves graphics, images, voice, text, and IoT data of traditional devices, including business data of existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.
  • Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making and other methods.
  • machine learning and deep learning can symbolize and formalize data for intelligent information modeling, extraction, preprocessing, training, etc.
  • Reasoning refers to the process of simulating human intelligent reasoning in a computer or intelligent system, using formal information to conduct machine thinking and solving problems based on reasoning control strategies.
  • the typical function is search and matching.
  • Decision-making refers to the process of making decisions after intelligent information is reasoned, and usually provides functions such as classification, ranking, and prediction.
  • some general capabilities can be formed based on the results of the data processing, such as an algorithm or a general system, for example, translation, text analysis, computer vision processing, speech recognition, image Recognition and so on.
  • Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. It is an encapsulation of the overall solution of artificial intelligence, productizing intelligent information decision-making and realizing landing applications. Its application fields mainly include: intelligent manufacturing, intelligent transportation, Smart home, smart medical, smart security, autonomous driving, smart city, smart terminal, etc.
  • a model that performs well in the source domain will cause performance limitations if it is directly applied to the target domain.
  • a distributed alignment strategy is adopted, that is, the data of the source domain and the data of the target domain are aligned at the level of feature representation. Since this distribution alignment process is only performed at the overall feature representation level, the domain adaptive learning task is inevitably affected by specific features in different fields. Therefore, the trained neural network model still has the problem of poor migration performance.
  • this application proposes a method for training a neural network model, which can extract domain invariant features (domain invariant features can be understood as features at the instance level that are not related to the domain) from the features of the data during the training process. Decoupling makes the domain adaptive learning task not affected by the specific characteristics of different domains, thereby improving the migration performance of the neural network model.
  • the neural network model trained in the embodiments of the present application can be applied to various different application scenarios, and the neural network model can also have different structures according to different specific application scenarios.
  • the neural network model in image classification application scenarios (such as vehicle recognition, face recognition, etc.), can be a convolutional neural network model, while in regression prediction application scenarios (such as energy consumption prediction of industrial production lines, weather prediction, etc.) Landslide prediction, etc.), the neural network model can include a multilayer perceptron architecture.
  • the embodiments of the present application do not limit the specific application scenarios and structure of the trained neural network model.
  • Domain adaptive learning is a machine learning method used to solve the problem of inconsistency in the probability distribution of training samples and test samples. It aims to overcome the difference between the probability distribution of source domain samples and the probability distribution of target domain samples in the training process to achieve the target domain Learning tasks.
  • a neural network can be composed of neural units, which can refer to an arithmetic unit that takes x s and intercept 1 as inputs.
  • the output of this arithmetic unit can be expressed by the following formula (1).
  • s 1, 2,...n, n is a natural number greater than 1
  • W s is the weight of x s
  • b is the bias of the neural unit.
  • f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal.
  • the output signal of the activation function can be used as the input of the next convolutional layer.
  • the activation function can be a sigmoid function.
  • a neural network is a network formed by connecting many of the above-mentioned single neural units together, that is, the output of one neural unit can be the input of another neural unit.
  • the input of each neural unit can be connected with the local receptive field of the previous layer to extract the characteristics of the local receptive field.
  • the local receptive field can be a region composed of several neural units.
  • Deep neural network also known as multi-layer neural network
  • DNN can be understood as a neural network with many hidden layers. There is no special metric for "many” here. Dividing DNN according to the location of different layers, the neural network inside DNN can be divided into three categories: input layer, hidden layer, and output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the middle layers are all hidden layers. The layers are fully connected, that is to say, any neuron in the i-th layer must be connected to any neuron in the i+1th layer.
  • DNN looks very complicated, the work of each layer can be expressed based on the linear relationship expression described in the following formula (2).
  • the linear coefficient from the fourth neuron in the second layer to the second neuron in the third layer is defined as
  • the superscript 3 represents the number of layers where the coefficient W is located, and the subscript 24 corresponds to the output third-level index 2 and the input second-level index 4.
  • the summary is: the coefficient from the kth neuron in the L-1th layer to the jth neuron in the Lth layer is defined as It should be noted that there is no W parameter in the input layer.
  • more hidden layers make the network more capable of portraying complex situations in the real world. In theory, a model with more parameters is more complex and has a greater "capacity", which means that it can complete more complex learning tasks.
  • the process of training the deep neural network is also the process of learning the weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (the weight matrix formed by the vector W of many layers).
  • Important equation taking the loss function as an example, the higher the output value (loss) of the loss function, the greater the difference, then the training of the deep neural network becomes a process of reducing this loss as much as possible.
  • Convolutional neural networks can use backpropagation (BP) algorithms to modify the size of the parameters in the initial neural network during the training process, so that the reconstruction error loss of the neural network becomes smaller and smaller. Specifically, forwarding the input signal to the output will cause error loss, and the parameters in the initial neural network are updated by backpropagating the error loss information, so that the error loss is converged.
  • the backpropagation algorithm is a backpropagation motion dominated by error loss, aiming to obtain the optimal neural network parameters, such as the weight matrix.
  • the adversarial sample refers to the input sample formed by adding disturbance to the data set, which causes the neural network to give an incorrect output with high confidence. Since the ultimate goal of the neural network is actually to obtain the correct output result, the adversarial sample is used to train the neural network with this adversarial training strategy, so that the neural network can adapt to this perturbation, thereby being robust to the adversarial sample.
  • Virtual confrontation training refers to a confrontation training method that does not rely on training labels. Virtual confrontation training generates a disturbance based on the first output of the neural network. This disturbance makes the second output obtained by inputting the generated confrontation sample into the neural network different from the previous first output, so as to realize the strategy of confrontation training.
  • FIG. 2 is a system architecture 200 provided by an embodiment of the application.
  • the system architecture 200 includes an execution device 210, a training device 220, a database 230, a client device 240, a data storage system 250, and a data collection system 260.
  • the execution device 210 includes a calculation module 211, an I/O interface 212, a preprocessing module 213, and a preprocessing module 214.
  • the calculation module 211 may include the target model/rule 201, and the preprocessing module 213 and the preprocessing module 214 are optional.
  • the data collection device 260 is used to collect training data (or sample data for training) and store it in the database 230.
  • the training data in the embodiment of the present application may include training data in different fields, such as training data in the source domain and the target domain.
  • the training device 220 trains the target model/rule 201 based on the training data maintained in the database 230, so that the target model/rule 201 has the function of decoupling domain invariant features and domain specific features from the input data, and uses the Domain-invariant features can complete tasks required by actual application scenarios, such as the ability to complete tasks such as target classification/detection/recognition/segmentation.
  • the target model/rule 201 may be a neural network model.
  • the work of each layer in the neural network model can be expressed in mathematical expressions To describe: From the physical level, the work of each layer in the neural network model can be understood as through five operations on the input space (the set of input vectors) to complete the transformation from the input space to the output space (that is, the row space of the matrix to the column Space), these five operations include: 1. Dimension Up/Down; 2. Enlarge/Reduce; 3. Rotate; 4. Translation; 5. "Bend”. The operations of 1, 2, and 3 are determined by Completed, the operation of 4 is completed by +b, and the operation of 5 is realized by a().
  • W is a weight vector, and each value in the vector represents the weight value of a neuron in the layer of neural network.
  • This vector W determines the spatial transformation from the input space to the output space described above, that is, the weight W of each layer controls how the space is transformed.
  • the purpose of training the neural network model is to finally obtain the weight matrix of all layers of the trained neural network (the weight matrix formed by the vector W of many layers). Therefore, the training process of the neural network is essentially the way of learning the control space transformation, and more specifically the learning weight matrix.
  • the output of the neural network model is as close as possible to the value that you really want to predict, you can compare the current network's predicted value with the really desired target value, and then update each layer of neural network according to the difference between the two.
  • the weight vector of the network (of course, there is usually an initialization process before the first update, which is to pre-configure parameters for each layer in the neural network model). For example, if the predicted value of the network is high, adjust the weight vector to make it The prediction is lower and keep adjusting until the neural network can predict the target value you really want. Therefore, it is necessary to predefine "how to compare the difference between the predicted value and the target value". This is the loss function or objective function, which is used to measure the difference between the predicted value and the target value. Important equation. Among them, taking the loss function as an example, the higher the output value (loss) of the loss function, the greater the difference, and the training of the neural network model becomes a process of reducing this loss as much as possible.
  • the target model/rule obtained by the training device 220 can be applied to different systems or devices.
  • the execution device 210 is configured with an I/O interface 212 to perform data interaction with external devices.
  • the "user" can input data to the I/O interface 212 through the client device 240.
  • the execution device 210 can call data, codes, etc. in the data storage system 250, and can also store data, instructions, etc. in the data storage system 250.
  • the calculation module 211 uses the target model/rule 201 to process the input data.
  • the specific input data of the calculation module 211 is related to the specific application scenario.
  • the input data of the calculation module 211 may be image data including a face image. Since the calculation module 211 uses the target model/rule 201 to process the input data, the calculation module actually obtains instance-level features based on the input data, and then uses the instance-level features to perform specific tasks.
  • the system architecture 200 may also include some management function modules connected to the calculation module 211 to complete more flexible subdivision tasks based on the output result of the calculation module 211.
  • the characteristic information further identifies information such as the license plate number and model of the vehicle; and the correlation function module 214 may be configured to further identify the gender, height, and age of the pedestrian based on the characteristics of the pedestrian output by the calculation module 211.
  • this application does not limit whether the system architecture includes these associated function modules, and the specific functions performed by these associated function modules.
  • the I/O interface 212 returns the processing result to the client device 240 and provides it to the user.
  • the training device 220 can generate corresponding target models/rules 201 based on different data for different targets, so as to provide users with better results.
  • the user can manually specify to input data in the execution device 210, for example, to operate in the interface provided by the I/O interface 212.
  • the client device 240 can automatically input data to the I/O interface 212 and obtain the result. If the client device 240 automatically inputs data and needs the user's authorization, the user can set the corresponding authority in the client device 240.
  • the user can view the result output by the execution device 210 on the client device 240, and the specific presentation form may be a specific manner such as display, sound, and action.
  • the client device 240 can also serve as a data collection terminal to store the collected sample data in the database 230.
  • Figure 2 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship between the devices, devices, modules, etc. shown in the figure does not constitute any limitation.
  • the data storage system 250 is an external memory relative to the execution device 210. In other cases, the data storage system 250 may also be placed in the execution device 210.
  • FIG. 3 is a diagram of the chip hardware structure provided by an embodiment of the application.
  • the chip includes a neural-network processing unit (NPU) 300.
  • the chip can be set in the execution device 210 as shown in FIG. 2 to complete the calculation work of the calculation module 211.
  • the chip can also be set in the training device 220 shown in FIG. 2 to complete the training work of the training device 220 and output the target model/rule 201.
  • the following neural network training methods shown in FIG. 4, FIG. 9 and FIG. 11 can all be implemented in the chip shown in FIG. 3.
  • the neural network processor 300 is mounted on a main central processing unit (host central processing unit, host CPU) as a coprocessor, and the main CPU distributes tasks.
  • the core part of the neural network processor 300 is the arithmetic circuit 303, and the controller 304 controls the arithmetic circuit 303 to extract data from the memory (weight memory 302 or input memory 301) and perform calculations.
  • the arithmetic circuit 303 includes multiple processing units (process engines, PE). In some implementations, the arithmetic circuit 303 is a two-dimensional systolic array. The arithmetic circuit 303 may also be a one-dimensional systolic array or other electronic circuits capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 303 is a general-purpose matrix processor.
  • the arithmetic circuit 303 fetches the data corresponding to the weight matrix B from the weight memory 302 and caches it on each PE in the arithmetic circuit 303.
  • the arithmetic circuit 303 fetches the input matrix A and the weight matrix B from the input memory 301 to perform matrix operations to obtain partial results or final results of the matrix, and store them in an accumulator 308.
  • the vector calculation unit 307 can perform further processing on the output of the operation circuit 303, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, and so on.
  • the vector calculation unit 307 can be used for network calculations in the non-convolutional/non-FC layer of the neural network, such as pooling, batch normalization, local response normalization, etc. .
  • the vector calculation unit 307 can store the processed output vector to the unified memory 306.
  • the vector calculation unit 307 may apply a nonlinear function to the output of the arithmetic circuit 303, such as a vector of accumulated values, to generate the activation value.
  • the vector calculation unit 307 generates a normalized value, a combined value, or both.
  • the processed output vector can be used as an activation input to the arithmetic circuit 303, for example for use in a subsequent layer in a neural network.
  • the unified memory 306 is used to store input data and output data.
  • the weight data directly transfers the input data in the external memory to the input memory 301 and/or the unified memory 306 through the direct memory access controller (DMAC) 305, and stores the weight data in the external memory into the weight memory 302, And the data in the unified memory 306 is stored in the external memory.
  • DMAC direct memory access controller
  • the bus interface unit (BIU) 310 is used to implement interaction between the main CPU, the DMAC, and the instruction fetch memory 309 through the bus.
  • An instruction fetch buffer 309 connected to the controller 304 is used to store instructions used by the controller 304.
  • the controller 304 is used to call the instructions cached in the instruction fetch memory 309 to control the working process of the computing accelerator.
  • the unified memory 306, the input memory 301, the weight memory 302, and the fetch memory 309 are all on-chip memories.
  • the external memory is a memory external to the NPU.
  • the external memory can be a double data rate synchronous dynamic random access memory.
  • Memory double data rate synchronous dynamic random access memory, DDR SDRAM), high bandwidth memory (HBM) or other readable and writable memory.
  • FIG. 4 is a system architecture 400 provided by an embodiment of this application.
  • the execution device 410 is implemented by one or more servers set in the cloud.
  • the server can also cooperate with other computing devices, such as data storage, routers, load balancers and other devices; the execution device 410 can be arranged on a physical site or distributed On multiple physical sites.
  • the execution device 410 may use the data in the data storage system 420 or call the program code in the data storage system 420 to implement the neural network training method provided by the embodiment of the present application; specifically, execute The device 410 can train the neural network according to the training data in the data storage system 420 in the method provided in the embodiment of the present application, and complete the corresponding intelligent task according to the request of the local device 401/402.
  • the execution device 410 may not have the function of training a neural network, but the neural network trained according to the neural network training method provided by the embodiment of the present application can complete the corresponding intelligent task; Specifically, after the execution device 410 is configured with the neural network training method provided by the embodiment of the present application, after the neural network is trained, the corresponding intelligent task can be completed after receiving the request of the local device 401/402, and the result will be fed back. To the local device 401/402.
  • the user can operate respective user devices (for example, the local device 401 and the local device 402) to interact with the execution device 410.
  • Each local device can represent any computing device, such as personal computers, computer workstations, smart phones, tablets, smart cameras, smart cars or other types of cellular phones, media consumption devices, wearable devices, set-top boxes, game consoles, etc.
  • the local device may be a security device, such as a surveillance camera device, a smoke alarm device, or a fire extinguishing device.
  • the local device of each user can interact with the execution device 410 through a communication network of any communication mechanism/communication standard.
  • the communication network can be a wide area network, a local area network, a point-to-point connection, etc., or any combination thereof.
  • one or more aspects of the execution device 410 may be implemented by each local device.
  • the local device 401 may provide the execution device 410 with local data or feed back calculation results.
  • all the functions of the foregoing execution device 410 may also be implemented by a local device.
  • the local device 401 executes the neural network training method provided in the embodiments of the present application, and uses the trained neural network to provide services to users.
  • FIG. 5 is a schematic flowchart of a neural network training method provided by an embodiment of the application.
  • the training method of the neural network shown in FIG. 5 can be executed by the training device 220 shown in FIG. 2, and the target model/rule 201 trained by the training device 220 is the neural network.
  • the neural network training method includes the following steps:
  • Step 501 Obtain training data.
  • the training data is the input data of the training process.
  • the training data can be collected by the user, or an existing training database can be used. It should be understood that the training data may have different formats and forms according to different requirements of actual scenarios. For example, in a target detection or target recognition scenario, the training data may be image data. In the scenario of regression prediction, the training data can be the collected historical housing price data.
  • the training data input to the neural network may include training data of different domains.
  • different domains can include target domains and source domains.
  • the training data may include the training data of the source domain and the training data of the target domain.
  • the difference in domains may be embodied as the difference in scenarios.
  • the training data of the source domain may be a large number of traffic scene images in a sunny scene
  • the training data of the target domain may be a large number of traffic scene images in a foggy scene.
  • the training data of the source domain and the training data of the target domain may also reflect this domain difference in other aspects.
  • the training data of the source domain may be collected last year.
  • the energy consumption data of the production line, and the training data of the target domain may be the energy consumption data of the production line collected this year.
  • the domain difference is reflected in the inconsistent value distribution of the energy consumption data due to time changes.
  • Step 502 Use the training data to train the neural network so that the neural network learns to decompose domain invariant features and domain specific features from the training data.
  • the neural network can use any of the methods of supervised learning, semi-supervised learning or unsupervised learning to learn from the training data.
  • the training data can include the training data of the source domain with sufficient labels and the training data of the target domain with a small number of labels.
  • the neural network can use semi-supervised learning to perform Training data for learning; alternatively, the training data can include the training data of the source domain with sufficient labels and the training data of the target domain without labels.
  • the neural network can use unsupervised learning on the training data. learn.
  • Domain-invariant representation is a feature that has nothing to do with the domain to which the training data belongs, and is a feature that does not change due to domain differences. Domain-invariant features can sometimes be referred to as task-related instance-level features.
  • the domain difference is reflected in the image difference between the weather changes between the traffic scene image taken on a sunny day and the traffic scene image taken on a foggy day, and the characteristics of the vehicle in the traffic scene will not follow The weather changes.
  • the target object (ie, instance) of the target detection task is the vehicle in the image, so the characteristics of the vehicle are the domain invariant features to be extracted.
  • this trained neural network can accurately extract the characteristics of the vehicle to complete the target. Inspection task.
  • Domain-specific representation is a feature that characterizes the domain to which the training data belongs. It is a feature unique to the domain to which the training data belongs, and will change due to domain differences; at the same time, domain-specific features are also irrelevant to the instance The characteristics are also irrelevant to the goal of the task in the actual task execution process. For example, in the aforementioned vehicle detection application scenario, the characteristics of the surrounding environment (trees, sky, street scene, etc.) of the vehicle in the traffic scene image are not related to the characteristics of the vehicle, because the recognition or detection of the vehicle does not require knowledge of the surrounding environment. Features, and the feature information of the surrounding environment (such as the sky) will change with the domain difference (weather change).
  • the embodiment of the present application decomposes the domain invariant feature and the domain specific feature from the training data, so that the domain invariant feature can be decoupled from the domain specific feature. Since the neural network obtained by the training method of the present application uses domain invariant features to perform tasks, the influence of domain-specific features on the neural network is avoided, thereby improving the migration performance of the neural network between different domains.
  • FIG. 6 is a schematic flowchart of a neural network training method provided by an embodiment of the application.
  • Fig. 7 is a schematic diagram of the structure of the neural network trained in Fig. 6.
  • the training method of the neural network shown in FIG. 6 can be executed by the training device 220 shown in FIG. 2, and the target model/rule 201 trained by the training device 220 is the neural network.
  • the training method of the neural network includes the following steps:
  • Step 601 Decompose domain-invariant features and domain-specific features from the training data.
  • the process of extracting the domain invariant feature DIR and the domain specific feature DSR can be respectively completed by the domain invariant feature extractor E DIR and the domain specific feature extractor E DSR in the neural network.
  • the domain invariant feature extractor E DIR and the domain specific feature extractor E DSR can be used to complete the extraction process of the domain invariant feature DIR and the domain specific feature DSR.
  • the goal of the task is to detect objects (including people and vehicles) in the image data.
  • the training data of the source domain on the left side of Fig. 8 is a photo image
  • the training data of the target domain on the right side is a cartoon image.
  • the domain invariant features extracted from the training data of the source domain are the characters and vehicles in the photo image
  • the domain invariant features extracted from the training data of the target domain are the characters and vehicles in the cartoon image.
  • the line C 1 represents the classification boundary between the domain invariant feature of the person and the domain invariant feature of the vehicle in the domain invariant space.
  • the domain-specific features extracted from the training data of the source domain are other features other than the characters and vehicles in the photo image
  • the domain-specific features extracted from the training data of the target domain are the cartoon images that remove the characters and
  • the line C 2 represents the distribution boundary of the domain-specific features from the source domain and the domain-specific features from the target domain in the domain-specific space.
  • Step 602 Use the domain invariant feature to perform the task, obtain the task loss, and calculate the mutual information loss between the domain invariant feature and the domain specific feature.
  • the mutual information loss is used to represent the difference between the domain invariant feature and the domain specific feature .
  • domain invariant features are used to characterize feature information at the instance level. Therefore, domain invariant features are used to perform tasks and obtain task loss, which can improve the effect of domain invariant features on tasks related to tasks. The accuracy and completeness of the instance characterization.
  • the task loss is used to characterize the gap between the result of using domain invariant features to perform the task and the task label. For example, when the domain invariant feature is used to perform a target detection task, the results obtained from the task can include the attribute features of the detected target object, and the task label corresponds to the target object to which the domain invariant feature actually corresponds In this way, the difference between the detected attribute feature and the standard attribute feature can be characterized by the task loss.
  • MI loss characterizes the interdependence of two variables.
  • the mutual information loss I of two random variables X and Z can be defined by the following formula (3), where H(X) is the edge entropy and H(X
  • Mutual information loss is used to represent the difference between domain-invariant features and domain-specific features. By calculating the mutual information loss between domain-invariant features and domain-specific features, and training the neural network based on the mutual information loss, it can help to further distinguish between domain-invariant features and domain-specific features to force The role of domain-invariant features and domain-specific features decoupling. It should be understood that the calculation method of mutual information loss can be selected according to actual scenario requirements. For example, mutual information neural estimator (MINE) can be selected to obtain the mutual information loss. The specific calculation of mutual information loss in this application The method is not strictly limited.
  • Step 603 Train the neural network according to the task loss and the mutual information loss.
  • the training process of the neural network is actually the process of adjusting the weight vector according to the value of the loss function.
  • the task loss here characterizes the ability to complete the task based on the domain invariant features extracted from the training data. If the domain invariant feature cannot correspond to the instance accurately enough, then the value of the task loss will be relatively large. At this time, the weight vector in the neural network needs to be adjusted so that the domain invariant feature can be obtained in the next prediction process Lower mission loss.
  • the domain invariant features extracted by the domain invariant feature extractor will correspond to the examples more and more accurately.
  • the process of training the neural network based on the mutual information loss may be a process of training the neural network to reduce the mutual information loss between the domain invariant feature and the domain specific feature, for example, to minimize the mutual information loss.
  • the mutual information loss between the domain invariant feature and the domain specific feature can be calculated, and the mutual information loss can be used to further improve the accuracy of the domain invariant feature extraction.
  • the mutual information loss characterizes the correlation between domain invariant features and domain-specific features
  • adjusting the weight vector of the neural network according to the mutual information loss can make the extracted domain invariant features better and domain-specific
  • the features are distinguished, and they play a role in forcing the decoupling of features. If the mutual information loss is large, it means that the current domain-invariant features and domain-specific features are relatively related, that is, the current domain-invariant feature extractor may still include domain-specific features in the features extracted Information content, at this time, the weight vector of the neural network needs to be adjusted to reduce the mutual information loss.
  • the features extracted by the domain invariant feature extractor may have some relevance to the task, so the training process based on mutual information loss can also be It is regarded as a process of "removing" domain-specific features from domain invariant features, so that the features extracted by the domain invariant feature extractor become more and more consistent with the instance as the training iterations, and it also makes the domain-specific feature extractor With the iteration of training, the extracted features become more and more irrelevant to the instance, thereby realizing the decoupling of domain-invariant features and domain-specific features.
  • the device is also trained in this training process based on mutual information loss.
  • the foregoing training process based on task loss and the training process based on mutual information loss are not necessarily performed at the same time.
  • the training process based on mutual information loss may also be performed during training based on task loss. After the process starts, this application does not strictly limit the specific execution sequence of the two training processes.
  • This application trains the neural network based on the task loss and the mutual information loss, which not only makes the decomposed domain invariant features more accurately correspond to the instance, but also reduces the gap between the domain invariant features and the domain specific features during the training process. In order to promote the complete decoupling of domain-invariant features and domain-specific features, the influence of domain-specific features on domain-invariant features is further reduced.
  • one or more combinations of the following loss information between domain invariant features and domain specific features can be calculated: mutual information loss, metric loss (for example, L1 distance or L2 distance), measurement Loss of data distribution (such as KL (kullback-leibler) divergence) and terrorismstein distance.
  • This application does not strictly limit the form of loss information used to characterize the correlation between domain invariant features and domain specific features.
  • the neural network can be used for domain adaptive learning, and the training data can come from image data of different domains (for example, different styles), such as photorealistic style, comic style, etc.
  • the training data can come from image data of different domains (for example, different styles), such as photorealistic style, comic style, etc.
  • the domain-invariant features can be decoupled from domain-specific features. Since domain invariant features are used to perform tasks, the neural network obtained by the training method of the present application can adapt to various image processing tasks in different fields through domain adaptive learning, such as target detection/recognition/segmentation, etc., so as to achieve Adaptive processing of image data in different fields.
  • the domain-specific feature extractor will also be trained in the training process based on mutual information loss, considering that when the domain-specific feature extractor's extraction accuracy for domain-specific features is improved, the The information loss training process can effectively distinguish domain-specific features from domain invariant features, and the extraction accuracy of domain invariant features will be further improved indirectly. Therefore, it is necessary to further improve the extraction accuracy of domain-specific features to indirectly improve the extraction accuracy of domain invariant feature extractors through the training process based on mutual information loss.
  • the domain-specific features extracted by the domain-specific feature extractor may be subjected to domain classification to obtain the domain classification loss, and then the neural network can be trained according to the task loss, mutual information loss, and domain classification loss.
  • the domain-specific feature extractor can be connected to a domain classifier, and a gradient reversal layer (GRL) can be set between the feature extractor and the domain classifier.
  • GRL gradient reversal layer
  • the domain-specific features extracted by the domain-specific feature extractor are input into the domain classifier to distinguish whether the domain-specific features are really domain-specific features to obtain the domain classification loss.
  • the domain classification loss is actually the domain-specific feature extractor The accuracy of the extraction result; then the domain classification loss will pass through the gradient reversal layer during the back propagation process to the domain specific feature extractor, so that the gradient direction of the domain classification loss in the back propagation process is automatically reversed to "Confusion" domain specific feature extractor.
  • the goal of the domain classifier is actually to confuse the domain-specific feature extractor; the goal of the specific feature extractor is to ensure that the extracted features are domain-specific Features, through this confrontation strategy between the domain classifier and the domain-specific feature extractor, in order to finally achieve the purpose of improving the accuracy of the domain-specific feature extractor to extract the domain-specific features.
  • This application introduces domain classification loss, which helps to extract domain invariant features from the features of training data.
  • the training data in order to further promote the decoupling of domain-invariant features and domain-specific features to contain all the feature information of the training data, so as to improve the completeness and rationality of the feature decoupling, can be obtained first Extract the initial features, decompose the initial features into domain-invariant features and domain-specific features, and then train the neural network to reduce the difference between the information contained in the initial features and the information jointly contained in the domain-invariant features and domain-specific features .
  • the initial features can be reconstructed using the domain-invariant features and domain-specific features to obtain the reconstructed features, and then compare the initial features and the original features. Reconstruct the features to determine the difference between the information contained in the initial features and the information contained in the domain invariant features and domain-specific features, that is, reconstruction loss; then use the reconstruction loss to train the neural network to make the domain
  • the domain invariant features extracted by the invariant features and the domain specific features extracted by the domain specific feature extractor can better cover the feature information of the training data.
  • This application reduces the difference between the information contained in the initial features and the information jointly contained in the domain invariant features and domain specific features, so that the decoupled domain invariant features and domain specific features can contain all the features of the training data Information to improve the completeness and rationality of feature decoupling.
  • FIG. 10 is a schematic structural diagram of a neural network provided by an embodiment of this application.
  • the neural network includes a first decoupler U1 and a second decoupler U2, through the joint action of the first decoupler U1 and the second decoupler U2 to complete the domain invariant features and domain specific Feature extraction process.
  • FIG. 11 is a schematic diagram of the extraction process of domain invariant features and domain specific features based on the neural network architecture shown in FIG. 10 according to an embodiment of the application. As shown in Figure 11, the extraction process of the domain invariant features and domain specific features may include the following steps:
  • Step 1101 Extract the first feature of the training data from the training data.
  • the neural network includes a feature extractor Feature extractor Used to specifically complete the extraction of the first feature from the training data First feature It is the feature basis for subsequent domain invariant feature enhancement.
  • the qualifier "first" in the first feature means the feature extractor It is the result of "preliminary" feature extraction on the training data. For example, when the training data is image data, the first feature is actually the result of feature extraction on the image texture level.
  • Step 1102 Use the first decoupler U1 to extract preliminary domain invariant features and preliminary domain specific features from the first feature.
  • the first decoupler U1 includes a domain invariant feature extractor And domain specific feature extractor Respectively used to extract the invariant features of the preliminary domain And preliminary domain-specific features Preliminary domain invariant features And preliminary domain-specific features
  • the respective extraction process can be expressed by the following formula (4):
  • the first decoupler U1 can be trained using mutual information loss to ensure the extraction accuracy of the preliminary domain invariant features and the preliminary domain specific features.
  • the mutual information (MI) loss characterizes the interdependence of two variables
  • the mutual information loss here characterizes the invariant features of the preliminary domain.
  • preliminary domain-specific features difference between. Therefore, adjusting the weight vector of the network structure in the first decoupler U1 according to the mutual information loss can make the extracted preliminary domain invariant features Can be better with preliminary domain-specific features Distinguish, play a role in forcing the decoupling of features.
  • the weight vector of the network structure of the first decoupler U1 needs to be adjusted to reduce the mutual information loss.
  • the domain classifier and the gradient reversal layer can also be used.
  • GTL gradient reversal layer
  • Step 1103 The preliminary domain invariant feature is merged with the first feature to obtain the second feature.
  • the second feature F 1 of the fusion process can be expressed by the following formula (5):
  • the specific method of feature fusion can be selected according to the requirements of actual application scenarios. For example, on the basis of keeping the number of channels unchanged, the initial domain invariant feature With the first feature Superimpose to form the second feature with the same number of channels; the initial domain invariant feature can also be With the first feature
  • the second feature of increasing the number of channels is formed by "splicing" in a connected manner. This application does not strictly limit the specific implementation of the fusion process.
  • Step 1104 Extract the third feature of the training data from the second feature.
  • the neural network also includes a feature extractor Feature extractor Used to specifically complete the extraction of the third feature from the second feature F 1 Third feature It serves as the basis for the extraction of subsequent domain invariant features and domain-specific features. It should be understood that the qualifier "third" in the third feature means that the third feature is extracted based on the second feature including the first feature, and the extraction process will be more refined, for example, when the training data is For image data, the third feature may be an extracted feature map that represents the semantic level of the image.
  • the feature extraction process can be expressed by the following formula (6):
  • Step 1105 Use the second decoupler U2 to extract domain-invariant features and domain-specific features from the third feature.
  • the second decoupler U2 includes a domain invariant feature extractor And domain specific feature extractor Respectively used to extract domain invariant features And domain-specific features Domain invariant features And domain-specific features
  • the respective extraction process can be expressed by the following formula (7):
  • domain invariant features Perform tasks to obtain task loss, and calculate mutual information (MI) loss between domain-invariant features and domain-specific features.
  • MI mutual information
  • the domain invariant feature Used to perform tasks and obtain task losses, which can improve domain invariant characteristics The accuracy and completeness of the instance characterization related to the task.
  • domain invariant features In order to ensure domain invariant characteristics It can correspond to the instance more accurately, and it can also calculate the domain invariant features during the training process And domain-specific features Mutual information loss between the two, and use the mutual information loss to further improve the domain invariant features Accuracy of extraction.
  • the first decoupler U1 in the training process of the neural network based on task loss and mutual information loss, is used to extract preliminary domain invariant features Domain invariant feature extractor And/or used to extract preliminary domain-specific features Domain specific feature extractor It can also participate in the parameter tuning process in the training process to ensure that the first decoupler U1 has invariant features for the preliminary domain. The extraction accuracy of, thereby further improving the data enhancement effect of the domain invariant feature realized by the first decoupler U1.
  • the extraction accuracy of the domain classifier and gradient reversal layer can also be used in the second decoupler U2.
  • domain classifier and domain specific feature extractor Between adversarial strategies to improve domain-specific feature extractors For domain specific features The extraction accuracy of, thus combined with the training process based on mutual information loss to indirectly improve the domain invariant features The purpose of the extraction accuracy.
  • the domain invariant feature And domain-specific features in order to further promote the decoupling of the domain invariant features And domain-specific features It can contain all the feature information of the training data to improve the completeness and rationality of feature decoupling, and it can train the neural network to reduce the third feature Contained information and domain invariant features And domain-specific features The difference between the information contained in common.
  • the domain invariant feature And domain-specific features After being extracted, domain invariant features can be used And domain-specific features
  • the difference between the information contained in common is the reconstruction loss.
  • the calculation process of reconstruction loss can be expressed by the following formula (8):
  • R represents the reconstruction network
  • F r is the feature after reconstruction
  • L recon is the reconstruction loss, which is reflected in the reconstruction feature F r and the third feature The L2 distance.
  • the reconstruction loss is used to train the neural network to make the domain invariant features And domain-specific features It can better cover the feature information of the training data.
  • the concept of "two-layer domain invariant feature decoupling” can be used to train a neural network to extract domain invariant features.
  • the preliminary domain invariant feature is merged with the first feature to obtain the second feature, so that the domain invariant feature information is in the first A feature level has been enhanced.
  • this second feature uses this second feature to decouple the domain invariant features based on the second decoupler U2, and the decoupling accuracy of the domain invariant features is further enhanced, which can make the task execution performance of the trained neural network stronger and domain adaptive Ability is also better.
  • the training process related to the neural network is described in detail.
  • the training process of the neural network can include: (1) training related to task loss and domain classification loss; (2) related to mutual information loss Training; (3) Training related to reconstruction loss.
  • the above-mentioned three kinds of training can be carried out at the same time or carried out in stages, which is not limited in the embodiment of the present application.
  • the training sequence of the above three types of training will be illustrated with examples in conjunction with FIG. 12.
  • the training process of the neural network can be divided into the following three stages in sequence.
  • the first stage Control the neural network to perform training related to task loss and domain classification loss.
  • This training stage aims to allow the neural network to learn the ability to decompose domain-invariant features and domain-specific features from training data. Therefore, the first The stage can also be called the feature decomposition stage (referred to as stage-fd, fd stands for feature decomposition).
  • the second stage Control the neural network to perform training related to the loss of mutual information.
  • This training stage aims to allow the first neural network to learn the ability to increase the difference between domain-invariant features and domain-specific features. Therefore, the second phase also It can be called the feature decomposition stage (referred to as stage-fs, fs stands for feature separation).
  • stage-fs the feature decomposition stage
  • fs stands for feature separation
  • the third stage Control the neural network to perform training related to reconstruction loss.
  • This training stage aims to make the domain-invariant features and domain-specific features decomposed by the neural network contain all the information in the initial features as much as possible. Therefore, the third stage It can also be called the feature reconstruction stage (referred to as stage-fr, fr stands for feature reconstruction).
  • Carrying out the training process of the neural network in stages can simplify the amount of training in each stage and speed up the convergence speed of the parameters of the neural network.
  • FIG. 13 is a schematic structural diagram of a neural network provided by an embodiment of this application.
  • the neural network is trained by using the training method provided in the above-mentioned embodiment of the present application.
  • the neural network 130 includes:
  • the first feature extraction layer 1301 is used to extract first features based on input data.
  • the first domain invariant feature decoupling layer 1302 is used to extract the first domain invariant feature based on the first feature.
  • the feature fusion layer 1303 is used to fuse the first feature and the invariant feature of the first domain to obtain the second feature.
  • the second feature extraction layer 1304 is used to extract a third feature based on the second feature.
  • the second domain invariant feature decoupling layer 1305 is used to extract the second domain invariant feature based on the third feature.
  • the first domain invariant feature and the second domain invariant feature are respectively features that characterize the field to which the input data belongs, and the first domain specific feature and the second domain specific feature are respectively features that have nothing to do with the field to which the input data belongs.
  • the neural network has the ability to decompose domain invariant features and domain specific features,
  • the trained neural network shown in Figure 13 does not actually need to extract domain-specific features.
  • the first feature is extracted by the first feature extraction layer 1301
  • the first domain invariant characteristics are extracted based on the first feature, and the domain invariant feature enhancement is realized by fusion with the first feature; and then based on the second feature
  • the features further extract the invariant features of the second domain, and the extracted invariant features of the second domain can accurately correspond to the examples, so that the neural network has stronger performance when performing specific tasks and better domain adaptability.
  • domain adaptive learning actually solves the cross-domain migration capability of neural networks.
  • To improve the domain generalization ability of the neural network it is not only necessary to train based on the feature information of the source domain, but also based on the feature information of the target domain. Therefore, when training the neural network, add the training data between the source domain and the target The training data of the intermediate domain between domains. By generating training data located in the intermediate domain, the "domain gap" between the source domain and the target domain is filled, and the problem of large distribution differences between the training data of the source domain and the training data of the target domain is alleviated.
  • FIG. 14 is a schematic diagram of a process for obtaining data of an intermediate domain according to an embodiment of the application.
  • FIG. 15 is a schematic diagram of a principle for obtaining data of an intermediate domain provided by an embodiment of this application. As shown in FIG. 14 and FIG. 15, the process of obtaining data of the intermediate domain may include the following steps:
  • Step 1401 Obtain data of the source domain and/or data of the target domain.
  • the source domain and the target domain are two domains with differences in data characteristics, and the difference in data characteristics between the intermediate domain and any one of the source domain and the target domain is smaller than the difference in data characteristics between the source domain and the target domain.
  • the data of the intermediate domain is actually generated by adding disturbances on the basis of the data of the source domain and/or the data of the target domain. Therefore, the data of the source domain and/or the data of the target domain must be obtained first.
  • Step 1402 Input the data of the source domain and/or the data of the target domain into the neural network for training, so as to obtain gradient information of the loss function.
  • the data to be generated is the intermediate domain between the source domain and the target domain, it is necessary to obtain the gradient information of the loss function to guide the subsequent perturbation process to generate the intermediate domain data.
  • Step 1403 Perturb the data of the source domain and/or the data of the target domain according to the gradient information to obtain the data of the intermediate domain.
  • Perturb the data in the source domain or the data in the target domain to generate new data, and these newly generated data can be used as the data in the intermediate domain.
  • the introduction of directional information between the source domain and the target domain makes the perturbation of the data more targeted.
  • the data of the intermediate domain obtained through the disturbance can fill the "domain gap" between the source domain and the target domain, and alleviate There is a big difference between the distribution of the data in the source domain and the data in the target domain.
  • the data of the source domain, the data of the target domain, and the data of the intermediate domain can be used as training data to train the neural network, so that the trained neural network can have better domain adaptability.
  • the labeled data X s of the source domain can be input to the neural network TNet for training to obtain the gradient information of the loss function.
  • the neural network TNet is generated by training based on labeled data X l of the target domain, and may include a feature extractor FT and a classifier CT .
  • the feature information P T extracted by the feature extractor FT is input into the classifier C T to obtain the cross-entropy loss L ce of the classification task to guide the tuning process of TNet.
  • the neural network TNet is calculated based on the input X l to calculate the task loss and adjust the network parameters, the neural network TNet is actually more suitable for the target domain. In this way, input X s into the neural network TNet will generate the source domain The first gradient information to the target domain.
  • X s is regarded as an object that can be optimized, and a certain magnitude of gradient disturbance is superimposed on X s according to the first gradient information of the task loss backpropagation, and the new disturbance after this disturbance from the source domain to the target domain is superimposed.
  • the samples can be used as intermediate domain data, as shown in AAT in Figure 15.
  • the neural network TNet is generated by training the labeled data of the target domain. Therefore, the first gradient information obtained after inputting the labeled data of the source domain into the neural network can be very good. A good measure of the direction from the source domain to the target domain.
  • unlabeled data X u of the target domain can be input into the neural network HNet. Since X u is unlabeled, virtual confrontation training can be used to obtain the gradient information.
  • the neural network HNet may be generated by training based on the labeled data X s of the source domain.
  • hNET may include a feature extractor F H and classifier C H, the process of training, the feature extractor F H extracted feature information P H is input to the classifier C H to obtain a classification
  • the task's cross-entropy loss L ce is used to guide the HNet parameter tuning process.
  • X s is input to HNet to calculate the task loss, and the network parameters of HNet are updated according to the task loss.
  • the labeled data X l of the target domain can also be used to train the neural network HNet together with the labeled data X s of the source domain to further improve the accuracy of the neural network HNet executing tasks.
  • the virtual confrontation training method is used to generate the predicted virtual label
  • the task loss is calculated based on the virtual label
  • the second gradient information back-propagated based on the task loss is generated on X u
  • the new sample after superimposing this disturbance from the target domain to the source domain can be used as the intermediate domain data, as shown in E-VAT in Figure 15.
  • the neural network HNet is generated from the labeled data of the source domain and the labeled data of the target domain. Therefore, the unlabeled data of the target domain is input into the neural network and then undergoes virtual confrontation training.
  • the obtained back-propagated second gradient information can be a good measure of the direction from the target domain to the source domain.
  • the labeled data X l of the target domain may also be input into an auxiliary neural network to obtain the loss function Gradient information.
  • the auxiliary neural network is generated based on the labeled data X s of the source domain. Since the auxiliary neural network is calculated based on the input X s to calculate the task loss and adjust the network parameters, the auxiliary neural network is actually more suitable for the source domain. Inputting X l into the auxiliary neural network will generate the transition from the target domain to the target domain. The gradient information of the source domain. At this time, X l is regarded as an object that can be optimized, and a certain magnitude of gradient disturbance is superimposed on X l according to the gradient information of the task loss backpropagation. The new sample after superimposing this disturbance from the target domain to the source domain can also be used. Used as intermediate domain data.
  • the embodiment shown in Figure 15 actually proposes a "two-way confrontation training" method to generate data in the intermediate domain, that is, the gradient information of the network is used to guide the perturbation direction of the sample, and the sample generated after superimposed perturbation is used as The data of the intermediate domain.
  • the gradient information of the network is used to guide the perturbation direction of the sample
  • the sample generated after superimposed perturbation is used as The data of the intermediate domain.
  • circles and triangles represent different sample categories.
  • the gradient information can be used to obtain the perturbation direction from the source domain to the target domain (as shown by the direction of the arrow from left to right in Figure 16), and then in the source domain Perturbation is added to the data to generate the data of the intermediate domain; at the same time, the gradient information can also be used to obtain the perturbation direction from the target domain to the source domain (as shown by the arrow direction from right to left in Figure 16), and then in the target domain Perturbations are added to the data to generate intermediate domain data.
  • the auxiliary network obtained through training can give the gradient direction from the source domain to the target domain or from the target domain to the source domain, and use the gradient direction to perturb the data of the source domain or the data of the target domain to generate a confrontation Samples; virtual confrontation training can also be used to generate confrontation samples from the target domain to the source domain, so as to generate confrontation samples in the "domain gap" between the source domain and the target domain in both directions to construct the intermediate domain.
  • the acquired data of the intermediate domain, together with the data of the source domain and the data of the target domain can be input into the neural network shown in FIG. 9 and pass the training provided by the embodiment of this application
  • the neural network is trained in the way of neural network, so as to realize the combination of "two-way confrontation training" and "two-layer domain invariant feature decoupling". Since the data of the feature decoupling includes the data of the intermediate domain, the data of the source domain and the data of the target domain can be effectively supplemented, the difference between the source domain and the target domain is reduced, and the data of the intermediate domain is used as the training data.
  • the training of feature decoupling can greatly improve the domain invariant feature decoupling ability, so that the domain generalization performance and cross-domain migration ability of the trained neural network are more significantly improved.
  • the neural network HNet generated by training based on the labeled data X s of the source domain Later, random noise disturbances can also be generated near X s , and these noise disturbances are correspondingly superimposed on X s to generate adversarial samples in the neighborhood. And the adversarial samples of the neighborhood are also input into the neural network for training as part of the training data.
  • the adversarial samples in the neighborhood can be input into HNet, and the feature extractor F H in HNet based on the feature map extracted by the adversarial samples in the neighborhood is input into the classifier C H to obtain
  • the cross-entropy loss L at of the classification task guides the adjustment process of HNet's network parameters, so that HNet can be further trained.
  • random noise disturbances can also be generated near X l , and these noise disturbances are correspondingly superimposed on X l to supplement the adversarial samples in the neighborhood.
  • the embodiment of the present application can also generate adversarial samples in the neighborhood based on the data of the source domain and the target domain, so as to effectively supplement the data of the source domain and the target domain, and reduce the difference between the source domain and the target domain.
  • the difference makes the domain generalization performance and cross-domain migration ability of the trained neural network further improved.
  • FIG. 17 is a schematic structural diagram of a data processing system provided by an embodiment of this application. As shown in FIG. 17, the data processing system 170 is used to train a neural network, and includes: a data acquisition network 1701 and a feature decoupling network 1702.
  • the data acquisition network 1701 is used to acquire the gradient information of the loss function based on the first data, and perturb the input data according to the gradient information to acquire the second data.
  • the adversarial sample that fills the "domain gap" of the first data as the new first data Two data, so that the training process can have better domain adaptability.
  • the feature decoupling network 1702 is used to train a neural network according to the training data including the second data, so that the neural network learns to decompose domain invariant features and domain specific features from the training data.
  • the feature decoupling network 1702 includes: a first feature extraction layer 17021 for extracting a first feature based on training data; a first domain invariant feature extraction layer 17022 for extracting a first feature based on the first feature A domain invariant feature; the first domain specific feature extraction layer 17023, used to extract the first domain specific feature based on the first feature; the first mutual information loss acquisition layer 17024, based on the first domain invariant feature and the first domain Specific features obtain the first mutual information loss; the feature fusion layer 17025 is used to fuse the first feature and the first domain invariant feature to obtain the second feature; the second feature extraction layer 17026 is used to extract the third feature based on the second feature Features; the second domain invariant feature decoupling layer 17027 is used to extract the second domain invariant features based on the third feature; the second domain specific feature extraction layer 17028 is used to extract the second domain specific features based on the third feature; The second mutual information loss acquisition layer 17029 is used to acquire
  • the data processing system 170 may further include: a first domain classifier 17031, configured to perform a classification task based on a specific feature of the first domain to obtain a first classification loss; a first gradient reversal layer 17032, Used to reverse the gradient information of the first classification loss;
  • the data processing system 170 may further include: a second domain classifier 17033, configured to perform a classification task based on specific features of the second domain to obtain the second classification loss; and a second gradient reversal layer 17034, configured to convert the second domain The gradient information of the binary loss is inverted.
  • a second domain classifier 17033 configured to perform a classification task based on specific features of the second domain to obtain the second classification loss
  • a second gradient reversal layer 17034 configured to convert the second domain The gradient information of the binary loss is inverted.
  • the data processing system 170 may further include: a reconstruction loss acquisition layer 17035 for reconstructing the third feature using the second domain invariant feature and the second domain specific feature to obtain the reconstructed feature; The third feature and the reconstruction feature are used to obtain the reconstruction loss.
  • the first data includes data of the source domain and/or data of the target domain.
  • the data acquisition network 1701 includes: a first training network generated based on the labeled data training of the target domain; and/or a second training network generated based on the labeled data training.
  • the first training network or the second training network may include a feature extractor and a classifier. In the training process, the feature information extracted by the feature extractor is input into the classifier to obtain the cross-entropy loss of the classification task to guide the parameter adjustment process of the first training network or the second training network.
  • the data processing system 170 shown in FIG. 17 realizes the combination of "adversarial training to fill the domain gap" and "two-layer domain invariant feature decoupling". Since the feature-decoupled training data includes data that can fill the gap in the field of the first data, it can effectively complement the original training data, reduce the difference between training data in different fields, and use data to obtain network output The training of feature decoupling on data can greatly improve the domain invariant feature decoupling ability, so that the domain generalization performance and cross-domain migration ability of the trained neural network are more significantly improved.
  • FIG. 18 is a schematic structural diagram of a neural network training device provided by an embodiment of the application. As shown in Fig. 18, the neural network training device 180 includes:
  • the obtaining module 1801 is configured to obtain training data
  • the training module 1802 is configured to use training data to train the neural network, so that the neural network learns to decompose domain invariant features and domain specific features from the training data.
  • the neural network training device 180 provided by the embodiment of this application decomposes domain invariant features and domain specific features from training data. Since the neural network obtained by the training method of this application uses domain invariant features to perform tasks, this avoids The influence of domain-specific features on the neural network is improved, and the migration performance of the neural network between different fields is improved.
  • the training module 1802 is configured to decompose domain-invariant features and domain-specific features from the training data; use the domain-invariant features to perform tasks, obtain the task loss, and calculate the domain-invariant features and domain-specific features.
  • Mutual information loss between features is used to represent the difference between domain invariant features and domain-specific features; according to task loss and mutual information loss, neural networks are trained.
  • the training module 1802 is further configured to perform domain classification using domain-specific features to obtain domain classification loss; and train a neural network based on task loss, mutual information loss, and domain classification loss.
  • the training module 1802 is further configured to extract initial features from the training data; decompose the initial features into domain-invariant features and domain-specific features; and train a neural network to reduce the information contained in the initial features The difference between the information contained together with the domain invariant feature and the domain specific feature.
  • the training module 1802 is configured to reconstruct the initial features using domain-invariant features and domain-specific features to obtain reconstructed features; compare the initial features and the reconstructed features to determine the information and domains contained in the initial features The difference between the information contained in invariant features and domain-specific features.
  • the training module 1802 is further configured to reconstruct the initial features using domain invariant features and domain specific features to obtain reconstructed features, where the domain invariant features and domain specific features are decomposed from the initial features The characteristics of; compare the initial feature and the reconstruction feature to obtain the reconstruction loss.
  • the reconstruction loss is used to characterize the difference between the information contained in the initial feature and the information contained in the domain invariant feature and the domain specific feature.
  • the training module is configured to train the neural network in the first stage according to the task loss; train the neural network in the second stage according to the mutual information loss; and train the neural network in the third stage according to the reconstruction loss.
  • the neural network includes a first decoupler and a second decoupler
  • the training module 1602 is configured to extract the first feature of the training data from the training data; Extract the preliminary domain invariant features and preliminary domain specific features from the features; fuse the preliminary domain invariant features with the first feature to obtain the second feature; extract the third feature of the training data from the second feature; use the second decoupler Extract domain-invariant features and domain-specific features from the third feature.
  • the training module 1802 is further configured to train a neural network to reduce the difference between the information contained in the third feature and the information jointly contained in the domain invariant feature and the domain specific feature.
  • FIG. 19 is a schematic structural diagram of a data acquisition device provided by an embodiment of this application. As shown in FIG. 19, the data acquisition device 190 includes:
  • the data acquisition module 1901 is configured to acquire the data of the source domain and/or the data of the target domain; among them, the source domain and the target domain are two domains with different data characteristics, and the intermediate domain and any one of the source domain and the target domain The difference in data characteristics between is smaller than the difference in data characteristics between the source domain and the target domain;
  • the gradient information acquisition module 1902 is configured to input data of the source domain and/or data of the target domain into the neural network for training, so as to acquire gradient information of the loss function;
  • the intermediate domain data generating module 1903 is configured to perturb the data of the source domain and/or the data of the target domain according to the gradient information to obtain the data of the intermediate domain.
  • the gradient information acquisition module 1902 is configured to input the labeled data of the source domain into the first neural network for training to obtain the first gradient information, where the first neural network is based on the target domain
  • the labeled data is generated by training.
  • the gradient information acquisition module 1902 is configured to input unlabeled data of the target domain into the second neural network, and perform training in a virtual confrontation training manner to obtain the second gradient information, where the second neural network The network is generated based on training with labeled data.
  • each module in the neural network training device 180/190 has been described in detail in the neural network training method described above, and therefore, repeated descriptions thereof will be omitted here.
  • FIG. 20 is a schematic diagram of the hardware structure of a neural network training device provided by an embodiment of the application.
  • the neural network training device 2000 shown in FIG. 20 includes a memory 2001, a processor 2002, a communication interface 2003, and a bus 2004.
  • the memory 2001, the processor 2002, and the communication interface 2003 realize the communication connection between each other through the bus 2004.
  • the memory 2001 may be a read only memory (ROM), a static storage device, a dynamic storage device, or a random access memory (RAM).
  • the memory 2001 may store a program.
  • the processor 2002 and the communication interface 2003 are used to execute each step of the neural network training method of the embodiment of the present application.
  • the processor 2002 can adopt a general-purpose central processing unit (central processing unit, CPU), microprocessor, application specific integrated circuit (application specific integrated circuit, ASIC), graphics processing unit (GPU), or one or more
  • the integrated circuit is used to execute related programs to realize the functions required by the units in the neural network training device of the embodiment of the present application, or to execute the neural network training method of the method embodiment of the present application.
  • the processor 2002 may also be an integrated circuit chip with signal processing capability. In the implementation process, each step of the neural network training method of the present application can be completed by the integrated logic circuit of hardware in the processor 2002 or instructions in the form of software.
  • the aforementioned processor 2002 may also be a general-purpose processor, a digital signal processing (digital signal processing, DSP), an application-specific integrated circuit (ASIC), a ready-made programmable gate array (field programmable gate array, FPGA) or other programmable logic devices , Discrete gates or transistor logic devices, discrete hardware components.
  • DSP digital signal processing
  • ASIC application-specific integrated circuit
  • FPGA ready-made programmable gate array
  • FPGA field programmable gate array
  • the methods, steps, and logical block diagrams disclosed in the embodiments of the present application can be implemented or executed.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
  • the storage medium is located in the memory 2001, and the processor 2002 reads the information in the memory 2001, and combines its hardware to complete the functions required by the units included in the neural network training device of the embodiment of the present application, or perform the functions of the method embodiment of the present application. Training method of neural network.
  • the communication interface 2003 uses a transceiver device such as but not limited to a transceiver to implement communication between the device 2000 and other devices or a communication network.
  • a transceiver device such as but not limited to a transceiver to implement communication between the device 2000 and other devices or a communication network.
  • the training data can be obtained through the communication interface 2003.
  • the bus 2004 may include a path for transferring information between various components of the device 2000 (for example, the memory 2001, the processor 2002, and the communication interface 2003).
  • the acquisition module 1801 and the training module 1802 in the neural network training device 180 can be equivalent to processor 2002.
  • the device 2000 shown in FIG. 20 only shows a memory, a processor, and a communication interface, in the specific implementation process, those skilled in the art should understand that the device 2000 also includes other devices necessary for normal operation. . At the same time, according to specific needs, those skilled in the art should understand that the device 2000 may also include hardware devices that implement other additional functions. In addition, those skilled in the art should understand that the device 2000 may also include only the components necessary to implement the embodiments of the present application, and does not necessarily include all the components shown in FIG. 20.
  • the apparatus 2000 is equivalent to the training device 220 in FIG. 2.
  • a person of ordinary skill in the art may be aware that the units and algorithm steps of the examples described in combination with the embodiments disclosed herein can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered as going beyond the scope of this application.
  • the disclosed system, device, and method can be implemented in other ways.
  • the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of this application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory, random access memory, magnetic disk or optical disk and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The present application discloses a neural network training method and device, and a data acquisition method and device in the field of artificial intelligence. The neural network training method comprises: acquiring training data; and training the neural network by using the training data, so that the neural network learns to decompose domain-invariant representation and domain-specific representation from the training data. By decomposing the domain-invariant representation and the domain-specific representation from the training data, the domain-invariant representation can be decoupled from the domain-specific representation, wherein the domain-specific representation refers to features characterizing a domain to which the training data belongs, and the domain-invariant representation refers to features irrelevant to the domain to which the training data belongs. As the neural network trained by the method of the present application uses domain-invariant representation obtained by feature decoupling to execute a task, the influence of domain-specific representation on the neural network is avoided, thereby improving the migration performance of the neural network between different domains.

Description

神经网络的训练方法、数据获取方法和装置Neural network training method, data acquisition method and device
本申请要求于2020年6月24日提交中国专利局、申请号为202010594053.6、发明名称为“神经网络的训练方法、数据获取方法和装置”的中国专利申请优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on June 24, 2020, the application number is 202010594053.6, and the invention title is "Neural Network Training Method, Data Acquisition Method and Device", the entire content of which is incorporated by reference in In this application.
技术领域Technical field
本申请涉及人工智能领域,具体涉及一种神经网络的训练方法、数据获取方法和装置。This application relates to the field of artificial intelligence, in particular to a neural network training method, data acquisition method and device.
背景技术Background technique
人工智能(artificial intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个分支,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式作出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。人工智能领域的研究包括机器人,自然语言处理,计算机视觉,决策与推理,人机交互,推荐与搜索,AI基础理论等。Artificial intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge, and use knowledge to obtain the best results. In other words, artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a similar way to human intelligence. Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making. Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision-making and reasoning, human-computer interaction, recommendation and search, and basic AI theories.
例如,在计算机视觉相关的应用场景中,以机器学习方式训练的神经网络可用于完成目标分类/检测/识别/分割/预测等多种任务。在很多应用场景下,训练样本和测试样本很可能来自不同的域,这会为神经网络的实际应用带来问题。例如,在车辆检测的应用场景下,源域数据可能是晴天拍摄的交通场景图像,而目标域数据却是雾天拍摄的交通场景图像。此时使用源域数据训练得到的目标检测模型就很难在目标域数据场景下取得好的效果。为了解决这种由于训练样本和测试样本之间的域偏差所带来的模型应用问题,域自适应(domain adaptation,DA)学习作为机器学习的重要研究领域在近几年受到了广泛的关注。For example, in computer vision-related application scenarios, neural networks trained by machine learning can be used to complete multiple tasks such as target classification/detection/recognition/segmentation/prediction. In many application scenarios, training samples and test samples are likely to come from different domains, which will cause problems for the practical application of neural networks. For example, in the application scenario of vehicle detection, the source domain data may be a traffic scene image taken on a sunny day, while the target domain data may be a traffic scene image taken on a foggy day. At this time, the target detection model trained with source domain data is difficult to achieve good results in the target domain data scenario. In order to solve this model application problem caused by the domain deviation between training samples and test samples, domain adaptation (DA) learning as an important research field of machine learning has received extensive attention in recent years.
域自适应学习通常使用分布对齐的方法来对齐源域和目标域的数据之间的概率分布,以缓解域偏差对域自适应学习任务带来的不利影响。由于这种分布对齐的过程只是在整体特征表示层面进行的,使得域自适应学习任务不可避免受到不同领域的特定特征的影响,因此,训练出的神经网络仍然存在迁移性能差的问题。Domain adaptive learning usually uses a distribution alignment method to align the probability distribution between the source domain and target domain data, so as to alleviate the adverse effects of domain deviation on the domain adaptive learning task. Since this distribution alignment process is only performed at the overall feature representation level, the domain adaptive learning task is inevitably affected by specific features in different fields. Therefore, the trained neural network still has the problem of poor migration performance.
发明内容Summary of the invention
本申请提供一种神经网络的训练方法、数据获取方法和装置,能够更好地提升神经网络在不同领域之间的迁移性能。This application provides a neural network training method, data acquisition method and device, which can better improve the migration performance of the neural network between different fields.
第一方面,提供了一种神经网络的训练方法,包括:获取训练数据;使用所述训练 数据对神经网络进行训练,使得所述神经网络从所述训练数据中学习分解域不变特征和域特定特征;其中,所述域特定特征为表征所述训练数据所属的领域的特征,所述域不变特征为与所述训练数据所属领域无关的特征。In a first aspect, a neural network training method is provided, including: obtaining training data; training the neural network using the training data, so that the neural network learns to decompose domain invariant features and domains from the training data Specific features; wherein, the domain specific features are features that characterize the domain to which the training data belongs, and the domain invariant features are features that have nothing to do with the domain to which the training data belongs.
通过从训练数据中分解域不变特征和域特定特征,使得域不变特征能够与域特定特征解耦。由于本申请的训练方法得到的神经网络使用域不变特征来执行任务,这样避免了域特定特征对于神经网络的影响,从而提升了神经网络在不同领域之间的迁移性能。By decomposing domain-invariant features and domain-specific features from training data, domain-invariant features can be decoupled from domain-specific features. Since the neural network obtained by the training method of the present application uses domain invariant features to perform tasks, the influence of domain-specific features on the neural network is avoided, thereby improving the migration performance of the neural network between different domains.
结合本申请的第一方面,在一种可能的实现方式中,所述使用所述训练数据对神经网络进行训练包括:从所述训练数据中分解出域不变特征和域特定特征;使用所述域不变特征执行任务,得到任务损失,并计算所述域不变特征和所述域特定特征之间的互信息损失,所述任务损失用于表征使用所述域不变特征执行任务所得到的结果与任务标签之间的差距,所述互信息损失用于表示所述域不变特征和所述域特定特征之间的差异;根据所述任务损失和所述互信息损失,训练所述神经网络。With reference to the first aspect of the present application, in a possible implementation manner, the training of the neural network using the training data includes: decomposing domain-invariant features and domain-specific features from the training data; The domain invariant feature performs a task to obtain a task loss, and the mutual information loss between the domain invariant feature and the domain specific feature is calculated, and the task loss is used to characterize the use of the domain invariant feature to perform the task. The difference between the obtained result and the task label, the mutual information loss is used to represent the difference between the domain invariant feature and the domain specific feature; according to the task loss and the mutual information loss, training the The neural network.
通过根据任务损失和互信息损失来训练神经网络,不仅可使得分解出的域不变特征更加精准的与实例对应,还可在训练的过程中减少域不变特征和域特定特征之间的互信息损失,以促进域不变特征和域特定特征的完全解耦,进一步降低域特定特征对域不变特征的影响。By training the neural network according to the task loss and mutual information loss, not only can the decomposed domain invariant features correspond to the instance more accurately, but also reduce the interaction between domain invariant features and domain specific features during the training process. Information loss to promote the complete decoupling of domain-invariant features and domain-specific features, and further reduce the influence of domain-specific features on domain-invariant features.
结合本申请的第一方面,在一种可能的实现方式中,所述方法还包括:使用所述域特定特征进行域分类,得到域分类损失;其中,所述根据所述任务损失和所述互信息损失,训练所述神经网络,包括:根据所述任务损失、所述互信息损失和所述域分类损失训练所述神经网络。With reference to the first aspect of the present application, in a possible implementation manner, the method further includes: using the domain-specific features to perform domain classification to obtain a domain classification loss; wherein, according to the task loss and the Mutual information loss, training the neural network includes: training the neural network according to the task loss, the mutual information loss, and the domain classification loss.
通过引入域分类损失,有助于从训练数据的特征中提取域不变特征。By introducing domain classification loss, it is helpful to extract domain invariant features from the features of training data.
结合本申请的第一方面,在一种可能的实现方式中,所述从所述训练数据中分解出域不变特征和域特定特征,包括:从所述训练数据中提取初始特征;将所述初始特征分解成所述域不变特征和所述域特定特征,其中,所述方法还包括:训练所述神经网络,以减小所述初始特征所包含的信息与所述域不变特征和所述域特定特征共同包含的信息之间的差异。With reference to the first aspect of the present application, in a possible implementation manner, said decomposing domain-invariant features and domain-specific features from said training data includes: extracting initial features from said training data; The initial feature is decomposed into the domain invariant feature and the domain specific feature, wherein the method further includes: training the neural network to reduce the information contained in the initial feature and the domain invariant feature The difference between the information contained in common with the domain-specific features.
通过减小初始特征所包含的信息与域不变特征和域特定特征共同包含的信息之间的差异,可使得解耦出的域不变特征和域特定特征能够包含训练数据的全部特征信息,以提高特征解耦的完整性和合理性。By reducing the difference between the information contained in the initial features and the information jointly contained in the domain invariant features and domain specific features, the decoupled domain invariant features and domain specific features can contain all the feature information of the training data. To improve the completeness and rationality of feature decoupling.
结合本申请的第一方面,在一种可能的实现方式中,在所述训练所述神经网络,以减小所述初始特征所包含的信息与所述域不变特征和所述域特定特征共同包含的信息之间的差异之前,所述方法还包括:使用所述域不变特征和所述域特定特征对所述初始特征进行重建,得到重建特征;比较所述初始特征和所述重建特征,以确定所述初始特征所包含的信息与所述域不变特征和所述域特定特征共同包含的信息之间的差异。In combination with the first aspect of the present application, in a possible implementation manner, in the training of the neural network, the information contained in the initial feature, the domain invariant feature and the domain specific feature are reduced Before the difference between the commonly contained information, the method further includes: reconstructing the initial feature using the domain invariant feature and the domain specific feature to obtain a reconstructed feature; comparing the initial feature with the reconstructed feature Feature to determine the difference between the information contained in the initial feature and the information contained in the domain invariant feature and the domain specific feature.
使用该重建损失来训练神经网络,可使得解耦出的域不变特征和域特定特征能够包含训练数据的全部特征信息,以提高特征解耦的完整性和合理性。Using the reconstruction loss to train the neural network can make the decoupled domain-invariant features and domain-specific features contain all the feature information of the training data, so as to improve the completeness and rationality of feature decoupling.
结合本申请的第一方面,在一种可能的实现方式中,所述方法还包括:使用所述域不变特征和所述域特定特征对初始特征进行重建,得到重建特征,其中所述域不变特征和所述域特定特征是从所述初始特征中分解出的特征;比较所述初始特征和所述重建特 征以获取重建损失,所述重建损失用于表征所述初始特征所包含的信息与所述域不变特征和所述域特定特征共同包含的信息之间的差异,其中,所述根据所述任务损失和所述互信息损失,训练所述神经网络,包括:根据所述任务损失,对所述神经网络进行第一阶段的训练;根据所述互信息损失,对所述神经网络进行第二阶段的训练,其中,所述方法还包括:根据所述重建损失,对所述神经网络进行第三阶段的训练。With reference to the first aspect of the present application, in a possible implementation manner, the method further includes: using the domain invariant feature and the domain specific feature to reconstruct the initial feature to obtain a reconstructed feature, wherein the domain The invariant feature and the domain-specific feature are features decomposed from the initial feature; the initial feature and the reconstruction feature are compared to obtain a reconstruction loss, and the reconstruction loss is used to characterize what the initial feature contains The difference between the information and the information jointly contained in the domain invariant feature and the domain specific feature, wherein the training the neural network according to the task loss and the mutual information loss includes: according to the Task loss, training the neural network in the first stage; training the neural network in the second stage based on the mutual information loss, wherein the method further includes: performing training on the neural network based on the reconstruction loss The neural network is trained in the third stage.
将神经网络的训练过程分阶段进行,可以简化每个阶段的训练量,加快神经网络的参数的收敛速度。Carrying out the training process of the neural network in stages can simplify the amount of training in each stage and speed up the convergence speed of the parameters of the neural network.
结合本申请的第一方面,在一种可能的实现方式中,所述神经网络包括第一解耦器和第二解耦器,所述从所述训练数据中分解出域不变特征和域特定特征,包括:从所述训练数据中提取所述训练数据的第一特征;采用所述第一解耦器从所述第一特征中提取初步域不变特征和初步域特定特征;将所述初步域不变特征与所述第一特征融合,得到第二特征;从所述第二特征中提取所述训练数据的第三特征;采用第二解耦器从所述第三特征中提取所述域不变特征和所述域特定特征。With reference to the first aspect of the present application, in a possible implementation manner, the neural network includes a first decoupler and a second decoupler, and the domain invariant features and domains are decomposed from the training data. The specific feature includes: extracting the first feature of the training data from the training data; using the first decoupler to extract preliminary domain invariant features and preliminary domain specific features from the first feature; The preliminary domain invariant feature is fused with the first feature to obtain a second feature; the third feature of the training data is extracted from the second feature; the third feature is extracted from the third feature by using a second decoupler The domain invariant feature and the domain specific feature.
通过先获取第一特征,并基于第一解耦器解耦出初步域不变特征,将该初步域不变特征与第一特征融合以获取第二特征,使得域不变特征信息在第一特征的层面上得到了增加。然后再使用该第二特征基于第二解耦器解耦出域不变特征,域不变特征的解耦精度得到进一步增强,可使得训练出的神经网络的任务执行性能更强,域适应能力也更优。By first obtaining the first feature, and decoupling the preliminary domain invariant feature based on the first decoupler, the preliminary domain invariant feature is merged with the first feature to obtain the second feature, so that the domain invariant feature information is in the first The level of characteristics has been increased. Then use this second feature to decouple the domain invariant features based on the second decoupler. The decoupling accuracy of the domain invariant features is further enhanced, which can make the trained neural network perform better in task execution and domain adaptability. Also better.
结合本申请的第一方面,在一种可能的实现方式中,所述方法还包括:训练所述神经网络,以减小所述第三特征所包含的信息与所述域不变特征和所述域特定特征共同包含的信息之间的差异。With reference to the first aspect of the application, in a possible implementation manner, the method further includes: training the neural network to reduce the information contained in the third feature and the domain invariant feature The difference between the information contained in the specific features of the domain.
通过减小第三特征所包含的信息与域不变特征和域特定特征共同包含的信息之间的差异,可进一步促使解耦出的域不变特征和域特定特征能够包含训练数据的全部特征信息,以提高特征解耦的完整性和合理性。By reducing the difference between the information contained in the third feature and the information jointly contained in the domain invariant features and domain specific features, the decoupled domain invariant features and domain specific features can be further promoted to contain all the features of the training data Information to improve the completeness and rationality of feature decoupling.
结合本申请的第一方面,在一种可能的实现方式中,所述神经网络用于进行域自适应学习,所述训练数据包括不同领域的图像数据。With reference to the first aspect of the present application, in a possible implementation manner, the neural network is used for domain adaptive learning, and the training data includes image data in different fields.
通过提取不同领域的图像数据的域不变特征和域特定特征,并基于使用域不变特征执行任务得到的任务损失对神经网络进行训练,使得域不变特征能够与域特定特征解耦。由于使用域不变特征来执行任务,本申请的训练方法得到的神经网络可通过域自适应学习来自行适应对于多种不同领域图像的处理任务,从而实现对不同领域的图像数据的自适应处理。By extracting domain invariant features and domain specific features of image data in different fields, and training the neural network based on the task loss obtained by using domain invariant features to perform tasks, the domain invariant features can be decoupled from domain specific features. Due to the use of domain invariant features to perform tasks, the neural network obtained by the training method of this application can self-adapt to processing tasks for images in a variety of different fields through domain adaptive learning, thereby realizing adaptive processing of image data in different fields .
第二方面,提供了一种数据获取方法,包括:获取源域的数据和/或目标域的数据;将所述源域的数据和/或所述目标域的数据输入神经网络进行训练,以获取损失函数的梯度信息;根据所述梯度信息,对所述源域的数据和/或所述目标域的数据进行扰动,得到中间域的数据;其中,所述源域和所述目标域为数据特征存在差异的两个领域,所述中间域与所述源域和所述目标域中的任一领域之间的数据特征的差异小于所述源域和所述目标域之间的数据特征的差异。In a second aspect, a data acquisition method is provided, including: acquiring data of a source domain and/or data of a target domain; inputting the data of the source domain and/or the data of the target domain into a neural network for training, Obtain the gradient information of the loss function; according to the gradient information, perturb the data of the source domain and/or the data of the target domain to obtain the data of the intermediate domain; wherein the source domain and the target domain are Two areas with different data characteristics, the difference in data characteristics between the intermediate domain and any one of the source domain and the target domain is smaller than the data characteristics between the source domain and the target domain The difference.
源域和目标域之间的方向信息的引入使得训练数据的扰动更有针对性,通过扰动获得的中间域的训练数据能够填补源域和目标域之间的“领域鸿沟”,缓解源域的训练数据和目标域的训练数据的分布差异大的问题。The introduction of the direction information between the source domain and the target domain makes the disturbance of the training data more targeted. The training data of the intermediate domain obtained through the disturbance can fill the "domain gap" between the source domain and the target domain, and alleviate the source domain There is a big difference between the distribution of training data and the training data of the target domain.
结合本申请的第二个面,在一种可能的实现方式中,所述将所述源域的数据和/或所述目标域的数据输入神经网络进行训练,以获取损失函数的梯度信息,包括:将所述源域的带有标签的数据输入第一神经网络,进行训练,得到第一梯度信息,其中,所述第一神经网络是基于所述目标域的带有标签的数据训练生成的。In conjunction with the second aspect of the present application, in a possible implementation manner, said inputting the data of the source domain and/or the data of the target domain into a neural network for training, so as to obtain gradient information of the loss function, The method includes: inputting the labeled data of the source domain into a first neural network and performing training to obtain first gradient information, wherein the first neural network is generated based on the labeled data of the target domain through training of.
第一神经网络是由目标域的带有标签的数据训练生成的,因此,将源域的带有标签的数据输入该第一神经网络之后得到的所述第一梯度信息可以很好地衡量从源域到目标域的方向。The first neural network is generated by training the labeled data of the target domain. Therefore, the first gradient information obtained after inputting the labeled data of the source domain into the first neural network can be a good measure of The direction from the source domain to the target domain.
结合本申请的第二方面,在一种可能的实现方式中,所述将所述源域的数据和/或所述目标域的数据输入神经网络进行训练,以获取损失函数的梯度信息,包括:将所述目标域的没有标签的数据输入第二神经网络,以虚拟对抗训练的方式进行训练,得到第二梯度信息,其中,所述第二神经网络是基于带有标签的数据训练生成的。With reference to the second aspect of the present application, in a possible implementation manner, said inputting the data of the source domain and/or the data of the target domain into a neural network for training to obtain gradient information of a loss function includes : Input the unlabeled data of the target domain into the second neural network, and perform training in the manner of virtual confrontation training to obtain the second gradient information, where the second neural network is generated based on the labeled data training .
第二神经网络是由源域的带有标签的数据训练生成的,因此,将目标域的没有标签的数据输入该第二神经网络之后经过虚拟对抗训练得到的所述第二梯度信息可以很好地衡量从目标域到源域的方向。The second neural network is generated by training with labeled data in the source domain. Therefore, the second gradient information obtained through virtual confrontation training after inputting unlabeled data in the target domain into the second neural network can be very good Measure the direction from the target domain to the source domain.
第三方面,提供了一种神经网络的训练装置,包括:用于执行第一方面的模块计算模块。In a third aspect, a neural network training device is provided, including: a module calculation module for executing the first aspect.
第四方面,提供了一种数据获取装置,包括:用于执行第二方面所述的方法的模块计算模块。In a fourth aspect, a data acquisition device is provided, including: a module calculation module for executing the method described in the second aspect.
第五方面,提供了一种神经网络的训练装置,包括:存储器,用于存储程序;处理器,用于执行所述存储器存储的程序,当所述存储器存储的程序被执行时,所述处理器用于执行第一方面或第二方面所述的方法。In a fifth aspect, a neural network training device is provided, including: a memory for storing a program; a processor for executing the program stored in the memory, and when the program stored in the memory is executed, the processing The device is used to perform the method described in the first aspect or the second aspect.
第六方面,提供了一种神经网络,包括:第一特征提取层,用于基于输入数据提取第一特征;第一域不变特征解耦层,用于基于所述第一特征提取第一域不变特征;特征融合层,用于融合所述第一特征和所述第一域不变特征,以获取第二特征;第二特征提取层,用于基于所述第二特征提取第三特征;第二域不变特征解耦层,用于基于所述第三特征提取第二域不变特征;其中,所述第一域不变特征和所述第二域不变特征分别为表征所述输入数据所属的领域的特征,所述第一域特定特征和所述第二域特定特征分别为与所述输入数据所属领域无关的特征。In a sixth aspect, a neural network is provided, including: a first feature extraction layer for extracting a first feature based on input data; a first domain invariant feature decoupling layer for extracting a first feature based on the first feature Domain invariant features; a feature fusion layer for fusing the first feature and the first domain invariant feature to obtain a second feature; a second feature extraction layer for extracting a third feature based on the second feature Features; the second domain invariant feature decoupling layer, used to extract the second domain invariant features based on the third feature; wherein, the first domain invariant features and the second domain invariant features are respectively a characterization The features of the field to which the input data belongs, the first domain specific feature and the second domain specific feature are respectively features that have nothing to do with the field to which the input data belongs.
第七方面,提供了一种数据处理系统,包括:数据获取网络,用于基于第一数据获取损失函数的梯度信息,并根据所述梯度信息对所述第一数据进行扰动以获取第二数据;特征解耦网络,用于使用包括所述第二数据的训练数据对神经网络进行训练,使得所述神经网络从所述训练数据中学习分解域不变特征和域特定特征;其中,所述域特定特征为表征所述训练数据所属的领域的特征,所述域不变特征为与所述训练数据所属领域无关的特征。In a seventh aspect, a data processing system is provided, including: a data acquisition network for acquiring gradient information of a loss function based on first data, and perturbing the first data according to the gradient information to acquire second data The feature decoupling network is used to train the neural network using the training data including the second data, so that the neural network learns to decompose domain invariant features and domain specific features from the training data; wherein, the The domain-specific features are features that characterize the domain to which the training data belongs, and the domain invariant features are features that have nothing to do with the domain to which the training data belongs.
结合本申请的第七方面,在一种可能的实现方式中,所述特征解耦网络包括:第一特征提取层,用于基于所述训练数据提取第一特征;第一域不变特征提取层,用于基于所述第一特征提取第一域不变特征;第一域特定特征提取层,用于基于所述第一特征提取第一域特定特征;第一互信息损失获取层,用于基于所述第一域不变特征和所述第一域特定特征获取第一互信息损失;特征融合层,用于融合所述第一特征和所述第一域不 变特征,以获取第二特征;第二特征提取层,用于基于所述第二特征提取第三特征;第二域不变特征解耦层,用于基于所述第三特征提取第二域不变特征;第二域特定特征提取层,用于基于所述第三特征提取第二域特定特征;第二互信息损失获取层,用于基于所述第二域不变特征和所述第二域特定特征获取第二互信息损失;任务损失获取层,用于使用所述第二域不变特征执行任务以获取任务损失。With reference to the seventh aspect of the present application, in a possible implementation manner, the feature decoupling network includes: a first feature extraction layer for extracting first features based on the training data; first domain invariant feature extraction Layer, used to extract the first domain invariant feature based on the first feature; the first domain specific feature extraction layer, used to extract the first domain specific feature based on the first feature; the first mutual information loss acquisition layer, using To obtain the first mutual information loss based on the first domain invariant feature and the first domain specific feature; the feature fusion layer is used to fuse the first feature and the first domain invariant feature to obtain the first Two features; a second feature extraction layer for extracting a third feature based on the second feature; a second domain invariant feature decoupling layer for extracting a second domain invariant feature based on the third feature; second The domain-specific feature extraction layer is used to extract second domain-specific features based on the third feature; the second mutual information loss acquisition layer is used to obtain the second domain-specific features based on the second domain invariant features and the second domain specific features. Two mutual information loss; the task loss acquisition layer, which is used to perform tasks using the invariant features of the second domain to obtain the task loss.
结合本申请的第七方面,在一种可能的实现方式中,所述数据处理系统进一步包括:第一域分类器,用于基于所述第一域特定特征执行分类任务以获取第一分类损失;第一梯度反转层,用于将所述第一分类损失的梯度信息取反;和/或,第二域分类器,用于基于所述第二域特定特征执行分类任务以获取第二分类损失;第二梯度反转层,于将所述第二分类损失的梯度信息取反。With reference to the seventh aspect of the present application, in a possible implementation manner, the data processing system further includes: a first domain classifier, configured to perform a classification task based on the specific characteristics of the first domain to obtain the first classification loss The first gradient inversion layer is used to invert the gradient information of the first classification loss; and/or the second domain classifier is used to perform classification tasks based on the specific features of the second domain to obtain the second Classification loss; a second gradient inversion layer, in which the gradient information of the second classification loss is inverted.
结合本申请的第七方面,在一种可能的实现方式中,所述数据处理系统进一步包括:重建损失获取层,用于使用所述第二域不变特征和所述第二域特定特征对所述第三特征进行重建,得到重建特征;比较所述第三特征和所述重建特征,以获取重建损失。With reference to the seventh aspect of the present application, in a possible implementation manner, the data processing system further includes: a reconstruction loss acquisition layer, configured to use the second domain invariant feature and the second domain specific feature pair The third feature is reconstructed to obtain a reconstructed feature; the third feature and the reconstructed feature are compared to obtain a reconstruction loss.
结合本申请的第七方面,在一种可能的实现方式中,所述第一数据包括源域的数据和/或目标域的数据,其中,所述数据获取网络包括:基于所述目标域的带有标签的数据训练生成的第一训练网络;和/或,基于带有标签的数据训练生成的第二训练网络。With reference to the seventh aspect of the present application, in a possible implementation manner, the first data includes data of a source domain and/or data of a target domain, wherein the data acquisition network includes: A first training network generated by training on labeled data; and/or a second training network generated based on training on labeled data.
第八方面,提供了一种安防设备,包括第六方面所述的神经网络。In an eighth aspect, a security device is provided, including the neural network described in the sixth aspect.
第九方面,提供了一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序指令,所述计算机程序指令在被处理器运行时使得所述处理器执行第一方面或第二方面所述的方法。In a ninth aspect, a computer-readable storage medium is provided, the computer-readable storage medium stores computer program instructions, and when the computer program instructions are executed by a processor, the processor executes the first aspect or the first aspect. The method described in the two aspects.
第十方面,提供了一种计算机程序产品,包括计算机程序指令,所述计算机程序指令在被处理器运行时使得处理器执行第一方面或第二方面所述的方法。In a tenth aspect, a computer program product is provided, including computer program instructions that, when run by a processor, cause the processor to execute the method described in the first aspect or the second aspect.
第十一方面,提供一种芯片,所述芯片包括处理器与数据接口,所述处理器通过所述数据接口读取存储器上存储的指令,执行第一方面或第二方面所述的方法。In an eleventh aspect, a chip is provided. The chip includes a processor and a data interface. The processor reads instructions stored in a memory through the data interface and executes the method described in the first or second aspect.
可选地,作为一种实现方式,所述芯片还可以包括存储器,所述存储器中存储有指令,所述处理器用于执行所述存储器上存储的指令,当所述指令被执行时,所述处理器用于执行第一方面或第二方面所述的方法。Optionally, as an implementation manner, the chip may further include a memory in which instructions are stored, and the processor is configured to execute instructions stored on the memory. When the instructions are executed, the The processor is configured to execute the method described in the first aspect or the second aspect.
附图说明Description of the drawings
图1为一种人工智能主体框架示意图。Figure 1 is a schematic diagram of an artificial intelligence main frame.
图2为本申请一实施例提供的一种系统架构。Fig. 2 is a system architecture provided by an embodiment of the application.
图3为本申请一实施例提供的芯片硬件结构图。FIG. 3 is a diagram of the chip hardware structure provided by an embodiment of the application.
图4为本申请一实施例提供的一种系统架构。Fig. 4 is a system architecture provided by an embodiment of the application.
图5为本申请一实施例提供的一种神经网络的训练方法的流程示意图。FIG. 5 is a schematic flowchart of a neural network training method provided by an embodiment of this application.
图6为本申请一实施例提供的一种神经网络的训练方法的流程示意图。FIG. 6 is a schematic flowchart of a neural network training method provided by an embodiment of this application.
图7为本申请一实施例提供的神经网络的结构示意图。FIG. 7 is a schematic structural diagram of a neural network provided by an embodiment of this application.
图8为本申请一实施例提供的特征解耦原理示意图。FIG. 8 is a schematic diagram of the principle of feature decoupling provided by an embodiment of this application.
图9为本申请另一实施例提供的神经网络的结构示意图。Fig. 9 is a schematic structural diagram of a neural network provided by another embodiment of the application.
图10为本申请一实施例提供的神经网络的结构示意图。FIG. 10 is a schematic structural diagram of a neural network provided by an embodiment of this application.
图11为基于图10所示神经网络架构的提取域不变特征和域特定特征的流程示意图。FIG. 11 is a schematic diagram of the process of extracting domain invariant features and domain specific features based on the neural network architecture shown in FIG. 10.
图12为本申请一实施例提供的训练过程的原理示意图。FIG. 12 is a schematic diagram of the principle of the training process provided by an embodiment of the application.
图13所示为本申请一实施例提供的一种神经网络的结构示意图。FIG. 13 is a schematic structural diagram of a neural network provided by an embodiment of this application.
图14为本申请一实施例提供的获取中间域的数据的流程示意图。FIG. 14 is a schematic diagram of a process for obtaining data of an intermediate domain according to an embodiment of the application.
图15为本申请一实施例提供的神经网络的结构示意图。FIG. 15 is a schematic structural diagram of a neural network provided by an embodiment of this application.
图16为本申请另一实施例提供的双向对抗训练的示意图。FIG. 16 is a schematic diagram of two-way confrontation training provided by another embodiment of this application.
图17为本申请一实施例提供的一种数据处理系统的结构示意图。FIG. 17 is a schematic structural diagram of a data processing system provided by an embodiment of this application.
图18为本申请一实施例提供的神经网络的训练装置的结构示意图。FIG. 18 is a schematic structural diagram of a neural network training device provided by an embodiment of the application.
图19为本申请另一实施例提供的数据获取装置的结构示意图。FIG. 19 is a schematic structural diagram of a data acquisition device provided by another embodiment of this application.
图20为本申请一实施例提供的神经网络的训练装置的硬件结构示意图。FIG. 20 is a schematic diagram of the hardware structure of a neural network training device provided by an embodiment of the application.
具体实施方式detailed description
下面将结合附图,对本申请中的技术方案进行描述。The technical solution in this application will be described below in conjunction with the accompanying drawings.
图1为一种人工智能主体框架示意图。该主体框架描述了人工智能系统总体工作流程,适用于通用的人工智能领域需求。Figure 1 is a schematic diagram of an artificial intelligence main frame. The main framework describes the overall workflow of the artificial intelligence system, which is suitable for general artificial intelligence field requirements.
下面从“智能信息链”(水平轴)和“IT价值链”(垂直轴)两个维度对上述人工智能主题框架进行阐述。The following describes the above-mentioned artificial intelligence theme framework from the two dimensions of "intelligent information chain" (horizontal axis) and "IT value chain" (vertical axis).
“智能信息链”反映从数据的获取到处理的一列过程。举例来说,可以是智能信息感知、智能信息表示与形成、智能推理、智能决策、智能执行与输出的一般过程。在这个过程中,数据经历了“数据—信息—知识—智慧”的凝练过程。"Intelligent Information Chain" reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, the data has gone through the condensing process of "data-information-knowledge-wisdom".
“IT价值链”从人智能的低层基础设施、信息(提供和处理技术实现)到系统的产业生态过程,反映人工智能为信息技术产业带来的价值。"IT value chain" from the low-level infrastructure of human intelligence, information (providing and processing technology realization) to the industrial ecological process of the system, reflecting the value that artificial intelligence brings to the information technology industry.
(1)基础设施:(1) Infrastructure:
基础设施为人工智能系统提供计算能力支持,实现与外部世界的沟通,并通过基础平台实现支撑。通过传感器与外部沟通;计算能力由智能芯片(CPU、NPU、GPU、ASIC、FPGA等硬件加速芯片)提供;基础平台包括分布式计算框架及网络等相关的平台保障和支持,可以包括云存储和计算、互联互通网络等。举例来说,传感器和外部沟通获取数据,这些数据提供给基础平台提供的分布式计算系统中的智能芯片进行计算。The infrastructure provides computing power support for the artificial intelligence system, realizes communication with the outside world, and realizes support through the basic platform. Communicate with the outside through sensors; computing capabilities are provided by smart chips (CPU, NPU, GPU, ASIC, FPGA and other hardware acceleration chips); the basic platform includes distributed computing framework and network related platform guarantees and support, which can include cloud storage and Computing, interconnection network, etc. For example, sensors communicate with the outside to obtain data, and these data are provided to the smart chip in the distributed computing system provided by the basic platform for calculation.
(2)数据(2) Data
基础设施的上一层的数据用于表示人工智能领域的数据来源。数据涉及到图形、图像、语音、文本,还涉及到传统设备的物联网数据,包括已有系统的业务数据以及力、位移、液位、温度、湿度等感知数据。The data in the upper layer of the infrastructure is used to represent the data source in the field of artificial intelligence. The data involves graphics, images, voice, text, and IoT data of traditional devices, including business data of existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.
(3)数据处理(3) Data processing
数据处理通常包括数据训练,机器学习,深度学习,搜索,推理,决策等方式。Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making and other methods.
其中,机器学习和深度学习可以对数据进行符号化和形式化的智能信息建模、抽取、预处理、训练等。Among them, machine learning and deep learning can symbolize and formalize data for intelligent information modeling, extraction, preprocessing, training, etc.
推理是指在计算机或智能系统中,模拟人类的智能推理方式,依据推理控制策略,利用形式化的信息进行机器思维和求解问题的过程,典型的功能是搜索与匹配。Reasoning refers to the process of simulating human intelligent reasoning in a computer or intelligent system, using formal information to conduct machine thinking and solving problems based on reasoning control strategies. The typical function is search and matching.
决策是指智能信息经过推理后进行决策的过程,通常提供分类、排序、预测等功能。Decision-making refers to the process of making decisions after intelligent information is reasoned, and usually provides functions such as classification, ranking, and prediction.
(4)通用能力(4) General ability
对数据经过上面提到的数据处理后,进一步基于数据处理的结果可以形成一些通用的能力,比如可以是算法或者一个通用系统,例如,翻译,文本的分析,计算机视觉的处理,语音识别,图像的识别等等。After the above-mentioned data processing is performed on the data, some general capabilities can be formed based on the results of the data processing, such as an algorithm or a general system, for example, translation, text analysis, computer vision processing, speech recognition, image Recognition and so on.
(5)智能产品及行业应用(5) Smart products and industry applications
智能产品及行业应用指人工智能系统在各领域的产品和应用,是对人工智能整体解决方案的封装,将智能信息决策产品化、实现落地应用,其应用领域主要包括:智能制造、智能交通、智能家居、智能医疗、智能安防、自动驾驶,智慧城市,智能终端等。Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. It is an encapsulation of the overall solution of artificial intelligence, productizing intelligent information decision-making and realizing landing applications. Its application fields mainly include: intelligent manufacturing, intelligent transportation, Smart home, smart medical, smart security, autonomous driving, smart city, smart terminal, etc.
如前所述,对于域自适应学习任务,由于源域和目标域之间存在分布差异,源域上表现较好的模型如果直接应用在目标域的场景下会导致性能受限。在训练神经网络模型以进行域自适应学习时,采用的是分布对齐的策略,即将源域的数据和目标域的数据在特征表示的层面上对齐。由于这种分布对齐的过程只是在整体特征表示层面进行的,使得域自适应学习任务不可避免受到不同领域的特定特征的影响,因此,训练出的神经网络模型仍然存在迁移性能差的问题。As mentioned earlier, for domain adaptive learning tasks, due to the distribution difference between the source domain and the target domain, a model that performs well in the source domain will cause performance limitations if it is directly applied to the target domain. When training the neural network model for domain adaptive learning, a distributed alignment strategy is adopted, that is, the data of the source domain and the data of the target domain are aligned at the level of feature representation. Since this distribution alignment process is only performed at the overall feature representation level, the domain adaptive learning task is inevitably affected by specific features in different fields. Therefore, the trained neural network model still has the problem of poor migration performance.
针对上述技术问题,本申请提出一种训练神经网络模型的方式,能够在训练的过程中将域不变特征(域不变特征可以理解为与域无关的实例层面的特征)从数据的特征中解耦出来,使得域自适应学习任务可以不受不同领域的特定特征的影响,从而可以提高神经网络模型的迁移性能。In response to the above technical problems, this application proposes a method for training a neural network model, which can extract domain invariant features (domain invariant features can be understood as features at the instance level that are not related to the domain) from the features of the data during the training process. Decoupling makes the domain adaptive learning task not affected by the specific characteristics of different domains, thereby improving the migration performance of the neural network model.
应当理解,本申请实施例所训练的神经网络模型可以应用在各种不同的应用场景,根据具体的应用场景不同,神经网络模型也可有着不同的结构。例如,在图像分类应用场景下(例如车辆识别、人脸识别等),该神经网络模型可以为卷积神经网络模型,而在回归预测应用场景下(例如工业生产线的能耗预测、天气预测、山体滑坡预测等),该神经网络模型就可包括多层感知机的架构。本申请实施例对所训练的神经网络模型的具体应用场景和结构并不做限定。It should be understood that the neural network model trained in the embodiments of the present application can be applied to various different application scenarios, and the neural network model can also have different structures according to different specific application scenarios. For example, in image classification application scenarios (such as vehicle recognition, face recognition, etc.), the neural network model can be a convolutional neural network model, while in regression prediction application scenarios (such as energy consumption prediction of industrial production lines, weather prediction, etc.) Landslide prediction, etc.), the neural network model can include a multilayer perceptron architecture. The embodiments of the present application do not limit the specific application scenarios and structure of the trained neural network model.
由于本申请实施例涉及域自适应学习和神经网络方面的应用,为了便于理解,下面先对本申请实施例可能涉及的相关术语及神经网络等相关概念进行简单介绍。Since the embodiments of the present application involve applications in domain adaptive learning and neural networks, in order to facilitate understanding, the following briefly introduces related terms and neural networks and other related concepts that may be involved in the embodiments of the present application.
(1)域自适应学习(1) Domain adaptive learning
域自适应学习是用于解决训练样本和测试样本概率分布不一致问题的一种机器学习方式,旨在克服训练过程中源域样本的概率分布和目标域样本的概率分布的差异,以实现目标域上的学习任务。Domain adaptive learning is a machine learning method used to solve the problem of inconsistency in the probability distribution of training samples and test samples. It aims to overcome the difference between the probability distribution of source domain samples and the probability distribution of target domain samples in the training process to achieve the target domain Learning tasks.
(2)神经网络(2) Neural network
神经网络可以是由神经单元组成的,神经单元可以指以x s和截距1为输入的运算单元。该运算单元的输出可以利用下述公式(1)表示。 A neural network can be composed of neural units, which can refer to an arithmetic unit that takes x s and intercept 1 as inputs. The output of this arithmetic unit can be expressed by the following formula (1).
Figure PCTCN2021096019-appb-000001
Figure PCTCN2021096019-appb-000001
在公式(1)中,s=1、2、……n,n为大于1的自然数,W s为x s的权重,b为神经单元的偏置。f为神经单元的激活函数(activation function),用于将非线性特性引入神经网络中,来将神经单元中的输入信号转换为输出信号。该激活函数的输出信号可以作为下一层卷积层的输入。激活函数可以是sigmoid函数。神经网络是将许多个上述单一的神 经单元联结在一起形成的网络,即一个神经单元的输出可以是另一个神经单元的输入。每个神经单元的输入可以与前一层的局部接受域相连,来提取局部接受域的特征,局部接受域可以是由若干个神经单元组成的区域。 In formula (1), s=1, 2,...n, n is a natural number greater than 1, W s is the weight of x s , and b is the bias of the neural unit. f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal. The output signal of the activation function can be used as the input of the next convolutional layer. The activation function can be a sigmoid function. A neural network is a network formed by connecting many of the above-mentioned single neural units together, that is, the output of one neural unit can be the input of another neural unit. The input of each neural unit can be connected with the local receptive field of the previous layer to extract the characteristics of the local receptive field. The local receptive field can be a region composed of several neural units.
(3)深度神经网络(3) Deep neural network
深度神经网络(deep neural network,DNN),也称多层神经网络,可以理解为具有很多隐含层的神经网络,这里的“很多”并没有特别的度量标准。从DNN按不同层的位置划分,DNN内部的神经网络可以分为三类:输入层,隐含层,输出层。一般来说,第一层是输入层,最后一层是输出层,中间的层都是隐含层。层与层之间是全连接的,也就是说,第i层的任意一个神经元一定与第i+1层的任意一个神经元相连。虽然DNN看起来很复杂,但是每一层的工作均可基于下述公式(2)记载的线性关系表达式表达。Deep neural network (DNN), also known as multi-layer neural network, can be understood as a neural network with many hidden layers. There is no special metric for "many" here. Dividing DNN according to the location of different layers, the neural network inside DNN can be divided into three categories: input layer, hidden layer, and output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the middle layers are all hidden layers. The layers are fully connected, that is to say, any neuron in the i-th layer must be connected to any neuron in the i+1th layer. Although DNN looks very complicated, the work of each layer can be expressed based on the linear relationship expression described in the following formula (2).
Figure PCTCN2021096019-appb-000002
Figure PCTCN2021096019-appb-000002
在公式(2)中,
Figure PCTCN2021096019-appb-000003
表示输入向量,
Figure PCTCN2021096019-appb-000004
表示输出向量,b表示偏移向量,W表示权重矩阵(也称系数),α(.)表示激活函数。每一层仅仅是对输入向量
Figure PCTCN2021096019-appb-000005
经过如此简单的操作得到输出向量
Figure PCTCN2021096019-appb-000006
由于DNN层数多,则系数W和偏移向量b的数量也就很多了。这些参数在DNN中的定义如下所述。
In formula (2),
Figure PCTCN2021096019-appb-000003
Represents the input vector,
Figure PCTCN2021096019-appb-000004
Represents the output vector, b represents the offset vector, W represents the weight matrix (also called coefficient), and α(.) represents the activation function. Each layer is just the input vector
Figure PCTCN2021096019-appb-000005
After such a simple operation, the output vector is obtained
Figure PCTCN2021096019-appb-000006
Due to the large number of DNN layers, the number of coefficients W and offset vectors b is also large. The definition of these parameters in DNN is as follows.
以系数W为例:假设在一个三层的DNN中,第二层的第4个神经元到第三层的第2个神经元的线性系数定义为
Figure PCTCN2021096019-appb-000007
上标3代表系数W所在的层数,而下标24对应的是输出的第三层索引2和输入的第二层索引4。总结就是:第L-1层的第k个神经元到第L层的第j个神经元的系数定义为
Figure PCTCN2021096019-appb-000008
需要注意的是,输入层是没有W参数的。在深度神经网络中,更多的隐含层让网络更能够刻画现实世界中的复杂情形。理论上而言,参数越多的模型复杂度越高,“容量”也就越大,也就意味着它能完成更复杂的学习任务。训练深度神经网络的过程也就是学习权重矩阵的过程,其最终目的是得到训练好的深度神经网络的所有层的权重矩阵(由很多层的向量W形成的权重矩阵)。
Take the coefficient W as an example: Suppose in a three-layer DNN, the linear coefficient from the fourth neuron in the second layer to the second neuron in the third layer is defined as
Figure PCTCN2021096019-appb-000007
The superscript 3 represents the number of layers where the coefficient W is located, and the subscript 24 corresponds to the output third-level index 2 and the input second-level index 4. The summary is: the coefficient from the kth neuron in the L-1th layer to the jth neuron in the Lth layer is defined as
Figure PCTCN2021096019-appb-000008
It should be noted that there is no W parameter in the input layer. In deep neural networks, more hidden layers make the network more capable of portraying complex situations in the real world. In theory, a model with more parameters is more complex and has a greater "capacity", which means that it can complete more complex learning tasks. The process of training the deep neural network is also the process of learning the weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (the weight matrix formed by the vector W of many layers).
(4)损失函数(4) Loss function
在训练深度神经网络的过程中,因为希望深度神经网络的输出尽可能的接近真正想要预测的值,所以可以通过比较当前网络的预测值和真正想要的目标值,再根据两者之间的差异情况来更新每一层神经网络的权重向量(当然,在第一次更新之前通常会有初始化的过程,即为深度神经网络中的各层预先配置参数)。比如,如果网络的预测值高了,就调整权重向量让它预测低一些,不断的调整,直到深度神经网络能够预测出真正想要的目标值或与真正想要的目标值非常接近的值。因此,就需要预先定义“如何比较预测值和目标值之间的差异”,这便是损失函数(loss function)或目标函数(objective function),它们是用于衡量预测值和目标值的差异的重要方程。其中,以损失函数举例,损失函数的输出值(loss)越高表示差异越大,那么深度神经网络的训练就变成了尽可能缩小这个loss的过程。In the process of training a deep neural network, because it is hoped that the output of the deep neural network is as close as possible to the value that you really want to predict, you can compare the predicted value of the current network with the target value you really want, and then based on the difference between the two To update the weight vector of each layer of neural network (of course, there is usually an initialization process before the first update, that is, pre-configured parameters for each layer in the deep neural network). For example, if the predicted value of the network is high, adjust the weight vector to make it predict lower, and keep adjusting until the deep neural network can predict the really desired target value or a value very close to the really desired target value. Therefore, it is necessary to predefine "how to compare the difference between the predicted value and the target value". This is the loss function or objective function, which is used to measure the difference between the predicted value and the target value. Important equation. Among them, taking the loss function as an example, the higher the output value (loss) of the loss function, the greater the difference, then the training of the deep neural network becomes a process of reducing this loss as much as possible.
(5)反向传播算法(5) Backpropagation algorithm
卷积神经网络可以采用误差反向传播(back propagation,BP)算法在训练过程中修正初始的神经网络中参数的大小,使得神经网络的重建误差损失越来越小。具体地,前向传递输入信号直至输出会产生误差损失,通过反向传播误差损失信息来更新初始的神经网络中的参数,从而使误差损失收敛。反向传播算法是以误差损失为主导的反向传播运动,旨在得到最优的神经网络的参数,例如权重矩阵。Convolutional neural networks can use backpropagation (BP) algorithms to modify the size of the parameters in the initial neural network during the training process, so that the reconstruction error loss of the neural network becomes smaller and smaller. Specifically, forwarding the input signal to the output will cause error loss, and the parameters in the initial neural network are updated by backpropagating the error loss information, so that the error loss is converged. The backpropagation algorithm is a backpropagation motion dominated by error loss, aiming to obtain the optimal neural network parameters, such as the weight matrix.
(6)对抗样本(6) Adversarial samples
对抗样本是指在数据集中添加扰动所形成的输入样本,导致神经网络以高置信度给出一个错误的输出。由于神经网络的最终目的其实在于获得正确的输出结果,因此对抗样本用于以这种对抗训练的策略训练神经网络,使得神经网络适应这种扰动,从而对对抗样本具有鲁棒性。The adversarial sample refers to the input sample formed by adding disturbance to the data set, which causes the neural network to give an incorrect output with high confidence. Since the ultimate goal of the neural network is actually to obtain the correct output result, the adversarial sample is used to train the neural network with this adversarial training strategy, so that the neural network can adapt to this perturbation, thereby being robust to the adversarial sample.
(7)虚拟对抗训练(7) Virtual confrontation training
虚拟对抗训练是指一种不依赖训练标签的对抗训练方式。虚拟对抗训练通过基于神经网络的第一输出生成扰动,这种扰动使得所生成的对抗样本输入神经网络所获得的第二输出不同于之前的第一输出,以此实现对抗训练的策略。Virtual confrontation training refers to a confrontation training method that does not rely on training labels. Virtual confrontation training generates a disturbance based on the first output of the neural network. This disturbance makes the second output obtained by inputting the generated confrontation sample into the neural network different from the previous first output, so as to realize the strategy of confrontation training.
下面结合图2对本申请实施例提供的系统架构进行详细的介绍。The system architecture provided by the embodiment of the present application will be described in detail below in conjunction with FIG. 2.
图2为本申请一实施例提供的一种系统架构200。如图2所示,系统架构200包括执行设备210、训练设备220、数据库230、客户设备240、数据存储系统250以及数据采集系统260。执行设备210包括计算模块211、I/O接口212、预处理模块213和预处理模块214。计算模块211中可以包括目标模型/规则201,预处理模块213和预处理模块214是可选的。FIG. 2 is a system architecture 200 provided by an embodiment of the application. As shown in FIG. 2, the system architecture 200 includes an execution device 210, a training device 220, a database 230, a client device 240, a data storage system 250, and a data collection system 260. The execution device 210 includes a calculation module 211, an I/O interface 212, a preprocessing module 213, and a preprocessing module 214. The calculation module 211 may include the target model/rule 201, and the preprocessing module 213 and the preprocessing module 214 are optional.
数据采集设备260用于采集训练数据(或称用于训练的样本数据)并存入数据库230,本申请实施例中的训练数据可以包括不同领域的训练数据,如源域的训练数据和目标域的训练数据。训练设备220基于数据库230中维护的训练数据对目标模型/规则201进行训练,以使得该目标模型/规则201具备从输入的数据中解耦域不变特征和域特定特征的功能、以及利用该域不变特征完成实际应用场景需求的任务的功能,如具备完成目标分类/检测/识别/分割等任务的功能。The data collection device 260 is used to collect training data (or sample data for training) and store it in the database 230. The training data in the embodiment of the present application may include training data in different fields, such as training data in the source domain and the target domain. Training data. The training device 220 trains the target model/rule 201 based on the training data maintained in the database 230, so that the target model/rule 201 has the function of decoupling domain invariant features and domain specific features from the input data, and uses the Domain-invariant features can complete tasks required by actual application scenarios, such as the ability to complete tasks such as target classification/detection/recognition/segmentation.
目标模型/规则201可以是神经网络模型。神经网络模型中的每一层的工作可以用数学表达式
Figure PCTCN2021096019-appb-000009
来描述:从物理层面神经网络模型中的每一层的工作可以理解为通过五种对输入空间(输入向量的集合)的操作,完成输入空间到输出空间的变换(即矩阵的行空间到列空间),这五种操作包括:1、升维/降维;2、放大/缩小;3、旋转;4、平移;5、“弯曲”。其中1、2、3的操作由
Figure PCTCN2021096019-appb-000010
完成,4的操作由+b完成,5的操作则由a()来实现。这里之所以用“空间”二字来表述是因为被分类的对象并不是单个事物,而是一类事物,空间是指这类事物所有个体的集合。其中,W是权重向量,该向量中的每一个值表示该层神经网络中的一个神经元的权重值。该向量W决定着上文所述的输入空间到输出空间的空间变换,即每一层的权重W控制着如何变换空间。训练神经网络模型的目的,也就是最终得到训练好的神经网络的所有层的权重矩阵(由很多层的向量W形成的权重矩阵)。因此,神经网络的训练过程本质上就是学习控制空间变换的方式,更具体的就是学习权重矩阵。
The target model/rule 201 may be a neural network model. The work of each layer in the neural network model can be expressed in mathematical expressions
Figure PCTCN2021096019-appb-000009
To describe: From the physical level, the work of each layer in the neural network model can be understood as through five operations on the input space (the set of input vectors) to complete the transformation from the input space to the output space (that is, the row space of the matrix to the column Space), these five operations include: 1. Dimension Up/Down; 2. Enlarge/Reduce; 3. Rotate; 4. Translation; 5. "Bend". The operations of 1, 2, and 3 are determined by
Figure PCTCN2021096019-appb-000010
Completed, the operation of 4 is completed by +b, and the operation of 5 is realized by a(). The reason why the word "space" is used here is because the object to be classified is not a single thing, but a class of things, and space refers to the collection of all individuals of this class of things. Among them, W is a weight vector, and each value in the vector represents the weight value of a neuron in the layer of neural network. This vector W determines the spatial transformation from the input space to the output space described above, that is, the weight W of each layer controls how the space is transformed. The purpose of training the neural network model is to finally obtain the weight matrix of all layers of the trained neural network (the weight matrix formed by the vector W of many layers). Therefore, the training process of the neural network is essentially the way of learning the control space transformation, and more specifically the learning weight matrix.
因为希望神经网络模型的输出尽可能的接近真正想要预测的值,所以可以通过比较当前网络的预测值和真正想要的目标值,再根据两者之间的差异情况来更新每一层神经网络的权重向量(当然,在第一次更新之前通常会有初始化的过程,即为神经网络模型中的各层预先配置参数),比如,如果网络的预测值高了,就调整权重向量让它预测低一些,不断的调整,直到神经网络能够预测出真正想要的目标值。因此,就需要预先定义“如何比较预测值和目标值之间的差异”,这便是损失函数(loss function)或目标函数 (objective function),它们是用于衡量预测值和目标值的差异的重要方程。其中,以损失函数举例,损失函数的输出值(loss)越高表示差异越大,那么神经网络模型的训练就变成了尽可能缩小这个loss的过程。Because it is hoped that the output of the neural network model is as close as possible to the value that you really want to predict, you can compare the current network's predicted value with the really desired target value, and then update each layer of neural network according to the difference between the two. The weight vector of the network (of course, there is usually an initialization process before the first update, which is to pre-configure parameters for each layer in the neural network model). For example, if the predicted value of the network is high, adjust the weight vector to make it The prediction is lower and keep adjusting until the neural network can predict the target value you really want. Therefore, it is necessary to predefine "how to compare the difference between the predicted value and the target value". This is the loss function or objective function, which is used to measure the difference between the predicted value and the target value. Important equation. Among them, taking the loss function as an example, the higher the output value (loss) of the loss function, the greater the difference, and the training of the neural network model becomes a process of reducing this loss as much as possible.
训练设备220得到的目标模型/规则可以应用在不同的系统或设备中。在附图2中,执行设备210配置有I/O接口212,与外部设备进行数据交互,“用户”可以通过客户设备240向I/O接口212输入数据。The target model/rule obtained by the training device 220 can be applied to different systems or devices. In FIG. 2, the execution device 210 is configured with an I/O interface 212 to perform data interaction with external devices. The "user" can input data to the I/O interface 212 through the client device 240.
执行设备210可以调用数据存储系统250中的数据、代码等,也可以将数据、指令等存入数据存储系统250中。The execution device 210 can call data, codes, etc. in the data storage system 250, and can also store data, instructions, etc. in the data storage system 250.
计算模块211使用目标模型/规则201对输入的数据进行处理。在实际应用场景的工作过程中,计算模块211的具体输入数据与具体的应用场景相关。例如,在人脸识别的应用场景,计算模块211的输入数据就可能是包括人脸图像的图像数据。由于计算模块211是使用目标模型/规则201对输入的数据进行处理,因此计算模块其实也是基于输入的数据获取实例层面的特征,然后将该实例层面的特征用于执行具体的任务。The calculation module 211 uses the target model/rule 201 to process the input data. In the working process of the actual application scenario, the specific input data of the calculation module 211 is related to the specific application scenario. For example, in an application scenario of face recognition, the input data of the calculation module 211 may be image data including a face image. Since the calculation module 211 uses the target model/rule 201 to process the input data, the calculation module actually obtains instance-level features based on the input data, and then uses the instance-level features to perform specific tasks.
在本申请一实施例中,该系统架构200还可能包括一些与计算模块211连接的管理功能模块,以基于计算模块211的输出结果完成更灵活的细分任务。例如,当“用户”可以通过客户设备240向I/O接口212输入的数据是交通场景的图像数据时,图2所示的关联功能模块213就可配置为根据计算模块211所输出车辆对象的特征信息进一步识别车辆的车牌号和型号等信息;而关联功能模块214可配置为根据计算模块211所输出的行人的特征进一步识别行人的性别、身高和年龄等信息。然而,本申请对该系统架构是否包括这些关联功能模块,以及这些关联功能模块具体所执行的功能并不做限定。In an embodiment of the present application, the system architecture 200 may also include some management function modules connected to the calculation module 211 to complete more flexible subdivision tasks based on the output result of the calculation module 211. For example, when the data that the "user" can input to the I/O interface 212 through the client device 240 is image data of a traffic scene, the associated function module 213 shown in FIG. The characteristic information further identifies information such as the license plate number and model of the vehicle; and the correlation function module 214 may be configured to further identify the gender, height, and age of the pedestrian based on the characteristics of the pedestrian output by the calculation module 211. However, this application does not limit whether the system architecture includes these associated function modules, and the specific functions performed by these associated function modules.
最后,I/O接口212将处理结果返回给客户设备240,提供给用户。Finally, the I/O interface 212 returns the processing result to the client device 240 and provides it to the user.
更深层地,训练设备220可以针对不同的目标,基于不同的数据生成相应的目标模型/规则201,以给用户提供更佳的结果。At a deeper level, the training device 220 can generate corresponding target models/rules 201 based on different data for different targets, so as to provide users with better results.
在附图2中所示情况下,用户可以手动指定输入执行设备210中的数据,例如,在I/O接口212提供的界面中操作。另一种情况下,客户设备240可以自动地向I/O接口212输入数据并获得结果,如果客户设备240自动输入数据需要获得用户的授权,用户可以在客户设备240中设置相应权限。用户可以在客户设备240查看执行设备210输出的结果,具体的呈现形式可以是显示、声音、动作等具体方式。客户设备240也可以作为数据采集端将采集到样本数据存入数据库230。In the case shown in FIG. 2, the user can manually specify to input data in the execution device 210, for example, to operate in the interface provided by the I/O interface 212. In another case, the client device 240 can automatically input data to the I/O interface 212 and obtain the result. If the client device 240 automatically inputs data and needs the user's authorization, the user can set the corresponding authority in the client device 240. The user can view the result output by the execution device 210 on the client device 240, and the specific presentation form may be a specific manner such as display, sound, and action. The client device 240 can also serve as a data collection terminal to store the collected sample data in the database 230.
值得注意的是,附图2仅是本申请实施例提供的一种系统架构的示意图,图中所示设备、器件、模块等之间的位置关系不构成任何限制,例如,在附图2中,数据存储系统250相对执行设备210是外部存储器,在其它情况下,也可以将数据存储系统250置于执行设备210中。It is worth noting that Figure 2 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship between the devices, devices, modules, etc. shown in the figure does not constitute any limitation. For example, in Figure 2 The data storage system 250 is an external memory relative to the execution device 210. In other cases, the data storage system 250 may also be placed in the execution device 210.
下面结合图3介绍本申请实施例提供的一种芯片硬件结构。The following describes a chip hardware structure provided by an embodiment of the present application in conjunction with FIG. 3.
图3为本申请一实施例提供的芯片硬件结构图。如图3所示,该芯片包括神经网络处理器(neural-network processing unit,NPU)300。该芯片可以被设置在如图2所示的执行设备210中,用以完成计算模块211的计算工作。该芯片也可以被设置在如图2所示的训练设备220中,用以完成训练设备220的训练工作并输出目标模型/规则201。此外,下述图4、图9和图11所示的神经网络的训练方法均可在如图3所示的芯片中得以实现。FIG. 3 is a diagram of the chip hardware structure provided by an embodiment of the application. As shown in FIG. 3, the chip includes a neural-network processing unit (NPU) 300. The chip can be set in the execution device 210 as shown in FIG. 2 to complete the calculation work of the calculation module 211. The chip can also be set in the training device 220 shown in FIG. 2 to complete the training work of the training device 220 and output the target model/rule 201. In addition, the following neural network training methods shown in FIG. 4, FIG. 9 and FIG. 11 can all be implemented in the chip shown in FIG. 3.
神经网络处理器300作为协处理器挂载到主中央处理单元(host central processing unit,host CPU)上,由主CPU分配任务。神经网络处理器300的核心部分为运算电路303,控制器304控制运算电路303提取存储器(权重存储器302或输入存储器301)中的数据并进行运算。The neural network processor 300 is mounted on a main central processing unit (host central processing unit, host CPU) as a coprocessor, and the main CPU distributes tasks. The core part of the neural network processor 300 is the arithmetic circuit 303, and the controller 304 controls the arithmetic circuit 303 to extract data from the memory (weight memory 302 or input memory 301) and perform calculations.
在一些实现中,运算电路303内部包括多个处理单元(process engine,PE)。在一些实现中,运算电路303是二维脉动阵列。运算电路303还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中,运算电路303是通用的矩阵处理器。In some implementations, the arithmetic circuit 303 includes multiple processing units (process engines, PE). In some implementations, the arithmetic circuit 303 is a two-dimensional systolic array. The arithmetic circuit 303 may also be a one-dimensional systolic array or other electronic circuits capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 303 is a general-purpose matrix processor.
举例来说,假设有输入矩阵A,权重矩阵B,输出矩阵C。运算电路303从权重存储器302中取权重矩阵B相应的数据,并缓存在运算电路303中每一个PE上。运算电路303从输入存储器301中取输入矩阵A与权重矩阵B进行矩阵运算,以得到矩阵的部分结果或最终结果,并保存在累加器(accumulator)308中。For example, suppose there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit 303 fetches the data corresponding to the weight matrix B from the weight memory 302 and caches it on each PE in the arithmetic circuit 303. The arithmetic circuit 303 fetches the input matrix A and the weight matrix B from the input memory 301 to perform matrix operations to obtain partial results or final results of the matrix, and store them in an accumulator 308.
向量计算单元307可以对运算电路303的输出做进一步处理,如向量乘,向量加,指数运算,对数运算,大小比较等等。例如,向量计算单元307可以用于神经网络中非卷积/非FC层的网络计算,如池化(pooling),批归一化(batch normalization),局部响应归一化(local response normalization)等。The vector calculation unit 307 can perform further processing on the output of the operation circuit 303, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, and so on. For example, the vector calculation unit 307 can be used for network calculations in the non-convolutional/non-FC layer of the neural network, such as pooling, batch normalization, local response normalization, etc. .
在一些实现中,向量计算单元307能将经处理的输出的向量存储到统一存储器306。例如,向量计算单元307可以将非线性函数应用到运算电路303的输出,例如累加值的向量,用以生成激活值。在一些实现中,向量计算单元307生成归一化的值、合并值,或二者均有。在一些实现中,处理过的输出的向量能够用作到运算电路303的激活输入,例如用于在神经网络中的后续层中的使用。In some implementations, the vector calculation unit 307 can store the processed output vector to the unified memory 306. For example, the vector calculation unit 307 may apply a nonlinear function to the output of the arithmetic circuit 303, such as a vector of accumulated values, to generate the activation value. In some implementations, the vector calculation unit 307 generates a normalized value, a combined value, or both. In some implementations, the processed output vector can be used as an activation input to the arithmetic circuit 303, for example for use in a subsequent layer in a neural network.
统一存储器306用于存放输入数据以及输出数据。The unified memory 306 is used to store input data and output data.
权重数据直接通过存储单元访问控制器(direct memory access controller,DMAC)305将外部存储器中的输入数据搬运到输入存储器301和/或统一存储器306、将外部存储器中的权重数据存入权重存储器302,以及将统一存储器306中的数据存入外部存储器。The weight data directly transfers the input data in the external memory to the input memory 301 and/or the unified memory 306 through the direct memory access controller (DMAC) 305, and stores the weight data in the external memory into the weight memory 302, And the data in the unified memory 306 is stored in the external memory.
总线接口单元(bus interface unit,BIU)310,用于通过总线实现主CPU、DMAC和取指存储器309之间进行交互。The bus interface unit (BIU) 310 is used to implement interaction between the main CPU, the DMAC, and the instruction fetch memory 309 through the bus.
与控制器304连接的取指存储器(instruction fetch buffer)309,用于存储控制器304使用的指令。An instruction fetch buffer 309 connected to the controller 304 is used to store instructions used by the controller 304.
控制器304,用于调用取指存储器309中缓存的指令,实现控制该运算加速器的工作过程。The controller 304 is used to call the instructions cached in the instruction fetch memory 309 to control the working process of the computing accelerator.
一般地,统一存储器306、输入存储器301、权重存储器302以及取指存储器309均为片上(on-chip)存储器,外部存储器为该NPU外部的存储器,该外部存储器可以为双倍数据率同步动态随机存储器(double data rate synchronous dynamic random access memory,DDR SDRAM)、高带宽存储器(high bandwidth memory,HBM)或其他可读可写的存储器。Generally, the unified memory 306, the input memory 301, the weight memory 302, and the fetch memory 309 are all on-chip memories. The external memory is a memory external to the NPU. The external memory can be a double data rate synchronous dynamic random access memory. Memory (double data rate synchronous dynamic random access memory, DDR SDRAM), high bandwidth memory (HBM) or other readable and writable memory.
图4为本申请一实施例提供的一种系统架构400。执行设备410由一个或多个设置在云端的服务器实现,服务器也可与其它计算设备配合,例如:数据存储、路由器、负载均衡器等设备;执行设备410可以布置在一个物理站点上,或者分布在多个物理站点 上。在本申请一实施例中,执行设备410可以使用数据存储系统420中的数据,或者调用数据存储系统420中的程序代码实现本申请的实施例所提供的神经网络的训练方法;具体地,执行设备410可以根据数据存储系统420中的训练数据以本申请的实施例所提供的方法训练神经网络,以及根据本地设备401/402的请求完成对应的智能任务。在本申请另一实施例中,执行设备410也可并不具备训练神经网络的功能,但是可以根据本申请的实施例所提供的神经网络的训练方法训练好的神经网络完成对应的智能任务;具体地,执行设备410配置有本申请的实施例所提供的神经网络的训练方法训练好的神经网络后,在接收到本地设备401/402的请求后即可完成对应的智能任务,并反馈结果给本地设备401/402。FIG. 4 is a system architecture 400 provided by an embodiment of this application. The execution device 410 is implemented by one or more servers set in the cloud. The server can also cooperate with other computing devices, such as data storage, routers, load balancers and other devices; the execution device 410 can be arranged on a physical site or distributed On multiple physical sites. In an embodiment of the present application, the execution device 410 may use the data in the data storage system 420 or call the program code in the data storage system 420 to implement the neural network training method provided by the embodiment of the present application; specifically, execute The device 410 can train the neural network according to the training data in the data storage system 420 in the method provided in the embodiment of the present application, and complete the corresponding intelligent task according to the request of the local device 401/402. In another embodiment of the present application, the execution device 410 may not have the function of training a neural network, but the neural network trained according to the neural network training method provided by the embodiment of the present application can complete the corresponding intelligent task; Specifically, after the execution device 410 is configured with the neural network training method provided by the embodiment of the present application, after the neural network is trained, the corresponding intelligent task can be completed after receiving the request of the local device 401/402, and the result will be fed back. To the local device 401/402.
用户可以操作各自的用户设备(例如本地设备401和本地设备402)与执行设备410进行交互。每个本地设备可以表示任何计算设备,例如个人计算机、计算机工作站、智能手机、平板电脑、智能摄像头、智能汽车或其他类型蜂窝电话、媒体消费设备、可穿戴设备、机顶盒、游戏机等。在本申请一实施例中,本地设备可以是一种安防设备,例如监控摄像设备、烟雾报警设备或灭火设备等。The user can operate respective user devices (for example, the local device 401 and the local device 402) to interact with the execution device 410. Each local device can represent any computing device, such as personal computers, computer workstations, smart phones, tablets, smart cameras, smart cars or other types of cellular phones, media consumption devices, wearable devices, set-top boxes, game consoles, etc. In an embodiment of the present application, the local device may be a security device, such as a surveillance camera device, a smoke alarm device, or a fire extinguishing device.
每个用户的本地设备可以通过任何通信机制/通信标准的通信网络与执行设备410进行交互,通信网络可以是广域网、局域网、点对点连接等方式,或它们的任意组合。The local device of each user can interact with the execution device 410 through a communication network of any communication mechanism/communication standard. The communication network can be a wide area network, a local area network, a point-to-point connection, etc., or any combination thereof.
在另一种实现中,执行设备410的一个方面或多个方面可以由每个本地设备实现,例如,本地设备401可以为执行设备410提供本地数据或反馈计算结果。In another implementation, one or more aspects of the execution device 410 may be implemented by each local device. For example, the local device 401 may provide the execution device 410 with local data or feed back calculation results.
在另一种实现中,上述执行设备410的所有功能也可以由本地设备实现。本地设备401执行本申请的实施例所提供的神经网络的训练方法,并使用训练好的神经网络为用户提供服务。In another implementation, all the functions of the foregoing execution device 410 may also be implemented by a local device. The local device 401 executes the neural network training method provided in the embodiments of the present application, and uses the trained neural network to provide services to users.
图5为本申请一实施例提供的神经网络的训练方法的流程示意图。图5所示的神经网络的训练方法可由图2所示的训练设备220执行,该训练设备220训练得到的目标模型/规则201即为该神经网络。如图5所示,该神经网络的训练方法包括如下步骤:FIG. 5 is a schematic flowchart of a neural network training method provided by an embodiment of the application. The training method of the neural network shown in FIG. 5 can be executed by the training device 220 shown in FIG. 2, and the target model/rule 201 trained by the training device 220 is the neural network. As shown in Figure 5, the neural network training method includes the following steps:
步骤501:获取训练数据。Step 501: Obtain training data.
训练数据为训练过程的输入数据,训练数据可以由用户采集获取,也可采用现有的训练数据库。应当理解,根据实际场景的需求不同,训练数据可具备不同的格式和形式。例如在目标检测或目标识别场景下,训练数据可以是图像数据。而在回归预测场景下,训练数据就可为采集的过往的房价历史数据。The training data is the input data of the training process. The training data can be collected by the user, or an existing training database can be used. It should be understood that the training data may have different formats and forms according to different requirements of actual scenarios. For example, in a target detection or target recognition scenario, the training data may be image data. In the scenario of regression prediction, the training data can be the collected historical housing price data.
在本申请一实施例中,输入神经网络的训练数据可以包括不同领域(domain)的训练数据。例如,不同领域可以包括目标域和源域。以域自适应学习任务为例,训练数据可以包括源域的训练数据和目标域的训练数据。在本申请一实施例中,领域(domain)的差异可体现为场景(scenario)的差异。以车辆检测的应用场景为例,源域的训练数据就可能是大量的晴天场景下的交通场景图像,目标域的训练数据就可能是大量的雾天场景下的交通场景图像。然而应当理解,根据应用场景的不同,源域的训练数据和目标域的训练数据也可能在其他方面体现这种领域差异,例如,在回归预测场景下,源域的训练数据可能是采集于去年的生产线的能耗数据,而目标域的训练数据可能是采集于今年的生产线的能耗数据,此时领域差异体现在由于时间变换而出现的能耗数据的数值分布不一致。In an embodiment of the present application, the training data input to the neural network may include training data of different domains. For example, different domains can include target domains and source domains. Taking the domain adaptive learning task as an example, the training data may include the training data of the source domain and the training data of the target domain. In an embodiment of the present application, the difference in domains may be embodied as the difference in scenarios. Taking the application scenario of vehicle detection as an example, the training data of the source domain may be a large number of traffic scene images in a sunny scene, and the training data of the target domain may be a large number of traffic scene images in a foggy scene. However, it should be understood that depending on the application scenario, the training data of the source domain and the training data of the target domain may also reflect this domain difference in other aspects. For example, in a regression prediction scenario, the training data of the source domain may be collected last year. The energy consumption data of the production line, and the training data of the target domain may be the energy consumption data of the production line collected this year. At this time, the domain difference is reflected in the inconsistent value distribution of the energy consumption data due to time changes.
步骤502:使用训练数据对神经网络进行训练,使得神经网络从训练数据中学习分解域不变特征和域特定特征。Step 502: Use the training data to train the neural network so that the neural network learns to decompose domain invariant features and domain specific features from the training data.
在训练神经网络的过程中,神经网络可以采用监督学习、半监督学习或无监督学习等方式中的任意一种对训练数据进行学习。以域自适应学习任务为例,训练数据可以包括带有足量标签的源域的训练数据,以及带有少量标签的目标域的训练数据,此时,神经网络可以采用半监督学习的方式对训练数据进行学习;或者,训练数据可以包括带有足量标签的源域的训练数据,以及不带标签的目标域的训练数据,此时,神经网络可以采用无监督学习的方式对训练数据进行学习。In the process of training the neural network, the neural network can use any of the methods of supervised learning, semi-supervised learning or unsupervised learning to learn from the training data. Taking the domain adaptive learning task as an example, the training data can include the training data of the source domain with sufficient labels and the training data of the target domain with a small number of labels. In this case, the neural network can use semi-supervised learning to perform Training data for learning; alternatively, the training data can include the training data of the source domain with sufficient labels and the training data of the target domain without labels. At this time, the neural network can use unsupervised learning on the training data. learn.
域不变特征(domain-invariant representation,DIR)为与训练数据所属领域无关的特征,为不会因为领域差异而产生变化的特征。域不变特征有时也可称为与任务相关的实例层面的特征。以车辆检测的应用场景为例,领域差异体现在晴天拍摄的交通场景图像和雾天拍摄的交通场景图像的天气变化所呈现出来的图像差异,而交通场景中的车辆的特征是不会随着天气的变化而变化的,同时,目标检测任务的目标对象(即实例)就是图像中的车辆,因此车辆的特征就是要提取出的域不变特征。在目标检测的域自适应学习场景中,所要实现的就是无论在实际工作中获取到的图像是拍摄于晴天还是雾天,这个训练好的神经网络都能够准确地提取出车辆的特征以完成目标检测任务。Domain-invariant representation (DIR) is a feature that has nothing to do with the domain to which the training data belongs, and is a feature that does not change due to domain differences. Domain-invariant features can sometimes be referred to as task-related instance-level features. Taking the application scenario of vehicle detection as an example, the domain difference is reflected in the image difference between the weather changes between the traffic scene image taken on a sunny day and the traffic scene image taken on a foggy day, and the characteristics of the vehicle in the traffic scene will not follow The weather changes. At the same time, the target object (ie, instance) of the target detection task is the vehicle in the image, so the characteristics of the vehicle are the domain invariant features to be extracted. In the domain adaptive learning scene of target detection, what we want to achieve is that regardless of whether the image obtained in actual work is taken on a sunny day or a foggy day, this trained neural network can accurately extract the characteristics of the vehicle to complete the target. Inspection task.
域特定特征(domain-specific representation,DSR)为表征训练数据所属的领域的特征,是训练数据所属的域所特有的特征,会因为领域差异而产生变化;同时域特定特征也是与实例不相关的特征,在实际任务执行过程中也是与任务的目标不相关的。例如在前述车辆检测的应用场景下,交通场景图像中车辆周遭的环境(树木、天空、街景等)的特征就是与车辆的特征不相关的,因为车辆的识别或检测并不需要了解周遭环境的特征,且这些周遭的环境(例如天空)的特征信息会随着领域差异(天气变化)而变化。Domain-specific representation (DSR) is a feature that characterizes the domain to which the training data belongs. It is a feature unique to the domain to which the training data belongs, and will change due to domain differences; at the same time, domain-specific features are also irrelevant to the instance The characteristics are also irrelevant to the goal of the task in the actual task execution process. For example, in the aforementioned vehicle detection application scenario, the characteristics of the surrounding environment (trees, sky, street scene, etc.) of the vehicle in the traffic scene image are not related to the characteristics of the vehicle, because the recognition or detection of the vehicle does not require knowledge of the surrounding environment. Features, and the feature information of the surrounding environment (such as the sky) will change with the domain difference (weather change).
由此可见,本申请实施例通过从训练数据中分解域不变特征和域特定特征,使得域不变特征能够与域特定特征解耦。由于本申请的训练方法得到的神经网络使用域不变特征来执行任务,这样避免了域特定特征对于神经网络的影响,从而提升了神经网络在不同领域之间的迁移性能。It can be seen that the embodiment of the present application decomposes the domain invariant feature and the domain specific feature from the training data, so that the domain invariant feature can be decoupled from the domain specific feature. Since the neural network obtained by the training method of the present application uses domain invariant features to perform tasks, the influence of domain-specific features on the neural network is avoided, thereby improving the migration performance of the neural network between different domains.
图6为本申请一实施例提供的神经网络的训练方法的流程示意图。图7为图6训练得到的神经网络的结构示意图。图6所示的神经网络的训练方法可由图2所示的训练设备220执行,该训练设备220训练得到的目标模型/规则201即为该神经网络。如图6和图7所示,该神经网络的训练方法包括如下步骤:FIG. 6 is a schematic flowchart of a neural network training method provided by an embodiment of the application. Fig. 7 is a schematic diagram of the structure of the neural network trained in Fig. 6. The training method of the neural network shown in FIG. 6 can be executed by the training device 220 shown in FIG. 2, and the target model/rule 201 trained by the training device 220 is the neural network. As shown in Fig. 6 and Fig. 7, the training method of the neural network includes the following steps:
步骤601:从训练数据中分解出域不变特征和域特定特征。Step 601: Decompose domain-invariant features and domain-specific features from the training data.
如图7所示,该提取域不变特征DIR和域特定特征DSR的过程可分别由神经网络中的域不变特征提取器E DIR和域特定特征提取器E DSR完成。训练数据输入到神经网络中后,可以使用该域不变特征提取器E DIR和该域特定特征提取器E DSR完成域不变特征DIR和域特定特征DSR的提取过程。 As shown in Figure 7, the process of extracting the domain invariant feature DIR and the domain specific feature DSR can be respectively completed by the domain invariant feature extractor E DIR and the domain specific feature extractor E DSR in the neural network. After the training data is input into the neural network, the domain invariant feature extractor E DIR and the domain specific feature extractor E DSR can be used to complete the extraction process of the domain invariant feature DIR and the domain specific feature DSR.
下面结合图8来说明目标检测场景下的域不变特征和域特定特征的解耦。在如图8所示的目标检测场景下,任务的目标是要检测出图像数据中的对象(包括人物和车辆)。图8左侧的源域的训练数据是照片图像,右侧的目标域的训练数据是卡通图像。在域不变空间中,源域的训练数据中提取出的域不变特征就是照片图像中的人物和车辆,目标 域的训练数据中提取出的域不变特征就是卡通图像中的人物和车辆,线条C 1表征的是在域不变空间中的人物的域不变特征和车辆的域不变特征的分类界限。在域特定空间中,源域的训练数据中提取出的域特定特征就是照片图像中除去人物和车辆外的其他特征,目标域的训练数据中提取出的域特定特征就是卡通图像中除去人物和车辆外的其他特征,线条C 2表征的是来自源域的域特定特征和来自目标域的域特定特征在域特定空间中的分布界限。 The decoupling of domain-invariant features and domain-specific features in the target detection scene will be explained below in conjunction with FIG. 8. In the target detection scene shown in Figure 8, the goal of the task is to detect objects (including people and vehicles) in the image data. The training data of the source domain on the left side of Fig. 8 is a photo image, and the training data of the target domain on the right side is a cartoon image. In the domain invariant space, the domain invariant features extracted from the training data of the source domain are the characters and vehicles in the photo image, and the domain invariant features extracted from the training data of the target domain are the characters and vehicles in the cartoon image. , The line C 1 represents the classification boundary between the domain invariant feature of the person and the domain invariant feature of the vehicle in the domain invariant space. In the domain-specific space, the domain-specific features extracted from the training data of the source domain are other features other than the characters and vehicles in the photo image, and the domain-specific features extracted from the training data of the target domain are the cartoon images that remove the characters and For other features outside the vehicle, the line C 2 represents the distribution boundary of the domain-specific features from the source domain and the domain-specific features from the target domain in the domain-specific space.
步骤602:使用域不变特征执行任务,得到任务损失,并计算域不变特征和域特定特征之间的互信息损失,互信息损失用于表示域不变特征和域特定特征之间的差异。Step 602: Use the domain invariant feature to perform the task, obtain the task loss, and calculate the mutual information loss between the domain invariant feature and the domain specific feature. The mutual information loss is used to represent the difference between the domain invariant feature and the domain specific feature .
如前所述,域不变特征用于表征的是实例层面的特征信息,因此将域不变特征用于执行任务并获得任务损失(task loss),可提高域不变特征对于与任务相关的实例表征的准确性和完整性。任务损失用于表征使用域不变特征执行任务所得到的结果与任务标签之间的差距。例如,当域不变特征用于执行目标检测任务时,执行任务所得到的结果就可包括检测出的目标对象的属性特征,而任务标签则对应该域不变特征实际上所对应的目标对象的标准属性特征,这样检测出的属性特征和标准属性特征之间的差异可通过任务损失来表征。As mentioned earlier, domain invariant features are used to characterize feature information at the instance level. Therefore, domain invariant features are used to perform tasks and obtain task loss, which can improve the effect of domain invariant features on tasks related to tasks. The accuracy and completeness of the instance characterization. The task loss is used to characterize the gap between the result of using domain invariant features to perform the task and the task label. For example, when the domain invariant feature is used to perform a target detection task, the results obtained from the task can include the attribute features of the detected target object, and the task label corresponds to the target object to which the domain invariant feature actually corresponds In this way, the difference between the detected attribute feature and the standard attribute feature can be characterized by the task loss.
互信息(mutual information,MI)损失表征的是两个变量的相互依赖性。两个随机变量X和Z的互信息损失I可通过如下公式(3)定义,其中H(X)是边缘熵,H(X|Z)是条件熵:Mutual information (MI) loss characterizes the interdependence of two variables. The mutual information loss I of two random variables X and Z can be defined by the following formula (3), where H(X) is the edge entropy and H(X|Z) is the conditional entropy:
I(X,Z)=H(X)-H(X|Z)           (3)I(X,Z)=H(X)-H(X|Z) (3)
互信息损失用于表示域不变特征和域特定特征之间的差异。通过计算域不变特征和域特定特征之间的互信息损失,并基于该互信息损失对该神经网络进行训练,可有助于进一步将域不变特征和域特定特征区分开,起到迫使域不变特征和域特定特征解耦的作用。应当理解,互信息损失的计算方法可根据实际场景需求选择,例如,可选择使用互信息神经估计器(mutual information neural estimator,MINE)来获取该互信息损失,本申请对互信息损失的具体计算方法不做严格限定。Mutual information loss is used to represent the difference between domain-invariant features and domain-specific features. By calculating the mutual information loss between domain-invariant features and domain-specific features, and training the neural network based on the mutual information loss, it can help to further distinguish between domain-invariant features and domain-specific features to force The role of domain-invariant features and domain-specific features decoupling. It should be understood that the calculation method of mutual information loss can be selected according to actual scenario requirements. For example, mutual information neural estimator (MINE) can be selected to obtain the mutual information loss. The specific calculation of mutual information loss in this application The method is not strictly limited.
步骤603:根据任务损失和互信息损失,训练神经网络。Step 603: Train the neural network according to the task loss and the mutual information loss.
如前所述,神经网络的训练过程其实是根据损失函数的值调整权重向量的过程。这里的任务损失所表征的就是基于训练数据提取出的域不变特征能够完成任务的能力。如果该域不变特征还不能足够精准地与实例对应,那么任务损失的值就会比较大,此时就需要调整该神经网络中的权重向量以使得域不变特征能够在下一次预测过程中获得更低的任务损失。通过训练的迭代,域不变特征提取器所提取到的域不变特征也就会越来越精准地与实例对应。As mentioned earlier, the training process of the neural network is actually the process of adjusting the weight vector according to the value of the loss function. The task loss here characterizes the ability to complete the task based on the domain invariant features extracted from the training data. If the domain invariant feature cannot correspond to the instance accurately enough, then the value of the task loss will be relatively large. At this time, the weight vector in the neural network needs to be adjusted so that the domain invariant feature can be obtained in the next prediction process Lower mission loss. Through the iteration of training, the domain invariant features extracted by the domain invariant feature extractor will correspond to the examples more and more accurately.
根据互信息损失训练神经网络的过程可以是训练神经网络以减小域不变特征和域特定特征之间互信息损失的过程,例如,最小化该互信息损失。为了保证域不变特征能够更精准地与实例对应,可以计算域不变特征和域特定特征之间的互信息损失,并使用该互信息损失来进一步提高域不变特征提取的精确性。The process of training the neural network based on the mutual information loss may be a process of training the neural network to reduce the mutual information loss between the domain invariant feature and the domain specific feature, for example, to minimize the mutual information loss. In order to ensure that the domain invariant feature can correspond to the instance more accurately, the mutual information loss between the domain invariant feature and the domain specific feature can be calculated, and the mutual information loss can be used to further improve the accuracy of the domain invariant feature extraction.
由于互信息损失表征的是域不变特征和域特定特征的相关性,因此根据该互信息损失来调整该神经网络的权重向量,可使得提取到的域不变特征能够更好的与域特定特征区分开,起到迫使特征解耦的作用。如果该互信息损失较大,则说明目前域不变特征和 域特定特征之间是较为相关的,即,目前域不变特征提取器所提取到的特征中很可能还是包括了域特定特征的信息内容,此时则需要调整该神经网络的权重向量以减小该互信息损失。Since the mutual information loss characterizes the correlation between domain invariant features and domain-specific features, adjusting the weight vector of the neural network according to the mutual information loss can make the extracted domain invariant features better and domain-specific The features are distinguished, and they play a role in forcing the decoupling of features. If the mutual information loss is large, it means that the current domain-invariant features and domain-specific features are relatively related, that is, the current domain-invariant feature extractor may still include domain-specific features in the features extracted Information content, at this time, the weight vector of the neural network needs to be adjusted to reduce the mutual information loss.
由于域不变特征的提取在基于任务损失的训练过程中会得到训练,域不变特征提取器所提取出的特征可能会与任务有一定相关性,因此基于互信息损失的训练过程也可被视作将域特定特征从域不变特征中“剔除”的过程,使得域不变特征提取器所提取到的特征随着训练的迭代越来越与实例一致,同时也使得域特定特征提取器所提取到的特征随着训练的迭代越来越与实例不相关,从而实现了对于域不变特征和域特定特征的特征解耦。由此可见,由于随着训练过程的迭代,域特定特征提取器所提取到的特征也会越来越与实例不相关,即越来越贴近于表征领域本身特有的特征,因此域特定特征提取器在这个基于互信息损失的训练过程中也是得到了训练的。Since the extraction of domain invariant features will be trained in the training process based on task loss, the features extracted by the domain invariant feature extractor may have some relevance to the task, so the training process based on mutual information loss can also be It is regarded as a process of "removing" domain-specific features from domain invariant features, so that the features extracted by the domain invariant feature extractor become more and more consistent with the instance as the training iterations, and it also makes the domain-specific feature extractor With the iteration of training, the extracted features become more and more irrelevant to the instance, thereby realizing the decoupling of domain-invariant features and domain-specific features. It can be seen that with the iteration of the training process, the features extracted by the domain-specific feature extractor will become more and more irrelevant to the instance, that is, closer and closer to the unique features of the representation domain itself, so domain-specific feature extraction The device is also trained in this training process based on mutual information loss.
应当理解,上述基于任务损失的训练过程和基于互信息损失的训练过程并不一定是同时进行的,在本申请一实施例中,基于互信息损失的训练过程也可以是在基于任务损失的训练过程开始之后进行的,本申请对这两个训练过程的具体执行顺序不做严格限定。It should be understood that the foregoing training process based on task loss and the training process based on mutual information loss are not necessarily performed at the same time. In an embodiment of the present application, the training process based on mutual information loss may also be performed during training based on task loss. After the process starts, this application does not strictly limit the specific execution sequence of the two training processes.
本申请通过根据任务损失和互信息损失来训练神经网络,不仅可使得分解出的域不变特征更加精准的与实例对应,还可在训练的过程中减少域不变特征和域特定特征之间的互信息损失,以促进域不变特征和域特定特征的完全解耦,进一步降低域特定特征对域不变特征的影响。This application trains the neural network based on the task loss and the mutual information loss, which not only makes the decomposed domain invariant features more accurately correspond to the instance, but also reduces the gap between the domain invariant features and the domain specific features during the training process. In order to promote the complete decoupling of domain-invariant features and domain-specific features, the influence of domain-specific features on domain-invariant features is further reduced.
应当理解,虽然上面以互信息损失为例表征了域不变特征和域特定特征的相关性,在本申请的其他实施例中,也可采用其他形式的损失信息来表征域不变特征和域特定特征的相关性;然后基于根据任务损失和该其他形式的损失信息训练神经网络,以使得提取到的域不变特征能够更好的与域特定特征区分开,起到迫使特征解耦的作用。在本申请一实施例中,可以计算域不变特征和域特定特征之间的如下损失信息中的一种或多种组合:互信息损失、度量损失(例如,L1距离或L2距离)、衡量数据分布的损失(例如KL(kullback-leibler)散度)和瓦瑟斯坦(wasserstein)距离。本申请对该用于表征域不变特征和域特定特征之间的相关性的损失信息的形式不做严格限定。It should be understood that although the mutual information loss is taken as an example above to characterize the correlation between domain invariant features and domain specific features, in other embodiments of the present application, other forms of loss information may also be used to represent domain invariant features and domains. The correlation of specific features; then based on the task loss and the other forms of loss information training neural network, so that the extracted domain invariant features can be better distinguished from the domain specific features, play a role in forcing feature decoupling . In an embodiment of the present application, one or more combinations of the following loss information between domain invariant features and domain specific features can be calculated: mutual information loss, metric loss (for example, L1 distance or L2 distance), measurement Loss of data distribution (such as KL (kullback-leibler) divergence) and wasserstein distance. This application does not strictly limit the form of loss information used to characterize the correlation between domain invariant features and domain specific features.
在本申请一实施例中,该神经网络可用于进行域自适应学习,训练数据可来自不同领域(例如,不同风格)的图像数据,例如照片写实风格、漫画风格等。通过提取不同风格的图像数据的域不变特征和域特定特征,并基于使用域不变特征执行任务得到的任务损失对神经网络进行训练,使得域不变特征能够与域特定特征解耦。由于使用域不变特征来执行任务,本申请的训练方法得到的神经网络可通过域自适应学习以自行适应对于多种不同领域图像的处理任务,例如目标检测/识别/分割等,从而实现对不同领域的图像数据的自适应处理。In an embodiment of the present application, the neural network can be used for domain adaptive learning, and the training data can come from image data of different domains (for example, different styles), such as photorealistic style, comic style, etc. By extracting domain-invariant features and domain-specific features of image data of different styles, and training the neural network based on the task loss obtained by using domain-invariant features to perform tasks, the domain-invariant features can be decoupled from domain-specific features. Since domain invariant features are used to perform tasks, the neural network obtained by the training method of the present application can adapt to various image processing tasks in different fields through domain adaptive learning, such as target detection/recognition/segmentation, etc., so as to achieve Adaptive processing of image data in different fields.
虽然在上面的描述中提到,域特定特征提取器在基于互信息损失的训练过程中也会得到训练,但考虑到当域特定特征提取器对于域特定特征的提取精度被提高时,基于互信息损失的训练过程可以有效地将域特定特征和域不变特征区分开,域不变特征的提取精度也会间接地得到进一步提升。因此有必要进一步提升域特定特征的提取精度,以通过基于互信息损失的训练过程来间接地提高域不变特征提取器的提取精度。Although it is mentioned in the above description that the domain-specific feature extractor will also be trained in the training process based on mutual information loss, considering that when the domain-specific feature extractor's extraction accuracy for domain-specific features is improved, the The information loss training process can effectively distinguish domain-specific features from domain invariant features, and the extraction accuracy of domain invariant features will be further improved indirectly. Therefore, it is necessary to further improve the extraction accuracy of domain-specific features to indirectly improve the extraction accuracy of domain invariant feature extractors through the training process based on mutual information loss.
在本申请的一些实施例中,可将域特定特征提取器提取出的域特定特征进行域分类, 得到域分类损失,然后根据任务损失、互信息损失和域分类损失来训练神经网络。In some embodiments of the present application, the domain-specific features extracted by the domain-specific feature extractor may be subjected to domain classification to obtain the domain classification loss, and then the neural network can be trained according to the task loss, mutual information loss, and domain classification loss.
例如图9所示,域特定特征提取器后面可连接域分类器(domain classifier),在特征提取器和域分类器之间还设置梯度反转层(GRL)。域特定特征提取器提取出的域特定特征输入到域分类器中分辨该域特定特征是否真的是域特有的特征,以获得域分类损失,该域分类损失表征的其实是域特定特征提取器提取结果的准确程度;然后该域分类损失在向域特定特征提取器的反向传播过程中会经过梯度反转层,以使得域分类损失在反向传播过程中的梯度方向自动取反,以“混淆”域特定特征提取器。由于域分类损失在反向传播的过程中会被自动取反,因此域分类器的目标其实在于混淆域特定特征提取器;而特定特征提取器的目标则是确保提取出的特征是域特有的特征,通过域分类器和域特定特征提取器之间的这种对抗策略,以最终达到提高域特定特征提取器提取域特定特征的精度的目的。For example, as shown in Figure 9, the domain-specific feature extractor can be connected to a domain classifier, and a gradient reversal layer (GRL) can be set between the feature extractor and the domain classifier. The domain-specific features extracted by the domain-specific feature extractor are input into the domain classifier to distinguish whether the domain-specific features are really domain-specific features to obtain the domain classification loss. The domain classification loss is actually the domain-specific feature extractor The accuracy of the extraction result; then the domain classification loss will pass through the gradient reversal layer during the back propagation process to the domain specific feature extractor, so that the gradient direction of the domain classification loss in the back propagation process is automatically reversed to "Confusion" domain specific feature extractor. Since the domain classification loss is automatically inverted during the backpropagation process, the goal of the domain classifier is actually to confuse the domain-specific feature extractor; the goal of the specific feature extractor is to ensure that the extracted features are domain-specific Features, through this confrontation strategy between the domain classifier and the domain-specific feature extractor, in order to finally achieve the purpose of improving the accuracy of the domain-specific feature extractor to extract the domain-specific features.
本申请通过引入域分类损失,有助于从训练数据的特征中提取域不变特征。This application introduces domain classification loss, which helps to extract domain invariant features from the features of training data.
在本申请一实施例中,为了进一步促使解耦出的域不变特征和域特定特征能够包含训练数据的全部特征信息,以提高特征解耦的完整性和合理性,可以先从训练数据中提取初始特征,将该初始特征分解成域不变特征和域特定特征,然后训练神经网络,以减小初始特征所包含的信息与域不变特征和域特定特征共同包含的信息之间的差异。In an embodiment of the present application, in order to further promote the decoupling of domain-invariant features and domain-specific features to contain all the feature information of the training data, so as to improve the completeness and rationality of the feature decoupling, the training data can be obtained first Extract the initial features, decompose the initial features into domain-invariant features and domain-specific features, and then train the neural network to reduce the difference between the information contained in the initial features and the information jointly contained in the domain-invariant features and domain-specific features .
具体而言,如图9所示,域特定特征和域不变特征在被提取出来后,可使用域不变特征和域特定特征对初始特征进行重建,得到重建特征,然后比较初始特征和该重建特征,以确定初始特征所包含的信息与域不变特征和域特定特征共同包含的信息之间的差异,即重建损失(reconstruction loss);然后使用该重建损失来训练神经网络,以使得域不变特征所提取出的域不变特征和域特定特征提取器所提取出的域特定特征能够更好地覆盖训练数据的特征信息。Specifically, as shown in Figure 9, after the domain-specific features and domain-invariant features are extracted, the initial features can be reconstructed using the domain-invariant features and domain-specific features to obtain the reconstructed features, and then compare the initial features and the original features. Reconstruct the features to determine the difference between the information contained in the initial features and the information contained in the domain invariant features and domain-specific features, that is, reconstruction loss; then use the reconstruction loss to train the neural network to make the domain The domain invariant features extracted by the invariant features and the domain specific features extracted by the domain specific feature extractor can better cover the feature information of the training data.
本申请通过减小初始特征所包含的信息与域不变特征和域特定特征共同包含的信息之间的差异,可使得解耦出的域不变特征和域特定特征能够包含训练数据的全部特征信息,以提高特征解耦的完整性和合理性。This application reduces the difference between the information contained in the initial features and the information jointly contained in the domain invariant features and domain specific features, so that the decoupled domain invariant features and domain specific features can contain all the features of the training data Information to improve the completeness and rationality of feature decoupling.
下面结合图10和图11来进一步描述本申请实施例的域不变特征和域特定特征的提取过程。The following further describes the extraction process of domain invariant features and domain specific features in the embodiments of the present application in conjunction with FIG. 10 and FIG. 11.
图10为本申请一实施例提供的神经网络的结构示意图。如图10所示,该神经网络包括第一解耦器U1和第二解耦器U2,通过第一解耦器U1和第二解耦器U2的共同作用来完成域不变特征和域特定特征的提取过程。图11为本申请一实施例提供的基于图10所示神经网络架构的域不变特征和域特定特征的提取流程示意图。如图11所示,该域不变特征和域特定特征的提取过程可包括如下步骤:FIG. 10 is a schematic structural diagram of a neural network provided by an embodiment of this application. As shown in Figure 10, the neural network includes a first decoupler U1 and a second decoupler U2, through the joint action of the first decoupler U1 and the second decoupler U2 to complete the domain invariant features and domain specific Feature extraction process. FIG. 11 is a schematic diagram of the extraction process of domain invariant features and domain specific features based on the neural network architecture shown in FIG. 10 according to an embodiment of the application. As shown in Figure 11, the extraction process of the domain invariant features and domain specific features may include the following steps:
步骤1101:从训练数据中提取训练数据的第一特征。Step 1101: Extract the first feature of the training data from the training data.
如图10所示,神经网络中包括特征提取器
Figure PCTCN2021096019-appb-000011
特征提取器
Figure PCTCN2021096019-appb-000012
用于具体完成从训练数据中提取第一特征
Figure PCTCN2021096019-appb-000013
第一特征
Figure PCTCN2021096019-appb-000014
为后续进行域不变特征增强的特征基础。应当理解,第一特征中的限定词“第一”意味着特征提取器
Figure PCTCN2021096019-appb-000015
是对训练数据进行了“初步”特征提取结果,例如当训练数据是图像数据时,该第一特征其实是对图像纹理层面进行特征提取得到的结果。
As shown in Figure 10, the neural network includes a feature extractor
Figure PCTCN2021096019-appb-000011
Feature extractor
Figure PCTCN2021096019-appb-000012
Used to specifically complete the extraction of the first feature from the training data
Figure PCTCN2021096019-appb-000013
First feature
Figure PCTCN2021096019-appb-000014
It is the feature basis for subsequent domain invariant feature enhancement. It should be understood that the qualifier "first" in the first feature means the feature extractor
Figure PCTCN2021096019-appb-000015
It is the result of "preliminary" feature extraction on the training data. For example, when the training data is image data, the first feature is actually the result of feature extraction on the image texture level.
步骤1102:采用第一解耦器U1从第一特征中提取初步域不变特征和初步域特定特 征。Step 1102: Use the first decoupler U1 to extract preliminary domain invariant features and preliminary domain specific features from the first feature.
第一解耦器U1中包括域不变特征提取器
Figure PCTCN2021096019-appb-000016
和域特定特征提取器
Figure PCTCN2021096019-appb-000017
分别用于提取初步域不变特征
Figure PCTCN2021096019-appb-000018
和初步域特定特征
Figure PCTCN2021096019-appb-000019
初步域不变特征
Figure PCTCN2021096019-appb-000020
和初步域特定特征
Figure PCTCN2021096019-appb-000021
各自的提取过程可通过如下公式(4)表示:
The first decoupler U1 includes a domain invariant feature extractor
Figure PCTCN2021096019-appb-000016
And domain specific feature extractor
Figure PCTCN2021096019-appb-000017
Respectively used to extract the invariant features of the preliminary domain
Figure PCTCN2021096019-appb-000018
And preliminary domain-specific features
Figure PCTCN2021096019-appb-000019
Preliminary domain invariant features
Figure PCTCN2021096019-appb-000020
And preliminary domain-specific features
Figure PCTCN2021096019-appb-000021
The respective extraction process can be expressed by the following formula (4):
Figure PCTCN2021096019-appb-000022
Figure PCTCN2021096019-appb-000022
在本申请一实施例中,如图10所示,可使用互信息损失训练该第一解耦器U1,以保证初步域不变特征和初步域特定特征的提取精度。如前所述,互信息(mutual information,MI)损失表征的是两个变量的相互依赖性,这里的互信息损失表征的是初步域不变特征
Figure PCTCN2021096019-appb-000023
和初步域特定特征
Figure PCTCN2021096019-appb-000024
之间的差异。因此根据该互信息损失来调整该第一解耦器U1中网络结构的权重向量可使得提取到的初步域不变特征
Figure PCTCN2021096019-appb-000025
能够更好的与初步域特定特征
Figure PCTCN2021096019-appb-000026
区分开,起到迫使特征解耦的作用。如果该互信息损失较大,则说明目前初步域不变特征
Figure PCTCN2021096019-appb-000027
和初步域特定特征
Figure PCTCN2021096019-appb-000028
是较为相关的,即,目前域不变特征提取器
Figure PCTCN2021096019-appb-000029
所提取到的特征中很可能还是包括了初步域特定特征
Figure PCTCN2021096019-appb-000030
的信息内容,此时则需要调整该第一解耦器U1的网络结构的权重向量以减小该互信息损失。
In an embodiment of the present application, as shown in FIG. 10, the first decoupler U1 can be trained using mutual information loss to ensure the extraction accuracy of the preliminary domain invariant features and the preliminary domain specific features. As mentioned earlier, the mutual information (MI) loss characterizes the interdependence of two variables, and the mutual information loss here characterizes the invariant features of the preliminary domain.
Figure PCTCN2021096019-appb-000023
And preliminary domain-specific features
Figure PCTCN2021096019-appb-000024
difference between. Therefore, adjusting the weight vector of the network structure in the first decoupler U1 according to the mutual information loss can make the extracted preliminary domain invariant features
Figure PCTCN2021096019-appb-000025
Can be better with preliminary domain-specific features
Figure PCTCN2021096019-appb-000026
Distinguish, play a role in forcing the decoupling of features. If the mutual information loss is large, it means that the current preliminary domain invariant features
Figure PCTCN2021096019-appb-000027
And preliminary domain-specific features
Figure PCTCN2021096019-appb-000028
Is more relevant, that is, the current domain invariant feature extractor
Figure PCTCN2021096019-appb-000029
The extracted features may still include preliminary domain-specific features
Figure PCTCN2021096019-appb-000030
At this time, the weight vector of the network structure of the first decoupler U1 needs to be adjusted to reduce the mutual information loss.
在本申请一实施例中,如图10所示,为了进一步提高初步域不变特征
Figure PCTCN2021096019-appb-000031
的提取精度,在第一解耦器U1中也可使用域分类器(domain classifier)和梯度反转层(GRL)。通过域分类器和域特定特征提取器
Figure PCTCN2021096019-appb-000032
之间的对抗策略,以提高域特定特征提取器
Figure PCTCN2021096019-appb-000033
对于初步域特定特征
Figure PCTCN2021096019-appb-000034
的提取精度,从而结合互信息损失的训练过程来间接地达到提高初步域不变特征
Figure PCTCN2021096019-appb-000035
的提取精度的目的。
In an embodiment of the present application, as shown in FIG. 10, in order to further improve the preliminary domain invariant feature
Figure PCTCN2021096019-appb-000031
For the extraction accuracy of the first decoupler U1, the domain classifier and the gradient reversal layer (GRL) can also be used. Through domain classifier and domain specific feature extractor
Figure PCTCN2021096019-appb-000032
Between adversarial strategies to improve domain-specific feature extractors
Figure PCTCN2021096019-appb-000033
For preliminary domain specific features
Figure PCTCN2021096019-appb-000034
The accuracy of extraction, combined with the training process of mutual information loss to indirectly improve the initial domain invariant features
Figure PCTCN2021096019-appb-000035
The purpose of the extraction accuracy.
步骤1103:将初步域不变特征与第一特征融合,得到第二特征。Step 1103: The preliminary domain invariant feature is merged with the first feature to obtain the second feature.
该将初步域不变特征
Figure PCTCN2021096019-appb-000036
与第一特征
Figure PCTCN2021096019-appb-000037
的融合过程的到第二特征F 1可通过如下公式(5)表示:
Invariant features
Figure PCTCN2021096019-appb-000036
With the first feature
Figure PCTCN2021096019-appb-000037
The second feature F 1 of the fusion process can be expressed by the following formula (5):
Figure PCTCN2021096019-appb-000038
Figure PCTCN2021096019-appb-000038
应当理解,特征融合的具体方式可根据实际应用场景的需求进行选择。例如,可以在保持通道数不变的基础上,将初步域不变特征
Figure PCTCN2021096019-appb-000039
与第一特征
Figure PCTCN2021096019-appb-000040
进行叠加,以形成通道数不变的第二特征;也可将初步域不变特征
Figure PCTCN2021096019-appb-000041
与第一特征
Figure PCTCN2021096019-appb-000042
以连接的方式“拼接”,形成通道数增加的第二特征。本申请对该融合过程的具体实现方式并不做严格限定。
It should be understood that the specific method of feature fusion can be selected according to the requirements of actual application scenarios. For example, on the basis of keeping the number of channels unchanged, the initial domain invariant feature
Figure PCTCN2021096019-appb-000039
With the first feature
Figure PCTCN2021096019-appb-000040
Superimpose to form the second feature with the same number of channels; the initial domain invariant feature can also be
Figure PCTCN2021096019-appb-000041
With the first feature
Figure PCTCN2021096019-appb-000042
The second feature of increasing the number of channels is formed by "splicing" in a connected manner. This application does not strictly limit the specific implementation of the fusion process.
由于初步域不变特征
Figure PCTCN2021096019-appb-000043
中包括了与实例对应的域不变特征信息,因此将该初步域不变特征
Figure PCTCN2021096019-appb-000044
与第一特征
Figure PCTCN2021096019-appb-000045
融合,实现了在第一特征层面的域不变特征的数据增强,从而使得第一特征
Figure PCTCN2021096019-appb-000046
中能够包括了更多的域不变特征信息,因而使得训练完毕的神经网络能够更好的适应实际应用场景中的领域差异。
Due to preliminary domain invariant features
Figure PCTCN2021096019-appb-000043
Includes the domain invariant feature information corresponding to the instance, so the preliminary domain invariant feature
Figure PCTCN2021096019-appb-000044
With the first feature
Figure PCTCN2021096019-appb-000045
Fusion realizes the data enhancement of domain-invariant features at the first feature level, so that the first feature
Figure PCTCN2021096019-appb-000046
It can include more domain invariant feature information, so that the trained neural network can better adapt to domain differences in actual application scenarios.
步骤1104:从第二特征中提取训练数据的第三特征。Step 1104: Extract the third feature of the training data from the second feature.
如图10所示,神经网络中还包括特征提取器
Figure PCTCN2021096019-appb-000047
特征提取器
Figure PCTCN2021096019-appb-000048
用于具体完成从第二特征F 1中提取第三特征
Figure PCTCN2021096019-appb-000049
第三特征
Figure PCTCN2021096019-appb-000050
则作为后续域不变特征和域特定特征的提取基础。应当理解,第三特征中的限定词“第三”则意味着第三特征是基于包括了第一特征的第二特征提取的,同时这种提取过程也会更为精细,例如当训练数据是图像数据时,第三特征则可能是提取的表征图像语义层面的特征图。该特征提取过程可通过如下公式(6)表示:
As shown in Figure 10, the neural network also includes a feature extractor
Figure PCTCN2021096019-appb-000047
Feature extractor
Figure PCTCN2021096019-appb-000048
Used to specifically complete the extraction of the third feature from the second feature F 1
Figure PCTCN2021096019-appb-000049
Third feature
Figure PCTCN2021096019-appb-000050
It serves as the basis for the extraction of subsequent domain invariant features and domain-specific features. It should be understood that the qualifier "third" in the third feature means that the third feature is extracted based on the second feature including the first feature, and the extraction process will be more refined, for example, when the training data is For image data, the third feature may be an extracted feature map that represents the semantic level of the image. The feature extraction process can be expressed by the following formula (6):
Figure PCTCN2021096019-appb-000051
Figure PCTCN2021096019-appb-000051
步骤1105:采用第二解耦器U2从第三特征中提取域不变特征和域特定特征。Step 1105: Use the second decoupler U2 to extract domain-invariant features and domain-specific features from the third feature.
第二解耦器U2中包括域不变特征提取器
Figure PCTCN2021096019-appb-000052
和域特定特征提取器
Figure PCTCN2021096019-appb-000053
分别用于提取域不变特征
Figure PCTCN2021096019-appb-000054
和域特定特征
Figure PCTCN2021096019-appb-000055
域不变特征
Figure PCTCN2021096019-appb-000056
和域特定特征
Figure PCTCN2021096019-appb-000057
各自的提取过程可通过如下公式(7)表示:
The second decoupler U2 includes a domain invariant feature extractor
Figure PCTCN2021096019-appb-000052
And domain specific feature extractor
Figure PCTCN2021096019-appb-000053
Respectively used to extract domain invariant features
Figure PCTCN2021096019-appb-000054
And domain-specific features
Figure PCTCN2021096019-appb-000055
Domain invariant features
Figure PCTCN2021096019-appb-000056
And domain-specific features
Figure PCTCN2021096019-appb-000057
The respective extraction process can be expressed by the following formula (7):
Figure PCTCN2021096019-appb-000058
Figure PCTCN2021096019-appb-000058
如图10所示,在获得了域不变特征
Figure PCTCN2021096019-appb-000059
后,便可使用域不变特征
Figure PCTCN2021096019-appb-000060
执行任务以得到任务损失(task loss),并计算域不变特征和域特定特征之间的互信息(mutual information,MI)损失。如前所述,将域不变特征
Figure PCTCN2021096019-appb-000061
用于执行任务并获得任务损失,可提高域不变特征
Figure PCTCN2021096019-appb-000062
对于与任务相关的实例表征的准确性和完整性。同时,为了保证域不变特征
Figure PCTCN2021096019-appb-000063
能够更精准地与实例对应,在训练的过程中还可以计算域不变特征
Figure PCTCN2021096019-appb-000064
和域特定特征
Figure PCTCN2021096019-appb-000065
之间的互信息损失,并使用该互信息损失来进一步提高域不变特征
Figure PCTCN2021096019-appb-000066
提取的精确性。在本申请一实施例中,在该基于任务损失和互信息损失对于神经网络的训练过程中,第一解耦器U1中用于提取初步域不变特征
Figure PCTCN2021096019-appb-000067
的域不变特征提取器
Figure PCTCN2021096019-appb-000068
和/或用于提取初步域特定特征
Figure PCTCN2021096019-appb-000069
的域特定特征提取器
Figure PCTCN2021096019-appb-000070
也可参与该训练过程中的调参过程,以此来保证第一解耦器U1对于初步域不变特征
Figure PCTCN2021096019-appb-000071
的提取精度,从而进一步改进第一解耦器U1所实现的域不变特征的数据增强效果。
As shown in Figure 10, after obtaining the domain invariant feature
Figure PCTCN2021096019-appb-000059
After that, you can use domain invariant features
Figure PCTCN2021096019-appb-000060
Perform tasks to obtain task loss, and calculate mutual information (MI) loss between domain-invariant features and domain-specific features. As mentioned earlier, the domain invariant feature
Figure PCTCN2021096019-appb-000061
Used to perform tasks and obtain task losses, which can improve domain invariant characteristics
Figure PCTCN2021096019-appb-000062
The accuracy and completeness of the instance characterization related to the task. At the same time, in order to ensure domain invariant characteristics
Figure PCTCN2021096019-appb-000063
It can correspond to the instance more accurately, and it can also calculate the domain invariant features during the training process
Figure PCTCN2021096019-appb-000064
And domain-specific features
Figure PCTCN2021096019-appb-000065
Mutual information loss between the two, and use the mutual information loss to further improve the domain invariant features
Figure PCTCN2021096019-appb-000066
Accuracy of extraction. In an embodiment of the present application, in the training process of the neural network based on task loss and mutual information loss, the first decoupler U1 is used to extract preliminary domain invariant features
Figure PCTCN2021096019-appb-000067
Domain invariant feature extractor
Figure PCTCN2021096019-appb-000068
And/or used to extract preliminary domain-specific features
Figure PCTCN2021096019-appb-000069
Domain specific feature extractor
Figure PCTCN2021096019-appb-000070
It can also participate in the parameter tuning process in the training process to ensure that the first decoupler U1 has invariant features for the preliminary domain.
Figure PCTCN2021096019-appb-000071
The extraction accuracy of, thereby further improving the data enhancement effect of the domain invariant feature realized by the first decoupler U1.
在本申请一实施例中,如图10所示,为了进一步提高域不变特征
Figure PCTCN2021096019-appb-000072
的提取精度,在第二解耦器U2中也可使用域分类器和梯度反转层。通过域分类器和域特定特征提取器
Figure PCTCN2021096019-appb-000073
之间的对抗策略,以提高域特定特征提取器
Figure PCTCN2021096019-appb-000074
对于域特定特征
Figure PCTCN2021096019-appb-000075
的提取精度,从而结合基于互信息损失的训练过程来间接达到提高域不变特征
Figure PCTCN2021096019-appb-000076
的提取精度目的。
In an embodiment of the present application, as shown in FIG. 10, in order to further improve the domain invariant feature
Figure PCTCN2021096019-appb-000072
The extraction accuracy of the domain classifier and gradient reversal layer can also be used in the second decoupler U2. Through domain classifier and domain specific feature extractor
Figure PCTCN2021096019-appb-000073
Between adversarial strategies to improve domain-specific feature extractors
Figure PCTCN2021096019-appb-000074
For domain specific features
Figure PCTCN2021096019-appb-000075
The extraction accuracy of, thus combined with the training process based on mutual information loss to indirectly improve the domain invariant features
Figure PCTCN2021096019-appb-000076
The purpose of the extraction accuracy.
在本申请一实施例中,为了进一步促使解耦出的域不变特征
Figure PCTCN2021096019-appb-000077
和域特定特征
Figure PCTCN2021096019-appb-000078
能够包含训练数据的全部特征信息,以提高特征解耦的完整性和合理性,可训练神经网络,以减小第三特征
Figure PCTCN2021096019-appb-000079
所包含的信息与域不变特征
Figure PCTCN2021096019-appb-000080
和域特定特征
Figure PCTCN2021096019-appb-000081
共同包含的信息之间的差异。具体而言,如图10所示,域不变特征
Figure PCTCN2021096019-appb-000082
和域特定特征
Figure PCTCN2021096019-appb-000083
在被提取出来后,可使用域不变特征
Figure PCTCN2021096019-appb-000084
和域特定特征
Figure PCTCN2021096019-appb-000085
对第三特征
Figure PCTCN2021096019-appb-000086
进行重建,得到重建特征,然后比较第三特征
Figure PCTCN2021096019-appb-000087
和该重建特征,以确定第三特征
Figure PCTCN2021096019-appb-000088
所包含的信息与域不变特征
Figure PCTCN2021096019-appb-000089
和域特定特征
Figure PCTCN2021096019-appb-000090
共同包含的信息之间的差异,即重建损失(reconstruction loss)。在本申请一实施例中,重建损失的计算过程可通过如下公式(8)表示:
In an embodiment of the present application, in order to further promote the decoupling of the domain invariant features
Figure PCTCN2021096019-appb-000077
And domain-specific features
Figure PCTCN2021096019-appb-000078
It can contain all the feature information of the training data to improve the completeness and rationality of feature decoupling, and it can train the neural network to reduce the third feature
Figure PCTCN2021096019-appb-000079
Contained information and domain invariant features
Figure PCTCN2021096019-appb-000080
And domain-specific features
Figure PCTCN2021096019-appb-000081
The difference between the information contained in common. Specifically, as shown in Figure 10, the domain invariant feature
Figure PCTCN2021096019-appb-000082
And domain-specific features
Figure PCTCN2021096019-appb-000083
After being extracted, domain invariant features can be used
Figure PCTCN2021096019-appb-000084
And domain-specific features
Figure PCTCN2021096019-appb-000085
To the third feature
Figure PCTCN2021096019-appb-000086
Perform reconstruction, get the reconstructed feature, and then compare the third feature
Figure PCTCN2021096019-appb-000087
And this reconstruction feature to determine the third feature
Figure PCTCN2021096019-appb-000088
Contained information and domain invariant features
Figure PCTCN2021096019-appb-000089
And domain-specific features
Figure PCTCN2021096019-appb-000090
The difference between the information contained in common is the reconstruction loss. In an embodiment of the present application, the calculation process of reconstruction loss can be expressed by the following formula (8):
Figure PCTCN2021096019-appb-000091
Figure PCTCN2021096019-appb-000091
其中,R表示重建网络,F r是重建之后的特征,L recon是重建损失,体现为重建特征F r与第三特征
Figure PCTCN2021096019-appb-000092
的L2距离。该重建损失被用来训练神经网络,以使得域不变特征
Figure PCTCN2021096019-appb-000093
和域特定特征
Figure PCTCN2021096019-appb-000094
能够更好地覆盖训练数据的特征信息。
Among them, R represents the reconstruction network, F r is the feature after reconstruction, and L recon is the reconstruction loss, which is reflected in the reconstruction feature F r and the third feature
Figure PCTCN2021096019-appb-000092
The L2 distance. The reconstruction loss is used to train the neural network to make the domain invariant features
Figure PCTCN2021096019-appb-000093
And domain-specific features
Figure PCTCN2021096019-appb-000094
It can better cover the feature information of the training data.
应当理解,虽然在前面的描述中采用了“第一”、“第二”和“第三”这样的限定词,但这些限定词仅用于更清楚地阐述技术方案和区分类似的概念。这些限定词本身并不能用于限定本申请的保护范围。It should be understood that although the qualifiers such as "first", "second" and "third" are used in the foregoing description, these qualifiers are only used to more clearly illustrate the technical solution and distinguish similar concepts. These qualifiers themselves cannot be used to limit the scope of protection of this application.
由此可见,本申请实施例可以采用“双层域不变特征解耦”的理念来训练神经网络来提取域不变特征。通过先获取第一特征,并基于第一解耦器U1解耦出初步域不变特征,将该初步域不变特征与第一特征融合以获取第二特征,使得域不变特征信息在第一特征的层面上得到了增强。然后再使用该第二特征基于第二解耦器U2解耦出域不变特征,域 不变特征的解耦精度得到进一步增强,可使得训练出的神经网络的任务执行性能更强,域适应能力也更优。It can be seen that, in the embodiment of the present application, the concept of "two-layer domain invariant feature decoupling" can be used to train a neural network to extract domain invariant features. By first obtaining the first feature, and decoupling the preliminary domain invariant feature based on the first decoupler U1, the preliminary domain invariant feature is merged with the first feature to obtain the second feature, so that the domain invariant feature information is in the first A feature level has been enhanced. Then use this second feature to decouple the domain invariant features based on the second decoupler U2, and the decoupling accuracy of the domain invariant features is further enhanced, which can make the task execution performance of the trained neural network stronger and domain adaptive Ability is also better.
上文结合图10和图11,详细描述了与神经网络有关的训练过程。从图10和图11的描述可以看出,在某些实现方式中,神经网络的训练过程可以包括:(1)与任务损失和域分类损失有关的训练;(2)与互信息损失有关的训练;(3)与重建损失有关的训练。With reference to Figure 10 and Figure 11, the training process related to the neural network is described in detail. As can be seen from the description of Figure 10 and Figure 11, in some implementations, the training process of the neural network can include: (1) training related to task loss and domain classification loss; (2) related to mutual information loss Training; (3) Training related to reconstruction loss.
上述三种训练可以同时进行,也可以分阶段进行,本申请实施例对此并不限定。下面结合图12,对以上三种训练的训练顺序进行举例说明。The above-mentioned three kinds of training can be carried out at the same time or carried out in stages, which is not limited in the embodiment of the present application. In the following, the training sequence of the above three types of training will be illustrated with examples in conjunction with FIG. 12.
如图12所示,可以将神经网络的训练过程依次分为如下三个阶段。As shown in Figure 12, the training process of the neural network can be divided into the following three stages in sequence.
第一阶段:控制神经网络执行与任务损失和域分类损失有关的训练,该训练阶段旨在让神经网络学习从训练数据中分解出域不变特征和域特定特征的能力,因此,该第一阶段也可称为特征分解阶段(简称stage-fd,fd表示feature decomposition)。The first stage: Control the neural network to perform training related to task loss and domain classification loss. This training stage aims to allow the neural network to learn the ability to decompose domain-invariant features and domain-specific features from training data. Therefore, the first The stage can also be called the feature decomposition stage (referred to as stage-fd, fd stands for feature decomposition).
第二阶段:控制神经网络执行与互信息损失有关的训练,该训练阶段旨在让第一神经网路学习增大域不变特征和域特定特征之间差异的能力,因此,该第二阶段也可称为特征分解阶段(简称stage-fs,fs表示feature separation)。在第二阶段,可以将图10中
Figure PCTCN2021096019-appb-000095
的参数固定,或者可以将图10中的
Figure PCTCN2021096019-appb-000096
Figure PCTCN2021096019-appb-000097
的参数固定。
The second stage: Control the neural network to perform training related to the loss of mutual information. This training stage aims to allow the first neural network to learn the ability to increase the difference between domain-invariant features and domain-specific features. Therefore, the second phase also It can be called the feature decomposition stage (referred to as stage-fs, fs stands for feature separation). In the second stage, the
Figure PCTCN2021096019-appb-000095
The parameters of is fixed, or you can change the
Figure PCTCN2021096019-appb-000096
and
Figure PCTCN2021096019-appb-000097
The parameters are fixed.
第三阶段:控制神经网络执行与重建损失有关的训练,该训练阶段旨在让神经网络分解出的域不变特征和域特定特征能够尽量包含初始特征中的全部信息,因此,该第三阶段也可称为特征重建阶段(简称stage-fr,fr表示feature reconstruction)。The third stage: Control the neural network to perform training related to reconstruction loss. This training stage aims to make the domain-invariant features and domain-specific features decomposed by the neural network contain all the information in the initial features as much as possible. Therefore, the third stage It can also be called the feature reconstruction stage (referred to as stage-fr, fr stands for feature reconstruction).
将神经网络的训练过程分阶段进行,可以简化每个阶段的训练量,加快神经网络的参数的收敛速度。Carrying out the training process of the neural network in stages can simplify the amount of training in each stage and speed up the convergence speed of the parameters of the neural network.
图13所示为本申请一实施例提供的一种神经网络的结构示意图。该神经网络通过使用本申请上述实施例所提供的训练方法训练完成,如图13所示,该神经网络130包括:FIG. 13 is a schematic structural diagram of a neural network provided by an embodiment of this application. The neural network is trained by using the training method provided in the above-mentioned embodiment of the present application. As shown in FIG. 13, the neural network 130 includes:
第一特征提取层1301,用于基于输入数据提取第一特征。The first feature extraction layer 1301 is used to extract first features based on input data.
第一域不变特征解耦层1302,用于基于第一特征提取第一域不变特征。The first domain invariant feature decoupling layer 1302 is used to extract the first domain invariant feature based on the first feature.
特征融合层1303,用于融合第一特征和第一域不变特征,以获取第二特征。The feature fusion layer 1303 is used to fuse the first feature and the invariant feature of the first domain to obtain the second feature.
第二特征提取层1304,用于基于第二特征提取第三特征。The second feature extraction layer 1304 is used to extract a third feature based on the second feature.
第二域不变特征解耦层1305,用于基于第三特征提取第二域不变特征。The second domain invariant feature decoupling layer 1305 is used to extract the second domain invariant feature based on the third feature.
第一域不变特征和第二域不变特征分别为表征输入数据所属的领域的特征,第一域特定特征和第二域特定特征分别为与输入数据所属领域无关的特征。The first domain invariant feature and the second domain invariant feature are respectively features that characterize the field to which the input data belongs, and the first domain specific feature and the second domain specific feature are respectively features that have nothing to do with the field to which the input data belongs.
由此可见,虽然在图10和图11所示的训练过程中,通过提取域特定特征以计算互信息损失和域分类损失,使得神经网络具备了分解域不变特征和域特定特征的能力,但如图13所示的训练好的神经网络在实际的使用过程中,其实是不需要提取域特定特征的。通过第一特征提取层1301提取出第一特征后,基于该第一特征提取到第一域不变特在,并通过与第一特征融合以实现域不变特征增强;然后再基于该第二特征进一步提取第二域不变特征,提取出的第二域不变特征可准确地与实例对应,从而使得神经网络在执行具体任务时性能更强,域适应能力也更优。It can be seen that although in the training process shown in Figure 10 and Figure 11, by extracting domain specific features to calculate mutual information loss and domain classification loss, the neural network has the ability to decompose domain invariant features and domain specific features, However, in actual use, the trained neural network shown in Figure 13 does not actually need to extract domain-specific features. After the first feature is extracted by the first feature extraction layer 1301, the first domain invariant characteristics are extracted based on the first feature, and the domain invariant feature enhancement is realized by fusion with the first feature; and then based on the second feature The features further extract the invariant features of the second domain, and the extracted invariant features of the second domain can accurately correspond to the examples, so that the neural network has stronger performance when performing specific tasks and better domain adaptability.
在本申请一实施例中,考虑到在域自适应学习的场景中,传统的训练数据往往来自源域和/或目标域,而域自适应学习其实解决的是神经网络的跨域迁移能力,要提高神经 网络的域泛化能力不仅要基于源域的特征信息去训练,也需要基于目标域的特征信息去训练,因此可在训练神经网络时,在训练数据中加入介于源域和目标域之间的中间域的训练数据。通过生成位于中间域的训练数据,以填补源域和目标域之间的“领域鸿沟”,缓解源域的训练数据和目标域的训练数据的分布差异大的问题。In an embodiment of the present application, considering that in the scenario of domain adaptive learning, traditional training data often comes from the source domain and/or target domain, and domain adaptive learning actually solves the cross-domain migration capability of neural networks. To improve the domain generalization ability of the neural network, it is not only necessary to train based on the feature information of the source domain, but also based on the feature information of the target domain. Therefore, when training the neural network, add the training data between the source domain and the target The training data of the intermediate domain between domains. By generating training data located in the intermediate domain, the "domain gap" between the source domain and the target domain is filled, and the problem of large distribution differences between the training data of the source domain and the training data of the target domain is alleviated.
图14为本申请一实施例提供的获取中间域的数据的流程示意图。图15为本申请一实施例提供的获取中间域的数据的原理示意图。如图14和图15所示,该获取中间域的数据的过程可包括如下步骤:FIG. 14 is a schematic diagram of a process for obtaining data of an intermediate domain according to an embodiment of the application. FIG. 15 is a schematic diagram of a principle for obtaining data of an intermediate domain provided by an embodiment of this application. As shown in FIG. 14 and FIG. 15, the process of obtaining data of the intermediate domain may include the following steps:
步骤1401:获取源域的数据和/或目标域的数据。Step 1401: Obtain data of the source domain and/or data of the target domain.
源域和目标域为数据特征存在差异的两个领域,中间域与源域和目标域中的任一领域之间的数据特征的差异小于源域和目标域之间的数据特征的差异。中间域的数据其实是在源域的数据和/或目标域的数据的基础上添加扰动来生成的,因此要首先获取源域的数据和/或目标域的数据。The source domain and the target domain are two domains with differences in data characteristics, and the difference in data characteristics between the intermediate domain and any one of the source domain and the target domain is smaller than the difference in data characteristics between the source domain and the target domain. The data of the intermediate domain is actually generated by adding disturbances on the basis of the data of the source domain and/or the data of the target domain. Therefore, the data of the source domain and/or the data of the target domain must be obtained first.
步骤1402:将源域的数据和/或目标域的数据输入神经网络进行训练,以获取损失函数的梯度信息。Step 1402: Input the data of the source domain and/or the data of the target domain into the neural network for training, so as to obtain gradient information of the loss function.
由于要生成的为与源域和目标域之间的中间域的数据,因此需要获取损失函数的梯度信息来指导后续的扰动过程来生成中间域的数据。Since the data to be generated is the intermediate domain between the source domain and the target domain, it is necessary to obtain the gradient information of the loss function to guide the subsequent perturbation process to generate the intermediate domain data.
步骤1403:根据梯度信息,对源域的数据和/或目标域的数据进行扰动,得到中间域的数据。Step 1403: Perturb the data of the source domain and/or the data of the target domain according to the gradient information to obtain the data of the intermediate domain.
在源域的数据或目标域的数据上进行扰动以生成新的数据,这些新生成的数据便可用作中间域的数据。Perturb the data in the source domain or the data in the target domain to generate new data, and these newly generated data can be used as the data in the intermediate domain.
在本申请中,源域和目标域之间的方向信息的引入使得数据的扰动更有针对性,通过扰动获得的中间域的数据能够填补源域和目标域之间的“领域鸿沟”,缓解源域的数据和目标域的数据的分布差异大的问题。在本申请一实施例中,可使用源域的数据、目标域的数据和中间域的数据作为训练数据训练神经网络,这样训练出的神经网络可具备更好的域适应性。In this application, the introduction of directional information between the source domain and the target domain makes the perturbation of the data more targeted. The data of the intermediate domain obtained through the disturbance can fill the "domain gap" between the source domain and the target domain, and alleviate There is a big difference between the distribution of the data in the source domain and the data in the target domain. In an embodiment of the present application, the data of the source domain, the data of the target domain, and the data of the intermediate domain can be used as training data to train the neural network, so that the trained neural network can have better domain adaptability.
在本申请一实施例中,如图15所示,可以将源域的带有标签的数据X s输入神经网络TNet,进行训练以获取损失函数的梯度信息。 In an embodiment of the present application, as shown in FIG. 15, the labeled data X s of the source domain can be input to the neural network TNet for training to obtain the gradient information of the loss function.
具体地,该神经网络TNet是基于目标域的带有标签的数据X l训练生成的,可包括特征提取器F T和分类器C T。在训练的过程中,特征提取器F T提取出的特征信息P T被输入到分类器C T中,以获得分类任务的交叉熵损失L ce来指导TNet的调参过程。由于该神经网络TNet是基于输入的X l计算任务损失并调整网络参数得到的,因此该神经网络TNet其实是更适用于目标域的,这样,将X s输入神经网络TNet便会产生从源域到目标域的第一梯度信息。此时,将X s作为可优化的对象,根据任务损失反向传播的第一梯度信息,在X s上叠加一定幅度的梯度扰动,叠加了这种从源域到目标域的扰动之后的新样本可用作中间域数据,如图15中AAT所示。 Specifically, the neural network TNet is generated by training based on labeled data X l of the target domain, and may include a feature extractor FT and a classifier CT . In the training process, the feature information P T extracted by the feature extractor FT is input into the classifier C T to obtain the cross-entropy loss L ce of the classification task to guide the tuning process of TNet. Since the neural network TNet is calculated based on the input X l to calculate the task loss and adjust the network parameters, the neural network TNet is actually more suitable for the target domain. In this way, input X s into the neural network TNet will generate the source domain The first gradient information to the target domain. At this time, X s is regarded as an object that can be optimized, and a certain magnitude of gradient disturbance is superimposed on X s according to the first gradient information of the task loss backpropagation, and the new disturbance after this disturbance from the source domain to the target domain is superimposed. The samples can be used as intermediate domain data, as shown in AAT in Figure 15.
在本申请中,神经网络TNet是由目标域的带有标签的数据训练生成的,因此,将源域的带有标签的数据输入该神经网络之后得到的反向传播的第一梯度信息可以很好地衡量从源域到目标域的方向。In this application, the neural network TNet is generated by training the labeled data of the target domain. Therefore, the first gradient information obtained after inputting the labeled data of the source domain into the neural network can be very good. A good measure of the direction from the source domain to the target domain.
在本申请另一实施例中,如图15所示,可以将目标域的没有标签的数据X u输入神经 网络HNet,由于X u是没有标签的,因此可采用虚拟对抗训练的方式来获取梯度信息。 In another embodiment of the present application, as shown in FIG. 15, unlabeled data X u of the target domain can be input into the neural network HNet. Since X u is unlabeled, virtual confrontation training can be used to obtain the gradient information.
具体地,该神经网络HNet可以是基于源域的带有标签的数据X s训练生成的。与TNet的架构类似,HNet可包括特征提取器F H和分类器C H,在训练的过程中,特征提取器F H提取出的特征信息P H被输入到分类器C H中,以获得分类任务的交叉熵损失L ce来指导HNet的调参过程。X s输入HNet以计算任务损失,并根据任务损失更新HNet的网络参数。在一进一步实施例中,目标域的带有标签的数据X l也可与源域的带有标签的数据X s一起用于训练神经网络HNet,以进一步提高神经网络HNet执行任务的精度。 Specifically, the neural network HNet may be generated by training based on the labeled data X s of the source domain. And TNet architecture similar, hNET may include a feature extractor F H and classifier C H, the process of training, the feature extractor F H extracted feature information P H is input to the classifier C H to obtain a classification The task's cross-entropy loss L ce is used to guide the HNet parameter tuning process. X s is input to HNet to calculate the task loss, and the network parameters of HNet are updated according to the task loss. In a further embodiment, the labeled data X l of the target domain can also be used to train the neural network HNet together with the labeled data X s of the source domain to further improve the accuracy of the neural network HNet executing tasks.
无标签的目标域的数据X u输入HNet之后,使用虚拟对抗训练的方法生成预测的虚拟标签,基于该虚拟标签计算任务损失,并依据任务损失反向传播的第二梯度信息在X u上产生一定幅度的梯度扰动,叠加这种从目标域到源域的扰动之后的新样本可用作中间域数据,如图15中的E-VAT所示。 After the data X u of the unlabeled target domain is input into HNet, the virtual confrontation training method is used to generate the predicted virtual label, the task loss is calculated based on the virtual label, and the second gradient information back-propagated based on the task loss is generated on X u With a certain magnitude of gradient disturbance, the new sample after superimposing this disturbance from the target domain to the source domain can be used as the intermediate domain data, as shown in E-VAT in Figure 15.
在本申请中,神经网络HNet是由源域的带有标签的数据和目标域的带有标签的数据训练生成的,因此,将目标域的没有标签的数据输入该神经网络之后经过虚拟对抗训练得到的反向传播的第二梯度信息可以很好地衡量从目标域到源域的方向。In this application, the neural network HNet is generated from the labeled data of the source domain and the labeled data of the target domain. Therefore, the unlabeled data of the target domain is input into the neural network and then undergoes virtual confrontation training. The obtained back-propagated second gradient information can be a good measure of the direction from the target domain to the source domain.
在本申请另一实施例中,当源域的数据和目标域的数据都带有标签时,也可以是将目标域的带有标签的数据X l输入一个辅助神经网络,以获取损失函数的梯度信息。 In another embodiment of the present application, when the data of the source domain and the data of the target domain are both labeled, the labeled data X l of the target domain may also be input into an auxiliary neural network to obtain the loss function Gradient information.
该辅助神经网络是基于源域的带有标签的数据X s训练生成的。由于该辅助神经网络是基于输入的X s计算任务损失并调整网络参数得到的,因此该辅助神经网络其实是更适用于源域的,将X l输入该辅助神经网络便会产生从目标域到源域的梯度信息。此时,将X l作为可优化的对象,根据任务损失反向传播的梯度信息,在X l上叠加一定幅度的梯度扰动,叠加这种从目标域到源域的扰动之后的新样本也可用作中间域数据。 The auxiliary neural network is generated based on the labeled data X s of the source domain. Since the auxiliary neural network is calculated based on the input X s to calculate the task loss and adjust the network parameters, the auxiliary neural network is actually more suitable for the source domain. Inputting X l into the auxiliary neural network will generate the transition from the target domain to the target domain. The gradient information of the source domain. At this time, X l is regarded as an object that can be optimized, and a certain magnitude of gradient disturbance is superimposed on X l according to the gradient information of the task loss backpropagation. The new sample after superimposing this disturbance from the target domain to the source domain can also be used. Used as intermediate domain data.
由此可见,图15所示的实施例其实提出了一种“双向对抗训练”的方式来生成中间域的数据,即使用网络的梯度信息指导样本的扰动方向,将叠加扰动后生成的样本作为中间域的数据。例如图16所示,圆形和三角形代表不同的样本类别,可使用梯度信息获取从源域到目标域的扰动方向(如图16中从左向右的箭头方向所示),然后在源域的数据上添加扰动以生成中间域的数据;同时,也可使用梯度信息获取从目标域到源域的扰动方向(如图16中从右向左的箭头方向所示),然后在目标域的数据上添加扰动以生成中间域的数据。具体而言,可通过训练得到的辅助网络给出从源域到目标域或从目标域到源域的梯度方向,并使用该梯度方向对源域的数据或目标域的数据进行扰动以生成对抗样本;也可使用虚拟对抗训练的方式产生目标域到源域的对抗样本,从而双向地在源域和目标域的“领域鸿沟”中生成对抗样本,构造中间域。It can be seen that the embodiment shown in Figure 15 actually proposes a "two-way confrontation training" method to generate data in the intermediate domain, that is, the gradient information of the network is used to guide the perturbation direction of the sample, and the sample generated after superimposed perturbation is used as The data of the intermediate domain. For example, as shown in Figure 16, circles and triangles represent different sample categories. The gradient information can be used to obtain the perturbation direction from the source domain to the target domain (as shown by the direction of the arrow from left to right in Figure 16), and then in the source domain Perturbation is added to the data to generate the data of the intermediate domain; at the same time, the gradient information can also be used to obtain the perturbation direction from the target domain to the source domain (as shown by the arrow direction from right to left in Figure 16), and then in the target domain Perturbations are added to the data to generate intermediate domain data. Specifically, the auxiliary network obtained through training can give the gradient direction from the source domain to the target domain or from the target domain to the source domain, and use the gradient direction to perturb the data of the source domain or the data of the target domain to generate a confrontation Samples; virtual confrontation training can also be used to generate confrontation samples from the target domain to the source domain, so as to generate confrontation samples in the "domain gap" between the source domain and the target domain in both directions to construct the intermediate domain.
然而应当理解,根据域自适应学习的场景不同,也可仅获取叠加了从源域到目标域的扰动的数据作为中间域的数据,或仅获取叠加了从目标域到源域的扰动的数据作为中间域的数据。例如在无监督学习的场景下,目标域的数据并没有标签,这样也就无法基于目标域的带有标签的数据X l训练神经网络TNet,这种情况下便仅获取叠加了从目标域到源域的扰动的数据作为中间域的数据即可。 However, it should be understood that according to different scenarios of domain adaptive learning, it is also possible to obtain only the data superimposed with the disturbance from the source domain to the target domain as the data of the intermediate domain, or only obtain the data superimposed with the disturbance from the target domain to the source domain As the data of the intermediate domain. For example, in the scenario of unsupervised learning, the data of the target domain is not labeled, so it is impossible to train the neural network TNet based on the labeled data X l of the target domain. In this case, only the superimposed data from the target domain to The disturbed data in the source domain may be used as the data in the intermediate domain.
在本申请一实施例中,可将获取到的中间域的数据,连同源域的数据和目标域的数据一起输入到图9所示的神经网络中,并通过本申请实施例所提供的训练神经网络的方式训练该神经网络,从而实现“双向对抗训练”和“双层域不变特征解耦”的结合。由 于特征解耦的数据中包括了中间域的数据,可有效地对源域的数据和目标域的数据进行补足,缩小了源域和目标域之间的差异,使用中间域的数据作为训练数据进行特征解耦的训练,能够很好地提升的域不变特征解耦能力,使得训练出的神经网络的域泛化性能和跨域迁移能力得到更显著地提升。In an embodiment of this application, the acquired data of the intermediate domain, together with the data of the source domain and the data of the target domain, can be input into the neural network shown in FIG. 9 and pass the training provided by the embodiment of this application The neural network is trained in the way of neural network, so as to realize the combination of "two-way confrontation training" and "two-layer domain invariant feature decoupling". Since the data of the feature decoupling includes the data of the intermediate domain, the data of the source domain and the data of the target domain can be effectively supplemented, the difference between the source domain and the target domain is reduced, and the data of the intermediate domain is used as the training data. The training of feature decoupling can greatly improve the domain invariant feature decoupling ability, so that the domain generalization performance and cross-domain migration ability of the trained neural network are more significantly improved.
在本申请一实施例中,为了进一步提高所训练出的神经网络在执行任务时的鲁棒性,如图15所示,在基于源域的带有标签的数据X s训练生成的神经网络HNet后,还可在X s附近产生随机的噪声扰动,并将这些噪声扰动对应叠加到X s上以生成邻域内的对抗样本。并将该邻域的对抗样本也作为训练数据的一部分输入到该神经网络中进行训练。在本申请一实施例中,该邻域内的对抗样本可被输入HNet,HNet中的特征提取器F H基于该邻域内的对抗样本提取出的特征图被输入到分类器C H中,以获得分类任务的交叉熵损失L at来指导HNet的网络参数的调整过程,从而使得HNet得到进一步的训练。在一进一步实施例中,当目标域的带有标签的数据X l也参与了对神经网络HNet的训练时,也可在X l附近产生随机的噪声扰动,并将这些噪声扰动对应叠加到X l上以补充邻域内的对抗样本。 In an embodiment of the present application, in order to further improve the robustness of the trained neural network when performing tasks, as shown in Figure 15, the neural network HNet generated by training based on the labeled data X s of the source domain Later, random noise disturbances can also be generated near X s , and these noise disturbances are correspondingly superimposed on X s to generate adversarial samples in the neighborhood. And the adversarial samples of the neighborhood are also input into the neural network for training as part of the training data. In an embodiment of the present application, the adversarial samples in the neighborhood can be input into HNet, and the feature extractor F H in HNet based on the feature map extracted by the adversarial samples in the neighborhood is input into the classifier C H to obtain The cross-entropy loss L at of the classification task guides the adjustment process of HNet's network parameters, so that HNet can be further trained. In a further embodiment, when the labeled data X l of the target domain also participates in the training of the neural network HNet, random noise disturbances can also be generated near X l , and these noise disturbances are correspondingly superimposed on X l to supplement the adversarial samples in the neighborhood.
由此可见,本申请的实施例还可以基于源域和目标域的数据生成邻域内的对抗样本,以有效地对源域和目标域的数据进行补足,缩小了源域和目标域之间的差异,使得训练出的神经网络的域泛化性能和跨域迁移能力得到进一步提升。It can be seen that the embodiment of the present application can also generate adversarial samples in the neighborhood based on the data of the source domain and the target domain, so as to effectively supplement the data of the source domain and the target domain, and reduce the difference between the source domain and the target domain. The difference makes the domain generalization performance and cross-domain migration ability of the trained neural network further improved.
图17为本申请一实施例提供的一种数据处理系统的结构示意图。如图17所示该数据处理系统170用于训练神经网络,包括:数据获取网络1701和特征解耦网络1702。FIG. 17 is a schematic structural diagram of a data processing system provided by an embodiment of this application. As shown in FIG. 17, the data processing system 170 is used to train a neural network, and includes: a data acquisition network 1701 and a feature decoupling network 1702.
数据获取网络1701用于基于第一数据获取损失函数的梯度信息,并根据梯度信息对输入数据进行扰动以获取第二数据,通过获取填补第一数据的“领域鸿沟”的对抗样本作为新的第二数据,以使得训练的过程能够具备更好的域适应性。The data acquisition network 1701 is used to acquire the gradient information of the loss function based on the first data, and perturb the input data according to the gradient information to acquire the second data. By acquiring the adversarial sample that fills the "domain gap" of the first data as the new first data Two data, so that the training process can have better domain adaptability.
特征解耦网络1702用于根据包括第二数据的训练数据来训练神经网络,以使得神经网络从训练数据中学习分解域不变特征和域特定特征。The feature decoupling network 1702 is used to train a neural network according to the training data including the second data, so that the neural network learns to decompose domain invariant features and domain specific features from the training data.
在本申请一实施例中,特征解耦网络1702包括:第一特征提取层17021,用于基于训练数据提取第一特征;第一域不变特征提取层17022,用于基于第一特征提取第一域不变特征;第一域特定特征提取层17023,用于基于第一特征提取第一域特定特征;第一互信息损失获取层17024,用于基于第一域不变特征和第一域特定特征获取第一互信息损失;特征融合层17025,用于融合第一特征和第一域不变特征,以获取第二特征;第二特征提取层17026,用于基于第二特征提取第三特征;第二域不变特征解耦层17027,用于基于第三特征提取第二域不变特征;第二域特定特征提取层17028,用于基于第三特征提取第二域特定特征;第二互信息损失获取层17029,用于基于第二域不变特征和第二域特定特征获取第二互信息损失;任务损失获取层17030,用于使用第二域不变特征执行任务以获取任务损失。In an embodiment of the present application, the feature decoupling network 1702 includes: a first feature extraction layer 17021 for extracting a first feature based on training data; a first domain invariant feature extraction layer 17022 for extracting a first feature based on the first feature A domain invariant feature; the first domain specific feature extraction layer 17023, used to extract the first domain specific feature based on the first feature; the first mutual information loss acquisition layer 17024, based on the first domain invariant feature and the first domain Specific features obtain the first mutual information loss; the feature fusion layer 17025 is used to fuse the first feature and the first domain invariant feature to obtain the second feature; the second feature extraction layer 17026 is used to extract the third feature based on the second feature Features; the second domain invariant feature decoupling layer 17027 is used to extract the second domain invariant features based on the third feature; the second domain specific feature extraction layer 17028 is used to extract the second domain specific features based on the third feature; The second mutual information loss acquisition layer 17029 is used to acquire the second mutual information loss based on the invariant features of the second domain and the specific characteristics of the second domain; the task loss acquisition layer 17030 is used to perform tasks using the invariant features of the second domain to acquire tasks loss.
在本申请一实施例中,该数据处理系统170可进一步包括:第一域分类器17031,用于基于第一域特定特征执行分类任务以获取第一分类损失;第一梯度反转层17032,用于将第一分类损失的梯度信息取反;In an embodiment of the present application, the data processing system 170 may further include: a first domain classifier 17031, configured to perform a classification task based on a specific feature of the first domain to obtain a first classification loss; a first gradient reversal layer 17032, Used to reverse the gradient information of the first classification loss;
和/或,该数据处理系统170可进一步包括:第二域分类器17033,用于基于第二域特定特征执行分类任务以获取第二分类损失;第二梯度反转层17034,用于将第二分类损失的梯度信息取反。And/or, the data processing system 170 may further include: a second domain classifier 17033, configured to perform a classification task based on specific features of the second domain to obtain the second classification loss; and a second gradient reversal layer 17034, configured to convert the second domain The gradient information of the binary loss is inverted.
在本申请一实施例中,该数据处理系统170可进一步包括:重建损失获取层17035,用于使用第二域不变特征和第二域特定特征对第三特征进行重建,得到重建特征;比较第三特征和重建特征,以获取重建损失。In an embodiment of the present application, the data processing system 170 may further include: a reconstruction loss acquisition layer 17035 for reconstructing the third feature using the second domain invariant feature and the second domain specific feature to obtain the reconstructed feature; The third feature and the reconstruction feature are used to obtain the reconstruction loss.
在本申请一实施例中,第一数据包括源域的数据和/或目标域的数据。数据获取网络1701包括:基于目标域的带有标签的数据训练生成的第一训练网络;和/或,基于带有标签的数据训练生成的第二训练网络。在本申请一实施例中,第一训练网络或第二训练网络可包括特征提取器和分类器。在训练的过程中,特征提取器提取出的特征信息被输入到分类器中,以获得分类任务的交叉熵损失来指导第一训练网络或第二训练网络的调参过程。In an embodiment of the present application, the first data includes data of the source domain and/or data of the target domain. The data acquisition network 1701 includes: a first training network generated based on the labeled data training of the target domain; and/or a second training network generated based on the labeled data training. In an embodiment of the present application, the first training network or the second training network may include a feature extractor and a classifier. In the training process, the feature information extracted by the feature extractor is input into the classifier to obtain the cross-entropy loss of the classification task to guide the parameter adjustment process of the first training network or the second training network.
上述数据处理系统170中的各个模块的具体功能和操作已经在上面描述的神经网络的训练方法中进行了详细介绍,因此,这里将省略其重复描述。The specific functions and operations of the various modules in the above-mentioned data processing system 170 have been described in detail in the neural network training method described above, and therefore, repeated descriptions thereof will be omitted here.
由此可见,图17所示的数据处理系统170实现了“对抗训练填补领域鸿沟”和“双层域不变特征解耦”的结合。由于特征解耦的训练数据中包括了可填补第一数据的领域鸿沟的数据,可有效地对原有训练数据进行补足,缩小了不同领域的训练数据之间的差异,使用数据获取网络输出的数据进行特征解耦的训练,能够很好地提升的域不变特征解耦能力,使得训练出的神经网络的域泛化性能和跨域迁移能力得到更显著地提升。It can be seen that the data processing system 170 shown in FIG. 17 realizes the combination of "adversarial training to fill the domain gap" and "two-layer domain invariant feature decoupling". Since the feature-decoupled training data includes data that can fill the gap in the field of the first data, it can effectively complement the original training data, reduce the difference between training data in different fields, and use data to obtain network output The training of feature decoupling on data can greatly improve the domain invariant feature decoupling ability, so that the domain generalization performance and cross-domain migration ability of the trained neural network are more significantly improved.
图18为本申请一实施例提供的神经网络的训练装置的结构示意图。如图18所示,该神经网络的训练装置180包括:FIG. 18 is a schematic structural diagram of a neural network training device provided by an embodiment of the application. As shown in Fig. 18, the neural network training device 180 includes:
获取模块1801,配置为获取训练数据;The obtaining module 1801 is configured to obtain training data;
训练模块1802,配置为使用训练数据对神经网络进行训练,使得神经网络从训练数据中学习分解域不变特征和域特定特征。The training module 1802 is configured to use training data to train the neural network, so that the neural network learns to decompose domain invariant features and domain specific features from the training data.
本申请实施例所提供的神经网络的训练装置180,通过从训练数据中分解域不变特征和域特定特征,由于本申请的训练方法得到的神经网络使用域不变特征来执行任务,这样避免了域特定特征对于神经网络的影响,从而提升了神经网络在不同领域之间的迁移性能。The neural network training device 180 provided by the embodiment of this application decomposes domain invariant features and domain specific features from training data. Since the neural network obtained by the training method of this application uses domain invariant features to perform tasks, this avoids The influence of domain-specific features on the neural network is improved, and the migration performance of the neural network between different fields is improved.
在本申请一实施例中,训练模块1802配置为,从训练数据中分解出域不变特征和域特定特征;使用域不变特征执行任务,得到任务损失,并计算域不变特征和域特定特征之间的互信息损失,互信息损失用于表示域不变特征和域特定特征之间的差异;根据任务损失和互信息损失,训练神经网络。In an embodiment of the present application, the training module 1802 is configured to decompose domain-invariant features and domain-specific features from the training data; use the domain-invariant features to perform tasks, obtain the task loss, and calculate the domain-invariant features and domain-specific features. Mutual information loss between features. Mutual information loss is used to represent the difference between domain invariant features and domain-specific features; according to task loss and mutual information loss, neural networks are trained.
在本申请一实施例中,训练模块1802进一步配置为,使用域特定特征进行域分类,得到域分类损失;根据任务损失、互信息损失和域分类损失训练神经网络。In an embodiment of the present application, the training module 1802 is further configured to perform domain classification using domain-specific features to obtain domain classification loss; and train a neural network based on task loss, mutual information loss, and domain classification loss.
在本申请一实施例中,训练模块1802进一步配置为,从训练数据中提取初始特征;将初始特征分解成域不变特征和域特定特征;训练神经网络,以减小初始特征所包含的信息与域不变特征和域特定特征共同包含的信息之间的差异。In an embodiment of the present application, the training module 1802 is further configured to extract initial features from the training data; decompose the initial features into domain-invariant features and domain-specific features; and train a neural network to reduce the information contained in the initial features The difference between the information contained together with the domain invariant feature and the domain specific feature.
在本申请一实施例中,训练模块1802配置为,使用域不变特征和域特定特征对初始特征进行重建,得到重建特征;比较初始特征和重建特征,以确定初始特征所包含的信息与域不变特征和域特定特征共同包含的信息之间的差异。In an embodiment of the present application, the training module 1802 is configured to reconstruct the initial features using domain-invariant features and domain-specific features to obtain reconstructed features; compare the initial features and the reconstructed features to determine the information and domains contained in the initial features The difference between the information contained in invariant features and domain-specific features.
在本申请一实施例中,训练模块1802进一步配置为,使用域不变特征和域特定特征对初始特征进行重建,得到重建特征,其中域不变特征和域特定特征是从初始特征中分 解出的特征;比较初始特征和重建特征以获取重建损失,重建损失用于表征初始特征所包含的信息与域不变特征和域特定特征共同包含的信息之间的差异,In an embodiment of the present application, the training module 1802 is further configured to reconstruct the initial features using domain invariant features and domain specific features to obtain reconstructed features, where the domain invariant features and domain specific features are decomposed from the initial features The characteristics of; compare the initial feature and the reconstruction feature to obtain the reconstruction loss. The reconstruction loss is used to characterize the difference between the information contained in the initial feature and the information contained in the domain invariant feature and the domain specific feature.
其中,训练模块配置为,根据任务损失对神经网络进行第一阶段的训练;根据互信息损失对神经网络进行第二阶段的训练;根据重建损失对神经网络进行第三阶段的训练。The training module is configured to train the neural network in the first stage according to the task loss; train the neural network in the second stage according to the mutual information loss; and train the neural network in the third stage according to the reconstruction loss.
在本申请一实施例中,神经网络包括第一解耦器和第二解耦器,训练模块1602配置为,从训练数据中提取训练数据的第一特征;采用第一解耦器从第一特征中提取初步域不变特征和初步域特定特征;将初步域不变特征与第一特征融合,得到第二特征;从第二特征中提取训练数据的第三特征;采用第二解耦器从第三特征中提取域不变特征和域特定特征。In an embodiment of the present application, the neural network includes a first decoupler and a second decoupler, and the training module 1602 is configured to extract the first feature of the training data from the training data; Extract the preliminary domain invariant features and preliminary domain specific features from the features; fuse the preliminary domain invariant features with the first feature to obtain the second feature; extract the third feature of the training data from the second feature; use the second decoupler Extract domain-invariant features and domain-specific features from the third feature.
在本申请一实施例中,训练模块1802进一步配置为,训练神经网络,以减小第三特征所包含的信息与域不变特征和域特定特征共同包含的信息之间的差异。In an embodiment of the present application, the training module 1802 is further configured to train a neural network to reduce the difference between the information contained in the third feature and the information jointly contained in the domain invariant feature and the domain specific feature.
图19为本申请一实施例提供的数据获取装置的结构示意图。如图19所示,该数据获取装置190包括:FIG. 19 is a schematic structural diagram of a data acquisition device provided by an embodiment of this application. As shown in FIG. 19, the data acquisition device 190 includes:
数据获取模块1901,配置为获取源域的数据和/或目标域的数据;其中,源域和目标域为数据特征存在差异的两个领域,中间域与源域和目标域中的任一领域之间的数据特征的差异小于源域和目标域之间的数据特征的差异;The data acquisition module 1901 is configured to acquire the data of the source domain and/or the data of the target domain; among them, the source domain and the target domain are two domains with different data characteristics, and the intermediate domain and any one of the source domain and the target domain The difference in data characteristics between is smaller than the difference in data characteristics between the source domain and the target domain;
梯度信息获取模块1902,配置为将源域的数据和/或目标域的数据输入神经网络进行训练,以获取损失函数的梯度信息;The gradient information acquisition module 1902 is configured to input data of the source domain and/or data of the target domain into the neural network for training, so as to acquire gradient information of the loss function;
中间域数据生成模块1903,配置为根据梯度信息,对源域的数据和/或目标域的数据进行扰动,得到中间域的数据。The intermediate domain data generating module 1903 is configured to perturb the data of the source domain and/or the data of the target domain according to the gradient information to obtain the data of the intermediate domain.
在本申请一实施例中,梯度信息获取模块1902配置为,将源域的带有标签的数据输入第一神经网络,进行训练,得到第一梯度信息,其中,第一神经网络是基于目标域的带有标签的数据训练生成的。In an embodiment of the present application, the gradient information acquisition module 1902 is configured to input the labeled data of the source domain into the first neural network for training to obtain the first gradient information, where the first neural network is based on the target domain The labeled data is generated by training.
在本申请一实施例中,梯度信息获取模块1902配置为,将目标域的没有标签的数据输入第二神经网络,以虚拟对抗训练的方式进行训练,得到第二梯度信息,其中,第二神经网络是基于带有标签的数据训练生成的。In an embodiment of the present application, the gradient information acquisition module 1902 is configured to input unlabeled data of the target domain into the second neural network, and perform training in a virtual confrontation training manner to obtain the second gradient information, where the second neural network The network is generated based on training with labeled data.
上述神经网络的训练装置180/190中的各个模块的具体功能和操作已经在上面描述的神经网络的训练方法中进行了详细介绍,因此,这里将省略其重复描述。The specific functions and operations of each module in the neural network training device 180/190 have been described in detail in the neural network training method described above, and therefore, repeated descriptions thereof will be omitted here.
图20为本申请一实施例提供的神经网络的训练装置的硬件结构示意图。图20所示的神经网络的训练装置2000(该装置2000具体可以是一种计算机设备)包括存储器2001、处理器2002、通信接口2003以及总线2004。其中,存储器2001、处理器2002、通信接口2003通过总线2004实现彼此之间的通信连接。FIG. 20 is a schematic diagram of the hardware structure of a neural network training device provided by an embodiment of the application. The neural network training device 2000 shown in FIG. 20 (the device 2000 may specifically be a computer device) includes a memory 2001, a processor 2002, a communication interface 2003, and a bus 2004. Among them, the memory 2001, the processor 2002, and the communication interface 2003 realize the communication connection between each other through the bus 2004.
存储器2001可以是只读存储器(read only memory,ROM),静态存储设备,动态存储设备或者随机存取存储器(random access memory,RAM)。存储器2001可以存储程序,当存储器2001中存储的程序被处理器2002执行时,处理器2002和通信接口2003用于执行本申请实施例的神经网络的训练方法的各个步骤。The memory 2001 may be a read only memory (ROM), a static storage device, a dynamic storage device, or a random access memory (RAM). The memory 2001 may store a program. When the program stored in the memory 2001 is executed by the processor 2002, the processor 2002 and the communication interface 2003 are used to execute each step of the neural network training method of the embodiment of the present application.
处理器2002可以采用通用的中央处理器(central processing unit,CPU),微处理器,应用专用集成电路(application specific integrated circuit,ASIC),图形处理器(graphics processing unit,GPU)或者一个或多个集成电路,用于执行相关程序,以实现本申请实施 例的神经网络的训练装置中的单元所需执行的功能,或者执行本申请方法实施例的神经网络的训练方法。The processor 2002 can adopt a general-purpose central processing unit (central processing unit, CPU), microprocessor, application specific integrated circuit (application specific integrated circuit, ASIC), graphics processing unit (GPU), or one or more The integrated circuit is used to execute related programs to realize the functions required by the units in the neural network training device of the embodiment of the present application, or to execute the neural network training method of the method embodiment of the present application.
处理器2002还可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,本申请的神经网络的训练方法的各个步骤可以通过处理器2002中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器2002还可以是通用处理器、数字信号处理器(digital signal processing,DSP)、专用集成电路(ASIC)、现成可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器2001,处理器2002读取存储器2001中的信息,结合其硬件完成本申请实施例的神经网络的训练装置中包括的单元所需执行的功能,或者执行本申请方法实施例的神经网络的训练方法。The processor 2002 may also be an integrated circuit chip with signal processing capability. In the implementation process, each step of the neural network training method of the present application can be completed by the integrated logic circuit of hardware in the processor 2002 or instructions in the form of software. The aforementioned processor 2002 may also be a general-purpose processor, a digital signal processing (digital signal processing, DSP), an application-specific integrated circuit (ASIC), a ready-made programmable gate array (field programmable gate array, FPGA) or other programmable logic devices , Discrete gates or transistor logic devices, discrete hardware components. The methods, steps, and logical block diagrams disclosed in the embodiments of the present application can be implemented or executed. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like. The steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor. The software module can be located in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers. The storage medium is located in the memory 2001, and the processor 2002 reads the information in the memory 2001, and combines its hardware to complete the functions required by the units included in the neural network training device of the embodiment of the present application, or perform the functions of the method embodiment of the present application. Training method of neural network.
通信接口2003使用例如但不限于收发器一类的收发装置,来实现装置2000与其他设备或通信网络之间的通信。例如,可以通过通信接口2003获取训练数据。The communication interface 2003 uses a transceiver device such as but not limited to a transceiver to implement communication between the device 2000 and other devices or a communication network. For example, the training data can be obtained through the communication interface 2003.
总线2004可包括在装置2000各个部件(例如,存储器2001、处理器2002、通信接口2003)之间传送信息的通路。The bus 2004 may include a path for transferring information between various components of the device 2000 (for example, the memory 2001, the processor 2002, and the communication interface 2003).
应理解,神经网络的训练装置180中的获取模块1801和训练模块1802,或神经网络的训练装置190中的数据获取模块1901、梯度信息获取模块1902、中间域数据生成模块1903、和训练执行模块1904可以相当于处理器2002。It should be understood that the acquisition module 1801 and the training module 1802 in the neural network training device 180, or the data acquisition module 1901, the gradient information acquisition module 1902, the intermediate domain data generation module 1903, and the training execution module in the neural network training device 190 1904 can be equivalent to processor 2002.
应注意,尽管图20所示的装置2000仅仅示出了存储器、处理器、通信接口,但是在具体实现过程中,本领域的技术人员应当理解,装置2000还包括实现正常运行所必须的其他器件。同时,根据具体需要,本领域的技术人员应当理解,装置2000还可包括实现其他附加功能的硬件器件。此外,本领域的技术人员应当理解,装置2000也可仅仅包括实现本申请实施例所必须的器件,而不必包括图20中所示的全部器件。It should be noted that although the device 2000 shown in FIG. 20 only shows a memory, a processor, and a communication interface, in the specific implementation process, those skilled in the art should understand that the device 2000 also includes other devices necessary for normal operation. . At the same time, according to specific needs, those skilled in the art should understand that the device 2000 may also include hardware devices that implement other additional functions. In addition, those skilled in the art should understand that the device 2000 may also include only the components necessary to implement the embodiments of the present application, and does not necessarily include all the components shown in FIG. 20.
可以理解,所述装置2000相当于图2中的所述训练设备220。本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。It can be understood that the apparatus 2000 is equivalent to the training device 220 in FIG. 2. A person of ordinary skill in the art may be aware that the units and algorithm steps of the examples described in combination with the embodiments disclosed herein can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered as going beyond the scope of this application.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and conciseness of description, the specific working process of the above-described system, device, and unit can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置 或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed system, device, and method can be implemented in other ways. For example, the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or It can be integrated into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器、随机存取存储器、磁碟或者光盘等各种可以存储程序代码的介质。If the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the technical solution of this application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory, random access memory, magnetic disk or optical disk and other media that can store program codes.
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。The above are only specific implementations of this application, but the protection scope of this application is not limited to this. Any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed in this application. Should be covered within the scope of protection of this application. Therefore, the protection scope of this application should be subject to the protection scope of the claims.

Claims (33)

  1. 一种神经网络的训练方法,其特征在于,包括:A neural network training method, which is characterized in that it includes:
    获取训练数据;Obtain training data;
    使用所述训练数据对神经网络进行训练,使得所述神经网络从所述训练数据中学习分解域不变特征和域特定特征;Training a neural network using the training data, so that the neural network learns to decompose domain invariant features and domain specific features from the training data;
    其中,所述域特定特征为表征所述训练数据所属的领域的特征,所述域不变特征为与所述训练数据所属领域无关的特征。Wherein, the domain-specific feature is a feature that characterizes the domain to which the training data belongs, and the domain invariant feature is a feature that has nothing to do with the domain to which the training data belongs.
  2. 根据权利要求1所述的方法,其特征在于,所述使用所述训练数据对神经网络进行训练包括:The method according to claim 1, wherein the training a neural network using the training data comprises:
    从所述训练数据的特征中分解出域不变特征和域特定特征;Decompose domain-invariant features and domain-specific features from the features of the training data;
    使用所述域不变特征执行任务,得到任务损失,并计算所述域不变特征和所述域特定特征之间的互信息损失,所述任务损失用于表征使用所述域不变特征执行任务所得到的结果与任务标签之间的差距,所述互信息损失用于表示所述域不变特征和所述域特定特征之间的差异;Use the domain invariant feature to perform a task to obtain a task loss, and calculate the mutual information loss between the domain invariant feature and the domain specific feature, and the task loss is used to characterize the execution using the domain invariant feature The gap between the result obtained by the task and the task label, and the mutual information loss is used to represent the difference between the domain invariant feature and the domain specific feature;
    根据所述任务损失和所述互信息损失,训练所述神经网络。Training the neural network according to the task loss and the mutual information loss.
  3. 根据权利要求2所述的方法,其特征在于,还包括:The method according to claim 2, further comprising:
    使用所述域特定特征进行域分类,得到域分类损失;Use the domain specific features to perform domain classification to obtain a domain classification loss;
    其中,所述根据所述任务损失和所述互信息损失,训练所述神经网络,包括:Wherein, the training of the neural network according to the task loss and the mutual information loss includes:
    根据所述任务损失、所述互信息损失和所述域分类损失训练所述神经网络。Training the neural network according to the task loss, the mutual information loss, and the domain classification loss.
  4. 根据权利要求2或3所述的方法,其特征在于,所述从所述训练数据的特征中分解出域不变特征和域特定特征,包括:The method according to claim 2 or 3, wherein the decomposing domain-invariant features and domain-specific features from the features of the training data comprises:
    从所述训练数据中提取初始特征;Extract initial features from the training data;
    将所述初始特征分解成所述域不变特征和所述域特定特征,Decompose the initial feature into the domain invariant feature and the domain specific feature,
    其中,所述方法,还包括:Wherein, the method further includes:
    训练所述神经网络,以减小所述初始特征所包含的信息与所述域不变特征和所述域特定特征共同包含的信息之间的差异。The neural network is trained to reduce the difference between the information contained in the initial feature and the information contained in the domain invariant feature and the domain specific feature.
  5. 根据权利要求4所述的方法,其特征在于,在所述训练所述神经网络,以减小所述初始特征所包含的信息与所述域不变特征和所述域特定特征共同包含的信息之间的差异之前,还包括:The method according to claim 4, wherein the neural network is trained to reduce the information contained in the initial feature and the information contained in the domain invariant feature and the domain specific feature Before the differences, include:
    使用所述域不变特征和所述域特定特征对所述初始特征进行重建,得到重建特征;Using the domain invariant feature and the domain specific feature to reconstruct the initial feature to obtain a reconstructed feature;
    比较所述初始特征和所述重建特征,以确定所述初始特征所包含的信息与所述域不变特征和所述域特定特征共同包含的信息之间的差异。The initial feature and the reconstructed feature are compared to determine the difference between the information contained in the initial feature and the information contained in the domain invariant feature and the domain specific feature.
  6. 根据权利要求2所述的方法,其特征在于,还包括:The method according to claim 2, further comprising:
    使用所述域不变特征和所述域特定特征对初始特征进行重建,得到重建特征,其中所述域不变特征和所述域特定特征是从所述初始特征中分解出的特征;Using the domain invariant feature and the domain specific feature to reconstruct an initial feature to obtain a reconstructed feature, wherein the domain invariant feature and the domain specific feature are features decomposed from the initial feature;
    比较所述初始特征和所述重建特征以获取重建损失,所述重建损失用于表征所述初始特征所包含的信息与所述域不变特征和所述域特定特征共同包含的信息之间的差异,The initial feature and the reconstruction feature are compared to obtain a reconstruction loss, and the reconstruction loss is used to characterize the difference between the information contained in the initial feature and the information contained in the domain invariant feature and the domain specific feature. difference,
    其中,所述根据所述任务损失和所述互信息损失,训练所述神经网络,包括:Wherein, the training of the neural network according to the task loss and the mutual information loss includes:
    根据所述任务损失,对所述神经网络进行第一阶段的训练;According to the task loss, perform the first stage of training on the neural network;
    根据所述互信息损失,对所述神经网络进行第二阶段的训练,According to the mutual information loss, the neural network is trained in the second stage,
    其中,所述方法还包括:Wherein, the method further includes:
    根据所述重建损失,对所述神经网络进行第三阶段的训练。According to the reconstruction loss, the neural network is trained in the third stage.
  7. 根据权利要求2或3所述的方法,其特征在于,所述神经网络包括第一解耦器和第二解耦器,所述从所述训练数据的特征中分解出域不变特征和域特定特征,包括:The method according to claim 2 or 3, wherein the neural network includes a first decoupler and a second decoupler, and the domain invariant features and domains are decomposed from the features of the training data. Specific characteristics, including:
    从所述训练数据中提取所述训练数据的第一特征;Extracting the first feature of the training data from the training data;
    采用所述第一解耦器从所述第一特征中提取初步域不变特征和初步域特定特征;Extracting preliminary domain invariant features and preliminary domain specific features from the first feature by using the first decoupler;
    将所述初步域不变特征与所述第一特征融合,得到第二特征;Fusing the preliminary domain invariant feature with the first feature to obtain a second feature;
    从所述第二特征中提取所述训练数据的第三特征;Extracting the third feature of the training data from the second feature;
    采用第二解耦器从所述第三特征中提取所述域不变特征和所述域特定特征。A second decoupler is used to extract the domain invariant feature and the domain specific feature from the third feature.
  8. 根据权利要求7所述的方法,其特征在于,还包括:The method according to claim 7, further comprising:
    训练所述神经网络,以减小所述第三特征所包含的信息与所述域不变特征和所述域特定特征共同所包含的信息之间的差异。The neural network is trained to reduce the difference between the information contained in the third feature and the information contained in the domain invariant feature and the domain specific feature.
  9. 根据权利要求1至8中的任一项所述的方法,其特征在于,所述神经网络用于进行域自适应学习,所述训练数据包括不同领域的图像数据。The method according to any one of claims 1 to 8, wherein the neural network is used for domain adaptive learning, and the training data includes image data in different fields.
  10. 一种数据获取方法,其特征在于,包括:A data acquisition method, characterized in that it comprises:
    获取源域的数据和/或目标域的数据;Obtain the data of the source domain and/or the data of the target domain;
    将所述源域的数据和/或所述目标域的数据输入神经网络进行训练,以获取损失函数的梯度信息;Input the data of the source domain and/or the data of the target domain into the neural network for training, so as to obtain gradient information of the loss function;
    根据所述梯度信息,对所述源域的数据和/或所述目标域的数据进行扰动,得到中间域的数据;Perturb the data of the source domain and/or the data of the target domain according to the gradient information to obtain the data of the intermediate domain;
    其中,所述源域和所述目标域为数据特征存在差异的两个领域,所述中间域与所述源域和所述目标域中的任一领域之间的数据特征的差异小于所述源域和所述目标域之间的数据特征的差异。Wherein, the source domain and the target domain are two domains with differences in data characteristics, and the difference in data characteristics between the intermediate domain and any one of the source domain and the target domain is less than that of the The difference in data characteristics between the source domain and the target domain.
  11. 根据权利要求10所述的方法,其特征在于,所述将所述源域的数据和/或所述目标域的数据输入神经网络进行训练,以获取损失函数的梯度信息,包括:The method according to claim 10, wherein the inputting the data of the source domain and/or the data of the target domain into a neural network for training to obtain gradient information of a loss function comprises:
    将所述源域的带有标签的数据输入第一神经网络,进行训练,得到第一梯度信息,其中,所述第一神经网络是基于所述目标域的带有标签的数据训练生成的。The labeled data of the source domain is input into a first neural network for training to obtain first gradient information, where the first neural network is generated based on the labeled data of the target domain through training.
  12. 根据权利要求10所述的方法,其特征在于,所述将所述源域的数据和/或所述目标域的数据输入神经网络进行训练,以获取损失函数的梯度信息,包括:The method according to claim 10, wherein the inputting the data of the source domain and/or the data of the target domain into a neural network for training to obtain gradient information of a loss function comprises:
    将所述目标域的没有标签的数据输入第二神经网络,以虚拟对抗训练的方式进行训练,得到第二梯度信息,其中,所述第二神经网络是基于带有标签的数据训练生成的。Input the unlabeled data of the target domain into a second neural network, and perform training in a virtual confrontation training manner to obtain second gradient information, wherein the second neural network is generated based on the labeled data training.
  13. 一种神经网络的训练装置,其特征在于,包括:A neural network training device, which is characterized in that it comprises:
    获取模块,配置为获取训练数据;The acquisition module is configured to acquire training data;
    训练模块,配置为使用所述训练数据对神经网络进行训练,使得所述神经网络从所述训练数据中学习分解域不变特征和域特定特征;A training module configured to use the training data to train a neural network, so that the neural network learns to decompose domain invariant features and domain specific features from the training data;
    其中,所述域特定特征为表征所述训练数据所属的领域的特征,所述域不变特征为与所述训练数据所属领域无关的特征。Wherein, the domain-specific feature is a feature that characterizes the domain to which the training data belongs, and the domain invariant feature is a feature that has nothing to do with the domain to which the training data belongs.
  14. 根据权利要求13所述的装置,其特征在于,所述训练模块配置为,从所述训练 数据的特征中分解出域不变特征和域特定特征;使用所述域不变特征执行任务,得到任务损失,并计算所述域不变特征和所述域特定特征之间的互信息损失,所述任务损失用于表征使用所述域不变特征执行任务所得到的结果与任务标签之间的差距,所述互信息损失用于表示所述域不变特征和所述域特定特征之间的差异;根据所述任务损失和所述互信息损失,训练所述神经网络。The device according to claim 13, wherein the training module is configured to decompose domain-invariant features and domain-specific features from the features of the training data; use the domain-invariant features to perform tasks to obtain Task loss, and calculate the mutual information loss between the domain invariant feature and the domain specific feature. The task loss is used to characterize the difference between the result of performing the task using the domain invariant feature and the task label. Gap, the mutual information loss is used to represent the difference between the domain invariant feature and the domain specific feature; training the neural network according to the task loss and the mutual information loss.
  15. 根据权利要求14所述的装置,其特征在于,所述训练模块进一步配置为,使用所述域特定特征进行域分类,得到域分类损失;根据所述任务损失、所述互信息损失和所述域分类损失训练所述神经网络。The apparatus according to claim 14, wherein the training module is further configured to perform domain classification using the domain specific features to obtain a domain classification loss; according to the task loss, the mutual information loss, and the The domain classification loss trains the neural network.
  16. 根据权利要求14或15所述的装置,其特征在于,所述训练模块进一步配置为,从所述训练数据中提取初始特征;将所述初始特征分解成所述域不变特征和所述域特定特征;训练所述神经网络,以减小所述初始特征所包含的信息与所述域不变特征和所述域特定特征共同所包含的信息之间的差异。The device according to claim 14 or 15, wherein the training module is further configured to extract an initial feature from the training data; decompose the initial feature into the domain invariant feature and the domain Specific feature; training the neural network to reduce the difference between the information contained in the initial feature and the information contained in the domain invariant feature and the domain specific feature.
  17. 根据权利要求16所述的装置,其特征在于,所述训练模块配置为,使用所述域不变特征和所述域特定特征对所述初始特征进行重建,得到重建特征;比较所述初始特征和所述重建特征,以确定所述初始特征所包含的信息与所述域不变特征和所述域特定特征共同包含的信息之间的差异。The apparatus according to claim 16, wherein the training module is configured to use the domain invariant feature and the domain specific feature to reconstruct the initial feature to obtain a reconstructed feature; compare the initial feature And the reconstruction feature to determine the difference between the information contained in the initial feature and the information contained in the domain invariant feature and the domain specific feature.
  18. 根据权利要求17所述的装置,其特征在于,所述训练模块进一步配置为,使用所述域不变特征和所述域特定特征对初始特征进行重建,得到重建特征,其中所述域不变特征和所述域特定特征是从所述初始特征中分解出的特征;比较所述初始特征和所述重建特征以获取重建损失,所述重建损失用于表征所述初始特征所包含的信息与所述域不变特征和所述域特定特征共同包含的信息之间的差异,The apparatus according to claim 17, wherein the training module is further configured to use the domain invariant feature and the domain specific feature to reconstruct the initial feature to obtain a reconstructed feature, wherein the domain invariant The feature and the domain-specific feature are features decomposed from the initial feature; the initial feature and the reconstruction feature are compared to obtain a reconstruction loss, and the reconstruction loss is used to characterize the information contained in the initial feature and The difference between the information jointly contained in the domain invariant feature and the domain specific feature,
    其中,所述训练模块配置为,根据所述任务损失对所述神经网络进行第一阶段的训练;根据所述互信息损失对所述神经网络进行第二阶段的训练;根据所述重建损失对所述神经网络进行第三阶段的训练。Wherein, the training module is configured to train the neural network in the first stage according to the task loss; train the neural network in the second stage according to the mutual information loss; The neural network performs the third stage of training.
  19. 根据权利要求14或15所述的装置,其特征在于,所述神经网络包括第一解耦器和第二解耦器,所述训练模块配置为,从所述训练数据中提取所述训练数据的第一特征;采用所述第一解耦器从所述第一特征中提取初步域不变特征和初步域特定特征;将所述初步域不变特征与所述第一特征融合,得到第二特征;从所述第二特征中提取所述训练数据的第三特征;采用第二解耦器从所述第三特征中提取所述域不变特征和所述域特定特征。The device according to claim 14 or 15, wherein the neural network comprises a first decoupler and a second decoupler, and the training module is configured to extract the training data from the training data Using the first decoupler to extract preliminary domain invariant features and preliminary domain specific features from the first feature; fusing the preliminary domain invariant features with the first feature to obtain the first feature Two features; extract the third feature of the training data from the second feature; use a second decoupler to extract the domain invariant feature and the domain specific feature from the third feature.
  20. 根据权利要求19所述的装置,其特征在于,所述训练模块进一步配置为,训练所述神经网络,以减小所述第三特征所包含的信息与所述域不变特征和所述域特定特征共同包含的信息之间的差异。The device according to claim 19, wherein the training module is further configured to train the neural network to reduce the information contained in the third feature and the domain invariant feature and the domain The difference between the information contained in a particular feature.
  21. 一种数据获取装置,其特征在于,包括:A data acquisition device, characterized in that it comprises:
    数据获取模块,配置为获取源域的数据和/或目标域的数据;The data acquisition module is configured to acquire the data of the source domain and/or the data of the target domain;
    梯度信息获取模块,配置为将所述源域的数据和/或所述目标域的数据输入神经网络进行训练,以获取损失函数的梯度信息;A gradient information acquisition module, configured to input the data of the source domain and/or the data of the target domain into a neural network for training, so as to acquire gradient information of a loss function;
    中间域数据生成模块,配置为根据所述梯度信息,对所述源域的数据和/或所述目标域的数据进行扰动,得到中间域的数据;An intermediate domain data generating module configured to perturb the data of the source domain and/or the data of the target domain according to the gradient information to obtain the data of the intermediate domain;
    其中,所述源域和所述目标域为数据特征存在差异的两个领域,所述中间域与所述源域和所述目标域中的任一领域之间的数据特征的差异小于所述源域和所述目标域之间的数据特征的差异。Wherein, the source domain and the target domain are two domains with differences in data characteristics, and the difference in data characteristics between the intermediate domain and any one of the source domain and the target domain is less than that of the The difference in data characteristics between the source domain and the target domain.
  22. 根据权利要求21所述的装置,其特征在于,所述梯度信息获取模块配置为将所述源域的带有标签的数据输入第一神经网络,进行训练,得到第一梯度信息,其中,所述第一神经网络是基于所述目标域的带有标签的数据训练生成的。The device according to claim 21, wherein the gradient information acquisition module is configured to input the labeled data of the source domain into the first neural network for training to obtain the first gradient information, wherein The first neural network is trained and generated based on labeled data of the target domain.
  23. 根据权利要求21所述的装置,其特征在于,所述梯度信息获取模块配置为将所述目标域的没有标签的数据输入第二神经网络,以虚拟对抗训练的方式进行训练,得到第二梯度信息,其中,所述第二神经网络是基于带有标签的数据训练生成的。The device according to claim 21, wherein the gradient information acquisition module is configured to input unlabeled data of the target domain into a second neural network, and perform training in a virtual confrontation training manner to obtain a second gradient Information, wherein the second neural network is generated based on training with labeled data.
  24. 一种神经网络的训练装置,包括:A neural network training device, including:
    存储器,用于存储程序;Memory, used to store programs;
    处理器,用于执行所述存储器存储的程序,当所述存储器存储的程序被执行时,所述处理器用于执行如权利要求1-9中任一项所述的神经网络的训练方法或如权利要求10至权利要求12中任一项所述的数据获取方法。The processor is configured to execute the program stored in the memory, and when the program stored in the memory is executed, the processor is configured to execute the neural network training method according to any one of claims 1-9 or The data acquisition method according to any one of claim 10 to claim 12.
  25. 一种神经网络,其特征在于,包括:A neural network, characterized in that it includes:
    第一特征提取层,用于基于输入数据提取第一特征;The first feature extraction layer is used to extract the first feature based on input data;
    第一域不变特征解耦层,用于基于所述第一特征提取第一域不变特征;The first domain invariant feature decoupling layer is used to extract the first domain invariant feature based on the first feature;
    特征融合层,用于融合所述第一特征和所述第一域不变特征,以获取第二特征;The feature fusion layer is used to fuse the first feature and the invariant feature of the first domain to obtain a second feature;
    第二特征提取层,用于基于所述第二特征提取第三特征;The second feature extraction layer is used to extract a third feature based on the second feature;
    第二域不变特征解耦层,用于基于所述第三特征提取第二域不变特征;The second domain invariant feature decoupling layer is used to extract the second domain invariant feature based on the third feature;
    其中,所述第一域不变特征和所述第二域不变特征分别为表征所述输入数据所属的领域的特征,所述第一域特定特征和所述第二域特定特征分别为与所述输入数据所属领域无关的特征。Wherein, the first domain invariant feature and the second domain invariant feature are respectively features that characterize the domain to which the input data belongs, and the first domain specific feature and the second domain specific feature are respectively and Features irrelevant to the field of the input data.
  26. 一种数据处理系统,其特征在于,包括:A data processing system is characterized in that it comprises:
    数据获取网络,用于基于第一数据获取损失函数的梯度信息,并根据所述梯度信息对所述第一数据进行扰动以获取第二数据;A data acquisition network, configured to acquire gradient information of a loss function based on the first data, and perturb the first data according to the gradient information to acquire second data;
    特征解耦网络,用于使用包括所述第二数据的训练数据对神经网络进行训练,使得所述神经网络从所述训练数据中学习分解域不变特征和域特定特征;A feature decoupling network is used to train a neural network using training data including the second data, so that the neural network learns to decompose domain-invariant features and domain-specific features from the training data;
    其中,所述域特定特征为表征所述训练数据所属的领域的特征,所述域不变特征为与所述训练数据所属领域无关的特征。Wherein, the domain-specific feature is a feature that characterizes the domain to which the training data belongs, and the domain invariant feature is a feature that has nothing to do with the domain to which the training data belongs.
  27. 根据权利要求26所述的数据处理系统,其特征在于,所述特征解耦网络包括:The data processing system according to claim 26, wherein the characteristic decoupling network comprises:
    第一特征提取层,用于基于所述训练数据提取第一特征;The first feature extraction layer is used to extract the first feature based on the training data;
    第一域不变特征提取层,用于基于所述第一特征提取第一域不变特征;The first domain invariant feature extraction layer is configured to extract the first domain invariant feature based on the first feature;
    第一域特定特征提取层,用于基于所述第一特征提取第一域特定特征;The first domain specific feature extraction layer is configured to extract the first domain specific feature based on the first feature;
    第一互信息损失获取层,用于基于所述第一域不变特征和所述第一域特定特征获取第一互信息损失;The first mutual information loss acquisition layer is configured to acquire the first mutual information loss based on the invariant feature of the first domain and the specific feature of the first domain;
    特征融合层,用于融合所述第一特征和所述第一域不变特征,以获取第二特征;The feature fusion layer is used to fuse the first feature and the invariant feature of the first domain to obtain a second feature;
    第二特征提取层,用于基于所述第二特征提取第三特征;The second feature extraction layer is used to extract a third feature based on the second feature;
    第二域不变特征解耦层,用于基于所述第三特征提取第二域不变特征;The second domain invariant feature decoupling layer is used to extract the second domain invariant feature based on the third feature;
    第二域特定特征提取层,用于基于所述第三特征提取第二域特定特征;The second domain specific feature extraction layer is configured to extract the second domain specific feature based on the third feature;
    第二互信息损失获取层,用于基于所述第二域不变特征和所述第二域特定特征获取第二互信息损失;The second mutual information loss acquisition layer is configured to acquire the second mutual information loss based on the invariant feature of the second domain and the specific feature of the second domain;
    任务损失获取层,用于使用所述第二域不变特征执行任务以获取任务损失。The task loss acquisition layer is used to perform tasks using the invariant features of the second domain to obtain task losses.
  28. 根据权利要求27所述的数据处理系统,其特征在于,进一步包括:The data processing system according to claim 27, further comprising:
    第一域分类器,用于基于所述第一域特定特征执行分类任务以获取第一分类损失;A first domain classifier, configured to perform a classification task based on the specific characteristics of the first domain to obtain a first classification loss;
    第一梯度反转层,用于将所述第一分类损失的梯度信息取反;The first gradient inversion layer is used to reverse the gradient information of the first classification loss;
    和/或,and / or,
    第二域分类器,用于基于所述第二域特定特征执行分类任务以获取第二分类损失;A second domain classifier, configured to perform a classification task based on the specific characteristics of the second domain to obtain a second classification loss;
    第二梯度反转层,用于将所述第二分类损失的梯度信息取反。The second gradient reversal layer is used to reverse the gradient information of the second classification loss.
  29. 根据权利要求27或28所述的数据处理系统,其特征在于,进一步包括:The data processing system according to claim 27 or 28, further comprising:
    重建损失获取层,用于使用所述第二域不变特征和所述第二域特定特征对所述第三特征进行重建,得到重建特征;比较所述第三特征和所述重建特征,以获取重建损失。The reconstruction loss acquisition layer is used to reconstruct the third feature using the invariant feature of the second domain and the specific feature of the second domain to obtain a reconstructed feature; compare the third feature with the reconstructed feature to Get reconstruction loss.
  30. 根据权利要求26至29中任一所述的数据处理系统,其特征在于,所述第一数据包括源域的数据和/或目标域的数据,其中,所述数据获取网络包括:The data processing system according to any one of claims 26 to 29, wherein the first data comprises data of a source domain and/or data of a target domain, wherein the data acquisition network comprises:
    基于所述目标域的带有标签的数据训练生成的第一训练网络;A first training network generated based on the labeled data training of the target domain;
    和/或,and / or,
    基于带有标签的数据训练生成的第二训练网络。A second training network generated based on the labeled data training.
  31. 一种安防设备,其特征在于,包括如权利要求25所述的神经网络。A security equipment, characterized by comprising the neural network according to claim 25.
  32. 一种计算机可读存储介质,包括指令,当其在计算机上运行时,使得计算机执行如权利要求1-12任意一项所述的方法。A computer-readable storage medium, comprising instructions, which when run on a computer, cause the computer to execute the method according to any one of claims 1-12.
  33. 一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行如权利要求1-12任意一项所述的方法。A computer program product containing instructions that, when run on a computer, causes the computer to execute the method according to any one of claims 1-12.
PCT/CN2021/096019 2020-06-24 2021-05-26 Neural network training method and device, and data acquisition method and device WO2021258967A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010594053.6 2020-06-24
CN202010594053.6A CN111898635A (en) 2020-06-24 2020-06-24 Neural network training method, data acquisition method and device

Publications (1)

Publication Number Publication Date
WO2021258967A1 true WO2021258967A1 (en) 2021-12-30

Family

ID=73207101

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/096019 WO2021258967A1 (en) 2020-06-24 2021-05-26 Neural network training method and device, and data acquisition method and device

Country Status (2)

Country Link
CN (1) CN111898635A (en)
WO (1) WO2021258967A1 (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114912516A (en) * 2022-04-25 2022-08-16 湖南大学无锡智能控制研究院 Cross-domain target detection method and system for coordinating feature consistency and specificity
CN115310361A (en) * 2022-08-16 2022-11-08 中国矿业大学 Method and system for predicting underground dust concentration of coal mine based on WGAN-CNN
CN115496916A (en) * 2022-09-30 2022-12-20 北京百度网讯科技有限公司 Training method of image recognition model, image recognition method and related device
CN116010805A (en) * 2023-03-24 2023-04-25 昆明理工大学 Rolling bearing fault feature extraction method and device based on convolutional neural network
CN116229080A (en) * 2023-05-08 2023-06-06 中国科学技术大学 Semi-supervised domain adaptive image semantic segmentation method, system, equipment and storage medium
CN116363421A (en) * 2023-03-15 2023-06-30 北京邮电大学 Image feature classification method and device, electronic equipment and medium
CN116485792A (en) * 2023-06-16 2023-07-25 中南大学 Histopathological subtype prediction method and imaging method
CN117194983A (en) * 2023-09-08 2023-12-08 北京理工大学 Bearing fault diagnosis method based on progressive condition domain countermeasure network
WO2023237042A1 (en) * 2022-06-09 2023-12-14 上海睿途新材料科技有限公司 Production line internet-of-things system for aluminized transfer paper, and internet-of-things communication method therefor

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111898635A (en) * 2020-06-24 2020-11-06 华为技术有限公司 Neural network training method, data acquisition method and device
CN112532746B (en) * 2020-12-21 2021-10-26 北京邮电大学 Cloud edge cooperative sensing method and system
GB2608344A (en) 2021-01-12 2022-12-28 Zhejiang Lab Domain-invariant feature-based meta-knowledge fine-tuning method and platform
CN112364945B (en) * 2021-01-12 2021-04-16 之江实验室 Meta-knowledge fine adjustment method and platform based on domain-invariant features
CN113065633A (en) * 2021-02-26 2021-07-02 华为技术有限公司 Model training method and associated equipment
CN112883988B (en) * 2021-03-19 2022-07-01 苏州科达科技股份有限公司 Training and feature extraction method of feature extraction network based on multiple data sets
CN113313233A (en) * 2021-05-17 2021-08-27 成都时识科技有限公司 Neural network configuration parameter training and deploying method and device for dealing with device mismatch
CN113255757B (en) * 2021-05-20 2022-10-11 西华大学 Antagonistic sample detection method and system based on activation value distribution difference
CN113807183A (en) * 2021-08-17 2021-12-17 华为技术有限公司 Model training method and related equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108898218A (en) * 2018-05-24 2018-11-27 阿里巴巴集团控股有限公司 A kind of training method of neural network model, device and computer equipment
CN109858505A (en) * 2017-11-30 2019-06-07 厦门大学 Classifying identification method, device and equipment
US20200134444A1 (en) * 2018-10-31 2020-04-30 Sony Interactive Entertainment Inc. Systems and methods for domain adaptation in neural networks
CN111292384A (en) * 2020-01-16 2020-06-16 西安交通大学 Cross-domain diversity image generation method and system based on generation type countermeasure network
CN111898635A (en) * 2020-06-24 2020-11-06 华为技术有限公司 Neural network training method, data acquisition method and device

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108710896B (en) * 2018-04-24 2021-10-29 浙江工业大学 Domain learning method based on generative confrontation learning network
US20190370651A1 (en) * 2018-06-01 2019-12-05 Nec Laboratories America, Inc. Deep Co-Clustering
CN109359623B (en) * 2018-11-13 2021-05-11 西北工业大学 Hyperspectral image migration classification method based on depth joint distribution adaptive network
CN111291274A (en) * 2020-03-02 2020-06-16 苏州大学 Article recommendation method, device, equipment and computer-readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109858505A (en) * 2017-11-30 2019-06-07 厦门大学 Classifying identification method, device and equipment
CN108898218A (en) * 2018-05-24 2018-11-27 阿里巴巴集团控股有限公司 A kind of training method of neural network model, device and computer equipment
US20200134444A1 (en) * 2018-10-31 2020-04-30 Sony Interactive Entertainment Inc. Systems and methods for domain adaptation in neural networks
CN111292384A (en) * 2020-01-16 2020-06-16 西安交通大学 Cross-domain diversity image generation method and system based on generation type countermeasure network
CN111898635A (en) * 2020-06-24 2020-11-06 华为技术有限公司 Neural network training method, data acquisition method and device

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114912516A (en) * 2022-04-25 2022-08-16 湖南大学无锡智能控制研究院 Cross-domain target detection method and system for coordinating feature consistency and specificity
WO2023237042A1 (en) * 2022-06-09 2023-12-14 上海睿途新材料科技有限公司 Production line internet-of-things system for aluminized transfer paper, and internet-of-things communication method therefor
CN115310361A (en) * 2022-08-16 2022-11-08 中国矿业大学 Method and system for predicting underground dust concentration of coal mine based on WGAN-CNN
CN115310361B (en) * 2022-08-16 2023-09-15 中国矿业大学 Underground coal mine dust concentration prediction method and system based on WGAN-CNN
CN115496916B (en) * 2022-09-30 2023-08-22 北京百度网讯科技有限公司 Training method of image recognition model, image recognition method and related device
CN115496916A (en) * 2022-09-30 2022-12-20 北京百度网讯科技有限公司 Training method of image recognition model, image recognition method and related device
CN116363421A (en) * 2023-03-15 2023-06-30 北京邮电大学 Image feature classification method and device, electronic equipment and medium
CN116010805A (en) * 2023-03-24 2023-04-25 昆明理工大学 Rolling bearing fault feature extraction method and device based on convolutional neural network
CN116010805B (en) * 2023-03-24 2023-06-16 昆明理工大学 Rolling bearing fault feature extraction method and device based on convolutional neural network
CN116229080B (en) * 2023-05-08 2023-08-29 中国科学技术大学 Semi-supervised domain adaptive image semantic segmentation method, system, equipment and storage medium
CN116229080A (en) * 2023-05-08 2023-06-06 中国科学技术大学 Semi-supervised domain adaptive image semantic segmentation method, system, equipment and storage medium
CN116485792A (en) * 2023-06-16 2023-07-25 中南大学 Histopathological subtype prediction method and imaging method
CN116485792B (en) * 2023-06-16 2023-09-15 中南大学 Histopathological subtype prediction method and imaging method
CN117194983A (en) * 2023-09-08 2023-12-08 北京理工大学 Bearing fault diagnosis method based on progressive condition domain countermeasure network
CN117194983B (en) * 2023-09-08 2024-04-19 北京理工大学 Bearing fault diagnosis method based on progressive condition domain countermeasure network

Also Published As

Publication number Publication date
CN111898635A (en) 2020-11-06

Similar Documents

Publication Publication Date Title
WO2021258967A1 (en) Neural network training method and device, and data acquisition method and device
WO2021190451A1 (en) Method and apparatus for training image processing model
US20210012198A1 (en) Method for training deep neural network and apparatus
WO2020221200A1 (en) Neural network construction method, image processing method and devices
WO2019228317A1 (en) Face recognition method and device, and computer readable medium
US20220375213A1 (en) Processing Apparatus and Method and Storage Medium
WO2021164750A1 (en) Method and apparatus for convolutional layer quantization
WO2021147325A1 (en) Object detection method and apparatus, and storage medium
WO2021022521A1 (en) Method for processing data, and method and device for training neural network model
WO2021164751A1 (en) Perception network architecture search method and device
WO2021238333A1 (en) Text processing network, neural network training method, and related device
WO2022156561A1 (en) Method and device for natural language processing
CN110222718B (en) Image processing method and device
CN113516227B (en) Neural network training method and device based on federal learning
WO2021129668A1 (en) Neural network training method and device
CN113011568A (en) Model training method, data processing method and equipment
WO2021190433A1 (en) Method and device for updating object recognition model
CN114418030A (en) Image classification method, and training method and device of image classification model
US20240046067A1 (en) Data processing method and related device
CN113361549A (en) Model updating method and related device
WO2022156475A1 (en) Neural network model training method and apparatus, and data processing method and apparatus
WO2021136058A1 (en) Video processing method and device
CN116258176A (en) Data processing method and device
WO2023122854A1 (en) Data processing method and apparatus
WO2022227024A1 (en) Operational method and apparatus for neural network model and training method and apparatus for neural network model

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21828188

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21828188

Country of ref document: EP

Kind code of ref document: A1