CN112580720A

CN112580720A - Model training method and device

Info

Publication number: CN112580720A
Application number: CN202011507377.8A
Authority: CN
Inventors: 周峰暐; 黎嘉伟; 谢传龙; 陈飞; 洪蓝青; 李震国
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2020-12-18
Filing date: 2020-12-18
Publication date: 2021-03-30

Abstract

The application discloses a model training method, which can be applied to the field of artificial intelligence and comprises the following steps: acquiring a first neural network model, a second neural network model and a first training sample; acquiring a target augmentation operation, and performing augmentation transformation on the first training sample according to the target augmentation operation to obtain an augmentation sample; determining a first weight corresponding to the augmented sample through a second neural network model according to the augmented sample; training the first neural network model according to the augmentation sample to obtain a first loss, and fusing the first loss and the first weight to obtain a second loss; and performing model training on the first neural network model according to the second loss to obtain a third neural network model. In the embodiment, the weight is given to the augmented sample by training the second neural network model, so that the automatic data augmentation strategy of sample perception is realized, and the model training precision is improved.

Description

Model training method and device

Technical Field

The application relates to the field of artificial intelligence, in particular to a model training method and device.

Background

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

In deep learning, data augmentation is mainly used for increasing the training data volume and enriching the diversity of data so as to improve the generalization capability of a deep model. Currently, data augmentation has wide applications in computer vision tasks, such as picture classification, semantic segmentation, object detection, etc., and natural language processing tasks, such as text classification, emotion analysis, machine translation, etc. At present, the mainstream data augmentation strategy is mainly designed by experts. And designing a data augmentation strategy suitable for the task by the expert according to the specific task, and augmenting the training data of the target task based on the strategy so as to train the model. Data augmentation strategies designed for one particular task are difficult to migrate to another task due to the large differences between different tasks. It is time and labor consuming to adopt expert design for each task. In order to liberate manpower and further improve the data augmentation effect, it has become a trend to directly learn a data augmentation strategy from data.

The existing automatic data augmentation algorithm searches augmentation operations at the data set level, that is, augments different training samples in the data set using the same strategy. However, some training samples processed by the augmentation operation may adversely affect the training process. Specifically, the labels corresponding to the training samples before and after the augmentation operation (i.e., the training sample and the augmentation sample) are the same in the training process, however, in some cases, the correct label corresponding to the augmentation sample is different from the label corresponding to the training sample, for example, the training sample includes an a target, the a target in the augmentation sample obtained by the translation operation is outside the image, in this case, when the neural network type to be trained is trained according to the augmentation sample, if the label corresponding to the original training sample is based, a loss with a large error is obtained, and the reason is described in the above example: if the loss represents the difference between the processing result and the target a and the neural network model to be trained is updated based on the loss, the processing accuracy of the model will be reduced.

Disclosure of Invention

In a first aspect, the present application provides a model training method, including:

acquiring a first neural network model, a second neural network model and a first training sample;

the first neural network model may be a task model to be trained, which is input by a user, and the first neural network model may be used to implement a target task, where the target task may be picture classification, object detection, semantic segmentation, indoor layout (room layout), picture completion or automatic coding, and the like, and the embodiment of the present application is not limited.

If the second neural network model is the pre-trained network, the second neural network model may be used to obtain the augmentation sample and the corresponding augmentation operation as inputs, and output a weight value of a loss function value of the augmentation sample, where the weight value may be used to confirm validity of the augmentation operation on the original first training sample, and the validity refers to that the neural network trained using the first training sample after the augmentation operation is improved in performance compared with the neural network trained using the first training sample before the augmentation operation, and the improvement is larger, the higher the validity is, and correspondingly, the larger the weight value is. The input of the augmentation policy network model may be a feature obtained by extracting a feature of the augmentation sample, or a category feature or a data set feature of the augmentation sample, and the embodiment of the application is not limited. If the second neural network model is an initialization network to be trained, the trained second neural network model has the capability of acquiring the augmentation sample and the corresponding augmentation operation as input and outputting the weight value of the loss function value of the augmentation sample. The second neural network model may be a deep learning network or other forms of neural network models, and the embodiments of the present application are not limited thereto.

The first training sample may be one or more of a batch of batch first training samples.

Acquiring a target augmentation operation, and performing augmentation transformation on the first training sample according to the target augmentation operation to obtain an augmentation sample; specifically, data augmentation refers to applying one or more data transformation operations (i.e., augmentation operations in the present embodiment) to data to obtain new data. For example, for picture data, it can be rotated by an angle. For text data, it may be that one or more words in a sentence are deleted. For the first training sample, the corresponding label may remain unchanged after the data is augmented. The augmentation policy refers to a policy for data augmentation, and the policy parameter corresponding to the augmentation policy may include at least one of a type of augmentation operation, a selection probability of the augmentation operation, or an intensity of the augmentation operation. The type of augmentation operation is determined according to the type of data. For example, the text may include at least one of random word deletion, random word exchange, synonym replacement, word replacement based on TF-IDF (Term Frequency-Inverse Document Frequency), word insertion based on TF-IDF, translation back, rewriting based on GPT-2 (genetic Pre-Training) language model, or word replacement based on WordNet (WordNet Substitute). For the image, at least one of a left-right translation transform, a up-down translation transform, a rotation transform, a left-right parallelogram transform, an up-down parallelogram transform, a sharpness transform, a brightness transform, a contrast transform, a color transform, an inverse color transform, a pixel bit transform, a histogram equalization, a histogram de-polarization, a picture blending transform, a picture block cut transform, and a region clipping transform may be included.

Determining a first weight corresponding to the augmented sample through the second neural network model according to the augmented sample, wherein the first weight is used for representing the improvement of the performance of the neural network model when the neural network is trained by using the augmented sample compared with the performance of the neural network model when the neural network model is trained by using the first training sample, and the first weight is larger when the improvement is larger;

if the second neural network model is the pre-trained network, the second neural network model may be used to obtain the augmentation sample and the corresponding augmentation operation as inputs, and output a weight value of a loss function value of the augmentation sample, where the weight value may be used to confirm validity of the augmentation operation on the original first training sample, and the validity refers to that the neural network trained using the first training sample after the augmentation operation is improved in performance compared with the neural network trained using the first training sample before the augmentation operation, and the improvement is larger, the higher the validity is, and correspondingly, the larger the weight value is. The input of the augmentation policy network model may be a feature obtained by extracting a feature of the augmentation sample, or a category feature or a data set feature of the augmentation sample, and the embodiment of the application is not limited. If the second neural network model is an initialization network to be trained, the trained second neural network model has the capability of acquiring the augmentation sample and the corresponding augmentation operation as input and outputting the weight value of the loss function value of the augmentation sample.

In the embodiment of the present application, the labels corresponding to the training samples before and after the augmentation operation (that is, the training sample and the augmentation sample) are the same in the training process, however, in some cases, the correct label corresponding to the augmentation sample is different from the label corresponding to the training sample, for example, the training sample includes an a target, the a target in the augmentation sample obtained after the translation operation is outside the image, in this case, when the neural network type to be trained is trained according to the augmentation sample, if the label corresponding to the original training sample is based on, a loss with a large error is obtained, and the reason is described in the above example: if the loss represents the difference between the processing result and the target a and the neural network model to be trained is updated based on the loss, the processing accuracy of the model will be reduced.

Therefore, the larger the difference between the processing result obtained after the neural network processing of the training sample after the augmentation operation and the processing result obtained before the augmentation operation is, the larger the difference between the correct label corresponding to the augmentation sample and the label corresponding to the training sample is, in the embodiment of the present application, for the loss obtained by the augmentation sample, the corresponding weight is set, the size of the loss is controlled by the size of the weight value, and if the augmented sample is used to train the neural network, the smaller the improvement on the performance of the neural network is compared with the case of using the first training sample to train the neural network model, the smaller the first weight is, and the smaller the loss is obtained by multiplying the loss by the first weight.

Training the first neural network model according to the augmentation sample to obtain a first loss, and fusing the first loss and the first weight to obtain a second loss; and performing model training on the first neural network model according to the second loss to obtain a third neural network model.

The fusion may refer to multiplication or operation implemented by a pre-trained neural network, as long as the first loss can be changed by the first weight, and the smaller the weight value is, the smaller the value of the second loss is, and the specific operation rule of the fusion is not limited in the present application.

According to the embodiment, different weights are given to different amplification samples by training the second neural network model, so that an automatic data amplification strategy for sample perception is realized, the beneficial effects of different amplification changes can be better exerted, and noise is avoided as much as possible while the diversity of the samples is greatly increased.

In one possible design, the method further includes:

obtaining a second training sample; processing the second training sample through the third neural network model to obtain a third loss; training the second neural network model according to the third loss to obtain a fourth neural network model; according to the augmented sample, determining a second weight corresponding to the augmented sample through the fourth neural network model, wherein the second weight is used for representing the improvement of the performance of the neural network model when the neural network is trained by using the augmented sample compared with the performance when the neural network model is trained by using the first training sample, and the second weight is larger when the improvement is larger; training the first neural network model according to the augmentation sample to obtain a fourth loss, and fusing the fourth loss and the second weight to obtain a fifth loss; and performing model training on the first neural network model according to the fifth loss to obtain a fifth neural network model.

The fusion may refer to multiplication or operation implemented by a pre-trained neural network, as long as the fourth loss can be changed by the second weight, and the smaller the weight value is, the smaller the value of the fifth loss is, and the specific operation rule of the fusion is not limited in the present application.

In this embodiment of the application, the training device may not use the third neural network model as the updated first neural network model, but obtain a new second training sample, and process the second training sample by using the third neural network model to obtain a third loss, where the third loss may be determined based on a processing result obtained by processing the second training sample by using the third neural network model and a label corresponding to the second training sample. The third loss may be a loss function of the second training sample or a loss function with respect to the second neural network model, and thus the second neural network model may be updated based on the third loss. And recalculating the second weight of the current batch of the augmentation samples by using the updated second neural network model (fourth neural network model) so as to obtain a new weighted training loss function (a fifth loss obtained by fusing the fourth loss and the second weight), and updating the first neural network model based on the loss function so as to obtain a fifth neural network model.

In one possible design, the determining, by the first neural network model and according to the augmented sample, a first weight corresponding to the augmented sample by the second neural network model includes: according to the feature extraction network, feature extraction is carried out on the augmented sample to obtain a first feature vector; acquiring a second feature vector used for indicating the target augmentation operation; and determining a first weight corresponding to the augmented sample through the second neural network model according to the first feature vector and the second feature vector.

In one possible design, the obtaining target augmentation operation includes:

and sampling the target augmentation operation from a plurality of augmentation operations based on a preset probability distribution, wherein the preset probability distribution comprises a probability corresponding to each augmentation operation, the target augmentation operation corresponds to a first probability in the preset probability distribution, the first probability is obtained based on historical weight, and the historical weight is determined by the second neural network model according to the second feature vector.

In the embodiment of the application, different weight values corresponding to the same augmentation operation are added to average based on the weight output by the second neural network model, so that the evaluation of each augmentation operation is obtained. Based on this evaluation, a probability distribution over the transform space is dynamically generated from which augmentation operations are sampled during the training process. Based on probability distribution sampling augmentation operation obtained by weight fitting output by the second neural network model, more high-quality augmentation samples can be generated, repeated sampling is effectively avoided not being suitable for augmentation operation of training data, and training efficiency is greatly improved.

In one possible design, the obtaining target augmentation operation includes: the target augmentation operation is randomly sampled from a plurality of augmentation operations.

In one possible design, the target augmentation operation includes at least one of: translation operation, rotation operation and affine transformation operation.

In a second aspect, the present application provides a model training apparatus, the apparatus comprising:

the acquisition module is used for acquiring a first neural network model, a second neural network model and a first training sample; acquiring a target augmentation operation, and performing augmentation transformation on the first training sample according to the target augmentation operation to obtain an augmentation sample;

a weight determining module, configured to determine, according to the augmented sample, a first weight corresponding to the augmented sample through the second neural network model, where the first weight is used to represent a performance improvement of the neural network model when the neural network is trained using the augmented sample compared to when the neural network model is trained using the first training sample, and the first weight is larger when the improvement is larger;

the loss determining module is used for training the first neural network model according to the augmentation sample to obtain a first loss, and fusing the first loss and the first weight to obtain a second loss;

and the model training module is used for carrying out model training on the first neural network model according to the second loss so as to obtain a third neural network model.

In one possible design, the obtaining module is configured to obtain a second training sample;

the model training module is used for processing the second training sample through the third neural network model to obtain a third loss; training the second neural network model according to the third loss to obtain a fourth neural network model; according to the augmented sample, determining a second weight corresponding to the augmented sample through the fourth neural network model, wherein the second weight is used for representing the improvement of the performance of the neural network model when the neural network is trained by using the augmented sample compared with the performance when the neural network model is trained by using the first training sample, and the second weight is larger when the improvement is larger; training the first neural network model according to the augmentation sample to obtain a fourth loss, and fusing the fourth loss and the second weight to obtain a fifth loss; and performing model training on the first neural network model according to the fifth loss to obtain a fifth neural network model.

In one possible design, the first neural network model includes a feature extraction network, and the weight determination module is configured to perform feature extraction on the augmented sample according to the feature extraction network to obtain a first feature vector; acquiring a second feature vector used for indicating the target augmentation operation; and determining a first weight corresponding to the augmented sample through the second neural network model according to the first feature vector and the second feature vector.

In a possible design, the obtaining module is configured to sample the target augmentation operation from a plurality of augmentation operations based on a preset probability distribution, where the preset probability distribution includes a probability corresponding to each augmentation operation, and the target augmentation operation corresponds to a first probability in the preset probability distribution, where the first probability is obtained based on a historical weight, and the historical weight is a weight determined by the second neural network model according to the second feature vector.

In one possible design, the obtaining module is configured to randomly sample the target augmentation operation from a plurality of augmentation operations.

In one possible design, the loss determination module is specifically configured to:

and multiplying the first loss and the first weight.

In a third aspect, an embodiment of the present application provides a model training apparatus, which may include a memory, a processor, and a bus system, where the memory is used to store a program, and the processor is used to execute the program in the memory to perform the method according to the first aspect and any optional method thereof.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium, in which a computer program is stored, and when the computer program runs on a computer, the computer is caused to execute the first aspect and any optional method thereof.

In a fifth aspect, embodiments of the present application provide a computer program comprising code for implementing the first aspect and any optional method thereof when the code is executed.

In a sixth aspect, the present application provides a chip system, which includes a processor, configured to support an execution device or a training device to implement the functions recited in the above aspects, for example, to transmit or process data recited in the above methods; or, information. In one possible design, the system-on-chip further includes a memory for storing program instructions and data necessary for the execution device or the training device. The chip system may be formed by a chip, or may include a chip and other discrete devices.

The embodiment of the application provides a model training method, which comprises the following steps: acquiring a first neural network model, a second neural network model and a first training sample; acquiring a target augmentation operation, and performing augmentation transformation on the first training sample according to the target augmentation operation to obtain an augmentation sample; determining a first weight corresponding to the augmented sample through the second neural network model according to the augmented sample, wherein the first weight is used for representing the improvement of the performance of the neural network model when the neural network is trained by using the augmented sample compared with the performance of the neural network model when the neural network model is trained by using the first training sample, and the first weight is larger when the improvement is larger; training the first neural network model according to the augmentation sample to obtain a first loss, and multiplying the first loss and the first weight to obtain a second loss; and performing model training on the first neural network model according to the second loss to obtain a third neural network model. The larger the difference between the processing result obtained after the neural network processing of the training sample after the augmentation operation is compared with the difference before the augmentation operation is, the larger the difference between the correct label corresponding to the augmentation sample and the label corresponding to the training sample is, in the embodiment of the present application, for the loss obtained through the augmentation sample, the corresponding weight is set, the size of the loss is controlled through the size of the weight value, and if the augmented sample is used for training the neural network, the smaller the improvement on the performance of the neural network model is compared with the case of using the first training sample for training the neural network model, the smaller the first weight is, the smaller the loss is multiplied by the first weight, so that the size of the loss is reduced. According to the embodiment, different weights are given to different amplification samples by training the second neural network model, so that an automatic data amplification strategy for sample perception is realized, the beneficial effects of different amplification changes can be better exerted, and noise is avoided as much as possible while the diversity of the samples is greatly increased.

Drawings

FIG. 1 is a schematic structural diagram of an artificial intelligence body framework;

fig. 2 is a schematic diagram of a system architecture according to an embodiment of the present application;

FIG. 3 is a schematic diagram of an embodiment of a model training method provided in an embodiment of the present application;

FIG. 4 is a schematic diagram illustrating an effect of a model training method provided in an embodiment of the present application;

fig. 5a is a schematic effect of a model training method provided in an embodiment of the present application;

FIG. 5b is a schematic diagram of a framework architecture of the model training method provided in the embodiment of the present application;

FIG. 5c is a flowchart illustration of a model training method provided in an embodiment of the present application;

FIG. 6 is a schematic diagram of a model training apparatus according to an embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of an execution device according to an embodiment of the present application;

FIG. 8 is a schematic structural diagram of a training apparatus provided in an embodiment of the present application;

fig. 9 is a schematic structural diagram of a chip according to an embodiment of the present disclosure.

Detailed Description

The embodiments of the present invention will be described below with reference to the drawings. The terminology used in the description of the embodiments of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.

Embodiments of the present application are described below with reference to the accompanying drawings. As can be known to those skilled in the art, with the development of technology and the emergence of new scenarios, the technical solution provided in the embodiments of the present application is also applicable to similar technical problems.

The terms "first," "second," and the like in the description and in the claims of the present application and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and are merely descriptive of the various embodiments of the application and how objects of the same nature can be distinguished. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of elements is not necessarily limited to those elements, but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

The general workflow of the artificial intelligence system will be described first, please refer to fig. 1, which shows a schematic structural diagram of an artificial intelligence body framework, and the artificial intelligence body framework is explained below from two dimensions of "intelligent information chain" (horizontal axis) and "IT value chain" (vertical axis). Where "intelligent information chain" reflects a list of processes processed from the acquisition of data. For example, the general processes of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision making and intelligent execution and output can be realized. In this process, the data undergoes a "data-information-knowledge-wisdom" refinement process. The 'IT value chain' reflects the value of the artificial intelligence to the information technology industry from the bottom infrastructure of the human intelligence, information (realization of providing and processing technology) to the industrial ecological process of the system.

(1) Infrastructure

The infrastructure provides computing power support for the artificial intelligent system, realizes communication with the outside world, and realizes support through a foundation platform. Communicating with the outside through a sensor; the computing power is provided by intelligent chips (hardware acceleration chips such as CPU, NPU, GPU, ASIC, FPGA and the like); the basic platform comprises distributed computing framework, network and other related platform guarantees and supports, and can comprise cloud storage and computing, interconnection and intercommunication networks and the like. For example, sensors and external communications acquire data that is provided to intelligent chips in a distributed computing system provided by the base platform for computation.

(2) Data of

Data at the upper level of the infrastructure is used to represent the data source for the field of artificial intelligence. The data relates to graphs, images, voice and texts, and also relates to the data of the Internet of things of traditional equipment, including service data of the existing system and sensing data such as force, displacement, liquid level, temperature, humidity and the like.

(3) Data processing

Data processing typically includes data training, machine learning, deep learning, searching, reasoning, decision making, and the like.

The machine learning and the deep learning can perform symbolized and formalized intelligent information modeling, extraction, preprocessing, training and the like on data.

Inference means a process of simulating an intelligent human inference mode in a computer or an intelligent system, using formalized information to think about and solve a problem by a machine according to an inference control strategy, and a typical function is searching and matching.

The decision-making refers to a process of making a decision after reasoning intelligent information, and generally provides functions of classification, sequencing, prediction and the like.

(4) General capabilities

After the above-mentioned data processing, further based on the result of the data processing, some general capabilities may be formed, such as algorithms or a general system, e.g. translation, analysis of text, computer vision processing, speech recognition, recognition of images, etc.

(5) Intelligent product and industrial application

The intelligent product and industry application refers to the product and application of an artificial intelligence system in various fields, and is the encapsulation of an artificial intelligence integral solution, the intelligent information decision is commercialized, and the landing application is realized, and the application field mainly comprises: intelligent terminal, intelligent transportation, intelligent medical treatment, autopilot, safe city etc..

The embodiment of the application can be applied to scenes such as picture classification, object detection, semantic segmentation, indoor layout (room layout), picture completion or automatic coding and the like.

The application scenes of the application are briefly introduced below by taking two application scenes of the ADAS/ADS visual perception system and mobile phone beauty as examples.

Application scenario 1: ADAS/ADS visual perception system

In ADAS and ADS, multiple types of 2D target detection are required in real time, including: dynamic obstacles (pedestrians), riders (cycles), tricycles (tricycles), cars (cars), trucks (trucks), buses (Bus)), static obstacles (traffic cones (trafficcon), traffic sticks (TrafficStick), fire hydrants (firehydrants), motorcycles (motocycles), bicycles (bicycles)), traffic signs ((TrafficSign), guide signs (GuideSign), billboards (billboards), Red traffic lights (TrafficLight _ Red)/Yellow traffic lights (TrafficLight _ Yellow)/Green traffic lights (TrafficLight _ Green)/Black traffic lights (TrafficLight _ Black), road signs (roadn)). In addition, in order to accurately acquire the region of the dynamic obstacle occupied in the 3-dimensional space, it is also necessary to perform 3D estimation on the dynamic obstacle and output a 3D frame. In order to fuse with data of a laser radar, a Mask of a dynamic obstacle needs to be acquired, so that laser point clouds hitting the dynamic obstacle are screened out; in order to accurately park a parking space, 4 key points of the parking space need to be detected simultaneously; in order to perform the composition positioning, it is necessary to detect key points of a static object. The neural network model obtained by training the technical scheme provided by the embodiment of the application can complete all or part of functions of the ADAS/ADS visual perception system.

Application scenario 2: mobile phone beauty function

In a mobile phone, Mask and key points of a human body can be detected by a neural network model (for example, a trained first neural network model, a trained second neural network model and a trained third neural network model) obtained by training through the technical scheme provided by the embodiment of the application, and corresponding parts of the human body can be enlarged and reduced, such as waist-closing and hip-beautifying operations, so that a beautifying image is output.

Application scenario 3: image classification scene:

after the image to be classified is obtained, the class of the object in the image to be classified can be obtained based on the neural network, and then the image to be classified can be classified according to the class of the object in the image to be classified. For photographers, many photographs are taken every day, with animals, people, and plants. The method can quickly classify the photos according to the content in the photos, and can be divided into photos containing animals, photos containing people and photos containing plants.

For the condition that the number of images is large, the efficiency of a manual classification mode is low, fatigue is easily caused when people deal with the same thing for a long time, and the classification result has large errors; the neural network model obtained through training of the technical scheme provided by the embodiment of the application can be used for rapidly classifying the images.

The embodiment of the application can be used for training the neural network, and the obtained trained neural network can be used for carrying out task processing in the above scenes.

Since the embodiments of the present application relate to the application of a large number of neural networks, for the convenience of understanding, the related terms and related concepts such as neural networks related to the embodiments of the present application will be described below.

(1) Neural network

The neural network may be composed of neural units, and the neural units may refer to operation units with xs (i.e. input data) and intercept 1 as inputs, and the output of the operation units may be:

where s is 1, 2, n is a natural number greater than 1, Ws is the weight of xs, and b is the bias of the neural unit. f is an activation function (activation functions) of the neural unit for introducing a nonlinear characteristic into the neural network to convert an input signal in the neural unit into an output signal. The output signal of the activation function may be used as an input for the next convolutional layer, and the activation function may be a sigmoid function. A neural network is a network formed by a plurality of the above-mentioned single neural units being joined together, i.e. the output of one neural unit may be the input of another neural unit. The input of each neural unit can be connected with the local receiving domain of the previous layer to extract the characteristics of the local receiving domain, and the local receiving domain can be a region composed of a plurality of neural units.

(2) Deep neural network

Deep Neural Networks (DNN) can be understood as Neural networks with many hidden layers, where "many" has no special metric, and we often say that the multilayer Neural networks and the Deep Neural networks are essentially the same thing. From the division of DNNs by the location of different layers, neural networks inside DNNs can be divided into three categories: input layer, hidden layer, output layer. Generally, the first layer is an input layer, the last layer is an output layer, and the middle layers are hidden layers. The layers are all connected, that is, any neuron of the ith layer is necessarily connected with any neuron of the (i + 1) th layer. Although DNN appears complex, it is not really complex in terms of the work of each layer, simply the following linear relational expression:

wherein the content of the first and second substances,

is the input vector of the input vector,

is the output vector of the output vector,

is an offset vector, W is a weight matrix (also called coefficient), and α () is an activation function. Each layer is only for the input vector

Obtaining the output vector through such simple operation

Due to the large number of DNN layers, the coefficient W and the offset vector

The number of the same is large. Then, how are the specific parameters defined in DNN? First we look at the definition of the coefficient W. Taking a three-layer DNN as an example, such as: the linear coefficients of the 4 th neuron of the second layer to the 2 nd neuron of the third layer are defined as

The superscript 3 represents the number of layers in which the coefficient W is located, while the subscripts correspond to the third layer index 2 of the output and the second layer index 4 of the input. In summary, the coefficients from the kth neuron at layer L-1 to the jth neuron at layer L are defined as

Note that the input layer is without the W parameter. In deep neural networks, more hidden layers make the network more able to depict complex situations in the real world. In theoryIn other words, the more parameters, the higher the model complexity, the larger the "capacity", which means that it can complete more complicated learning tasks.

(3) Back propagation algorithm

The convolutional neural network can adopt a Back Propagation (BP) algorithm to correct the size of parameters in the initial super-resolution model in the training process, so that the reconstruction error loss of the super-resolution model is smaller and smaller. Specifically, error loss occurs when an input signal is transmitted in a forward direction until the input signal is output, and parameters in an initial super-resolution model are updated by reversely propagating error loss information, so that the error loss is converged. The back propagation algorithm is a back propagation motion with error loss as a dominant factor, aiming at obtaining the optimal parameters of the super-resolution model, such as a weight matrix.

(4) Data augmentation (data augmentation)

Data augmentation is one of the common skills for deep learning, and the training data is transformed in a specific mode to generate new data or directly generate analog data so as to increase the training data amount and enrich the diversity of the training data.

(5) Data augmentation policy

The data augmentation strategy is a probability distribution on a data augmentation transformation space, and the training data is augmented by sampling augmentation transformation based on the probability distribution.

(6) Loss function (loss function)

The loss function is used for evaluating the degree of difference between the predicted value and the actual value of the model, and the smaller the loss function value is, the better the performance of the model is. The loss functions for different models are typically different.

Fig. 2 is a schematic diagram of a system architecture 100 according to an embodiment of the present application, in fig. 2, an execution device 110 configures an input/output (I/O) interface 112 for data interaction with an external device, and a user may input data to the I/O interface 112 through a client device 140.

During the process that the execution device 110 preprocesses the input data or during the process that the calculation module 111 of the execution device 110 performs the calculation (for example, performs the function implementation of the neural network in the present application), the execution device 110 may call the data, the code, and the like in the data storage system 150 for corresponding processing, and may store the data, the instruction, and the like obtained by corresponding processing into the data storage system 150.

Finally, the I/O interface 112 returns the processing results to the client device 140 for presentation to the user.

Alternatively, the client device 140 may be, for example, a control unit in an automatic driving system, a functional algorithm module in a mobile phone terminal, and the functional algorithm module may be used to implement related tasks, for example.

It should be noted that the training device 120 may generate corresponding target models/rules based on different training data for different targets or different tasks, and the corresponding target models/rules may be used to achieve the targets or complete the tasks, so as to provide the user with the required results.

In the case shown in fig. 2, the user may manually give the input data, which may be operated through an interface provided by the I/O interface 112. Alternatively, the client device 140 may automatically send the input data to the I/O interface 112, and if the client device 140 is required to automatically send the input data to obtain authorization from the user, the user may set the corresponding permissions in the client device 140. The user may view the results output by the execution device 110 at the client device 140, and the specific form may be a display, a sound, an action, and the like. The client device 140 may also serve as a data collection terminal, collecting input data of the input I/O interface 112 and output results of the output I/O interface 112 as new sample data, and storing the new sample data in the database 130. Of course, the input data input to the I/O interface 112 and the output result output from the I/O interface 112 as shown in the figure may be directly stored in the database 130 as new sample data by the I/O interface 112 without being collected by the client device 140, or the sample data collected by the data collecting device 160 may be stored in the database 130.

It should be noted that fig. 2 is only a schematic diagram of a system architecture provided in an embodiment of the present application, and the position relationship between the devices, modules, and the like shown in the diagram does not constitute any limitation, for example, in fig. 2, the data storage system 150 is an external memory with respect to the execution device 110, and in other cases, the data storage system 150 may also be disposed in the execution device 110.

In deep learning, data augmentation is mainly used for increasing the training data volume and enriching the diversity of data so as to improve the generalization capability of a deep model. Currently, data augmentation is widely applied to computer vision tasks such as picture classification, semantic segmentation, object detection and the like, and natural language processing tasks such as text classification, emotion analysis, machine translation and the like, so that the use of a better data augmentation strategy for improving the expression of a model has a very high commercial value. At present, the mainstream data augmentation strategy is mainly designed by experts. And designing a data augmentation strategy suitable for the task by the expert according to the specific task, and augmenting the training data of the target task based on the strategy so as to train the model. Data augmentation strategies designed for one particular task are difficult to migrate to another task due to the large differences between different tasks. It is time and labor consuming to adopt expert design for each task. In order to liberate manpower and further improve the data augmentation effect, it has become a trend to directly learn a data augmentation strategy from data. Existing automatic data augmentation algorithms search augmentation strategies at the dataset level, that is, the same strategy is used for augmentation of different samples in the dataset. However, the difference between different samples is large, and using the same augmentation strategy introduces a lot of noise.

For example, for a picture classification task, if an object in a picture associated with a classification category is to the right of the picture, then the left translation transformation is more appropriate for the picture. Conversely, if the object is at a position very to the left in the picture, the left translation may shift the object out of the picture, and the transformed picture may negatively affect the model training. Therefore, the application provides a sample perception model training method, different weights are given to different augmentation samples, automatic data augmentation searching of sample perception is achieved, sample diversity is greatly increased, and noise is avoided as far as possible.

First, a model training method provided in the embodiments of the present application is described with a model training phase as an example.

Referring to fig. 3, fig. 3 is a schematic diagram of an embodiment of a model training method provided in an embodiment of the present application, and as shown in fig. 3, the model training method provided in the embodiment of the present application includes:

301. a first neural network model, a second neural network model and a first training sample are obtained.

In this embodiment of the present application, the training device may obtain a first neural network model, where the first neural network model may be a task model to be trained, which is input by a user, and the first neural network model may be used to implement a target task, where the target task may be picture classification, object detection, semantic segmentation, indoor layout (room layout), picture completion, or automatic coding, and the like, and this embodiment of the present application is not limited.

In the embodiment of the present application, the training device may obtain a second neural network model, where the second neural network model may also be referred to as an augmented policy network model, the second neural network model may be a pre-trained network or an initialized network to be trained, and if the second neural network model is the pre-trained network, the method can be used for obtaining an augmentation sample and a corresponding augmentation operation as input, and outputting a weight value of a loss function value of the augmentation sample, where the weight value can be used for confirming the effectiveness of the augmentation operation on an original first training sample, and the effectiveness refers to that the effectiveness is higher when the first training sample after the augmentation operation is used for training the neural network compared with the improvement of the model performance of the neural network trained by the first training sample before the augmentation operation, and the improvement is larger, and correspondingly, the weight value is larger. The input of the augmentation policy network model may be a feature obtained by extracting a feature of the augmentation sample, or a category feature or a data set feature of the augmentation sample, and the embodiment of the application is not limited. If the second neural network model is an initialization network to be trained, the trained second neural network model has the capability of acquiring the augmentation sample and the corresponding augmentation operation as input and outputting the weight value of the loss function value of the augmentation sample.

In this embodiment of the application, a first training sample may be obtained, where the first training sample may be one or more of a batch of batch first training samples, and taking the first training sample as an example, the training device may obtain a batch of training pictures

Wherein x_iFor training pictures, y_iAnd marking the corresponding category, wherein n is the number of the training pictures of the current batch.

The training pictures may be exemplified by, but not limited to: CIFAR-10, the data set comprises 10 categories of natural pictures, including 4.9 ten thousand training set pictures, 1 thousand verification set pictures and 1 ten thousand test set pictures. CIFAR-100, the data set comprises 100 categories of natural pictures, including 4.9 ten thousand training set pictures, 1 thousand verification set pictures and 1 ten thousand test set pictures. Omniglot, a data set comprises 1623 categories of handwritten character pictures, including 2.4 ten thousand training set pictures, 3 thousand verification set pictures and 5 thousand test set pictures. ImageNet, a data set comprising 1000 classes of natural pictures, contains 126 ten thousand training set pictures, 2.5 ten thousand verification set pictures and 5 ten thousand test set pictures.

302. And acquiring a target augmentation operation, and performing augmentation transformation on the first training sample according to the target augmentation operation to obtain an augmentation sample.

It should be understood that the present application does not limit the timing sequence between the step 301 of obtaining the first neural network model and the step 302 of the training device, and in one implementation, the training device may obtain the first neural network model, obtain the second neural network model, then obtain the target augmentation operation, and perform augmentation transformation on the first training sample according to the target augmentation operation to obtain an augmentation sample; in one implementation, a training device may obtain a second neural network model, then obtain a first neural network model, then obtain a target augmentation operation, and perform an augmentation transformation on the first training sample according to the target augmentation operation.

In an embodiment of the application, the training apparatus may obtain a target augmentation operation to be applied to the first training sample.

Specifically, data augmentation refers to applying one or more data transformation operations (i.e., augmentation operations in the present embodiment) to data to obtain new data. For example, for picture data, it can be rotated by an angle. For text data, it may be that one or more words in a sentence are deleted. For the first training sample, the corresponding label may remain unchanged after the data is augmented. The augmentation policy refers to a policy for data augmentation, and the policy parameter corresponding to the augmentation policy may include at least one of a type of augmentation operation, a selection probability of the augmentation operation, or an intensity of the augmentation operation. The type of augmentation operation is determined according to the type of data. For example, the text may include at least one of random word deletion, random word exchange, synonym replacement, word replacement based on TF-IDF (Term Frequency-Inverse Document Frequency), word insertion based on TF-IDF, translation back, rewriting based on GPT-2 (genetic Pre-Training) language model, or word replacement based on WordNet (WordNet Substitute). For the image, at least one of a left-right translation transform, a up-down translation transform, a rotation transform, a left-right parallelogram transform, an up-down parallelogram transform, a sharpness transform, a brightness transform, a contrast transform, a color transform, an inverse color transform, a pixel bit transform, a histogram equalization, a histogram de-polarization, a picture blending transform, a picture block cut transform, and a region clipping transform may be included.

In this embodiment, the target augmentation operation may be obtained by randomly sampling from a plurality of augmentation operations in an initial stage of the first neural network model training. If the first neural network model has been updated through a certain number of iterations, the target augmentation operation may be obtained by sampling from a plurality of augmentation operations based on a preset probability distribution, where the preset probability distribution includes a probability corresponding to each augmentation operation.

In this embodiment, the preset probability distribution may include a plurality of augmentation operations and a selection probability of each augmentation operation, where the selection probability of an augmentation operation refers to a probability that an augmentation operation is executed in an augmentation policy. For example, if the probability of the augmentation operation is 0.2, it means that the probability of the augmentation operation being selected is 0.2. The intensity of the augmentation operation refers to the intensity employed when the augmentation operation is performed on the data. For text, the strength may be determined by the number or proportion of word transformations. For example, if the augmentation operation is word deletion with an intensity of 2 for a sentence, it indicates that there are 2 words in the sentence that need to be deleted. For images, the intensity may be determined by the size of the angle of rotation, the size of the cropped area, or the size of the translation. For example, if the rotation intensity is 60 degrees, this indicates that 60 degrees of the image needs to be rotated when the image rotation operation is selected.

In an alternative implementation, the obtaining of the target augmentation operation may be performed by an augmentation transform sampler, taking the first training sample as a picture, the target augmentation transform including two picture processing operations as an example, and the augmentation transform sampler may sample one target augmentation transform on each training picture to obtain an augmentation picture

Each augmented transform consists of two picture processing operations, e.g., left translation, right translation, rotation, etc., j_i，k_iAfter ordering all the picture processing operations, each picture processing operation includes a strength parameter, such as a left shift by a certain distance, m₁，m₂And processing the intensity parameter corresponding to the selected picture.

In an embodiment of the application, the target augmentation operation corresponds to a first probability in the preset probability distribution, the first probability is obtained based on a historical weight, and the historical weight is a weight determined by the second neural network model according to the second eigenvector. How to obtain the historical weight and how to determine the first probability corresponding to the target augmentation operation in the preset probability distribution will be described in the following embodiments, and details are not repeated here.

303. According to the augmented samples, determining a first weight corresponding to the augmented samples through the second neural network model, wherein the first weight is used for representing the improvement of the performance of the neural network model when the neural network is trained by using the augmented samples compared with the performance of the neural network model when the neural network model is trained by using the first training samples, and the first weight is larger when the improvement is larger.

In this application embodiment, training equipment can be after obtaining the target augmentation operation, and according to the target augmentation operation is right first training sample carries out the augmentation transform to obtain the augmentation sample, can be according to the augmentation sample, through the second neural network model determines the first weight that the augmentation sample corresponds, first weight is used for showing and uses augmentation sample training neural network compare in using the promotion to neural network model performance when first training sample training neural network model, wherein, the promotion is big more, first weight is big more.

If the second neural network model is the pre-trained network, the second neural network model may be used to obtain the augmentation sample and the corresponding augmentation operation as inputs, and output a weight value of a loss function value of the augmentation sample, where the weight value may be used to confirm validity of the augmentation operation on the original first training sample, and the validity refers to that the neural network trained using the first training sample after the augmentation operation is improved in performance compared with the neural network trained using the first training sample before the augmentation operation, and the improvement is larger, the higher the validity is, and correspondingly, the larger the weight value is.

It should be understood that, in the embodiment of the present application, the input of the augmentation policy network model may be a feature obtained by performing feature extraction on an augmentation sample, or may also be a category feature or a data set feature of the augmentation sample, and the embodiment of the present application is not limited. If the second neural network model is an initialization network to be trained, the trained second neural network model has the capability of acquiring the augmentation sample and the corresponding augmentation operation as input and outputting the weight value of the loss function value of the augmentation sample.

How to train the second neural network model to have the capability of outputting the weight value of the loss function value of the augmented sample will be described in the following embodiments, and details thereof will not be repeated.

Specifically, the first neural network model includes a feature extraction network, and the training device may perform feature extraction on the augmented sample according to the feature extraction network to obtain a first feature vector; acquiring a second feature vector used for indicating the target augmentation operation; and determining a first weight corresponding to the augmented sample through the second neural network model according to the first feature vector and the second feature vector.

In this embodiment, the training device may extract features from the augmented sample using a feature extraction network to obtain a first feature vector

Where w is a parameter of the first neural network model. Processing the first feature vector and the second feature vector using a second neural network model to obtain a first weight

Wherein are the parameters of the second neural network model,

is the second eigenvector, i.e., the vector representation of the target augmented operation.

304. And training the first neural network model according to the augmentation sample to obtain a first loss, and fusing the first loss and the first weight to obtain a second loss.

305. And performing model training on the first neural network model according to the second loss to obtain a third neural network model.

In this embodiment, after the training device determines the first weight corresponding to the augmented sample through the second neural network model according to the augmented sample, the training device may train the first neural network model according to the augmented sample to obtain a first loss, and fuse the first loss and the first weight to obtain a second loss.

In this embodiment of the application, the first loss may be determined based on a processing result obtained by processing the augmented sample by the first neural network model and a label corresponding to the augmented sample, where the label corresponding to the augmented sample is consistent with the label corresponding to the training sample.

Taking the fusion as a multiplication operation, specifically, the training device may determine the first weight of each augmented sample of the current batch through the second neural network model, multiply the first weight of each augmented sample by the first loss of the corresponding augmented sample, and sum the first weights to obtain the weighted training loss function

And updating parameters of the first neural network model based on the gradient. For example, the parameters of the first neural network model may be updated according to the following formula:

wherein is the learning rate, L_i(w) is the first loss

In this embodiment, the training device may perform model training on the first neural network model according to the second loss to obtain a third neural network model, where the third neural network model may serve as a meta-learning task network, then process a second training sample based on the third neural network model, specifically, the training device may obtain the second training sample, process the second training sample through the third neural network model to obtain a third loss, train the second neural network model according to the third loss to obtain a fourth neural network model, then determine, according to the augmented sample, a second weight corresponding to the augmented sample through the fourth neural network model, where the second weight is used to represent an improvement in performance of the neural network model when the neural network is trained using the augmented sample compared with when the neural network model is trained using the first training sample, the larger the promotion is, the larger the second weight is, training the first neural network model according to the augmented sample to obtain a fourth loss, fusing the fourth loss and the second weight to obtain a fifth loss, and performing model training on the first neural network model according to the fifth loss to obtain a fifth neural network model.

In this embodiment of the application, the training device may not use the third neural network model as the updated first neural network model, but obtain a new second training sample, and process the second training sample by using the third neural network model to obtain a third loss, where the second training sample may be a verification sample, and the third loss may be determined based on a processing result obtained by processing the second training sample by using the third neural network model and a label corresponding to the second training sample. The third loss may be a loss function of the second training sample or a loss function with respect to the second neural network model, and thus the second neural network model may be updated based on the third loss. And recalculating the second weight of the current batch of the augmentation samples by using the updated second neural network model (fourth neural network model) so as to obtain a new weighted training loss function (a fifth loss obtained by fusing the fourth loss and the second weight), and updating the first neural network model based on the loss function so as to obtain a fifth neural network model.

Taking the fusion specifically being multiplication and the second training sample as the verification picture as an example, the training device may sample a batch of verification pictures

Wherein

In order to verify the picture or pictures,

and marking the corresponding category, wherein m is the number of verification pictures of the current batch. Obtaining a loss function of each verification picture by using a third neural network model

Adding and averaging to obtain a loss function of the verification pictures of the current batch

Updating parameters of the second neural network model based on the gradient:

wherein, for the learning rate, the updated second neural network model is used to recalculate the weight of the current batch of the augmented pictures, thereby obtaining a new weighted training loss function

Updating parameters of the first neural network model based on the gradient:

among these is the learning rate.

In an embodiment of the application, the target augmentation operation corresponds to a first probability in the preset probability distribution, the first probability is obtained based on a historical weight, and the historical weight is a weight determined by the second neural network model according to the second eigenvector. The training device may add different weight values corresponding to the same augmentation operation to average based on weights output by the second neural network model in a training process of a recent certain number of batches, to obtain an evaluation of each augmentation operation. For example, taking a training sample as a training picture, the evaluation value of the augmentation operation formed by the jth and kth picture processing operations may be:

wherein c is_j,kIs the number of items in the summation item. Based on this evaluation, a probability distribution over a transform space is dynamically generated

Where it is a hyper-parameter, the probability distribution sampling and the random sampling obtained by fitting are balanced. The training device can update the probability distribution of one amplification operation at intervals of certain times of batches of training pictures, sample the amplification operation according to the probability distribution for each batch of training pictures, and perform data amplification on the training pictures.

By taking the training sample as the picture and the first neural network model for realizing the picture classification as an example, the prediction accuracy of various picture classification tasks can be obviously improved. Specifically, the following tables 1 and 2 can be used, where table 1 is a comparison of MetaAugment with other automatic data augmentation methods on CIFAR-10, CIFAR-100 and omniroot datasets using different task networks, and table 2 is a comparison of MetaAugment with other automatic data augmentation methods on ImageNet datasets using different task networks:

TABLE 1

TABLE 2

From tables 1 and 2, it can be seen from the results on these several data sets that the present embodiment achieves the optimal accuracy for different task networks. As can be seen from fig. 4 and fig. 5a, the embodiment gives higher weight to the pictures that still contain rich semantic information after the augmentation operation, and gives lower weight to the pictures that semantic information is lost after the augmentation operation. For example, as shown in the red box, which is also a translation transformation, the target object is still in the picture after the two pictures in fig. 4 are subjected to the translation transformation, the second neural network model is given a higher weight, while the target object has been moved out of the picture after the two pictures in fig. 5a are subjected to the translation transformation, and the second neural network model is given a lower weight. Compared with other automatic data augmentation methods, the embodiment gives different weights to different augmentation samples by training the second neural network model, realizes the automatic data augmentation strategy of sample perception, can better play the beneficial effects of different augmentation changes, and avoids the generation of noise as much as possible while greatly increasing the sample diversity.

The embodiment of the application provides a model training method, which comprises the following steps: acquiring a first neural network model, a second neural network model and a first training sample; acquiring a target augmentation operation, and performing augmentation transformation on the first training sample according to the target augmentation operation to obtain an augmentation sample; determining a first weight corresponding to the augmented sample through the second neural network model according to the augmented sample, wherein the first weight is used for representing the improvement of the performance of the neural network model when the neural network is trained by using the augmented sample compared with the performance of the neural network model when the neural network model is trained by using the first training sample, and the first weight is larger when the improvement is larger; training the first neural network model according to the augmentation sample to obtain a first loss, and fusing the first loss and the first weight to obtain a second loss; and performing model training on the first neural network model according to the second loss to obtain a third neural network model. The larger the difference between the processing result obtained after the neural network processing of the training sample after the augmentation operation is compared with the difference before the augmentation operation is, the larger the difference between the correct label corresponding to the augmentation sample and the label corresponding to the training sample is, in the embodiment of the present application, for the loss obtained through the augmentation sample, the corresponding weight is set, the size of the loss is controlled through the size of the weight value, and if the augmented sample is used for training the neural network, the smaller the improvement on the performance of the neural network model is compared with the case of using the first training sample for training the neural network model, the smaller the first weight is, the smaller the loss is multiplied by the first weight, so that the size of the loss is reduced. According to the embodiment, different weights are given to different amplification samples by training the second neural network model, so that an automatic data amplification strategy for sample perception is realized, the beneficial effects of different amplification changes can be better exerted, and noise is avoided as much as possible while the diversity of the samples is greatly increased.

Referring to fig. 5b, fig. 5b is a schematic diagram of a framework architecture of the model training method provided in the embodiment of the present application, and as shown in fig. 5b, the framework mainly includes two modules: the system comprises an augmented sample weighting module and an augmented conversion sampling module;

the augmented sample weighting module can learn a sample-aware augmented policy network to evaluate the effectiveness of different augmented transformations on different training samples, and the sample-aware data augmentation policy is realized by adjusting the weights of the augmented samples. The augmentation transformation sampling module can sample augmentation transformation according to probability distribution of the augmentation transformation obtained by output fitting of the augmentation strategy network, so that invalid augmentation transformation of repeated sampling can be avoided, and training efficiency is further improved.

Specifically, for the augmented sample weighting module, in the training process, for each batch of training samples, the augmented transformation is sampled and the training samples are augmented. The augmented sample is passed through the task network (the first neural network model in figure 5 b) to obtain the feature and loss function values. The augmentation strategy network receives the characteristics of the augmentation samples and the corresponding augmentation transformation as input, and outputs the weight of the loss function value of the augmentation samples, so as to evaluate the effectiveness of the augmentation transformation on the original training samples. The input of the augmentation strategy network can be the characteristics of augmentation samples, or can be the category characteristics or the data set characteristics, so as to realize the data augmentation strategies of different layers. And for each batch of training samples, converting the loss function of the updated task network on the verification set based on the augmentation samples into a function related to the parameters of the augmentation strategy network by using a meta-learning method, and updating the parameters of the augmentation strategy network based on the gradient. And for each batch of training samples, based on the updated augmentation strategy network, the weight of the loss function value of the augmentation samples is obtained again. Parameters of the task network are updated based on the gradients using a weighted loss function.

Aiming at the augmentation transform sampling module, in the training process, different weight values corresponding to the same augmentation transform are added and averaged based on the weight output by the augmentation strategy network, and the evaluation of each augmentation transform is obtained. Based on this evaluation, a probability distribution over a transform space is dynamically generated to evaluate the effectiveness of different augmented transforms on the target training data set. The augmentation transform sampler samples augmentation transform based on probability distribution on a transform space obtained by weight fitting of the augmentation strategy network output, and augments training samples, so that the augmentation transform which is inapplicable in repeated sampling can be effectively avoided.

For the specific description of the augmented transform sampling module and the augmented sample weighting module, reference may be made to the description in the embodiment corresponding to fig. 3 in the foregoing embodiment, and similar parts are not repeated.

Referring to fig. 5c, fig. 5c is a flow schematic of a model training method provided in this embodiment, and as shown in fig. 5c, the model training method may include a task network standard training flow and an augmentation policy network element learning flow, where in the task network standard training flow, a batch of training samples may be input, and for each training sample, an augmentation transform sampler samples a certain number of augmentation transforms (random sampling at the beginning of training) to perform augmentation transforms on the training samples, so as to obtain a corresponding number of augmentation samples, and uses the task network to extract features from the training augmentation samples, and calculate a loss function value of the training augmentation samples; a weight is calculated for each training augmented sample using an augmented policy network. The input of the augmentation strategy network is the characteristics of the training augmentation sample obtained by the task network and the augmentation transformation used by the training augmentation sample, and the output is the weight; and for the training augmentation samples of the current batch, multiplying the weight of the training augmentation samples obtained by the augmentation strategy network by the loss function value of the corresponding training augmentation samples, then adding the weight to obtain a weighted training loss function, and updating the task network parameters in advance based on the loss function to obtain the meta-learning task network. The parameters of the meta-learning task network are functions related to the parameters of the augmentation strategy network; inputting a batch of verification samples, and obtaining a loss function of the current batch of verification samples by using a meta-learning task network. Since the parameters of the meta-learning task network are functions of the parameters of the augmented policy network, the loss function of the validation sample is also a function of the parameters of the augmented policy network. Updating parameters of the augmentation policy network based on the loss function of the verification sample; recalculating the weight of the training augmentation samples of the current batch by using the updated augmentation strategy network so as to obtain a new weighted training loss function, and updating the task network parameters based on the loss function; and adding different weight values corresponding to the same augmentation transformation to average based on the weights output by the recent fixed number of batches of augmentation strategy networks to obtain the evaluation of each augmentation transformation. Based on this evaluation, a probability distribution over the transform space is dynamically generated. The augmented transform sampler may sample an augmented transform based on this distribution.

Referring to fig. 6, an embodiment of the present application further provides a model training apparatus 600, including:

an obtaining module 601, configured to obtain a first neural network model, a second neural network model, and a first training sample; acquiring a target augmentation operation, and performing augmentation transformation on the first training sample according to the target augmentation operation to obtain an augmentation sample;

the detailed description of the obtaining module 601 may refer to the description of step 301, which is not described herein again.

A weight determining module 602, configured to determine, according to the augmented sample, a first weight corresponding to the augmented sample through the second neural network model, where the first weight is used to represent a performance improvement of the neural network model when the neural network is trained using the augmented sample compared to when the neural network model is trained using the first training sample, and the first weight is larger when the improvement is larger;

the detailed description of the weight determining module 602 may refer to the description of step 302, and is not repeated here.

A loss determining module 603, configured to train the first neural network model according to the augmented sample to obtain a first loss, and fuse the first loss and the first weight to obtain a second loss;

the detailed description of the loss determining module 603 may refer to the description of step 303, and is not repeated here.

And a model training module 604, configured to perform model training on the first neural network model according to the second loss to obtain a third neural network model.

The detailed description of the model training module 604 can refer to the description of step 304, and is not repeated here.

In one possible design, the obtaining module 601 is configured to obtain a second training sample;

In one possible design, the loss determination module is specifically configured to: and multiplying the first loss and the first weight.

Next, a product form of the embodiment of the present application is described, and the embodiment may be a training apparatus for various deep learning models based on a cloud computing platform, including a front-end system, an active sample perceptual data augmentation module, and an end-to-end deep model training module. The front-end system mainly uploads user data (including the first training sample and the label) and downloads or uses the trained model. The active sample sensing data amplification module performs sample sensing data amplification on training data uploaded by a user, and may use a pre-trained amplification strategy network (the second neural network in the above embodiment) to perform data amplification, or may jointly train the amplification strategy network and a task network (the first neural network in the above embodiment) to perform data amplification. The end-to-end deep model training module achieves the function that a user uploads a first training sample and a sample perception data augmentation sample to train a task network together.

The front-end system, the front-end system and the back-end core algorithm are relatively independent and can be realized in various different modes. The functions of data storage and transmission, model downloading and API calling, and user interface, etc. are usually implemented by directly calling corresponding services of the cloud computing platform, including but not limited to a cloud disk, a virtual machine, an API management system, a network application program, etc.

An active sample sensing data amplification module: it is costly for the user to collect a large amount of labeled training data. The module can be used for sample perception data augmentation of training data uploaded by a user.

An end-to-end depth model training module: the module trains the user-selected task network using the user-uploaded training data and the training augmentation data.

Referring to fig. 7, fig. 7 is a schematic structural diagram of an execution device provided in an embodiment of the present application, and the execution device 1200 may be embodied as a mobile phone, a tablet, a notebook computer, an intelligent wearable device, a server, and the like, which is not limited herein. Specifically, the execution apparatus 1200 includes: a receiver 1201, a transmitter 1202, a processor 1203 and a memory 1204 (wherein the number of processors 1203 in the execution device 1200 may be one or more, for example, one processor in fig. 7), wherein the processor 1203 may include an application processor 12031 and a communication processor 12032. In some embodiments of the present application, the receiver 1201, the transmitter 1202, the processor 1203, and the memory 1204 may be connected by a bus or other means.

The memory 1204 may include both read-only memory and random access memory, and provides instructions and data to the processor 1203. A portion of the memory 1204 may also include non-volatile random access memory (NVRAM). The memory 1204 stores the processor and operating instructions, executable modules or data structures, or a subset or an expanded set thereof, wherein the operating instructions may include various operating instructions for performing various operations.

The processor 1203 controls the operation of the execution device. In a particular application, the various components of the execution device are coupled together by a bus system that may include a power bus, a control bus, a status signal bus, etc., in addition to a data bus. For clarity of illustration, the various buses are referred to in the figures as a bus system.

The method disclosed in the embodiments of the present application may be applied to the processor 1203, or implemented by the processor 1203. The processor 1203 may be an integrated circuit chip with signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 1203. The processor 1203 may be a general-purpose processor, a Digital Signal Processor (DSP), a microprocessor or a microcontroller, a Vision Processor (VPU), a Tensor Processing Unit (TPU), or other processors suitable for AI operation, and may further include an Application Specific Integrated Circuit (ASIC), a field-programmable gate array (FPGA) or other programmable logic device, a discrete gate or transistor logic device, and a discrete hardware component. The processor 1203 may implement or execute the methods, steps and logic blocks disclosed in the embodiments of the present application. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 1204, and the processor 1203 reads the information in the memory 1204, and completes the steps of the above method in combination with the hardware thereof.

Receiver 1201 may be used to receive input numeric or character information and to generate signal inputs related to performing settings and function control of the device. The transmitter 1202 may be configured to output numeric or character information via the first interface; the transmitter 1202 is also operable to send instructions to the disk group via the first interface to modify data in the disk group; the transmitter 1202 may also include a display device such as a display screen.

The execution device may obtain the model obtained by training through the model training method in the embodiment corresponding to fig. 6, and perform model inference.

Referring to fig. 8, fig. 8 is a schematic structural diagram of a training device provided in the embodiment of the present application, specifically, the training device 1300 is implemented by one or more servers, and the training device 1300 may generate a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 1313 (e.g., one or more processors) and a memory 1332, and one or more storage media 1330 (e.g., one or more mass storage devices) storing an application program 1342 or data 1344. Memory 1332 and storage medium 1330 may be, among other things, transitory or persistent storage. The program stored on the storage medium 1330 may include one or more modules (not shown), each of which may include a sequence of instructions for operating on the exercise device. Still further, central processor 1313 may be configured to communicate with storage medium 1330 to perform a series of instructional operations on storage medium 1330 on training device 1300.

The training apparatus 1300 may also include one or more power supplies 1326, one or more wired or wireless network interfaces 1350, one or more input-output interfaces 1358; or one or more operating systems 1341, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.

Specifically, the training apparatus may perform the model training method in the embodiment corresponding to fig. 3.

Embodiments of the present application also provide a computer program product, which when executed on a computer causes the computer to perform the steps performed by the aforementioned execution device, or causes the computer to perform the steps performed by the aforementioned training device.

Also provided in an embodiment of the present application is a computer-readable storage medium, in which a program for signal processing is stored, and when the program is run on a computer, the program causes the computer to execute the steps executed by the aforementioned execution device, or causes the computer to execute the steps executed by the aforementioned training device.

The execution device, the training device, or the terminal device provided in the embodiment of the present application may specifically be a chip, where the chip includes: a processing unit, which may be for example a processor, and a communication unit, which may be for example an input/output interface, a pin or a circuit, etc. The processing unit may execute the computer execution instructions stored by the storage unit to cause the chip in the execution device to execute the data processing method described in the above embodiment, or to cause the chip in the training device to execute the data processing method described in the above embodiment. Optionally, the storage unit is a storage unit in the chip, such as a register, a cache, and the like, and the storage unit may also be a storage unit located outside the chip in the wireless access device, such as a read-only memory (ROM) or another type of static storage device that can store static information and instructions, a Random Access Memory (RAM), and the like.

Specifically, referring to fig. 9, fig. 9 is a schematic structural diagram of a chip provided in the embodiment of the present application, where the chip may be represented as a neural network processor NPU 1400, and the NPU 1400 is mounted on a main CPU (Host CPU) as a coprocessor, and the Host CPU allocates tasks. The core part of the NPU is an arithmetic circuit 1403, and the arithmetic circuit 1403 is controlled by a controller 1404 to extract matrix data in a memory and perform multiplication.

The NPU 1400 may implement the model training method provided in the embodiment described in fig. 6 through cooperation between internal devices, or perform inference on the trained model.

The arithmetic circuit 1403 in the NPU 1400 may perform the steps of obtaining a first neural network model and performing model training on the first neural network model.

More specifically, in some implementations, the arithmetic circuitry 1403 in the NPU 1400 includes multiple processing units (PEs) therein. In some implementations, the operational circuit 1403 is a two-dimensional systolic array. The arithmetic circuit 1403 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuitry 1403 is a general-purpose matrix processor.

For example, assume that there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit fetches the data corresponding to the matrix B from the weight memory 1402 and buffers each PE in the arithmetic circuit. The arithmetic circuit takes the matrix a data from the input memory 1401 and performs matrix operation with the matrix B, and the obtained partial result or final result of the matrix is stored in an accumulator (accumulator) 1408.

The unified memory 1406 is used for storing input data and output data. The weight data directly passes through a Memory Access Controller (DMAC) 1405, and the DMAC is transferred to the weight Memory 1402. The input data is also carried into the unified memory 1406 via the DMAC.

The BIU is a Bus Interface Unit 1410, which is used for the interaction of the AXI Bus with the DMAC and the Instruction Fetch memory (IFB) 1409.

A Bus Interface Unit 1410(Bus Interface Unit, BIU for short) is used for the instruction fetch memory 1409 to obtain instructions from the external memory, and is also used for the storage Unit access controller 1405 to obtain the original data of the input matrix a or the weight matrix B from the external memory.

The DMAC is mainly used to transfer input data in the external memory DDR to the unified memory 1406, or to transfer weight data to the weight memory 1402, or to transfer input data to the input memory 1401.

The vector calculation unit 1407 includes a plurality of operation processing units, and further processes the output of the operation circuit 1403, such as vector multiplication, vector addition, exponential operation, logarithmic operation, magnitude comparison, and the like, if necessary. The method is mainly used for non-convolution/full-connection layer network calculation in the neural network, such as Batch Normalization, pixel-level summation, up-sampling of a feature plane and the like.

In some implementations, the vector calculation unit 1407 can store the processed output vector to the unified memory 1406. For example, the vector calculation unit 1407 may calculate a linear function; alternatively, a non-linear function is applied to the output of the arithmetic circuit 1403, such as linear interpolation of the feature planes extracted from the convolutional layers, and then, for example, a vector of accumulated values to generate the activation values. In some implementations, the vector calculation unit 1407 generates normalized values, pixel-level summed values, or both. In some implementations, the vector of processed outputs can be used as activation inputs to the arithmetic circuitry 1403, e.g., for use in subsequent layers in a neural network.

An instruction fetch buffer (1409) connected to the controller 1404, for storing instructions used by the controller 1404;

the unified memory 1406, the input memory 1401, the weight memory 1402, and the instruction fetch memory 1409 are all On-Chip memories. The external memory is private to the NPU hardware architecture.

The processor mentioned in any of the above may be a general purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of the above programs.

It should be noted that the above-described embodiments of the apparatus are merely schematic, where the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. In addition, in the drawings of the embodiments of the apparatus provided in the present application, the connection relationship between the modules indicates that there is a communication connection therebetween, and may be implemented as one or more communication buses or signal lines.

Through the above description of the embodiments, those skilled in the art will clearly understand that the present application can be implemented by software plus necessary general-purpose hardware, and certainly can also be implemented by special-purpose hardware including special-purpose integrated circuits, special-purpose CPUs, special-purpose memories, special-purpose components and the like. Generally, functions performed by computer programs can be easily implemented by corresponding hardware, and specific hardware structures for implementing the same functions may be various, such as analog circuits, digital circuits, or dedicated circuits. However, for the present application, the implementation of a software program is more preferable. Based on such understanding, the technical solutions of the present application may be substantially embodied in the form of a software product, which is stored in a readable storage medium, such as a floppy disk, a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, an exercise device, or a network device) to execute the method according to the embodiments of the present application.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.

The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, training device, or data center to another website site, computer, training device, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that a computer can store or a data storage device, such as a training device, a data center, etc., that incorporates one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

Claims

1. A method of model training, the method comprising:

acquiring a target augmentation operation, and performing augmentation transformation on the first training sample according to the target augmentation operation to obtain an augmentation sample;

training the first neural network model according to the augmentation sample to obtain a first loss, and fusing the first loss and the first weight to obtain a second loss;

and performing model training on the first neural network model according to the second loss to obtain a third neural network model.

2. The method of claim 1, further comprising:

obtaining a second training sample;

processing the second training sample through the third neural network model to obtain a third loss;

training the second neural network model according to the third loss to obtain a fourth neural network model;

according to the augmented sample, determining a second weight corresponding to the augmented sample through the fourth neural network model, wherein the second weight is used for representing the improvement of the performance of the neural network model when the neural network is trained by using the augmented sample compared with the performance when the neural network model is trained by using the first training sample, and the second weight is larger when the improvement is larger;

training the first neural network model according to the augmentation sample to obtain a fourth loss, and fusing the fourth loss and the second weight to obtain a fifth loss;

and performing model training on the first neural network model according to the fifth loss to obtain a fifth neural network model.

3. The method of claim 1 or 2, wherein the first neural network model comprises a feature extraction network, and wherein determining, from the augmented samples, first weights corresponding to the augmented samples by the second neural network model comprises:

according to the feature extraction network, feature extraction is carried out on the augmented sample to obtain a first feature vector;

acquiring a second feature vector used for indicating the target augmentation operation;

and determining a first weight corresponding to the augmented sample through the second neural network model according to the first feature vector and the second feature vector.

4. The method of claim 3, wherein the obtaining a target augmentation operation comprises:

5. The method of any of claims 1 to 3, wherein the obtaining a target augmentation operation comprises:

the target augmentation operation is randomly sampled from a plurality of augmentation operations.

6. The method of any of claims 1 to 5, wherein the target augmentation operation comprises at least one of: translation operation, rotation operation, affine transformation operation and region clipping.

7. The method of any of claims 1 to 6, wherein said fusing said first loss and said first weight comprises:

and multiplying the first loss and the first weight.

8. A model training apparatus, the apparatus comprising:

9. The apparatus of claim 8, wherein the obtaining module is configured to obtain a second training sample;

10. The apparatus according to claim 8 or 9, wherein the first neural network model comprises a feature extraction network, and the weight determination module is configured to perform feature extraction on the augmented samples according to the feature extraction network to obtain a first feature vector; acquiring a second feature vector used for indicating the target augmentation operation; and determining a first weight corresponding to the augmented sample through the second neural network model according to the first feature vector and the second feature vector.

11. The apparatus of claim 10, wherein the obtaining module is configured to sample the target augmentation operation from a plurality of augmentation operations based on a preset probability distribution, wherein the preset probability distribution includes a probability corresponding to each augmentation operation, and the target augmentation operation corresponds to a first probability in the preset probability distribution, the first probability is obtained based on a historical weight, and the historical weight is a weight determined by the second neural network model according to the second feature vector.

12. The apparatus of any of claims 9 to 11, wherein the obtaining module is configured to randomly sample the target augmentation operation from a plurality of augmentation operations.

13. The apparatus of any of claims 9 to 12, wherein the target augmentation operation comprises at least one of: translation operation, rotation operation, affine transformation operation and region clipping.

14. The apparatus according to any one of claims 8 to 13, wherein the loss determination module is specifically configured to:

and multiplying the first loss and the first weight.

15. A model training apparatus, the apparatus comprising a memory and a processor; the memory stores code, and the processor is configured to retrieve the code and perform the method of any of claims 1 to 7.

16. A computer storage medium, characterized in that the computer storage medium stores one or more instructions that, when executed by one or more computers, cause the one or more computers to implement the method of any one of claims 1 to 7.

17. A computer product comprising code that, when executed, is operable to implement the method of any of claims 1 to 7.