CN114065901A - Method and device for training neural network model - Google Patents

Method and device for training neural network model Download PDF

Info

Publication number
CN114065901A
CN114065901A CN202010755919.7A CN202010755919A CN114065901A CN 114065901 A CN114065901 A CN 114065901A CN 202010755919 A CN202010755919 A CN 202010755919A CN 114065901 A CN114065901 A CN 114065901A
Authority
CN
China
Prior art keywords
neural network
training
feature vector
network model
training sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010755919.7A
Other languages
Chinese (zh)
Inventor
魏可鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202010755919.7A priority Critical patent/CN114065901A/en
Publication of CN114065901A publication Critical patent/CN114065901A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Abstract

The application relates to terminal artificial intelligence, and provides a method and a device for training a neural network model, a method for joint transfer learning and a method for deep learning, wherein the neural network model with higher precision can be obtained by training under the conditions of certain sample quantity, computational power and training time. The method comprises the following steps: obtaining a first training sample; acquiring a first feature vector of a first training sample according to the first training sample; acquiring a first deep neural network model according to the first feature vector; acquiring a pre-training model according to the first training sample; combining the first deep neural network model and the pre-training model to obtain a combined neural network model; and training the combined neural network model according to the first training sample to obtain a target neural network model.

Description

Method and device for training neural network model
Technical Field
The present application relates to the field of artificial intelligence, and more particularly, to a method and apparatus for training a neural network model.
Background
Generally, a neural network model can be obtained by an algorithm engineer who needs to train for a long time based on strong GPU computing power and by using professional knowledge to label a sample. Now, through a transfer learning method, a common development engineer can obtain a neural network model by using common GPU computational power, a small number of samples and short training time. The transfer learning is a learning idea and mode, and the transfer learning is a learning process of applying a model learned in an old field to a new field by utilizing the similarity among data, tasks or models, and the core of the transfer learning is to find the similarity between a new problem and an old problem and smoothly realize the transfer of knowledge. Deep learning is to enable a machine to autonomously acquire knowledge from data so as to be applied to solving new problems; while migration learning focuses on applying already learned knowledge migration to solve new problems.
An existing migration learning tool Create ML based on an Integrated Development Environment (IDE) enables a common application development engineer to quickly Create a neural network model through common GPU computing power, a small number of samples and original development experience. However, only by using the transfer learning method, the accuracy of neural network model training cannot be further improved, and a neural network model with better quality cannot be obtained.
Disclosure of Invention
The application provides a method and a device for training a neural network model, a combined transfer learning method and a deep learning method, and the neural network model with higher precision can be obtained by training under the conditions of certain sample quantity, calculation power and training time.
In a first aspect, a method for training a neural network model is provided, including: obtaining a first training sample; acquiring a first feature vector of a first training sample according to the first training sample; acquiring a first deep neural network model according to the first feature vector; acquiring a pre-training model according to the first training sample; combining the first deep neural network model and the pre-training model to obtain a combined neural network model; and training the combined neural network model according to the first training sample to obtain a target neural network model.
The method for training the neural network model combines a transfer learning method and a deep learning method. The transfer learning part solves the problems of large quantity of training samples, high calculation force requirement and long training time; and the deep learning part solves the problem that the neural network model obtained by the transfer learning has insufficient fitting capability on the training sample of the target domain. Compared with the existing method for training the neural network model, the neural network model trained by the method for training the neural network model in the embodiment of the application has better balance in the aspects of sample quantity, computing power, training time, training precision and the like.
In a possible implementation manner of the first aspect, obtaining a first neural network model according to the first feature vector includes: acquiring a network structure storage table, wherein the network structure storage table comprises a plurality of deep neural network models and a plurality of feature vectors, and the deep neural network models correspond to the feature vectors one to one; determining a second feature vector according to the first feature vector, wherein the second feature vector is one of a plurality of feature vectors in a network structure storage table; and acquiring a deep neural network model corresponding to the second feature vector according to the second feature vector.
In the method for training the neural network model, the deep neural network model is obtained according to the training sample in a table look-up mode, and the deep neural network model in the table is a neural network model which is trained in advance according to the existing training sample, so that the calculation power and the training time can be greatly saved.
In a possible implementation manner of the first aspect, determining the second feature vector according to the first feature vector includes: the distance between the second feature vector and the first feature vector is smaller than a first threshold value.
And respectively calculating the distance between the first feature vector and each feature vector in the network structure storage table, and taking the feature vector of which the distance from the first feature vector is less than a first threshold value, wherein the first threshold value can be a value set artificially, or taking the feature vector of which the distance from the first feature vector is the smallest as a second feature vector.
In a second aspect, there is provided an object classification method, where an object includes an image and/or text, including: acquiring object classification data; processing the object classification data according to the object classification neural network model to obtain an object classification result, wherein the training of the object classification neural network model comprises: obtaining a first training sample; acquiring a first feature vector of a first training sample according to the first training sample;
acquiring a first deep neural network model according to the first feature vector; acquiring a pre-training model according to the first training sample; combining the first deep neural network model and the pre-training model to obtain a combined neural network model; and training the combined neural network model according to the first training sample to obtain an object classification neural network model.
In a possible implementation manner of the second aspect, obtaining a first neural network model according to the first feature vector includes: acquiring a network structure storage table, wherein the network structure storage table comprises a plurality of deep neural network models and a plurality of feature vectors, and the deep neural network models correspond to the feature vectors one to one; determining a second feature vector according to the first feature vector, wherein the second feature vector is one of a plurality of feature vectors in a network structure storage table; and acquiring a deep neural network model corresponding to the second feature vector according to the second feature vector.
In a possible implementation manner of the second aspect, determining the second feature vector according to the first feature vector includes: the distance between the second feature vector and the first feature vector is smaller than a first threshold value.
In a third aspect, an apparatus for training a neural network model is provided, including: an acquisition unit for acquiring a first training sample; the processing unit is used for acquiring a first feature vector of the first training sample according to the first training sample; the processing unit is further used for acquiring a first deep neural network model according to the first feature vector; the processing unit is further used for acquiring a pre-training model according to the first training sample; the processing unit is further used for combining the first deep neural network model and the pre-training model to obtain a combined neural network model; and the processing unit is also used for training the combined neural network model according to the first training sample so as to obtain the target neural network model.
In a possible implementation manner of the third aspect, the obtaining, by the processing unit, the first neural network model according to the first feature vector includes: acquiring a network structure storage table, wherein the network structure storage table comprises a plurality of deep neural network models and a plurality of feature vectors, and the deep neural network models correspond to the feature vectors one to one; determining a second feature vector according to the first feature vector, wherein the second feature vector is one of a plurality of feature vectors in a network structure storage table; and acquiring a deep neural network model corresponding to the second feature vector according to the second feature vector.
In a possible implementation manner of the third aspect, the determining, by the processing unit, the second eigenvector according to the first eigenvector includes: the distance between the second feature vector and the first feature vector is smaller than a first threshold value.
In a fourth aspect, an object classification apparatus is provided, in which an object includes an image and/or a text, and includes: an acquisition unit configured to acquire object classification data; a processing unit, configured to process the object classification data according to the object classification neural network model to obtain an object classification result, where the training of the object classification neural network model includes: obtaining a first training sample; acquiring a first feature vector of a first training sample according to the first training sample; acquiring a first deep neural network model according to the first feature vector; acquiring a pre-training model according to the first training sample; combining the first deep neural network model and the pre-training model to obtain a combined neural network model; and training the combined neural network model according to the first training sample to obtain an object classification neural network model.
In a possible implementation manner of the fourth aspect, obtaining a first neural network model according to the first feature vector includes: acquiring a network structure storage table, wherein the network structure storage table comprises a plurality of deep neural network models and a plurality of feature vectors, and the deep neural network models correspond to the feature vectors one to one; determining a second feature vector according to the first feature vector, wherein the second feature vector is one of a plurality of feature vectors in a network structure storage table; and acquiring a deep neural network model corresponding to the second feature vector according to the second feature vector.
In a possible implementation manner of the fourth aspect, determining the second eigenvector according to the first eigenvector includes: the distance between the second feature vector and the first feature vector is smaller than a first threshold value.
In a fifth aspect, an apparatus for training a neural network model is provided, which includes a processor and a memory, where the memory is used for storing program instructions and the processor is used for calling the program instructions to execute the method in any one of the implementations of the first aspect and any one of the implementations of the second aspect and the second aspect.
A sixth aspect provides a computer-readable storage medium, wherein the computer-readable storage medium stores program instructions, which, when executed by a processor, implement the method in any one of the above-mentioned first aspect and first aspect, and any one of the second aspect and second aspect.
In a seventh aspect, a chip is provided, where the chip includes a processor and a data interface, and the processor reads instructions stored in a memory through the data interface, and executes the method in any one of the foregoing implementation manners of the first aspect and any one of the implementation manners of the second aspect and the second aspect.
Drawings
FIG. 1 is a schematic block diagram of a system architecture for training a neural network model of an embodiment of the present application;
FIG. 2 is a schematic flow chart diagram of a method of training a neural network model of an embodiment of the present application;
FIG. 3 is a schematic flow chart diagram of an object classification method of an embodiment of the present application;
FIG. 4 is a schematic block diagram of a specific application of the method for training a neural network model according to an embodiment of the present application;
FIG. 5 is a schematic block diagram of an apparatus for training a neural network model according to an embodiment of the present application;
fig. 6 is a schematic block diagram of an object classification apparatus of an embodiment of the present application;
FIG. 7 is a schematic hardware structure diagram of an apparatus for training a neural network model according to an embodiment of the present application;
fig. 8 is a schematic hardware configuration diagram of an object classification apparatus according to an embodiment of the present application.
Detailed Description
Artificial Intelligence (AI) includes Machine Learning (ML), which includes Deep Learning (DL) and Transfer Learning (TL).
Artificial intelligence is a branch of computer science, which is a simulation of the information process of human consciousness and thinking. Applications of artificial intelligence include machine vision, fingerprint recognition, face recognition, retina recognition, iris recognition, palm print recognition, expert systems, automatic planning, intelligent search, theorem proof, gaming, automatic programming, intelligent control, robotics, language and image understanding, genetic programming, and the like.
Machine learning is the core of artificial intelligence, and the machine learning theory is mainly to design and analyze algorithms which can be automatically learned by a computer. The machine learning algorithm is an algorithm for automatically analyzing and obtaining rules from data and predicting unknown data by using the rules. The core of machine learning is data, algorithms (models), computing power (computer computing power). Machine learning has a wide range of applications, including data mining, data classification, computer vision, Natural Language Processing (NLP), biometric identification, search engines, medical diagnostics, detecting credit card fraud, stock market analysis, DNA sequence sequencing, speech and handwriting recognition, strategic gaming, and robotic applications. Machine learning is to design an algorithm model to process data and output a result desired by a user, and the user can continuously adjust and optimize the algorithm model to form more accurate data processing capability.
Deep learning is a kind of machine learning, the concept of which is derived from the research of artificial neural networks, and a multi-layer perceptron with multiple hidden layers is a deep learning structure, so deep learning is also often called as deep neural networks. Compared with general machine learning, deep learning can automatically perform feature extraction, namely, automatically combine simple features into more complex features, and perform multi-layer weight learning by using the combinations to solve problems. The winter to study deep learning is to build neural networks that simulate the human brain for analytical learning, which imitates the mechanism of the human brain to interpret data such as images, sounds, text, and the like. The deep learning is originally started from image recognition, but in the short few years, the deep learning is popularized to various fields of machine learning and has excellent performance, and the deep learning is applied to various fields of image recognition, voice recognition, audio processing, natural language recognition, robot biological information processing, search engines, man-machine games, network advertisement targeted delivery, medical automatic diagnosis, finance and the like.
Transfer learning is a machine learning method, meaning that a pre-trained model is re-applied in another task. In solving problems using deep learning techniques, the most common obstacle is the large amount of data required in training the model. Such a lot of data is needed because the machine encounters a lot of parameters in the model during learning, and in the face of a particular problem in a certain field, data of the scale needed to build the model may not be available. However, the relationships obtained for a certain type of data in a model training task can also be applied to other problems in the same domain, which is so-called transfer learning.
The embodiments of the present application relate to a neural network, and for the sake of understanding, the following first introduces terms and concepts related to the neural network to which the embodiments of the present application may relate.
(1) Neural network
The neural network may be composed of neural units, which may be referred to as xsAnd an arithmetic unit with intercept 1 as input, the output of which can be as shown in equation (1):
Figure BDA0002611562570000041
wherein s is 1, 2, … … n, n is a natural number greater than 1, and W issIs xsB is the bias of the neural unit. f is activation functions of the neural units, which are used to nonlinearly transform features in the neural network to transform the input signals in the neural unitsAnd is converted into an output signal. The output signal of the activation function may be used as an input for the next convolutional layer, and the activation function may be a sigmoid function. A neural network is a network formed by a plurality of the above-mentioned single neural units being joined together, i.e. the output of one neural unit may be the input of another neural unit. The input of each neural unit can be connected with the local receiving domain of the previous layer to extract the characteristics of the local receiving domain, and the local receiving domain can be a region composed of a plurality of neural units.
(2) Deep neural network
Deep Neural Networks (DNNs), also called multi-layer neural networks, can be understood as neural networks with multiple hidden layers. The DNNs are divided according to the positions of different layers, and the neural networks inside the DNNs can be divided into three categories: input layer, hidden layer, output layer. Generally, the first layer is an input layer, the last layer is an output layer, and the middle layers are hidden layers. The layers are all connected, that is, any neuron of the ith layer is necessarily connected with any neuron of the (i + 1) th layer.
Although DNN appears complex, it is not really complex in terms of the work of each layer, simply the following linear relational expression:
Figure BDA0002611562570000042
wherein the content of the first and second substances,
Figure BDA0002611562570000043
is the input vector of the input vector,
Figure BDA0002611562570000044
is the output vector of the output vector,
Figure BDA0002611562570000045
is an offset vector, W is a weight matrix (also called coefficient), and α () is an activation function. Each layer is only for the input vector
Figure BDA0002611562570000051
The output is obtained through such simple operationOutput vector
Figure BDA0002611562570000052
Due to the large number of DNN layers, the coefficient W and the offset vector
Figure BDA0002611562570000053
The number of the same is also large. The definition of these parameters in DNN is as follows: taking the coefficient W as an example, assume that in a three-layer DNN, the linear coefficients of the 4 th neuron of the second layer to the 2 nd neuron of the third layer are defined as
Figure BDA0002611562570000054
The superscript 3 represents the number of layers in which the coefficient W is located, while the subscripts correspond to the third layer index 2 of the output and the second layer index 4 of the input.
In summary, the coefficients from the kth neuron at layer L-1 to the jth neuron at layer L are defined as
Figure BDA0002611562570000055
Note that the input layer is without the W parameter. In deep neural networks, more hidden layers make the network more able to depict complex situations in the real world. Theoretically, the more parameters the higher the model complexity, the larger the "capacity", which means that it can accomplish more complex learning tasks. The final goal of the process of training the deep neural network, i.e., learning the weight matrix, is to obtain the weight matrix (the weight matrix formed by the vectors W of many layers) of all the layers of the deep neural network that is trained.
(3) Convolutional Neural Network (CNN)
A convolutional neural network is a deep neural network with a convolutional structure. The convolutional neural network comprises a feature extractor consisting of convolutional layers and sub-sampling layers, which can be regarded as a filter. The convolutional layer is a neuron layer for performing convolutional processing on an input signal in a convolutional neural network. In convolutional layers of convolutional neural networks, one neuron may be connected to only a portion of the neighbor neurons. In a convolutional layer, there are usually several characteristic planes, and each characteristic plane may be composed of several neural units arranged in a rectangular shape. The neural units of the same feature plane share weights, where the shared weights are convolution kernels. Sharing weights may be understood as the way image information is extracted is location independent. The convolution kernel can be initialized in the form of a matrix of random size, and can be learned to obtain reasonable weights in the training process of the convolutional neural network. In addition, sharing weights brings the direct benefit of reducing connections between layers of the convolutional neural network, while reducing the risk of overfitting.
(4) Recurrent Neural Network (RNN)
The recurrent neural network is used to process the sequence data. In the traditional neural network model, from the input layer to the hidden layer to the output layer, the layers are all connected, and each node between every two layers is connectionless. Although solving many problems, the common neural network still has no capability to solve many problems. For example, you would typically need to use the previous word to predict what the next word in a sentence is, because the previous and next words in a sentence are not independent. The RNN is called a recurrent neural network, i.e., the current output of a sequence is also related to the previous output. The concrete expression is that the network memorizes the previous information and applies the previous information to the calculation of the current output, namely, the nodes between the hidden layers are not connected any more but connected, and the input of the hidden layer not only comprises the output of the input layer but also comprises the output of the hidden layer at the last moment. In theory, RNNs can process sequence data of any length. The training for RNN is the same as for conventional CNN or DNN.
Now that there is a convolutional neural network, why is a circular neural network? For simple reasons, in convolutional neural networks, there is a precondition assumption that: the elements are independent of each other, as are inputs and outputs, such as cats and dogs. However, in the real world, many elements are interconnected, such as stock changes over time, and for example, a person says: i like to travel, wherein the favorite place is Yunnan, and the opportunity is in future to go. Here, to fill in the blank, humans should all know to fill in "yunnan". Because humans infer from the context, but how do the machine do it? The RNN is generated. RNNs aim at making machines capable of memory like humans. Therefore, the output of the RNN needs to be dependent on the current input information and historical memory information.
(5) Residual error network
The residual network is a deep convolutional network proposed in 2015, which is easier to optimize than the conventional convolutional neural network and can improve accuracy by increasing the equivalent depth. The core of the residual network is to solve the side effect (degradation problem) caused by increasing the depth, so that the network performance can be improved by simply increasing the network depth. The residual network generally includes many sub-modules with the same structure, and a residual network (ResNet) is usually used to connect a number to indicate the number of times that the sub-module is repeated, for example, ResNet50 indicates that there are 50 sub-modules in the residual network.
(6) Classifier
Many neural network architectures have a classifier for classifying objects in the image. The classifier is generally composed of a fully connected layer (called normalized exponential function) and a softmax function (called normalized exponential function), and is capable of outputting probabilities of different classes according to inputs. It should be noted that, for some cases, the softmax function may also be replaced by a sparsemax function (which may be understood as a sparse normalized exponential function).
(7) Loss function
In the process of training the deep neural network, because the output of the deep neural network is expected to be as close to the value really expected to be predicted as possible, the weight vector of each layer of the neural network can be updated according to the difference between the predicted value of the current network and the really expected target value (of course, an initialization process is usually carried out before the first updating, namely parameters are preset for each layer in the deep neural network), for example, if the predicted value of the network is high, the weight vector is adjusted to be lower, and the adjustment is continuously carried out until the deep neural network can predict the really expected target value or the value which is very close to the really expected target value. Therefore, it is necessary to define in advance "how to compare the difference between the predicted value and the target value", which are loss functions (loss functions) or objective functions (objective functions), which are important equations for measuring the difference between the predicted value and the target value. Taking the loss function as an example, if the higher the output value (loss) of the loss function indicates the larger the difference, the training of the deep neural network becomes the process of reducing the loss as much as possible.
(8) Back propagation algorithm
The neural network can adopt a Back Propagation (BP) algorithm to correct the numerical values of the parameters in the initial neural network model in the training process, so that the reconstruction error loss of the neural network model is smaller and smaller. Specifically, the error loss is generated by transmitting the input signal in the forward direction until the output, and the parameters in the initial neural network model are updated by reversely propagating the error loss information, so that the error loss is converged. The back propagation algorithm is a back propagation motion with error loss as a dominant factor, aiming at obtaining the optimal parameters of the neural network model, such as a weight matrix.
(9) Attention mechanism (attention mechanism)
Attention mechanisms stem from the study of human vision. In cognitive science, humans selectively focus on a portion of all information while ignoring other visible information due to bottlenecks in information processing, a mechanism commonly referred to as attentiveness. In the field of artificial intelligence, the objective of attention mechanism is to find key data from a number of data, such as key things from an image, key images from a stack of images, change nodes from a stack of time series data, and so on. The attention mechanism mainly finds the data units with larger contribution by evaluating the integral contribution of each or each group of data units in the data to be processed to the data. Or may be understood as determining the critical data by comparing the impact weights of different data on the results. Note that in the conventional mechanism, the nature of the attribute function can be described as a process of mapping a query (query) to a series of key-value pairs, that is, obtaining a series of key-value pairs.
Attention mechanisms may include self-attention mechanism (SAM) and referred-attention mechanism (RAM). The SAM may be understood as finding the influence weight of each data on the result from the same group of data, for example, may find the key data from the data, or may be understood as the influence weight of the data in the vertical direction (or in the group). The RAM can be understood as finding the influence weight on the result between different groups from different groups of data, can also find key data from the influence weight, or can be understood as the influence weight of data in the horizontal direction (or between different groups). It should be noted that SAM and RAM can be understood as intra-group attention mechanism and inter-group attention mechanism, respectively.
In the embodiment of the present application, the SAM is mainly used to mine the dependency (or correlation) between data in each dimension of data to be processed, and the RAM is used to mine the dependency (or correlation) between data in different dimensions of data to be processed.
The method for training the neural network model in the embodiment of the application can be applied to an intelligent terminal, and the intelligent terminal comprises the system for training the neural network model in the embodiment of the application. The user can provide training samples according to requirements, and the system outputs the trained neural network model according to the training samples and the training targets provided by the user.
Illustratively, the intelligent terminal may be mobile or fixed; for example, the smart terminal may be a mobile phone having an image processing function, a Tablet Personal Computer (TPC), a media player, a smart tv, a notebook computer (LC), a Personal Digital Assistant (PDA), a Personal Computer (PC), a camera, a camcorder, a smart watch, a Wearable Device (WD), an autonomous vehicle, or the like, which is not limited in the embodiment of the present application.
It should be understood that the above description is illustrative of the application scenario and does not limit the application scenario of the present application in any way.
The transfer learning is to transfer the knowledge of one domain (source domain) to another domain (target domain) so that the target domain can obtain better learning effect. In general, the amount of data in the source domain is sufficient, while the amount of data in the target domain is small, and the migration learning requires that the knowledge learned when the amount of data is sufficient is migrated to a new environment in which the amount of data is small. In the prior art, when migration learning is performed, a model of a source domain is continuously trained by using data of a target domain and data of the source domain, so that the model of the target domain is obtained. In actual use, however, the Create ML technology only obtains the migration feature of the target domain, and cannot obtain the original depth feature, so that the model has insufficient fitting capability to the target domain, and a situation that the accuracy of a new model obtained by migration learning is low may occur.
Therefore, the embodiment of the application provides a method and a device for training a model, and the problems of large sample quantity, high computational power requirement and long training time required by the traditional deep learning are solved according to the combined learning of the depth characteristics and the migration characteristics, and the problem of insufficient fitting capability of the migration learning on a target domain is solved. The neural network model with higher training precision is obtained under the conditions of less sample quantity, lower computational power and shorter training time.
Fig. 1 shows a schematic block diagram of a system architecture for training a neural network model according to an embodiment of the present application. As shown in fig. 1, the system architecture for training the neural network model according to the embodiment of the present application includes a model creation part, which includes a domain adaptation algorithm module, an early-stop algorithm module, a data enhancement technique module, a sample comprehensive evaluation module, a network structure storage table module, and a model training module. The system comprises a domain adaptation algorithm module, an early-stop algorithm module, a data enhancement technology module, a sample comprehensive evaluation module, a network structure storage table module and a model training module, wherein the domain adaptation algorithm module is used for a transfer learning part, the early-stop algorithm module is used for preventing the model from being over-fitted, the data enhancement technology module is used for increasing the number of samples and improving the generalization capability of the neural network model, the sample comprehensive evaluation module and the network structure storage table module are responsible for obtaining the small deep neural network model, and the model training module is responsible for the joint training of a pre-training model and the small deep neural network model.
Fig. 2 shows a schematic flow chart of a method for training a neural network model according to an embodiment of the present application, which includes steps 201 to 206, and these steps are described in detail below.
S201, a first training sample is obtained.
The first training sample is used to train the target neural network model. Specifically, the method for training a neural network model according to the embodiment of the present application may be used in classification scenarios such as image classification, text classification, and sound classification, and therefore the first training sample may be a training sample of an image, a training sample of a text, a training sample of a sound, and the like.
S202, a first feature vector of the first training sample is obtained according to the first training sample.
Specifically, the first feature vector is obtained according to the feature attributes of the first training sample, and the feature attributes may include data distribution, classification number, sample number, and the like. The first feature vector corresponds to a first training sample.
Optionally, before obtaining the first feature vector, the method for training a neural network model according to the embodiment of the present application further includes performing data enhancement on the first training sample. Taking a training sample of a graph as an example, data enhancement comprises rotation, translation, scaling, random shielding, color difference adjustment and the like of an image, so that the diversity of the training sample is enriched.
S203, acquiring a first deep neural network model according to the first feature vector.
Firstly, a network structure storage table is obtained, wherein the network structure storage table comprises a plurality of deep neural network models and a plurality of feature vectors. The deep neural network models are trained respectively according to a plurality of training sets, one training set corresponds to one deep neural network model, a plurality of feature vectors are obtained according to training samples in each training set, one training set corresponds to one feature vector, and therefore one deep neural network model corresponds to one feature vector.
A second eigenvector is then determined from the first eigenvector, the second eigenvector being one of the plurality of eigenvectors in the network fabric storage table. Specifically, the distance between the first feature vector and each feature vector in the network structure storage table is calculated, and the feature vector with the distance from the first feature vector smaller than a first threshold value is taken as the second feature vector, where the first threshold value may be an artificially set value, or the feature vector with the smallest distance from the first feature vector is taken as the second feature vector.
And finally, acquiring a deep neural network model corresponding to the second feature vector as a first deep neural network model according to the second feature vector.
And S204, acquiring a pre-training model according to the first training sample.
The pre-training model can adopt an existing mature model, such as MobileNetV2, which is obtained by training millions of pictures and has good generalization capability, and optionally, the pre-training model can also be a multi-task learning model.
The method for training the neural network model in the embodiment of the application can be used for finely adjusting the pre-trained neural network model according to the first training sample, the fine adjustment method can be used for freezing the parameters of a certain number of layers in front of the network structure, and only updating the parameters of a certain number of layers and a full connection layer behind the network structure, wherein the certain number of layers can be a preset value.
S205, combining the first deep neural network model and the pre-training model to obtain a combined neural network model.
The combination mode is a mode of combining two models in the prior art.
And S206, training the combined neural network model according to the first training sample to obtain a target neural network model.
The neural network model is trained according to the method for training the neural network model shown in fig. 2, and the method for transfer learning and the method for deep learning are combined. The transfer learning part solves the problems of large quantity of training samples, high calculation force requirement and long training time; and the deep learning part solves the problem that the neural network model obtained by the transfer learning has insufficient fitting capability on the training sample of the target domain. Compared with the existing method for training the neural network model, the neural network model trained by the method for training the neural network model in the embodiment of the application has better balance in the aspects of sample quantity, computing power, training time, training precision and the like.
Fig. 3 shows a schematic flow chart of an object classification method according to an embodiment of the present application, and as shown in fig. 3, the method includes steps 301 to 302, which are described below.
S301, object classification data is obtained.
The object classification method of the embodiment of the application can be used for classification scenes such as image classification, text classification, sound classification and the like, so that the object classification data can be classification data of images or text or sound.
S302, processing the object classification data according to the object classification neural network model to obtain an object classification result.
The object classification neural network model is trained in advance, and the training of the object classification neural network model comprises the following steps: obtaining a first training sample; acquiring a first feature vector of a first training sample according to the first training sample; acquiring a first deep neural network model according to the first feature vector; acquiring a pre-training model according to the first training sample; combining the first deep neural network model and the pre-training model to obtain a combined neural network model; and training the combined neural network model according to the first training sample to obtain an object classification neural network model.
It should be understood that the training method of the object classification neural network model is the method of training the neural network model in fig. 2, and reference may be specifically made to the description of fig. 2 above, and for brevity, the embodiments of the present application are not described herein again.
The application of the method for training the neural network model according to the embodiment of the present application to image classification is specifically described below with reference to fig. 4. As shown in fig. 4, the method for training a neural network model according to the embodiment of the present application can be divided into two processes, one is a neural network structure search process, and is responsible for searching for an optimal neural network structure suitable for different image classification tasks; and the other is a joint training process which is responsible for finding a proper small neural network model according to the data set of the target domain and performing joint training with the pre-trained neural network model.
1. Network structure search process
The process includes training a plurality of categories of image classification tasks through Auto machine learning (Auto ML) to acquire an optimal network structure for different classification tasks, and finally obtaining a network structure storage table. Auto ML is a process of end-to-end flow automation that applies machine learning to display problems. Machine learning models are well designed by teams of engineers and scientists, and the manual design process is very difficult because the search space of model components can be huge, and Auto ML aims to research the automatic implementation of machine learning, faces users without professional machine learning knowledge, and provides new tools for professional machine learning users. The Auto ML inputs data and tasks (classification, regression, recommendation, etc.), outputs models that can be used for applications that can predict unknown data, and each decision in the data-driven flow is a hyper-parameter, and the basic idea of Auto ML is to find a good quality hyper-parameter in a relatively short time.
The network structure searching process of the method for training the neural network model is completed at the server side, and the terminal user does not sense the network structure. The image classification task is a common image classification task based on an internet open source data set and manual labeling, and comprises hundreds of classification scenes such as animal classification (further subdivided into poultry classification, wild animal classification, bird classification and the like), plant classification (further subdivided into flower classification, crop classification, succulent plant classification and the like), clothing classification (further subdivided into hat classification, coat classification, trousers classification and the like) and the like. The above is merely an example of the image classification task in the embodiment of the present application, and does not constitute a limitation to the embodiment of the present application, and the embodiment of the present application may include other possible image classification tasks in addition to the image classification tasks listed above.
And calculating a feature vector according to feature distribution, classification quantity, sample quantity and the like for each image classification task, and recording the feature vector as a standard sample vector. Each standard sample vector corresponds to a network structure obtained through Auto ML training according to the image classification task. From this, a network configuration storage table is obtained as shown in the following table.
TABLE 1
Training target Vector of standard samples Network architecture Version(s)
Animal classification Vector A Structure 1: conv + Relu + Conv V1
Plant classification Vector B Structure 2: FireModule 3 V2
Clothes classification Vector C Structure 3: … Block + … V1
2. Joint training process
The network structure searching process of the method for training the neural network model in the embodiment of the application is performed in an Integrated Development Environment (IDE). An IDE is an application program for providing a presentation development environment and generally includes tools such as a code editor, a compiler, a debugger, and a graphical user interface. The IDE integrates a code writing function, an analysis function, a debugging function and other integrated development software service sets, and all software or software sets with the special effect can be called an integrated development environment. The IDE can run independently or in combination with other programs.
First, a data set of image classification is selected, here a training data set of garbage classification is selected by a user, taking the data set of garbage classification as an example. The system performs data enhancement on training samples in the training data set, such as rotating, translating, zooming, randomly blocking, adjusting color, and the like on the image.
And then modeling is carried out according to the data distribution, classification number, sample number and the like of the training target and the training sample to obtain a sample evaluation vector. And inquiring a network structure storage table, calculating the distance between the sample evaluation vector and each standard sample vector in the network structure storage table, and taking the network structure corresponding to the standard sample vector with the minimum distance as a small-sized deep network structure.
And then combining the pre-training model and the small deep network structure to perform joint learning. The pre-training model is a neural network model for the transfer learning part, the pre-training model can adopt a mature pre-training model, such as MobileNetV2, and the neural network model has better generalization capability through training millions of pictures. Optionally, the pre-training model may also be a neural network model for multi-task learning, and the embodiment of the present application is not specifically limited herein. In the embodiment of the application, the pre-trained neural network model can be finely tuned according to the data set of image classification (for example, the data set of garbage classification), the fine tuning method can be to freeze the parameters of a certain number of layers in front of the network structure, and only update the parameters of a certain number of layers and a full connection layer behind the network structure, and the certain number of layers can be a value preset artificially. The system inputs data into a pre-training model and a small-sized deep network structure at the same time, wherein the input is double input, and the input of the two models has different requirements on the shape of the picture, so the picture needs to be resized before being input. And after the training is finished, obtaining the neural network model after the combined learning according to the pre-training model and the small-sized deep network structure.
The application of the method for training the neural network model in the embodiment of the present application to text classification is specifically described below. Similar to the application in image classification, the application of the method for training a neural network model in the embodiment of the present application in text classification can also be divided into two processes, one is a neural network structure search process, and the neural network structure search process is responsible for searching for an optimal neural network structure suitable for text classification tasks of different languages and different lengths; and the other is a joint training process which is responsible for finding a proper small neural network model according to the data set of the target domain and performing joint training with the pre-trained neural network model.
1. Network structure search process
The network structure searching process of the method for training the neural network model is completed at the server side, and the terminal user does not sense the network structure. The text classification task is a common text classification task based on an internet open source data set and manual labeling, and is different from an image classification task, and the text classification task mainly distinguishes classification tasks from classification languages, text lengths and the like, such as Chinese classification with average length smaller than 100 characters, Chinese-English mixed classification with average length larger than 500 characters, English classification with average length of (100, 500) characters and the like. The above is merely an example of the text classification task in the embodiment of the present application, and does not constitute a limitation to the embodiment of the present application, and the embodiment of the present application may include other possible text classification tasks in addition to the text classification tasks listed above.
And calculating a feature vector according to feature distribution, classification quantity, sample quantity and the like for each text classification task, and recording the feature vector as a standard sample vector. Each standard sample vector corresponds to a network structure obtained through Auto ML training according to the text classification task. From this, a network configuration storage table is obtained as shown in the following table.
TABLE 2
Figure BDA0002611562570000101
Figure BDA0002611562570000111
2. Joint training process
The network structure searching process of the method for training the neural network model is carried out in the IDE. First, a text-classified data set is selected, here, a top news data set is taken as an example, and a user selects a training data set of top news. The system performs data enhancement, such as character number or character number reduction, on training samples in the training data set.
And then modeling is carried out according to the data distribution, classification number, sample number and the like of the training target and the training sample to obtain a sample evaluation vector. And inquiring a network structure storage table, calculating the distance between the sample evaluation vector and each standard sample vector in the network structure storage table, and taking the network structure corresponding to the standard sample vector with the minimum distance as a small-sized deep network structure.
And then combining the pre-training model and the small deep network structure to perform joint training. The system simultaneously inputs data into the pre-training model and the small deep network structure, wherein the input is double input. And after the training is finished, obtaining the neural network model after the combined learning according to the pre-training model and the small-sized deep network structure.
The existing IDE-based Create ML technology only supports a transfer learning method, the method for training the neural network model in the embodiment of the application can automatically design a small deep network structure, and the neural network model with higher accuracy can be obtained according to the combined training of the pre-training model and the small deep network structure.
The method for training a neural network model and the method for classifying an object according to the embodiments of the present application are described in detail with reference to the accompanying drawings, and the apparatus for training a neural network model and the apparatus for classifying an object according to the embodiments of the present application are described in detail with reference to the accompanying drawings, it is to be understood that the apparatus for training a neural network model and the apparatus for classifying an object described below can perform the steps of the method for training a neural network model and the method for classifying an object according to the embodiments of the present application, and in order to avoid unnecessary repetition, the description of repetition is appropriately omitted below when the apparatus for training a neural network model and the apparatus for classifying an object according to the embodiments of the present application are introduced.
Fig. 5 is a schematic block diagram of an apparatus for training a neural network model according to an embodiment of the present application, and as shown in fig. 5, the apparatus includes an obtaining unit 510 and a processing unit 520, which are respectively described below.
An obtaining unit 510 is configured to obtain a first training sample.
The processing unit 520 is configured to obtain a first feature vector of the first training sample according to the first training sample.
The processing unit 520 is further configured to obtain a first deep neural network model according to the first feature vector.
The processing unit 520 is further configured to obtain a pre-training model according to the first training sample.
The processing unit 520 is further configured to combine the first deep neural network model and the pre-training model to obtain a combined neural network model.
The processing unit 520 is further configured to train the combined neural network model according to the first training sample to obtain a target neural network model.
Optionally, the obtaining, by the processing unit 520, a first neural network model according to the first feature vector includes: acquiring a network structure storage table, wherein the network structure storage table comprises a plurality of deep neural network models and a plurality of feature vectors, and the deep neural network models correspond to the feature vectors one to one; determining a second feature vector according to the first feature vector, wherein the second feature vector is one of a plurality of feature vectors in a network structure storage table; and acquiring a deep neural network model corresponding to the second feature vector according to the second feature vector.
Optionally, the determining, by the processing unit 520, a second feature vector according to the first feature vector includes: the distance between the second feature vector and the first feature vector is smaller than a first threshold value.
Fig. 6 shows a schematic block diagram of an object classification apparatus according to an embodiment of the present application, and as shown in fig. 6, the object classification apparatus includes an obtaining unit 610 and a processing unit 620, which are introduced below.
An obtaining unit 610 is used for obtaining the object classification data.
And the processing unit 620 is configured to process the object classification data according to the object classification neural network model to obtain an object classification result.
Wherein the training of the object classification neural network model comprises: obtaining a first training sample; acquiring a first feature vector of a first training sample according to the first training sample;
acquiring a first deep neural network model according to the first feature vector; acquiring a pre-training model according to the first training sample; combining the first deep neural network model and the pre-training model to obtain a combined neural network model; and training the combined neural network model according to the first training sample to obtain an object classification neural network model.
Optionally, obtaining the first neural network model according to the first feature vector includes: acquiring a network structure storage table, wherein the network structure storage table comprises a plurality of deep neural network models and a plurality of feature vectors, and the deep neural network models correspond to the feature vectors one to one; determining a second feature vector according to the first feature vector, wherein the second feature vector is one of a plurality of feature vectors in a network structure storage table; and acquiring a deep neural network model corresponding to the second feature vector according to the second feature vector.
Optionally, determining the second feature vector according to the first feature vector includes: the distance between the second feature vector and the first feature vector is smaller than a first threshold value.
Fig. 7 is a hardware structural diagram of an apparatus for training a neural network model according to an embodiment of the present application. The apparatus 700 shown in fig. 7 (which apparatus 700 may specifically be a computer device) includes a memory 710, a processor 720, a communication interface 730, and a bus 740. The memory 710, the processor 720 and the communication interface 730 are connected to each other through a bus 740.
The memory 710 may be a Read Only Memory (ROM), a static memory device, a dynamic memory device, or a Random Access Memory (RAM). The memory 710 may store a program, and the processor 720 is configured to perform the steps of the method for training a neural network model according to the embodiment of the present application when the program stored in the memory 710 is executed by the processor 720; for example, the various steps shown in fig. 2 are performed.
It should be understood that the apparatus for training the neural network model shown in the embodiment of the present application may be a server, for example, a server in a cloud, or may also be a chip configured in the server in the cloud.
The processor 720 may be a general-purpose Central Processing Unit (CPU), a microprocessor, an Application Specific Integrated Circuit (ASIC), a Graphics Processing Unit (GPU), or one or more integrated circuits, and is configured to execute related programs to implement the image classification method according to the embodiment of the present invention.
Processor 720 may also be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the image classification method of the present application may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 720.
The processor 720 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, or discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 710, and the processor 720 reads the information in the memory 710, and performs the functions required to be performed by the units included in the apparatus shown in fig. 5 in the application implementation or the method for training the neural network model shown in fig. 2 in the method embodiment of the application in combination with the hardware thereof.
Communication interface 730 enables communication between apparatus 700 and other devices or communication networks using transceiver devices, such as, but not limited to, transceivers.
Bus 740 may include a pathway to transfer information between various components of apparatus 700, such as memory 710, processor 720, and communication interface 730.
Fig. 8 is a schematic hardware structure diagram of an object classification apparatus according to an embodiment of the present application. The object classification apparatus 800 shown in fig. 8 (the object classification apparatus 800 may be specifically a computer device) includes a memory 810, a processor 820, a communication interface 830, and a bus 840. The memory 810, the processor 820 and the communication interface 830 are connected to each other through a bus 840.
The memory 810 may be a Read Only Memory (ROM), a static memory device, a dynamic memory device, or a Random Access Memory (RAM). The memory 810 may store a program, and the processor 820 is configured to perform the steps of the object classification method according to the embodiment of the present application when the program stored in the memory 810 is executed by the processor 820; for example, the various steps shown in fig. 3 are performed.
It should be understood that the object classification device shown in the embodiment of the present application may be a server, for example, a server in a cloud, or may also be a chip configured in the server in the cloud.
For example, the processor 820 may employ a general-purpose Central Processing Unit (CPU), a microprocessor, an Application Specific Integrated Circuit (ASIC), a Graphics Processing Unit (GPU) or one or more integrated circuits, and execute related programs to implement the object classification method of the embodiment of the present application.
Illustratively, the processor 820 may also be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the object classification method of the present application may be performed by instructions in the form of hardware, integrated logic circuits, or software in the processor 820.
The processor 820 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, or discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 810, and the processor 820 reads the information in the memory 810, and in combination with the hardware thereof, performs the functions required to be performed by the units included in the training apparatus shown in fig. 6, or performs the object classification method shown in fig. 3 according to the embodiment of the method of the present application.
Communication interface 830 enables communication between training apparatus 800 and other devices or communication networks using transceiver devices, such as, but not limited to, transceivers.
Bus 840 may include a pathway to transfer information between various components of device 800, such as memory 810, processor 820, and communication interface 830.
It should be noted that although the above-described apparatus 700 and apparatus 800 show only memories, processors, and communication interfaces, in particular implementations, those skilled in the art will appreciate that the apparatus 700 and apparatus 800 may also include other devices necessary to achieve proper operation. Also, those skilled in the art will appreciate that the apparatus 700 and apparatus 800 described above may also include hardware components to implement other additional functions, according to particular needs. Furthermore, those skilled in the art will appreciate that the apparatus 700 and apparatus 800 described above may also include only those components necessary to implement the embodiments of the present application, and need not include all of the components shown in fig. 7 or fig. 8.
Illustratively, the embodiment of the present application further provides a chip, which includes a transceiver unit and a processing unit. The transceiver unit can be an input/output circuit and a communication interface; the processing unit is a processor or a microprocessor or an integrated circuit integrated on the chip; the chip can execute the method for training the neural network model in the method embodiment.
Illustratively, the embodiment of the present application further provides a chip, which includes a transceiver unit and a processing unit. The transceiver unit can be an input/output circuit and a communication interface; the processing unit is a processor or a microprocessor or an integrated circuit integrated on the chip; the chip may perform the object classification method in the above method embodiments.
Illustratively, the present application further provides a computer-readable storage medium, on which instructions are stored, and the instructions, when executed, perform the method for training a neural network model in the above method embodiments.
Illustratively, the present application further provides a computer-readable storage medium, on which instructions are stored, and the instructions, when executed, perform the object classification method in the above method embodiment.
Illustratively, the present application further provides a computer program product containing instructions, which when executed, perform the method for training a neural network model in the above method embodiments.
Illustratively, the present application further provides a computer program product containing instructions, which when executed, perform the object classification method in the above method embodiments.
It should be understood that the processor in the embodiments of the present application may be a Central Processing Unit (CPU), and the processor may also be other general-purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
It will also be appreciated that the memory in the embodiments of the subject application can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. The non-volatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory. Volatile memory can be Random Access Memory (RAM), which acts as external cache memory. By way of example, but not limitation, many forms of Random Access Memory (RAM) are available, such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), synchlink DRAM (SLDRAM), and direct bus RAM (DR RAM).
The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions or computer programs. The procedures or functions according to the embodiments of the present application are wholly or partially generated when the computer instructions or the computer program are loaded or executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more collections of available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. The semiconductor medium may be a solid state disk.
It should be understood that the term "and/or" herein is merely one type of association relationship that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone, wherein A and B can be singular or plural. In addition, the "/" in this document generally indicates that the former and latter associated objects are in an "or" relationship, but may also indicate an "and/or" relationship, which may be understood with particular reference to the former and latter text.
In the present application, "at least one" means one or more, "a plurality" means two or more. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of the singular or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or multiple.
It should be understood that, in the various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (14)

1. A method of training a neural network model, comprising:
obtaining a first training sample;
acquiring a first feature vector of the first training sample according to the first training sample;
acquiring a first deep neural network model according to the first feature vector;
acquiring a pre-training model according to the first training sample;
combining the first deep neural network model and the pre-training model to obtain a combined neural network model;
and training the combined neural network model according to the first training sample to obtain a target neural network model.
2. The method of claim 1, wherein obtaining a first neural network model from the first feature vector comprises:
acquiring a network structure storage table, wherein the network structure storage table comprises a plurality of deep neural network models and a plurality of feature vectors, and the deep neural network models correspond to the feature vectors one to one;
determining a second feature vector from the first feature vector, the second feature vector being one of the plurality of feature vectors in the network fabric storage table;
and obtaining a deep neural network model corresponding to the second feature vector according to the second feature vector.
3. The method of claim 2, wherein determining a second eigenvector from the first eigenvector comprises:
the distance between the second feature vector and the first feature vector is smaller than a first threshold value.
4. An object classification method, wherein the object includes an image and/or text, comprising:
acquiring object classification data;
processing the object classification data according to an object classification neural network model to obtain an object classification result,
wherein the training of the object classification neural network model comprises:
obtaining a first training sample;
acquiring a first feature vector of the first training sample according to the first training sample;
acquiring a first deep neural network model according to the first feature vector;
acquiring a pre-training model according to the first training sample;
combining the first deep neural network model and the pre-training model to obtain a combined neural network model;
and training the combined neural network model according to the first training sample to obtain an object classification neural network model.
5. The method of claim 4, wherein obtaining the first neural network model from the first feature vector comprises:
acquiring a network structure storage table, wherein the network structure storage table comprises a plurality of deep neural network models and a plurality of feature vectors, and the deep neural network models correspond to the feature vectors one to one;
determining a second feature vector from the first feature vector, the second feature vector being one of the plurality of feature vectors in the network fabric storage table;
and obtaining a deep neural network model corresponding to the second feature vector according to the second feature vector.
6. The method of claim 5, wherein determining a second eigenvector from the first eigenvector comprises:
the distance between the second feature vector and the first feature vector is smaller than a first threshold value.
7. An apparatus for training a neural network model, comprising:
an acquisition unit for acquiring a first training sample;
the processing unit is used for acquiring a first feature vector of the first training sample according to the first training sample;
the processing unit is further configured to obtain a first deep neural network model according to the first feature vector;
the processing unit is further used for obtaining a pre-training model according to the first training sample;
the processing unit is further configured to combine the first deep neural network model and the pre-training model to obtain a combined neural network model;
the processing unit is further configured to train the combined neural network model according to the first training sample to obtain a target neural network model.
8. The apparatus of claim 7, wherein the processing unit obtains a first neural network model from the first feature vector, and comprises:
acquiring a network structure storage table, wherein the network structure storage table comprises a plurality of deep neural network models and a plurality of feature vectors, and the deep neural network models correspond to the feature vectors one to one;
determining a second feature vector from the first feature vector, the second feature vector being one of the plurality of feature vectors in the network fabric storage table;
and obtaining a deep neural network model corresponding to the second feature vector according to the second feature vector.
9. The apparatus of claim 8, wherein the processing unit determines a second eigenvector from the first eigenvector, comprising:
the distance between the second feature vector and the first feature vector is smaller than a first threshold value.
10. An object classification apparatus, wherein the object includes an image and/or text, comprising:
an acquisition unit configured to acquire object classification data;
a processing unit for processing the object classification data according to an object classification neural network model to obtain an object classification result,
wherein the training of the object classification neural network model comprises:
obtaining a first training sample;
acquiring a first feature vector of the first training sample according to the first training sample;
acquiring a first deep neural network model according to the first feature vector;
acquiring a pre-training model according to the first training sample;
combining the first deep neural network model and the pre-training model to obtain a combined neural network model;
and training the combined neural network model according to the first training sample to obtain an object classification neural network model.
11. The apparatus of claim 10, wherein the obtaining a first neural network model from the first feature vector comprises:
acquiring a network structure storage table, wherein the network structure storage table comprises a plurality of deep neural network models and a plurality of feature vectors, and the deep neural network models correspond to the feature vectors one to one;
determining a second feature vector from the first feature vector, the second feature vector being one of the plurality of feature vectors in the network fabric storage table;
and obtaining a deep neural network model corresponding to the second feature vector according to the second feature vector.
12. The apparatus of claim 11, wherein the determining a second eigenvector from the first eigenvector comprises:
the distance between the second feature vector and the first feature vector is smaller than a first threshold value.
13. An apparatus for training a neural network model, comprising a processor and a memory, the memory for storing program instructions, the processor for invoking the program instructions to perform the method of any one of claims 1-3 or 4-6.
14. A computer-readable storage medium, in which program instructions are stored, which, when executed by a processor, implement the method of any one of claims 1 to 3 or 4 to 6.
CN202010755919.7A 2020-07-31 2020-07-31 Method and device for training neural network model Pending CN114065901A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010755919.7A CN114065901A (en) 2020-07-31 2020-07-31 Method and device for training neural network model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010755919.7A CN114065901A (en) 2020-07-31 2020-07-31 Method and device for training neural network model

Publications (1)

Publication Number Publication Date
CN114065901A true CN114065901A (en) 2022-02-18

Family

ID=80227327

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010755919.7A Pending CN114065901A (en) 2020-07-31 2020-07-31 Method and device for training neural network model

Country Status (1)

Country Link
CN (1) CN114065901A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114926698A (en) * 2022-07-19 2022-08-19 深圳市南方硅谷半导体股份有限公司 Image classification method for neural network architecture search based on evolutionary game theory

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114926698A (en) * 2022-07-19 2022-08-19 深圳市南方硅谷半导体股份有限公司 Image classification method for neural network architecture search based on evolutionary game theory

Similar Documents

Publication Publication Date Title
US20220092351A1 (en) Image classification method, neural network training method, and apparatus
Springenberg et al. Improving deep neural networks with probabilistic maxout units
CN111507378A (en) Method and apparatus for training image processing model
CN111353076A (en) Method for training cross-modal retrieval model, cross-modal retrieval method and related device
US20230095606A1 (en) Method for training classifier, and data processing method, system, and device
WO2022052601A1 (en) Neural network model training method, and image processing method and device
CN112639828A (en) Data processing method, method and equipment for training neural network model
CN112529146B (en) Neural network model training method and device
CN111882031A (en) Neural network distillation method and device
CN112580369B (en) Sentence repeating method, method and device for training sentence repeating model
CN113570029A (en) Method for obtaining neural network model, image processing method and device
CN111340190A (en) Method and device for constructing network structure, and image generation method and device
CN113592060A (en) Neural network optimization method and device
CN111797882A (en) Image classification method and device
US20240078428A1 (en) Neural network model training method, data processing method, and apparatus
CN111797881A (en) Image classification method and device
CN111695673A (en) Method for training neural network predictor, image processing method and device
CN114091554A (en) Training set processing method and device
CN115879508A (en) Data processing method and related device
WO2021136058A1 (en) Video processing method and device
CN112749737A (en) Image classification method and device, electronic equipment and storage medium
CN113407820A (en) Model training method, related system and storage medium
CN114065901A (en) Method and device for training neural network model
CN114254686A (en) Method and device for identifying confrontation sample
WO2023273934A1 (en) Method for selecting hyper-parameter of model, and related apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination