CN115130539A - Classification model training method, data classification device and computer equipment - Google Patents

Classification model training method, data classification device and computer equipment Download PDF

Info

Publication number
CN115130539A
CN115130539A CN202210421701.7A CN202210421701A CN115130539A CN 115130539 A CN115130539 A CN 115130539A CN 202210421701 A CN202210421701 A CN 202210421701A CN 115130539 A CN115130539 A CN 115130539A
Authority
CN
China
Prior art keywords
target
classification model
training
training sample
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210421701.7A
Other languages
Chinese (zh)
Inventor
许剑清
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202210421701.7A priority Critical patent/CN115130539A/en
Publication of CN115130539A publication Critical patent/CN115130539A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to a classification model training method, a data classification device, a computer device, a storage medium and a computer program product. The application relates to artificial intelligence technology. The method comprises the following steps: determining at least two reference classification models from a first target classification model and a second target classification model set obtained by training based on a training sample set; the second target classification model set comprises a second target classification model and a first target classification model which are aligned in feature space; inputting the same training sample into each reference classification model to obtain each reference sample characteristic, and fusing the obtained reference sample characteristics to obtain a target sample characteristic; and inputting the training samples into a third initial classification model to obtain training sample characteristics, and adjusting model parameters of the third initial classification model based on the target sample characteristics and the training sample characteristics corresponding to the same training sample until a target convergence condition is met to obtain a third target classification model. By adopting the method, the classification accuracy of the model can be improved.

Description

Classification model training method, data classification device and computer equipment
Technical Field
The present application relates to the field of computer technologies, and in particular, to a classification model training method, a data classification method, an apparatus, a computer device, a storage medium, and a computer program product.
Background
With the development of computer technology, machine learning appears, which is a multi-field cross subject and relates to multiple subjects such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like, and is used for specially researching how a computer simulates or realizes the learning behavior of human beings so as to obtain new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer.
In the conventional technology, supervised training is usually performed on a classification model to be trained directly based on a training sample to obtain a trained classification model. However, the knowledge learned by blind training based on the training samples is limited, which easily results in low classification accuracy of the trained model.
Disclosure of Invention
In view of the above, it is necessary to provide a classification model training method, a data classification method, an apparatus, a computer device, a computer readable storage medium, and a computer program product, which can improve the classification accuracy of a model.
A classification model training method. The method comprises the following steps:
acquiring a training sample set; the training sample set comprises training samples and training labels corresponding to the training samples;
determining at least two reference classification models from a first target classification model and a second target classification model set obtained by training based on the training sample set; in a second target classification model included in the second target classification model set, class center features corresponding to various training labels are obtained from the first target classification model, and the class center features are used for representing position information of the training labels in a feature space;
inputting the training sample set into each reference classification model to obtain reference sample characteristics corresponding to each training sample, and fusing the reference sample characteristics corresponding to the same training sample to obtain target sample characteristics corresponding to each training sample;
and inputting the training sample set into a third initial classification model to obtain training sample characteristics corresponding to each training sample, and adjusting model parameters of the third initial classification model based on the target sample characteristics and the training sample characteristics corresponding to the same training sample until a target convergence condition is met to obtain a third target classification model.
A classification model training device. The device comprises:
the training sample set acquisition module is used for acquiring a training sample set; the training sample set comprises training samples and training labels corresponding to the training samples;
the reference classification model determining module is used for determining at least two reference classification models from a first target classification model and a second target classification model set which are obtained through training based on the training sample set; in a second target classification model included in the second target classification model set, class center features corresponding to various training labels are obtained from the first target classification model, and the class center features are used for representing position information of the training labels in a feature space;
the characteristic fusion module is used for inputting the training sample set into each reference classification model to obtain reference sample characteristics corresponding to each training sample, and fusing the reference sample characteristics corresponding to the same training sample to obtain target sample characteristics corresponding to each training sample;
and the third target classification model determining module is used for inputting the training sample set into a third initial classification model to obtain training sample characteristics corresponding to each training sample, and adjusting the model parameters of the third initial classification model based on the target sample characteristics and the training sample characteristics corresponding to the same training sample until a target convergence condition is met to obtain a third target classification model.
A computer device comprising a memory storing a computer program and a processor implementing the steps of the above classification model training method when executing the computer program.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned classification model training method.
A computer program product comprising a computer program which, when being executed by a processor, carries out the steps of the above-mentioned classification model training method.
A data classification method. The method comprises the following steps:
acquiring data to be classified, and inputting the data to be classified into a third target classification model to obtain target characteristics corresponding to the data to be classified;
determining a classification result corresponding to the data to be classified based on the target features;
the training process of the third target classification model is as follows:
acquiring a training sample set; the training sample set comprises training samples and training labels corresponding to the training samples;
determining at least two reference classification models from a first target classification model and a second target classification model set obtained by training based on the training sample set; in a second target classification model included in the second target classification model set, class center features respectively corresponding to various training labels are obtained from the first target classification model, and the class center features are used for representing position information of the training labels in a feature space;
inputting the training sample set into each reference classification model to obtain reference sample characteristics corresponding to each training sample, and fusing the reference sample characteristics corresponding to the same training sample to obtain target sample characteristics corresponding to each training sample;
and inputting the training sample set into a third initial classification model to obtain training sample characteristics corresponding to each training sample, and adjusting model parameters of the third initial classification model based on the target sample characteristics and the training sample characteristics corresponding to the same training sample until a target convergence condition is met to obtain a third target classification model.
A data classification device. The device comprises:
the data acquisition module is used for acquiring data to be classified and inputting the data to be classified into a third target classification model to obtain target characteristics corresponding to the data to be classified;
the classification result determining module is used for determining a classification result corresponding to the data to be classified based on the target characteristics;
the training process of the third target classification model is as follows:
acquiring a training sample set; the training sample set comprises training samples and training labels corresponding to the training samples;
determining at least two reference classification models from a first target classification model and a second target classification model set obtained by training based on the training sample set; in a second target classification model included in the second target classification model set, class center features corresponding to various training labels are obtained from the first target classification model, and the class center features are used for representing position information of the training labels in a feature space;
inputting the training sample set into each reference classification model to obtain reference sample characteristics corresponding to each training sample, and fusing the reference sample characteristics corresponding to the same training sample to obtain target sample characteristics corresponding to each training sample;
and inputting the training sample set into a third initial classification model to obtain training sample characteristics corresponding to each training sample, and adjusting model parameters of the third initial classification model based on the target sample characteristics and the training sample characteristics corresponding to the same training sample until a target convergence condition is met to obtain a third target classification model.
A computer device comprising a memory storing a computer program and a processor implementing the steps of the data classification method described above when executing the computer program.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the data classification method as described above.
A computer program product comprising a computer program which, when executed by a processor, carries out the steps of the data classification method described above.
The classification model training and data classification method, device, computer equipment, storage medium and computer program product are characterized in that a training sample set is obtained, the training sample set comprises training samples and training labels corresponding to the training samples, at least two reference classification models are determined from a first target classification model and a second target classification model set obtained by training based on the training sample set, class center features respectively corresponding to various training labels in the second target classification model set are obtained from the first target classification model, the class center features are used for representing position information of the training labels in a feature space, the training sample set is input into each reference classification model to obtain reference sample features respectively corresponding to various training samples, and the reference sample features corresponding to the same training sample are fused, and obtaining target sample characteristics corresponding to each training sample, inputting the training sample set into a third initial classification model to obtain the training sample characteristics corresponding to each training sample, and adjusting model parameters of the third initial classification model based on the target sample characteristics and the training sample characteristics corresponding to the same training sample until a target convergence condition is met to obtain a third target classification model. In this way, the class center features are used to represent the positions of the training labels in the feature space, each class center feature in the second object classification model is obtained from the first object classification model, and therefore, the second object classification model and the first object classification model are aligned in the feature space, the feature space distribution of the second object classification model is consistent with that of the first object classification model, each reference classification model determined from the first object classification model and the second object classification model is also aligned in the feature space, the reference classification models aligned in the feature space are complementary models, and there is complementarity between the extracted features. Furthermore, feature fusion is carried out by combining a plurality of reference classification models, a third initial classification model is trained by adopting the fused target sample features, the third initial classification model can learn more accurate feature distribution of the fusion features, the trained third target classification model can extract more accurate data features of input data, and more accurate classification results can be obtained based on more accurate data features, so that the classification accuracy of the models is improved, and the classification accuracy of the data is improved.
Drawings
FIG. 1 is a diagram of an exemplary implementation of a classification model training method and a data classification method;
FIG. 2 is a schematic flow chart diagram illustrating a classification model training method according to an embodiment;
FIG. 3 is a schematic illustration of feature fusion in one embodiment;
FIG. 4 is a schematic flow chart illustrating a process of training a first object classification model and a second object classification model according to an embodiment;
FIG. 5A is a schematic flow chart diagram illustrating the training of the first target classification model in one embodiment;
FIG. 5B is a schematic diagram illustrating an exemplary process for training a second object classification model;
FIG. 6 is a schematic diagram of a process for training a third object classification model according to an embodiment;
FIG. 7 is a flow diagram that illustrates a method for data classification in one embodiment;
FIG. 8A is a diagram illustrating model training and model deployment, in one embodiment;
FIG. 8B is a schematic diagram of a knowledge distillation process for a small recognition network in one embodiment;
FIG. 8C is a schematic flow chart diagram illustrating model deployment in one embodiment;
FIG. 9 is a block diagram showing the structure of a classification model training apparatus according to an embodiment;
FIG. 10 is a block diagram showing the structure of a data sorting apparatus according to an embodiment;
FIG. 11 is a diagram of the internal structure of a computer device in one embodiment;
FIG. 12 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
It will be understood that the terms "first," "second," and the like as used herein may be used herein to describe various data, but the data is not limited by these terms. These terms are only used to distinguish one type of data from another. For example, a first target classification model may be referred to as a second target classification model, and similarly, a second target classification model may be referred to as a first target classification model, without departing from the scope of the present application. Both the first and second object classification models are object classification models, but they are not the same object classification model.
Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The scheme provided by the embodiment of the application relates to the computer vision technology, the machine learning technology and other technologies of artificial intelligence, and is specifically explained by the following embodiments:
the classification model training method and the data classification method provided by the embodiment of the application can be applied to the application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104, or may be placed on the cloud or other server. The terminal 102 may be, but is not limited to, various desktop computers, notebook computers, smart phones, tablet computers, internet of things devices, and portable wearable devices, and the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart car-mounted devices, and the like. The portable wearable device can be a smart watch, a smart bracelet, a head-mounted device, and the like. The server 104 may be implemented as a stand-alone server or a server cluster consisting of a plurality of servers or a cloud server.
Both the terminal and the server can be independently used for executing the classification model training method and the data classification method provided in the embodiment of the application.
For example, the server locally obtains a training sample set, where the training sample set includes training samples and training labels corresponding to the training samples. The server determines at least two reference classification models from a first target classification model and a second target classification model set which are obtained through training based on a training sample set. In a second target classification model included in the second target classification model set, class center features corresponding to various training labels are obtained from the first target classification model, and the class center features are used for representing position information of the training labels in a feature space. And the server inputs the training sample set into each reference classification model to obtain reference sample characteristics corresponding to each training sample, and fuses the reference sample characteristics corresponding to the same training sample to obtain target sample characteristics corresponding to each training sample. And the server inputs the training sample set into a third initial classification model to obtain training sample characteristics corresponding to each training sample, and adjusts model parameters of the third initial classification model based on the target sample characteristics and the training sample characteristics corresponding to the same training sample until a target convergence condition is met to obtain a third target classification model.
The server acquires the data to be classified, inputs the data to be classified into the third target classification model to obtain target characteristics corresponding to the data to be classified, and determines classification results corresponding to the data to be classified based on the target characteristics.
The terminal and the server can also be cooperatively used for executing the classification model training method and the data classification method provided in the embodiment of the application.
For example, the server obtains a training sample set from the terminal, and determines at least two reference classification models from a first target classification model and a second target classification model which are trained based on the training sample set. And the server inputs the training sample set into each reference classification model to obtain reference sample characteristics corresponding to each training sample, and fuses the reference sample characteristics corresponding to the same training sample to obtain target sample characteristics corresponding to each training sample. And the server inputs the training sample set into a third initial classification model to obtain training sample characteristics corresponding to each training sample, and adjusts model parameters of the third initial classification model based on the target sample characteristics and the training sample characteristics corresponding to the same training sample until a target convergence condition is met to obtain a third target classification model.
The server may send the third target classification model to the terminal. And the terminal acquires the data to be classified, inputs the data to be classified into the third target classification model to obtain target characteristics corresponding to the data to be classified, and determines a classification result corresponding to the data to be classified based on the target characteristics.
In one embodiment, as shown in FIG. 2, a classification model training method is provided, which is illustrated by applying the method to a computer device, which may be the terminal 102 or the server 104 in FIG. 1. Referring to fig. 2, the classification model training method includes the steps of:
step S202, acquiring a training sample set; the training sample set comprises training samples and training labels corresponding to the training samples.
The training samples refer to samples of known classes and are used for training the classification model. And the training labels corresponding to the training samples are used for representing the categories corresponding to the training samples. The training sample set comprises a plurality of training samples, and each training sample has a corresponding training label.
It is understood that under different domains, different training sample sets may be generated, and the classification model corresponding to a specific domain is trained based on the training sample set corresponding to the specific domain. For example, in the image domain, the training samples may be images of known classes, and the image classification model is trained based on the images of known classes. In the text field, the training samples may be texts of known classes, and the text classification model is trained based on the texts of the known classes. In the speech domain, the training samples may be known classes of speech, and the speech classification model is trained based on the known classes of speech. The training samples can be video, image, text, voice and other types of data, and correspondingly, the classification model can be a video classification model, an image classification model, a text classification model, a voice classification model and other models.
Further, in a certain field, different training sample sets may be generated according to the subdivided application scenarios, and the classification model corresponding to the specific application scenario may be trained based on the training sample set corresponding to the specific application scenario. For example, in a face recognition scenario in the image domain, the training samples may be face images of known classes, and the face classification model is trained based on the face images of known classes. In a vehicle recognition scenario in the image domain, the training samples may be vehicle images of known classes, and the vehicle classification model is trained based on the vehicle images of known classes.
Step S204, determining at least two reference classification models from a first target classification model and a second target classification model set which are obtained by training based on a training sample set; in a second target classification model included in the second target classification model set, class center features corresponding to various training labels are obtained from the first target classification model, and the class center features are used for representing position information of the training labels in a feature space.
Wherein the second set of object classification models comprises at least one second object classification model. The first and second object classification models are trained classification models. The first target classification model is obtained by training based on a training sample set, and comprises class center features corresponding to various training labels respectively. The class center feature is used for representing position information of the training label in the feature space, and the determination of the class center feature represents the position determination of the training label in the feature space. The class center feature is a statistical center of sample features corresponding to each sample belonging to the same training label, and the sample features are obtained by performing feature extraction on the samples by using a model. The class center feature is a model parameter of the classification model and is obtained by training and updating the classification model. It can be understood that model parameters are adjusted during model training, so that sample features and feature spaces extracted by the model are changed. The feature space refers to a space in which the features of the sample exist. The distribution of each class center feature in the feature space may reflect the distribution of the sample feature set corresponding to each training label in the feature space. After the model training is completed, the sample features corresponding to different training labels are usually located in different regions in the feature space, and the sample features corresponding to the same training label are usually located in the same region in the feature space, that is, the model has the feature distinguishing capability, and can effectively classify data.
The second target classification model is obtained through training based on the training sample set, but class center features corresponding to various training labels in the second target classification model are obtained from the first target classification model. That is, although the first and second object classification models are different models, the class center features of the first and second object classification models are consistent, and the first and second object classification models are feature space aligned. Different models of feature space alignment may be considered complementary models, any of which may help improve the performance of another model, any of which may supplement the knowledge learned by the other model. It is to be understood that, if the second target classification model set includes at least two second target classification models, although each second target classification model is a different model, the class center features of the second target classification models are consistent, and the second target classification models are aligned in the feature space.
Specifically, the computer device may obtain a training sample set locally or from another device, obtain a first target classification model and a second target classification model set obtained through training based on the training sample set, determine a reference classification model from the first target classification model and the second target classification model set, assist training of a third initial classification model through the reference classification model, and finally train to obtain a third target classification model with higher classification accuracy.
In one embodiment, the determination of the reference classification model comprises any one of the following ways:
acquiring at least two second target classification models from the second target classification model set as reference classification models; and acquiring at least one second target classification model in the first target classification model and the second target classification model set as a reference classification model.
In particular, when determining at least two reference classification models from the first and second set of object classification models, the computer device may determine the reference classification models from only the respective second object classification models, or from the first and second object classification models. The feature spaces of the second object classification models are aligned, so that the computer device can obtain at least two second object classification models from the second object classification models as reference classification models. The first target classification model and the second target classification model are aligned in feature space, so the computer can also obtain the first target classification model as a reference classification model and obtain at least one second target classification model in the second target classification model set as a reference classification model.
Step S206, inputting the training sample set into each reference classification model to obtain reference sample characteristics corresponding to each training sample, and fusing the reference sample characteristics corresponding to the same training sample to obtain target sample characteristics corresponding to each training sample.
The reference sample features are sample features obtained by performing feature extraction on the training samples by referring to the classification model. The target sample features are obtained by performing feature fusion on the reference sample features corresponding to the same training sample, and the reference sample features corresponding to the same training sample are extracted by different reference classification models respectively.
Specifically, the models with aligned feature spaces are complementary models among the reference classification models, and the reference classification models can be regarded as single sampling points of input data, so that more accurate sample features can be obtained by fusing the sample features extracted by the reference classification models, and the model training of the third initial classification model is performed by using the fused features, so that the classification accuracy of the trained models can be improved.
The computer device may input the training samples in the training sample set into each reference classification model to obtain reference sample characteristics corresponding to each training sample, where the same training sample has the reference sample characteristics extracted by each reference classification model, i.e., the same training sample has multiple reference sample characteristics. The computer device can perform feature fusion on each reference sample feature corresponding to the same training sample to obtain a target sample feature, and finally can obtain a target sample feature corresponding to each training sample.
In one embodiment, referring to fig. 3, there are six reference classification models, and a training sample is input into the six reference classification models, so as to obtain a reference sample feature 1, a reference sample feature 2, a reference sample feature 3, a reference sample feature 4, a reference sample feature 5, and a reference sample feature 6. And fusing the reference sample characteristics corresponding to the training sample to obtain the target sample characteristics. In the fusion, an arithmetic average of the respective reference sample features may be used as the target sample feature, or a weighted average of the respective reference sample features may be used as the target sample feature. The weight corresponding to each reference sample feature may be self-defined, or may be determined based on the prediction accuracy of each reference classification model, and the higher the prediction accuracy is, the larger the corresponding weight is. The prediction accuracy is calculated based on the training labels corresponding to the test samples and the prediction labels obtained by inputting the test samples into the reference classification model.
Step S208, inputting the training sample set into a third initial classification model to obtain training sample characteristics corresponding to each training sample, and adjusting model parameters of the third initial classification model based on the target sample characteristics and the training sample characteristics corresponding to the same training sample until a target convergence condition is met to obtain a third target classification model.
The initial classification model is a classification model to be trained, and the target classification model is a trained classification model. The third initial classification model refers to a third classification model to be trained. The third target classification model refers to a trained third classification model. The training sample features are sample features obtained by performing feature extraction on the training samples by the third initial classification model.
Specifically, the target sample features obtained through fusion are relatively accurate sample features, and the training target of the third initial classification model is to make the training sample features close to the target sample features, so that the third target classification model obtained through training can also extract the relatively accurate sample features.
The computer device may input the training samples in the training sample set into the third initial classification model to obtain training sample characteristics corresponding to each training sample, and then adjust model parameters of the third initial classification model based on a difference between a target sample characteristic and a training sample characteristic corresponding to the same training sample until a target convergence condition is satisfied to obtain a third target classification model. The computer device may generate loss information based on a difference between a target sample feature and a training sample feature corresponding to the same training sample, and perform back propagation to adjust a model parameter of the third initial classification model based on the loss information until a target convergence condition is satisfied, to obtain a third target classification model. The computer device may generate loss information based on a loss function, which may be a commonly used loss function in model training or a custom loss function.
The target convergence condition may be at least one of the conditions that the loss information is smaller than a preset threshold, the number of model iterations is greater than a preset number, and the like.
In one embodiment, the computer device may determine an initial training sample from the training sample set, input the initial training sample into a third initial classification model to obtain training sample characteristics, adjust model parameters of the third initial classification model based on target sample characteristics and training sample characteristics corresponding to the initial training sample to obtain a third intermediate classification model, use a next training sample as a new initial training sample, use the third intermediate classification model as a new third initial classification model, return to input the initial training sample into the third initial classification model, perform iterative training in the step of obtaining the training sample characteristics, and continuously adjust model parameters until a target convergence condition is satisfied to obtain a third target classification model. For example, if the loss information calculated based on the training sample features and the reference sample features is smaller than a preset threshold value in a certain round of training, the adjustment of the model parameters is stopped, and the newly adjusted classification model is used as the third target classification model. And if the iteration times of the model after a certain round of training are greater than the preset times, taking the latest adjusted classification model as a third target classification model. It is understood that there may be at least one initial training sample obtained in each round of training.
In one embodiment, the first, second and third object classification models may be the same model size as each other or may differ.
In the classification model training method, a training sample set is obtained, the training sample set comprises training samples and training labels corresponding to the training samples, at least two reference classification models are determined from a first target classification model and a second target classification model set obtained by training based on the training sample set, class center features respectively corresponding to various training labels are obtained from the first target classification model in the second target classification model included in the second target classification model set, the class center features are used for representing position information of the training labels in a feature space, the training sample set is input into each reference classification model to obtain reference sample features respectively corresponding to various training samples, the reference sample features corresponding to the same training sample are fused to obtain target sample features respectively corresponding to various training samples, and the training sample set is input into a third initial classification model, and obtaining training sample characteristics corresponding to each training sample, and adjusting model parameters of the third initial classification model based on the target sample characteristics and the training sample characteristics corresponding to the same training sample until a target convergence condition is met to obtain a third target classification model. In this way, the class center features are used to represent the positions of the training labels in the feature space, each class center feature in the second target classification model is obtained from the first target classification model, therefore, the second target classification model and the first target classification model are feature space aligned, the feature space distribution of the second target classification model and the first target classification model is consistent, each reference classification model determined from the first target classification model and the second target classification model is also feature space aligned, the reference classification models in feature space alignment are complementary models, and there is complementarity between the extracted features. Furthermore, feature fusion is carried out by combining a plurality of reference classification models, a third initial classification model is trained by adopting the fused target sample features, the third initial classification model can learn more accurate feature distribution of the fusion features, the trained third target classification model can extract more accurate data features of input data, and more accurate classification results can be obtained based on more accurate data features, so that the classification accuracy of the models is improved, and the classification accuracy of the data is improved.
In one embodiment, referring to fig. 4, before determining at least two reference classification models from the first target classification model and the second target classification model set trained based on the training sample set, the classification model training method further includes:
step S402, model training is carried out on the first initial classification model based on the training sample set, and a first target classification model is obtained.
The first initial classification model refers to a first classification model to be trained. The first target classification model refers to a trained first classification model.
Specifically, the computer device may first train based on the training sample set to obtain a first target classification model, and then train based on the first target classification model and the training sample set to obtain a second target classification model.
The computer device may input training samples in the training sample set into the first initial classification model to obtain first prediction labels corresponding to the training samples, and adjust model parameters of the first initial classification model based on the training labels and the prediction labels corresponding to the same training sample until a first convergence condition is satisfied to obtain a first target classification model.
Step S404, obtaining target class central features respectively corresponding to various training labels from the first target classification model, and generating at least one second initial classification model aligned with the first target classification model feature space; and the class center features respectively corresponding to various training labels in the second initial classification model are corresponding target class center features.
Step S406, based on the training sample set, performing model training on at least one second initial classification model to obtain at least one second target classification model, wherein each second target classification model forms a second target classification model set; the third initial classification model keeps the center features of various classes unchanged during model training.
The target class center feature refers to a class center feature obtained from a trained model, and is a class center feature which is not adjusted any more. The second initial classification model refers to a second classification model to be trained. The second target classification model refers to a trained second classification model.
Specifically, the computer device conducts supervised training on the first initial classification model based on the training sample set to obtain a second target classification model, and the second target classification model comprises target class central features corresponding to various training labels respectively. Further, in order to align the feature spaces of the first target classification model and the second target classification model, the computer device may first generate at least one second initial classification model aligned with the feature space of the first target classification model, then perform model training on each second initial classification model based on the training sample set to obtain each second target classification model, and form a second target classification model set from each second target classification model.
When the second initial classification model is generated, the computer device may obtain target class central features corresponding to various training labels from the first target classification model, and use the target class central features corresponding to various training labels as class central features corresponding to various training labels in the second initial classification model, thereby obtaining the second initial classification model aligned with the first target classification model feature space. In the training process, class center features corresponding to various training labels in the second initial classification model are not updated. Because the central features of each class are kept unchanged during model training, the feature space distribution of each second initial classification model in the training can be ensured to be consistent, each second target classification model obtained through final training is still aligned in the feature space, and each second target classification model and the first target classification model are also aligned in the feature space.
In one embodiment, the second initial classification model and the first initial classification model may be trained in the same or different ways, and the loss functions used by the two models may be the same or different.
In one embodiment, in order to ensure the diversity of the models, when the second initial classification model is generated, parameters except for the class center feature in the second initial classification model may be randomly initialized, so that the initial model parameters are different between the second initial classification models. In one embodiment, the penalty functions between different second target classification models may be different to further secure the diversity of the models.
In an embodiment, in order to improve the model training speed, other model parameters of the first target classification model except the class center feature may also be adjusted to obtain a second initial classification model, and since the first target classification model is a converged model, the model parameters of the first target classification model are fine-tuned to generate the second initial classification model, which may effectively improve the convergence speed of the second initial classification model. It will be appreciated that different second initial classification models may correspond to different adjustment magnitudes. In one embodiment, the loss functions corresponding to the first target classification model and the second target classification model may be different, so as to ensure that the first target classification model and the second target classification model obtained by final training are different models.
In the above embodiment, the first target classification model is obtained through training, then the target class center feature is obtained from the first target classification model and is used as the class center feature of the second initial classification model, and the class center feature is kept unchanged during training of the second initial classification model, so that it is ensured that the second target classification model and the first target classification model obtained through training are models with aligned feature spaces.
In one embodiment, model training the first initial classification model based on the training sample set to obtain a first target classification model includes:
inputting training samples in a training sample set into a first initial classification model; the model parameters of the first initial classification model comprise initial class center features corresponding to various training labels respectively; extracting current sample characteristics corresponding to the current training sample, and obtaining a first prediction label corresponding to the current training sample based on the current sample characteristics and each initial class center characteristic; and adjusting model parameters of the first initial classification model based on the first prediction label and the training label corresponding to the same training sample until a first convergence condition is met, so as to obtain a first target classification model.
Wherein, the initial class center feature refers to the class center feature to be adjusted. It is understood that the initial model parameters in the first initial classification model may be initialized randomly or may be set manually.
Specifically, when a first initial classification model is trained, the computer device may input training samples in a training sample set into the first initial classification model, the first initial classification model outputs a first prediction label corresponding to the training sample, the computer device generates loss information based on a difference between the first prediction label and the training label corresponding to the same training sample, and performs back propagation to adjust model parameters of the first initial classification model based on the loss information until a first convergence condition is satisfied, so as to obtain a first target classification model.
Similar to the target convergence condition, the first convergence condition may be at least one of the conditions that the loss information is less than a preset threshold, and the number of model iterations is greater than the preset threshold.
The internal data processing process of the model is explained by taking a current training sample as an example, wherein the current training sample refers to any one training sample in a training sample set. After the current training sample is input into the first initial classification model, the first initial classification model extracts sample features corresponding to the current training sample to obtain current sample features, each initial class center feature is obtained, and the current sample features and each initial class center feature are fused to obtain a first prediction label corresponding to the current training sample.
In one embodiment, the prediction probabilities of the current training sample belonging to various training labels can be calculated based on the current sample features and the initial class center features, and the first prediction label corresponding to the current training sample is obtained based on the prediction probabilities. In one embodiment, feature distances between the current sample feature and each of the initial class-center features may be calculated, and the prediction probability may be obtained based on the feature distances. It will be appreciated that the smaller the feature distance, the greater the prediction probability.
In the above embodiment, the first prediction labels corresponding to the training samples may be obtained based on the sample features extracted from the training samples and each of the initial class center features, the first prediction labels may reflect which kind of label the model prediction training samples belong to, and based on the first prediction label and the training label corresponding to the same training sample, the model parameters of the first initial classification model are adjusted, so that the first prediction label may gradually approach the training label, and finally the first target classification model with higher prediction accuracy is obtained through training.
In one embodiment, model training is performed on at least one second initial classification model respectively based on a training sample set, so as to obtain at least one second target classification model, including:
sampling the training sample set to obtain a plurality of training sample subsets, determining a target sample subset from each training sample subset, determining a candidate classification model from each third initial classification model, and taking the candidate classification model as a current classification model; inputting target training samples in the target sample subset into the current classification model to obtain initial sample characteristics corresponding to each target training sample; calculating characteristic included angle information based on initial sample characteristics corresponding to the target training samples and various class center characteristics, and obtaining target loss based on the characteristic included angle information corresponding to each target training sample and included angle compensation parameters corresponding to the candidate classification models; adjusting model parameters corresponding to the current classification model based on the target loss to obtain an intermediate classification model; and taking the next training sample subset as a target sample subset, taking the intermediate classification model as a current classification model, returning to the step of inputting the target training samples in the target sample subset into the current classification model to obtain initial sample characteristics corresponding to each target training sample, and executing until a second convergence condition is met to obtain a second target classification model corresponding to the candidate classification model.
The initial sample features are sample features obtained by performing feature extraction on the target training samples by the current classification model. The feature included angle information is used for representing included angle information between the initial sample features and the class center features corresponding to the training samples. The angle compensation parameters are used to adjust the characteristic angle information to generate loss information that can improve the discrimination of the model for the class.
Specifically, when training the second initial classification model, the computer device may perform iterative training on the second initial classification model to obtain a second target classification model. The computer device can perform sampling processing on the training sample set to obtain a plurality of training sample subsets, and perform iterative training on the second initial classification model based on each training sample subset to obtain a second target classification model.
During sampling, a preset number of training samples can be selected from the training sample set to form a training sample subset, a plurality of training sample subsets can be obtained through multiple selections, and one training sample subset can be obtained through each selection. It is to be appreciated that training samples in different subsets of training samples may or may not be repeated.
During iterative training, one model can be randomly selected from each second initial classification model as a candidate classification model, the candidate classification model is used as a current classification model, one training sample subset can be randomly selected from each training sample subset as a target sample subset, the current classification model is trained on the basis of the target sample subset to obtain an intermediate classification model, the next training sample subset is used as a new target sample subset, the intermediate classification model is used as a new current classification model, the new current classification model is trained on the basis of the new target sample subset to obtain a new intermediate classification model, iterative training is carried out in this way until a second convergence condition is met, and finally a second target classification model corresponding to the candidate classification model is obtained. Then, the computer device may obtain a next model from each second initial classification model as a new candidate classification model, and repeat the above process to obtain a second target classification model corresponding to the new candidate classification model. It can be understood that the training modes of the second initial classification models are the same, and the second target classification models corresponding to the second initial classification models can be finally obtained by training with reference to the training modes. Of course, the subset of training samples used to train different second initial classification models may be the same or may be different.
When the current classification model is trained based on the target sample subset, the computer device may input the target training samples in the target sample subset into the current classification model respectively, extract the sample features of the target training samples through the current classification model, and obtain the initial sample features corresponding to each target training sample in the target sample subset respectively. And then, the computer equipment calculates included angle information between the initial sample features and the class center features based on the initial sample features and the class center features corresponding to the same target training sample to obtain feature included angle information corresponding to each target training sample, calculates target loss based on the feature included angle information corresponding to each target training sample and included angle compensation parameters corresponding to the candidate classification models, and finally performs back propagation based on the target loss to update model parameters corresponding to the current classification model to obtain an intermediate classification model. The corresponding included angle compensation parameters of the second initial classification models respectively exist, and the included angle compensation parameters corresponding to different second initial classification models can be different, so that the fact that each second target classification model obtained through final training is different is guaranteed, and the diversity of the models is guaranteed.
In an embodiment, the characteristic angle information may be adjusted based on the angle compensation parameter to obtain target information, each target information may be normalized, and the target loss may be obtained based on each normalized target information.
It can be understood that, referring to the above-mentioned manner of training the second initial classification model, the first initial classification model may also be trained to obtain the first target classification model.
In the above embodiment, the second initial classification model is iteratively trained to obtain a second target classification model, in any round of training, feature included angle information is calculated based on the initial sample features and the class center features corresponding to the training samples in the round, target loss is calculated based on the feature included angle information corresponding to each training sample and the included angle compensation parameters corresponding to the second initial classification model, and model parameters are adjusted based on the target loss, so that the feature distinguishing capability of the model can be improved, and the second target classification model is obtained through rapid training. And each second initial classification model has corresponding included angle compensation parameters, so that the trained second target classification models can be different models.
In one embodiment, calculating feature included angle information based on initial sample features and various class center features corresponding to target training samples, and obtaining target loss based on feature included angle information and included angle compensation parameters corresponding to candidate classification models, which respectively correspond to the target training samples, includes:
taking a training label corresponding to a current target training sample as a target training label, obtaining target included angle information corresponding to the current target training sample based on an initial sample characteristic corresponding to the current target training sample and a class center characteristic corresponding to the target training label, obtaining reference included angle information corresponding to the current target training sample based on the initial sample characteristic corresponding to the current target training sample and the class center characteristics corresponding to other training labels, and taking the target included angle information and the reference included angle information corresponding to the current target training sample as feature included angle information corresponding to the current target training sample; respectively adjusting target included angle information and reference included angle information corresponding to the same target training sample based on included angle compensation parameters to obtain target characteristic information and reference characteristic information, and obtaining initial loss based on the target characteristic information and the reference characteristic information corresponding to the same target training sample; and obtaining target loss based on the initial loss corresponding to each target training sample.
The current target training sample refers to any one of the target training samples. The target characteristic information is obtained by adjusting target included angle information based on included angle compensation parameters, and the reference characteristic information is obtained by adjusting reference included angle information based on included angle compensation parameters.
Specifically, the feature angle information corresponding to any one target training sample includes target angle information and reference angle information corresponding to the target training sample, the target angle information is obtained based on an initial sample feature corresponding to the target training sample and a class center feature corresponding to a training label of the target training sample, and the reference angle information is obtained based on the initial sample feature corresponding to the target training sample and the class center feature corresponding to other training labels except the training label of the target training sample. Specifically, a feature included angle between an initial sample feature corresponding to the current target training sample and a class center feature corresponding to the target training label may be calculated as a target included angle, and target included angle information may be obtained based on the target included angle. The target included angle may be used as target included angle information, or the target included angle may be converted to obtain target included angle information, for example, the target included angle may be subjected to cosine processing to obtain target included angle information. Specifically, a feature included angle between an initial sample feature corresponding to the current target training sample and other class center features except the target training label can be calculated as a reference included angle, and reference included angle information is obtained based on the reference included angle. The reference angle information is determined in a similar manner to the target angle information.
When loss is calculated, the computer device can respectively adjust target included angle information and reference included angle information corresponding to the same target training sample based on the included angle compensation parameters to obtain target characteristic information and reference characteristic information, the target training samples respectively have corresponding target characteristic information and reference characteristic information, initial loss is calculated based on the target characteristic information and the reference characteristic information corresponding to the same target training sample, and initial loss corresponding to each target training sample is obtained. For example, the target feature information may be normalized based on the reference feature information to obtain an initial loss. The normalization processing is used for mapping the target characteristic information into a preset range, so that the numerical range of the target characteristic information corresponding to each target training sample is unified, and data calculation is performed conveniently. Finally, the computer device obtains the target loss based on the initial losses corresponding to the respective target training samples, for example, an average value of the losses of the respective initial losses may be calculated as the target loss, and a median value of the respective initial losses may be used as the target loss.
In one embodiment, the angle compensation parameter includes a first parameter and a second parameter, and the target characteristic information may be obtained by adjusting target angle information corresponding to the target training sample based on the first parameter and the second parameter in the angle compensation parameter, and the reference characteristic information may be obtained by adjusting reference angle information corresponding to the target training sample based on the first parameter in the angle compensation parameter. The first parameter is used for additively adjusting the included angle information, and the second parameter is used for multiplicatively adjusting the included angle information. Additive adjustment refers to adjustment by addition, and multiplicative adjustment refers to adjustment by multiplication.
In one embodiment, the target loss is calculated as follows:
Figure BDA0003608074460000201
wherein L is loss Representing the target loss and N representing the number of target training samples.
Figure BDA0003608074460000202
Representing a target included angle corresponding to the ith target training sample, wherein the target included angle is the training of the initial sample characteristic corresponding to the ith target training sample and the ith target training sampleAnd the characteristic included angle between the class center characteristics corresponding to the label.
Figure BDA0003608074460000203
And representing target characteristic information corresponding to the ith target training sample. cos θ j And representing a reference included angle corresponding to the ith target training sample, wherein the reference included angle is a characteristic included angle between the initial sample characteristic corresponding to the ith target training sample and other class center characteristics.
Figure BDA0003608074460000204
And representing the reference characteristic information corresponding to the ith target training sample. k represents the number of classes of training labels. y is i A training label representing an ith target training label. s, m1, and m2 represent angle adjustment parameters. m1 and m2 can enlarge the interval between the regions occupied by different training labels in the feature space, so as to improve the classification degree. The cos value interval is originally [0,1 ]]The value of the interval is too small, so that differences cannot be effectively distinguished, and the cos value can be increased by s times by s, so that the difference of feature distribution is improved, and the convergence speed is improved.
In the above embodiment, the feature angle information corresponding to the target training sample includes target angle information and reference angle information, the target angle information may reflect a distance between the sample feature and the category center feature corresponding to the real tag, and the reference angle information may reflect a distance between the sample feature and the category center feature corresponding to the error tag. And respectively adjusting target included angle information and reference included angle information corresponding to the same target training sample based on the included angle compensation parameters to obtain target characteristic information and reference characteristic information, wherein the included angle compensation parameters can punish the target included angle information and the reference included angle information, and the influence degree of the target included angle information and the reference included angle information is increased. The initial loss is obtained based on the target characteristic information and the reference characteristic information corresponding to the same target training sample, the target loss is obtained based on the initial loss corresponding to each target training sample, and the model parameters are adjusted based on the target loss, so that the characteristic discrimination of the model can be enhanced, and the classification accuracy of the model is improved.
In one embodiment, a first target classification model is obtained by training, and then a second target classification model is obtained by training based on the first target classification model. The training process of the first initial classification model is explained with reference to fig. 5A. The training process of the first initial classification model mainly comprises a training data preparation module, an identification network unit module, a center storage module of each category, a loss function calculation module and a loss function optimization module.
(a) A training data preparation module: in the training process, training samples are obtained, the obtained training samples are combined into a training sample subset, and the training sample subset is sent to a deep network unit (namely a first initial classification model) for processing.
(b) Basic identification network unit module: the module is used for extracting the characteristics of the training samples. The module generally has a structure of a Convolutional Neural Network (CNN), which includes operations such as convolution (convolution) calculation, nonlinear activation function (Relu) calculation, Pooling (floating) calculation, and the like. Taking a training sample as an image as an example, the module can extract spatial features of the image, and the output feature map retains spatial structure information of the image.
(c) A category center storage module: the data stored in the module is the class center feature corresponding to each training label in the training data, and the shape of the class center feature is (d × m), wherein d is the feature dimension, and m is the number of classes of the training labels in the training data. And performing matrix operation on the central features of each class and the features extracted by the identification network unit module to obtain probability values of the training samples belonging to various training labels, wherein the probability values of the training samples belonging to various training labels form a probability vector (also called a prediction label).
(d) A loss function calculation module: the module takes the probability vector corresponding to the training sample and the training label of the training sample as the input of the loss function, and calculates the loss information loss. The loss function may be classified loss function, such as softmax function, margin-softmax function, etc., or may be other types of loss function.
(e) A loss function optimization module: the module trains and optimizes the whole network based on a Gradient descending mode, such as random Gradient descending, random Gradient descending driving quantity items, Adam (Adaptive motion Estimate, momentum-based algorithm and Adaptive learning rate-based algorithm), Adaptive Gradient optimization algorithm and the like. Repeating the steps (a) to (d) in the training until the training result meets the training termination condition. The condition for terminating the model training is generally set that the iteration times meet the set value, or the loss calculated by the loss function is smaller than the set value, so that the model training can be completed.
The training process of the second initial classification model is explained with reference to fig. 5B. In order to ensure that the features extracted by the reference classification model can be directly fused, the feature spaces of the first classification model and the second classification model need to be aligned. Therefore, in the process of training the second initial classification model, the class center storage module in fig. 5A needs to be used in cooperation. And in the training process of the second initial classification model, the parameters of the class center storage module are not updated. Because each class center feature of the training model is fixed and unchangeable in the training, the feature space distribution of the first classification model and the second classification model in the training can be ensured to be consistent.
The training process of the second initial classification model mainly comprises a training data preparation module, a complementary recognition network unit module, a category center storage module, a loss function calculation module, a random seed control module, a loss super-parameter control module and a loss function optimization module.
(1) A training data preparation module: the function of this module is identical to the training data preparation module in fig. 5A.
(2) Complementary identification network element module: the function of this module is identical to the basic identification network element in fig. 5A.
(3) A category center storage module: the function of this module is identical to the category centric storage module in fig. 5A. The data of this module is obtained from the trained category-centric storage module in fig. 5A.
(4) A loss function calculation module: the function of this block is identical to the loss function calculation block in fig. 5A.
In the training process, parameters are updated only on the complementary recognition network unit module, and the category center storage module only provides gradient calculation and does not participate in the parameter updating process.
(5) The random seed control module: the module randomly initializes the second initial classification models and controls the inconsistency of the initialized random seeds of each second initial classification model so as to enrich the diversity of the models.
(6) Loss parameter control module: the module controls a loss function of
Figure BDA0003608074460000221
Figure BDA0003608074460000222
Since the class center direction of the training data is fixed, the direction vector of each training label in space can be determined. It will be appreciated that the sample features of the training sample are clustered around the space of the direction vectors of the training labels, with the distance s from the origin expressing the quality of the training sample. In order to guarantee the diversity of the model and the complementarity of the model, the training samples can be guaranteed to be constrained in space by adjusting s and m, and therefore the model is promoted to learn the knowledge of the complementarity. Therefore, the loss override module can control the configuration of s and m, and the configuration of the different second initial classification models is inconsistent.
(7) A loss function optimization module: the function of this module is consistent with the loss function optimization module in fig. 5A.
Finally, different second target classification models (also called complementary models) can be obtained through training, and because the feature spaces of the models are aligned, the features of the models can be directly fused.
In an embodiment, as shown in fig. 6, adjusting model parameters of a third initial classification model based on target sample features and training sample features corresponding to the same training sample until a target convergence condition is satisfied, to obtain a third target classification model, includes:
step S602, obtaining the target class central feature corresponding to the current training sample from the reference classification model, and obtaining the reference label corresponding to the current training sample based on the target sample feature and the target class central feature corresponding to the current training sample.
Step S604, obtaining an initial class center feature corresponding to the current training sample from the third initial classification model, and obtaining a second prediction label corresponding to the current training sample based on the training sample feature and the initial class center feature corresponding to the current training sample.
Step S606, based on the reference label and the second prediction label corresponding to the same training sample, adjusting the model parameters of the third initial classification model until the target convergence condition is met, and obtaining a third target classification model.
Specifically, in addition to constraining the distance between the reference classification model and the third initial classification model by using the features, label information may be introduced, and constraining the distance between the reference classification model and the third initial classification model by using labels. The computer device can obtain a reference label corresponding to the training sample based on the target sample characteristics corresponding to the training sample, obtain a second prediction label corresponding to the training sample based on the training sample characteristics corresponding to the training sample, wherein the reference label is a fusion prediction result of a plurality of reference classification models, and the reference label can be regarded as a relatively accurate result because the reference classification model is a trained model, so that the reference label is used as a supervision signal, the second prediction label is a prediction result of a third initial classification model, and the training target of the third initial classification model is to make the second prediction label approach the reference label, thereby finally training to obtain the third target classification model.
The current training sample refers to any one of the training samples. When the reference label is determined, the target class center feature corresponding to the current training sample can be obtained from any one of the reference classification models, and the reference label corresponding to the current training sample is obtained based on the target sample feature corresponding to the current training sample and the target class center feature. When the second prediction label is determined, the initial class center feature corresponding to the current training sample may be obtained from the third initial classification model, and the second prediction label corresponding to the current training sample is obtained based on the training sample feature corresponding to the current training sample and the initial class center feature. It will be appreciated that the determination of the reference label and the second predictive label may refer to the determination of the first predictive label in the previous embodiment.
After the reference label and the second prediction label are obtained, the computer device may generate loss information based on a difference between the reference label and the second prediction label corresponding to the same training sample, and perform back propagation to adjust model parameters of the third initial classification model based on the loss information until a target convergence condition is satisfied, so as to obtain a third target classification model.
In one embodiment, the computer device may determine an initial training sample from a training sample set, determine a reference label corresponding to the initial training sample, input the initial training sample into a third initial classification model to obtain a second prediction label, adjust a model parameter of the third initial classification model based on a target sample feature corresponding to the initial training sample and the second prediction label to obtain a third intermediate classification model, use a next training sample as a new initial training sample, use the third intermediate classification model as a new third initial classification model, return to the step of inputting the initial training sample into the third initial classification model to obtain the second prediction label for iterative training, and continuously adjust the model parameter until a target convergence condition is satisfied to obtain a third target classification model.
In one embodiment, the difference between the reference label and the second predicted label corresponding to the same training sample may be calculated by a KL distance (Kullback-Leibler Divergence, also referred to as relative entropy).
In the foregoing embodiment, in addition to the distance between the feature constraint reference classification model and the third initial classification model, the distance between the label constraint reference classification model and the third initial classification model may also be used, so that the third initial classification model learns the knowledge of the reference classification model, and the third target classification model with higher classification accuracy is obtained.
In one embodiment, the model size of the first and second target classification models is larger than the model size of the third target classification model.
In particular, the model scale is used to characterize the size of the model. The model scale of the first target classification model and the second target classification model is larger than that of the third target classification model, that is, the first target classification model and the second target classification model are large models, and the third target classification model is a small model. The model scale may be determined based on at least one of the model's calculations, parameters, memory footprint, etc.
The classification model training method can be applied to knowledge distillation, and knowledge distillation is carried out on a small model in a mode of fusing the characteristics of a plurality of large models. Knowledge distillation is to guide the training of a small model by using knowledge learned by the large model, so that the small model has the performance equivalent to that of the large model, but the scale of the model is greatly reduced, and the compression and acceleration of the model are realized. Specifically, a plurality of large models which are trained and feature space aligned are obtained, sample features extracted by the large models from the same training sample are fused, each large model can be regarded as a sampling point of the training sample, feature fusion is carried out by adopting the large models to obtain more accurate feature distribution of the training sample, knowledge distillation is carried out on the small models based on the feature distribution, the feature distribution of the small models can be constrained to be consistent with the fusion features, and therefore the classification accuracy of the distilled small models is higher than that of the small models obtained by training directly based on the training sample.
In the above embodiment, the model scales of the first target classification model and the second target classification model are larger than the model scale of the third target classification model, so that the third target classification model has high classification accuracy and high classification efficiency.
In one embodiment, when the first target classification model, the second target classification model and the third target classification model are face recognition models, the third target classification model is used for extracting image features corresponding to the input image, and the image features are used for identity recognition.
Specifically, in a face recognition application scenario, the first target classification model, the second target classification model, and the third target classification model may be face recognition models, and the models are used to recognize faces in images. The third target classification model obtained through the training of the classification model training method is used for extracting image features corresponding to the input image, the image features corresponding to the input image are used for identity recognition, and identity information of an object in the input image is judged. The input image may be a face image containing face information.
In one embodiment, the first image may be input into a third target classification model to obtain a first image feature corresponding to the first image, the second image may be input into the third target classification model to obtain a second image feature corresponding to the second image, the first image feature and the second image feature are compared, and whether the first image and the second image are images corresponding to the same object is determined according to a comparison result. For example, feature similarity of the first image feature and the second image feature may be calculated, and if the feature similarity is greater than a preset similarity, it is determined that the first image and the second image are images corresponding to the same object.
In one embodiment, image features corresponding to a face image of a known object may be extracted as candidate image features by the third target classification model, and a corresponding object identifier exists for each candidate image feature. The object identifier is used to identify an object corresponding to the face image, and may specifically include a character string of at least one character of letters, numbers and symbols, for example, a job number or a communication number of the object may be used as the object identifier. If the face image to be recognized of the identity information to be judged is obtained, extracting image features corresponding to the face image to be recognized through a third target classification model, respectively matching the image features corresponding to the face image to be recognized with the candidate image features, and determining the identity information corresponding to the face image to be recognized according to the matching result. For example, feature similarity between image features corresponding to the face image to be recognized and each candidate image feature may be calculated, the candidate image feature corresponding to the maximum feature similarity is taken as the target image feature, and the object identifier corresponding to the target image feature is obtained as the identity information corresponding to the face image to be recognized.
In one embodiment, the first and second object classification models may be larger model-scale face recognition models than the third object classification model.
In the embodiment, the classification model training method can be applied to a face recognition task, and the accuracy of face recognition is improved.
In one embodiment, as shown in fig. 7, a data classification method is provided, which is exemplified by the method applied to a computer device, which may be the terminal 102 or the server 104 in fig. 1 described above. Referring to fig. 7, the data classification method includes the following steps
Step S702, obtaining the data to be classified, inputting the data to be classified into a third target classification model, and obtaining the target characteristics corresponding to the data to be classified.
The training process of the third target classification model is as follows: acquiring a training sample set; the training sample set comprises training samples and training labels corresponding to the training samples; determining at least two reference classification models from a first target classification model and a second target classification model set obtained by training based on a training sample set; in a second target classification model included in a second target classification model set, class center features respectively corresponding to various training labels are obtained from the first target classification model, and the class center features are used for representing position information of the training labels in a feature space; inputting the training sample set into each reference classification model to obtain reference sample characteristics corresponding to each training sample, and fusing the reference sample characteristics corresponding to the same training sample to obtain target sample characteristics corresponding to each training sample; and inputting the training sample set into a third initial classification model to obtain training sample characteristics corresponding to each training sample, and adjusting model parameters of the third initial classification model based on the target sample characteristics and the training sample characteristics corresponding to the same training sample until a target convergence condition is met to obtain a third target classification model.
It can be understood that, for the specific training process of the first target classification model, the second target classification model and the third target classification model, reference may be made to the foregoing embodiments of the classification model training method, and details are not repeated here.
Step S704, determining a classification result corresponding to the data to be classified based on the target feature.
The data to be classified refers to data of a classification result to be determined. The classification model can be a video classification model, an image classification model, a text classification model, a voice classification model and other models, and correspondingly, the data to be classified can be video, image, text, voice and other types of data.
Specifically, the computer device may determine a classification result corresponding to the data to be classified through the model. The computer equipment can obtain the data to be classified locally or from other equipment, inputs the data to be classified into a third target classification model, extracts target features corresponding to the data to be classified through the third target classification model, and finally determines a classification result corresponding to the data to be classified based on the target features. The method comprises the steps that models which are complementary are arranged among all reference classification models, complementarity exists among extracted features, target sample features obtained by fusing the sample features extracted by all the reference classification models aiming at the same training sample are features with higher accuracy, the initial sample features extracted by a third initial classification model are restrained through the target sample features, the third initial classification model is trained to learn the target sample features, therefore, the third target classification model obtained through final training can extract data features with more accurate input data, and therefore accurate classification results corresponding to data to be classified can be obtained based on the target features extracted by the third target classification model.
In one embodiment, the computer device may input the data to be classified into a third target classification model, extract target features corresponding to the data to be classified through the third target classification model, determine a classification result corresponding to the data to be classified based on the target features corresponding to the data to be classified through the third target classification model, and finally output the classification result corresponding to the data to be classified through the third target classification model.
In one embodiment, the computer device may input the data to be classified into a third target classification model, extract target features corresponding to the data to be classified through the third target classification model, and output the target features corresponding to the data to be classified through the third target classification model. And subsequently, matching the target features corresponding to the data to be classified with the candidate features corresponding to the candidate data of the known class to determine the classification result corresponding to the data to be classified.
In the data classification method, a training sample set is obtained, the training sample set comprises training samples and training labels corresponding to the training samples, at least two reference classification models are determined from a first target classification model and a second target classification model set obtained by training based on the training sample set, class center features respectively corresponding to various training labels are obtained from the first target classification model in the second target classification model included in the second target classification model set, the class center features are used for representing position information of the training labels in a feature space, the training sample set is input into each reference classification model to obtain reference sample features respectively corresponding to various training samples, the reference sample features corresponding to the same training sample are fused to obtain target sample features respectively corresponding to various training samples, and the training sample set is input into a third initial classification model, and obtaining training sample characteristics corresponding to each training sample, and adjusting the model parameters of the third initial classification model based on the target sample characteristics and the training sample characteristics corresponding to the same training sample until a target convergence condition is met to obtain a third target classification model. In this way, the class center features are used to represent the positions of the training labels in the feature space, each class center feature in the second target classification model is obtained from the first target classification model, therefore, the second target classification model and the first target classification model are feature space aligned, the feature space distribution of the second target classification model and the first target classification model is consistent, each reference classification model determined from the first target classification model and the second target classification model is also feature space aligned, the reference classification models in feature space alignment are complementary models, and there is complementarity between the extracted features. Furthermore, feature fusion is carried out by combining a plurality of reference classification models, a third initial classification model is trained by adopting fused target sample features, the third initial classification model can learn more accurate feature distribution of fusion features, the trained third target classification model can extract more accurate data features of input data, and more accurate classification results can be obtained based on more accurate data features, so that the classification accuracy of the models is improved, and the classification accuracy of the data is improved.
In one embodiment, determining a classification result corresponding to the data to be classified based on the target feature includes:
calculating the similarity between the target characteristic and each candidate characteristic in the characteristic library; determining similar features from the candidate features based on the similarity; and obtaining a classification result based on the class labels corresponding to the similar features.
Specifically, when determining the classification result, the computer device may match the target feature with each candidate feature in the feature library, and determine the classification result according to the matching result. The computer device may calculate similarities between the target features and each candidate feature in the feature library, where each candidate feature has a corresponding category label, determine a similar feature from each candidate feature based on the similarities, use the candidate feature corresponding to the maximum similarity as the similar feature, and use the category label corresponding to the similar feature as the final classification result.
In one embodiment, each candidate feature may be pre-extracted. In one embodiment, the candidate features may be obtained by inputting candidate data of known classes into any one of the object classification models.
In the above embodiment, the classification result can be determined quickly by feature matching.
In a specific embodiment, the classification model training method and the data classification method of the present application can be applied to a face recognition task. At present, the method is applied to a mobile terminal face recognition system of actual scenes such as security, payment and entrance guard, and has high requirements on the running time consumption and accuracy of a face recognition model. It is required to obtain higher recognition accuracy in as little inference time as possible. In order to meet the time-consuming requirement, a small network is generally adopted as a recognition inference model in the scene. However, training a small network directly using a large amount of data often does not result in a model that meets the accuracy requirements. This is due to the fact that the small network has a small fitting capability, and can fall into a local minimum value of the constraint function during training, and then start to oscillate, so that further optimization cannot be achieved.
The application provides a method for optimizing a small model by fusing a plurality of large models and then carrying out knowledge distillation on the small model, and the accuracy rate of face recognition is improved. The characteristic distribution extracted by the large model is transferred to the small model by adopting a knowledge distillation mode, so that the small model can be promoted to jump out of local minimum points. The method comprises the steps of firstly training a large model by using face image data, then training a plurality of large models which are complementary with the large model, fusing the plurality of models to obtain more accurate picture characteristics, and finally performing knowledge distillation on a small model by using the fused characteristics. As training progresses, the features extracted by the small model will gradually converge to be consistent with the distribution of the fused features. The data distribution extracted by fusing the features is more accurate than that obtained by directly training the small model, so that the face recognition accuracy of the small model obtained by final training is higher than that of the small model obtained by direct training.
Referring to fig. 8A, the method of the present application is mainly divided into two phases, a network module training phase and a network module deployment phase. In the network module training stage, a large recognition network (which may be called a large model) is trained by using face data with known identity, and then a complementary model (which may also be called a large recognition network) aligned with the feature space of the large model is trained. And finally, combining a plurality of large recognition networks to fuse the picture characteristics, and performing distillation training on a small model (which can also be called a small recognition network) by using the fused characteristics. In the module deployment stage, only the trained small model is adopted for deployment, and the whole face recognition system is formed by matching with the feature comparison searching module. It can be understood that the small recognition network has a similar structure to the large recognition network, but the parameters of the small recognition network are much smaller than those of the large recognition network, and the small recognition network needs shorter time for forward reasoning.
Specifically, the training process of the large recognition network in the network module training phase may refer to fig. 5A, and the process of training the complementary model in the feature space alignment manner may refer to fig. 5B, which is not described herein again.
Referring to fig. 8B, the training process of the small recognition network includes a training data preparation module, a small recognition network unit module, a large recognition network unit module, a feature fusion calculation module, a knowledge distillation loss function calculation module, and a knowledge distillation objective function optimization module.
(a) A training data preparation module: the function of this module is consistent with the training data preparation module in fig. 5A. The read data is face image data.
(b) Small-sized identification network unit module: the small identification network corresponding to the module has a similar structure with the large identification network, but the parameter number is far less than that of the large identification network, and the time consumption for forward reasoning is short. The small-sized identification network unit extracts the characteristics of the input picture to obtain characteristics F X
(c) Large-scale discernment network element module: the module comprises large identification networks obtained by training, and the large identification networks respectively extract the characteristics of input pictures to obtain fea 1 ~fea n (i.e., feature 1 to feature n).
(d) A feature fusion calculation module: the module fuses the features extracted by the large-scale recognition network, and the fusion formula is as follows:
Figure BDA0003608074460000301
(e) a knowledge distillation loss function calculation module: a cosine similarity loss function is typically employed. The module is used for evaluating the characteristics F extracted by the small network unit module X And fusion feature F Y The similarity between them. The loss function is as follows:
L f =||F X -F Y || 2
(f) knowledge distillation loss function optimization module: the module trains and optimizes the whole network based on a gradient descending mode. Repeating (a) to (e) in the training until the training result meets the training termination condition. The condition for terminating the model training is that the iteration times meet the set value, or the loss calculated by the knowledge distillation loss function is less than the set value, so that the model training can be completed.
The network module deployment stage is mainly to combine and deploy the small recognition network obtained by training in the network module training stage and other modules to form a complete solution. Referring to fig. 8C, the image acquisition input module acquires a face image to be recognized, enters the recognition network module (including the small recognition network) to extract features, and then enters the feature comparison search module to be recognized. The small recognition network is obtained by knowledge migration of a large recognition network with stronger expression capacity, and the similarity of the extracted feature distribution and the feature distribution of the large recognition network is higher, so that the small recognition network has higher recognition accuracy.
In the embodiment, the face recognition accuracy of the small recognition network can be improved by adopting the method, so that the method is suitable for various complex application scenes.
It can be understood that the classification model training method and the data classification method of the present application can be applied to face recognition tasks, other image recognition and classification tasks, such as vehicle recognition tasks based on vehicle images, and other types of data recognition and classification tasks, such as text classification tasks, voice classification tasks, video classification tasks, and the like.
It should be understood that, although the steps in the flowcharts related to the embodiments as described above are sequentially displayed as indicated by arrows, the steps are not necessarily performed sequentially as indicated by the arrows. The steps are not limited to being performed in the exact order illustrated and, unless explicitly stated herein, may be performed in other orders. Moreover, at least a part of the steps in the flowcharts related to the embodiments described above may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the execution order of the steps or stages is not necessarily sequential, but may be rotated or alternated with other steps or at least a part of the steps or stages in other steps.
Based on the same inventive concept, the embodiment of the present application further provides a classification model training apparatus for implementing the above-mentioned classification model training method and a data classification apparatus for implementing the above-mentioned data classification method. The implementation scheme for solving the problem provided by the device is similar to the implementation scheme described in the above method, so the specific limitations in one or more embodiments of the classification model training device provided below can be referred to the limitations in the above classification model training method, and the specific limitations in one or more embodiments of the data classification device can be referred to the limitations in the above data classification method, which is not described herein again.
In one embodiment, as shown in fig. 9, there is provided a classification model training apparatus including: a training sample set obtaining module 902, a reference classification model determining module 904, a feature fusion module 906, and a third target classification model determining module 908, wherein:
a training sample set obtaining module 902, configured to obtain a training sample set; the training sample set comprises training samples and training labels corresponding to the training samples.
A reference classification model determining module 904, configured to determine at least two reference classification models from a first target classification model and a second target classification model set obtained through training based on a training sample set; in a second target classification model included in the second target classification model set, class center features corresponding to various training labels are obtained from the first target classification model, and the class center features are used for representing position information of the training labels in a feature space.
The feature fusion module 906 is configured to input the training sample set into each reference classification model to obtain reference sample features corresponding to each training sample, and fuse the reference sample features corresponding to the same training sample to obtain target sample features corresponding to each training sample.
The third target classification model determining module 908 is configured to input the training sample set into the third initial classification model, obtain training sample characteristics corresponding to each training sample, and adjust model parameters of the third initial classification model based on the target sample characteristics and the training sample characteristics corresponding to the same training sample until a target convergence condition is satisfied, so as to obtain a third target classification model.
In the classification model training device, the class center features are used to represent positions of the training labels in the feature space, each class center feature in the second target classification model is obtained from the first target classification model, so that the second target classification model and the first target classification model are aligned in the feature space, the feature space distribution of the second target classification model is consistent with that of the first target classification model, each reference classification model determined from the first target classification model and the second target classification model is also aligned in the feature space, complementary models are provided between each reference classification model aligned in the feature space, and complementarity exists between the extracted features. Furthermore, feature fusion is carried out by combining a plurality of reference classification models, a third initial classification model is trained by adopting fused target sample features, the third initial classification model can learn more accurate feature distribution of fusion features, the trained third target classification model can extract more accurate data features of input data, and more accurate classification results can be obtained based on more accurate data features, so that the classification accuracy of the models is improved, and the classification accuracy of the data is improved.
In one embodiment, the classification model training apparatus further includes:
and the first model training module is used for carrying out model training on the first initial classification model based on the training sample set to obtain a first target classification model.
The initial model generation module is used for acquiring target class central features respectively corresponding to various training labels from the first target classification model and generating at least one second initial classification model aligned with the feature space of the first target classification model; class center features respectively corresponding to various training labels in the second initial classification model are corresponding target class center features.
The second model training module is used for carrying out model training on at least one second initial classification model based on the training sample set to obtain at least one second target classification model, and each second target classification model forms a second target classification model set; the third initial classification model keeps the center features of various classes unchanged during model training.
In one embodiment, the first model training module is further configured to input training samples in the set of training samples into the first initial classification model; the model parameters of the first initial classification model comprise initial class center features corresponding to various training labels respectively; extracting current sample characteristics corresponding to the current training sample, and obtaining a first prediction label corresponding to the current training sample based on the current sample characteristics and each initial class center characteristic; and adjusting model parameters of the first initial classification model based on the first prediction label and the training label corresponding to the same training sample until a first convergence condition is met, so as to obtain a first target classification model.
In one embodiment, the second model training module is further configured to sample the training sample set to obtain a plurality of training sample subsets, determine a target sample subset from each of the training sample subsets, determine a candidate classification model from each of the third initial classification models, and use the candidate classification model as the current classification model; inputting target training samples in the target sample subset into the current classification model to obtain initial sample characteristics corresponding to each target training sample; calculating characteristic included angle information based on initial sample characteristics corresponding to the target training samples and each class center characteristic, and obtaining target loss based on the characteristic included angle information corresponding to each target training sample and an included angle compensation parameter corresponding to the candidate classification model; adjusting model parameters corresponding to the current classification model based on the target loss to obtain an intermediate classification model; and taking the next training sample subset as a target sample subset, taking the intermediate classification model as a current classification model, returning to the step of inputting the target training samples in the target sample subset into the current classification model to obtain initial sample characteristics corresponding to each target training sample, and executing until a second convergence condition is met to obtain a second target classification model corresponding to the candidate classification model.
In one embodiment, the second model training module is further configured to use a training label corresponding to a current target training sample as a target training label, obtain target included angle information corresponding to the current target training sample based on an initial sample feature corresponding to the current target training sample and a class center feature corresponding to the target training label, obtain reference included angle information corresponding to the current target training sample based on the initial sample feature corresponding to the current target training sample and the class center features corresponding to other training labels, and use the target included angle information and the reference included angle information corresponding to the current target training sample as feature included angle information corresponding to the current target training sample; respectively adjusting target included angle information and reference included angle information corresponding to the same target training sample based on included angle compensation parameters to obtain target characteristic information and reference characteristic information, and obtaining initial loss based on the target characteristic information and the reference characteristic information corresponding to the same target training sample; and obtaining target loss based on the initial loss corresponding to each target training sample.
In one embodiment, the reference classification model determination module is further configured to obtain at least two second target classification models from the second set of target classification models as the reference classification models. The reference classification model determination module is further configured to obtain at least one second target classification model from the first target classification model and the second target classification model set as a reference classification model.
In one embodiment, the model size of the first and second target classification models is larger than the model size of the third target classification model.
In one embodiment, when the first target classification model, the second target classification model and the third target classification model are face recognition models, the third target classification model is used for extracting image features corresponding to the input image, and the image features are used for identity recognition.
In one embodiment, as shown in fig. 10, there is provided a data sorting apparatus including: a data acquisition module 1002 and a classification result determination module 1004, wherein:
the data obtaining module 1002 is configured to obtain data to be classified, and input the data to be classified into the third target classification model to obtain target features corresponding to the data to be classified.
And a classification result determining module 1004 for determining a classification result corresponding to the data to be classified based on the target feature.
The training process of the third target classification model is as follows: acquiring a training sample set; the training sample set comprises training samples and training labels corresponding to the training samples; determining at least two reference classification models from a first target classification model and a second target classification model set obtained by training based on a training sample set; in a second target classification model included in a second target classification model set, class center features respectively corresponding to various training labels are obtained from the first target classification model, and the class center features are used for representing position information of the training labels in a feature space; inputting the training sample set into each reference classification model to obtain reference sample characteristics corresponding to each training sample, and fusing the reference sample characteristics corresponding to the same training sample to obtain target sample characteristics corresponding to each training sample; and inputting the training sample set into a third initial classification model to obtain training sample characteristics corresponding to each training sample, and adjusting model parameters of the third initial classification model based on the target sample characteristics and the training sample characteristics corresponding to the same training sample until a target convergence condition is met to obtain a third target classification model.
In the data classification device, the class center features are used for representing positions of the training labels in the feature space, each class center feature in the second target classification model is obtained from the first target classification model, therefore, the second target classification model and the first target classification model are aligned in the feature space, the feature space distribution of the second target classification model is consistent with that of the first target classification model, each reference classification model determined from the first target classification model and the second target classification model is also aligned in the feature space, each reference classification model aligned in the feature space is a complementary model, and there is complementarity between extracted features. Furthermore, feature fusion is carried out by combining a plurality of reference classification models, a third initial classification model is trained by adopting the fused target sample features, the third initial classification model can learn more accurate feature distribution of the fusion features, the trained third target classification model can extract more accurate data features of input data, and more accurate classification results can be obtained based on more accurate data features, so that the classification accuracy of the models is improved, and the classification accuracy of the data is improved.
In one embodiment, the classification result determining module is further configured to calculate similarities between the target features and the candidate features in the feature library, respectively; determining similar features from the candidate features based on the similarity; and obtaining a classification result based on the class labels corresponding to the similar features.
The modules in the classification model training device or the data classification device can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 11. The computer device includes a processor, a memory, an Input/Output interface (I/O for short), and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for training data such as a sample set, a first target classification model, a second target classification model and a third target classification model. The input/output interface of the computer device is used for exchanging information between the processor and an external device. The communication interface of the computer device is used for connecting and communicating with an external terminal through a network. The computer program is executed by a processor to implement a classification model training method or a data classification method.
In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 12. The computer apparatus includes a processor, a memory, an input/output interface, a communication interface, a display unit, and an input device. The processor, the memory and the input/output interface are connected by a system bus, and the communication interface, the display unit and the input device are connected by the input/output interface to the system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The input/output interface of the computer device is used for exchanging information between the processor and an external device. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a classification model training method or a data classification method. The display unit of the computer equipment is used for forming a visual picture, and can be a display screen, a projection device or a virtual reality imaging device, the display screen can be a liquid crystal display screen or an electronic ink display screen, the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on a shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.
It will be appreciated by those skilled in the art that the configurations shown in fig. 11 and 12 are block diagrams of only some of the configurations relevant to the present disclosure, and do not constitute a limitation on the computing devices to which the present disclosure may be applied, and that a particular computing device may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In an embodiment, a computer device is further provided, which includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the above method embodiments when executing the computer program.
In an embodiment, a computer-readable storage medium is provided, in which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.
In one embodiment, a computer program product or computer program is provided that includes computer instructions stored in a computer readable storage medium. The computer instructions are read by a processor of the computer device from a computer-readable storage medium, and the computer instructions are executed by the processor to cause the computer device to perform the steps of the above-described method embodiments.
It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, displayed data, etc.) referred to in the present application are information and data authorized by the user or fully authorized by each party, and the collection, use and processing of the related data need to comply with the relevant laws and regulations and standards of the relevant countries and regions.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by hardware instructions of a computer program, which may be stored in a non-volatile computer-readable storage medium, and when executed, may include the processes of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high-density embedded nonvolatile Memory, resistive Random Access Memory (ReRAM), Magnetic Random Access Memory (MRAM), Ferroelectric Random Access Memory (FRAM), Phase Change Memory (PCM), graphene Memory, and the like. Volatile Memory can include Random Access Memory (RAM), external cache Memory, and the like. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others. The databases involved in the embodiments provided herein may include at least one of relational and non-relational databases. The non-relational database may include, but is not limited to, a block chain based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic devices, quantum computing based data processing logic devices, etc., without limitation.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application shall be subject to the appended claims.

Claims (16)

1. A classification model training method, the method comprising:
acquiring a training sample set; the training sample set comprises training samples and training labels corresponding to the training samples;
determining at least two reference classification models from a first target classification model and a second target classification model set obtained by training based on the training sample set; in a second target classification model included in the second target classification model set, class center features respectively corresponding to various training labels are obtained from the first target classification model, and the class center features are used for representing position information of the training labels in a feature space;
inputting the training sample set into each reference classification model to obtain reference sample characteristics corresponding to each training sample, and fusing the reference sample characteristics corresponding to the same training sample to obtain target sample characteristics corresponding to each training sample;
and inputting the training sample set into a third initial classification model to obtain training sample characteristics corresponding to each training sample, and adjusting model parameters of the third initial classification model based on the target sample characteristics and the training sample characteristics corresponding to the same training sample until a target convergence condition is met to obtain a third target classification model.
2. The method of claim 1, wherein prior to determining at least two reference classification models from a first set of target classification models and a second set of target classification models trained based on the set of training samples, the method further comprises:
performing model training on a first initial classification model based on the training sample set to obtain the first target classification model;
acquiring target class central features respectively corresponding to various training labels from the first target classification model, and generating at least one second initial classification model aligned with the first target classification model feature space; class center features respectively corresponding to various training labels in the second initial classification model are corresponding target class center features;
performing model training on the at least one second initial classification model based on the training sample set to obtain at least one second target classification model, wherein each second target classification model forms the second target classification model set; the third initial classification model keeps the central features of all classes unchanged during model training.
3. The method of claim 2, wherein model training a first initial classification model based on the training sample set to obtain the first target classification model comprises:
inputting training samples in the set of training samples into the first initial classification model; the model parameters of the first initial classification model comprise initial class center features corresponding to various training labels respectively;
extracting current sample features corresponding to the current training samples, and obtaining first prediction labels corresponding to the current training samples based on the current sample features and each initial class center feature;
and adjusting the model parameters of the first initial classification model based on a first prediction label and a training label corresponding to the same training sample until a first convergence condition is met, so as to obtain the first target classification model.
4. The method according to claim 2, wherein the performing model training on the at least one second initial classification model respectively based on the training sample set to obtain at least one second target classification model comprises:
sampling the training sample set to obtain a plurality of training sample subsets, determining a target sample subset from each training sample subset, determining a candidate classification model from each third initial classification model, and taking the candidate classification model as a current classification model;
inputting target training samples in the target sample subset into the current classification model to obtain initial sample characteristics corresponding to each target training sample;
calculating characteristic included angle information based on initial sample characteristics corresponding to the target training samples and various class center characteristics, and obtaining target loss based on the characteristic included angle information corresponding to each target training sample and included angle compensation parameters corresponding to the candidate classification models;
adjusting model parameters corresponding to the current classification model based on the target loss to obtain an intermediate classification model;
and taking the next training sample subset as a target sample subset, taking the intermediate classification model as a current classification model, returning to the step of inputting the target training samples in the target sample subset into the current classification model to obtain initial sample characteristics corresponding to each target training sample, and executing until a second convergence condition is met to obtain a second target classification model corresponding to the candidate classification model.
5. The method according to claim 4, wherein the calculating of the feature angle information based on the initial sample features and the class center features corresponding to the target training samples and the obtaining of the target loss based on the feature angle information corresponding to each target training sample and the angle compensation parameters corresponding to the candidate classification models comprise:
taking a training label corresponding to a current target training sample as a target training label, obtaining target included angle information corresponding to the current target training sample based on an initial sample characteristic corresponding to the current target training sample and a class center characteristic corresponding to the target training label, obtaining reference included angle information corresponding to the current target training sample based on the initial sample characteristic corresponding to the current target training sample and the class center characteristics corresponding to other training labels, and taking the target included angle information and the reference included angle information corresponding to the current target training sample as feature included angle information corresponding to the current target training sample;
respectively adjusting target included angle information and reference included angle information corresponding to the same target training sample based on the included angle compensation parameters to obtain target characteristic information and reference characteristic information, and obtaining initial loss based on the target characteristic information and the reference characteristic information corresponding to the same target training sample;
and obtaining the target loss based on the initial loss corresponding to each target training sample.
6. The method of claim 1, wherein the determining of the reference classification model comprises any one of:
acquiring at least two second target classification models from the second target classification model set as reference classification models;
and acquiring at least one second target classification model in the first target classification model and the second target classification model set as a reference classification model.
7. The method according to claim 1, wherein the adjusting the model parameters of the third initial classification model based on the target sample features and the training sample features corresponding to the same training sample until a target convergence condition is satisfied to obtain a third target classification model comprises:
acquiring a target class central feature corresponding to the current training sample from the reference classification model, and acquiring a reference label corresponding to the current training sample based on the target sample feature and the target class central feature corresponding to the current training sample;
acquiring initial class center features corresponding to the current training sample from the third initial classification model, and acquiring a second prediction label corresponding to the current training sample based on the training sample features corresponding to the current training sample and the initial class center features;
and adjusting the model parameters of the third initial classification model based on the reference label and the second prediction label corresponding to the same training sample until a target convergence condition is met, so as to obtain the third target classification model.
8. The method of claim 1, wherein the model size of the first and second target classification models is larger than the model size of the third target classification model.
9. The method according to any one of claims 1 to 8, wherein when the first, second and third object classification models are face recognition models, the third object classification model is used to extract image features corresponding to an input image, and the image features are used for identity recognition.
10. A method of data classification, the method comprising:
acquiring data to be classified, and inputting the data to be classified into a third target classification model to obtain target characteristics corresponding to the data to be classified;
determining a classification result corresponding to the data to be classified based on the target features;
the training process of the third target classification model is as follows:
acquiring a training sample set; the training sample set comprises training samples and training labels corresponding to the training samples;
determining at least two reference classification models from a first target classification model and a second target classification model set obtained by training based on the training sample set; in a second target classification model included in the second target classification model set, class center features respectively corresponding to various training labels are obtained from the first target classification model, and the class center features are used for representing position information of the training labels in a feature space;
inputting the training sample set into each reference classification model to obtain reference sample characteristics corresponding to each training sample, and fusing the reference sample characteristics corresponding to the same training sample to obtain target sample characteristics corresponding to each training sample;
and inputting the training sample set into a third initial classification model to obtain training sample characteristics corresponding to each training sample, and adjusting model parameters of the third initial classification model based on the target sample characteristics and the training sample characteristics corresponding to the same training sample until a target convergence condition is met to obtain a third target classification model.
11. The method according to claim 10, wherein the determining the classification result corresponding to the data to be classified based on the target feature comprises:
calculating the similarity between the target features and each candidate feature in a feature library respectively;
determining similar features from the candidate features based on the similarity;
and obtaining the classification result based on the class label corresponding to the similar feature.
12. A classification model training apparatus, characterized in that the apparatus comprises:
the training sample set acquisition module is used for acquiring a training sample set; the training sample set comprises training samples and training labels corresponding to the training samples;
the reference classification model determining module is used for determining at least two reference classification models from a first target classification model and a second target classification model set which are obtained through training based on the training sample set; in a second target classification model included in the second target classification model set, class center features respectively corresponding to various training labels are obtained from the first target classification model, and the class center features are used for representing position information of the training labels in a feature space;
the characteristic fusion module is used for inputting the training sample set into each reference classification model to obtain reference sample characteristics corresponding to each training sample, and fusing the reference sample characteristics corresponding to the same training sample to obtain target sample characteristics corresponding to each training sample;
and the third target classification model determining module is used for inputting the training sample set into a third initial classification model to obtain training sample characteristics corresponding to each training sample, and adjusting the model parameters of the third initial classification model based on the target sample characteristics and the training sample characteristics corresponding to the same training sample until a target convergence condition is met to obtain a third target classification model.
13. An apparatus for classifying data, the apparatus comprising:
the data acquisition module is used for acquiring data to be classified and inputting the data to be classified into a third target classification model to obtain target characteristics corresponding to the data to be classified;
the classification result determining module is used for determining a classification result corresponding to the data to be classified based on the target characteristics;
the training process of the third target classification model is as follows:
acquiring a training sample set; the training sample set comprises training samples and training labels corresponding to the training samples;
determining at least two reference classification models from a first target classification model and a second target classification model set obtained by training based on the training sample set; in a second target classification model included in the second target classification model set, class center features respectively corresponding to various training labels are obtained from the first target classification model, and the class center features are used for representing position information of the training labels in a feature space;
inputting the training sample set into each reference classification model to obtain reference sample characteristics corresponding to each training sample, and fusing the reference sample characteristics corresponding to the same training sample to obtain target sample characteristics corresponding to each training sample;
and inputting the training sample set into a third initial classification model to obtain training sample characteristics corresponding to each training sample, and adjusting model parameters of the third initial classification model based on the target sample characteristics and the training sample characteristics corresponding to the same training sample until a target convergence condition is met to obtain a third target classification model.
14. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 9 or 10 to 11.
15. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 9 or 10 to 11.
16. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 9 or 10 to 11.
CN202210421701.7A 2022-04-21 2022-04-21 Classification model training method, data classification device and computer equipment Pending CN115130539A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210421701.7A CN115130539A (en) 2022-04-21 2022-04-21 Classification model training method, data classification device and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210421701.7A CN115130539A (en) 2022-04-21 2022-04-21 Classification model training method, data classification device and computer equipment

Publications (1)

Publication Number Publication Date
CN115130539A true CN115130539A (en) 2022-09-30

Family

ID=83377007

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210421701.7A Pending CN115130539A (en) 2022-04-21 2022-04-21 Classification model training method, data classification device and computer equipment

Country Status (1)

Country Link
CN (1) CN115130539A (en)

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108805185A (en) * 2018-05-29 2018-11-13 腾讯科技(深圳)有限公司 Training method, device, storage medium and the computer equipment of model
US20190318202A1 (en) * 2016-10-31 2019-10-17 Tencent Technology (Shenzhen) Company Limited Machine learning model training method and apparatus, server, and storage medium
CN111091166A (en) * 2020-03-25 2020-05-01 腾讯科技(深圳)有限公司 Image processing model training method, image processing device, and storage medium
CN112329916A (en) * 2020-10-27 2021-02-05 上海眼控科技股份有限公司 Model training method and device, computer equipment and storage medium
US20210133457A1 (en) * 2018-11-28 2021-05-06 Beijing Dajia Internet Information Technology Co., Ltd. Method, computer device, and storage medium for video action classification
US20210166119A1 (en) * 2019-12-03 2021-06-03 Fujitsu Limited Information processing apparatus and information processing method
WO2021139309A1 (en) * 2020-07-31 2021-07-15 平安科技(深圳)有限公司 Method, apparatus and device for training facial recognition model, and storage medium
CN113449700A (en) * 2021-08-30 2021-09-28 腾讯科技(深圳)有限公司 Training of video classification model, video classification method, device, equipment and medium
WO2021244521A1 (en) * 2020-06-04 2021-12-09 广州虎牙科技有限公司 Object classification model training method and apparatus, electronic device, and storage medium
US20210390428A1 (en) * 2020-06-11 2021-12-16 Beijing Baidu Netcom Science And Technology Co., Ltd. Method, apparatus, device and storage medium for training model
US20210390695A1 (en) * 2019-06-28 2021-12-16 Tencent Technology (Shenzhen) Company Limited Image classification method, apparatus, and device, storage medium, and medical electronic device
WO2021248859A1 (en) * 2020-06-11 2021-12-16 中国科学院深圳先进技术研究院 Video classification method and apparatus, and device, and computer readable storage medium
CN114330499A (en) * 2021-11-30 2022-04-12 腾讯科技(深圳)有限公司 Method, device, equipment, storage medium and program product for training classification model

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190318202A1 (en) * 2016-10-31 2019-10-17 Tencent Technology (Shenzhen) Company Limited Machine learning model training method and apparatus, server, and storage medium
CN108805185A (en) * 2018-05-29 2018-11-13 腾讯科技(深圳)有限公司 Training method, device, storage medium and the computer equipment of model
US20210133457A1 (en) * 2018-11-28 2021-05-06 Beijing Dajia Internet Information Technology Co., Ltd. Method, computer device, and storage medium for video action classification
US20210390695A1 (en) * 2019-06-28 2021-12-16 Tencent Technology (Shenzhen) Company Limited Image classification method, apparatus, and device, storage medium, and medical electronic device
US20210166119A1 (en) * 2019-12-03 2021-06-03 Fujitsu Limited Information processing apparatus and information processing method
CN111091166A (en) * 2020-03-25 2020-05-01 腾讯科技(深圳)有限公司 Image processing model training method, image processing device, and storage medium
WO2021244521A1 (en) * 2020-06-04 2021-12-09 广州虎牙科技有限公司 Object classification model training method and apparatus, electronic device, and storage medium
US20210390428A1 (en) * 2020-06-11 2021-12-16 Beijing Baidu Netcom Science And Technology Co., Ltd. Method, apparatus, device and storage medium for training model
WO2021248859A1 (en) * 2020-06-11 2021-12-16 中国科学院深圳先进技术研究院 Video classification method and apparatus, and device, and computer readable storage medium
WO2021139309A1 (en) * 2020-07-31 2021-07-15 平安科技(深圳)有限公司 Method, apparatus and device for training facial recognition model, and storage medium
CN112329916A (en) * 2020-10-27 2021-02-05 上海眼控科技股份有限公司 Model training method and device, computer equipment and storage medium
CN113449700A (en) * 2021-08-30 2021-09-28 腾讯科技(深圳)有限公司 Training of video classification model, video classification method, device, equipment and medium
CN114330499A (en) * 2021-11-30 2022-04-12 腾讯科技(深圳)有限公司 Method, device, equipment, storage medium and program product for training classification model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
欧阳瑞麒;雍杨;王兵学;: "卷积神经网络在飞机类型识别中的应用", 兵工自动化, no. 12, pages 81 - 85 *

Similar Documents

Publication Publication Date Title
CN111523621B (en) Image recognition method and device, computer equipment and storage medium
CN112418292B (en) Image quality evaluation method, device, computer equipment and storage medium
CN114332578A (en) Image anomaly detection model training method, image anomaly detection method and device
CN116580257A (en) Feature fusion model training and sample retrieval method and device and computer equipment
CN112819023A (en) Sample set acquisition method and device, computer equipment and storage medium
CN114283350B (en) Visual model training and video processing method, device, equipment and storage medium
CN114298122B (en) Data classification method, apparatus, device, storage medium and computer program product
CN113033507B (en) Scene recognition method and device, computer equipment and storage medium
CN113822315A (en) Attribute graph processing method and device, electronic equipment and readable storage medium
CN116664719A (en) Image redrawing model training method, image redrawing method and device
CN114358109A (en) Feature extraction model training method, feature extraction model training device, sample retrieval method, sample retrieval device and computer equipment
CN113641797A (en) Data processing method, device, equipment, storage medium and computer program product
CN112749737A (en) Image classification method and device, electronic equipment and storage medium
CN113254687B (en) Image retrieval and image quantification model training method, device and storage medium
CN114492601A (en) Resource classification model training method and device, electronic equipment and storage medium
CN114329065A (en) Processing method of video label prediction model, video label prediction method and device
CN115130539A (en) Classification model training method, data classification device and computer equipment
CN114328904A (en) Content processing method, content processing device, computer equipment and storage medium
CN113822291A (en) Image processing method, device, equipment and storage medium
CN116645700B (en) Feature extraction model processing method and device and feature extraction method and device
CN113269176B (en) Image processing model training method, image processing device and computer equipment
CN116895091B (en) Facial recognition method and device for incomplete image, chip and terminal
CN114936327B (en) Element recognition model acquisition method and device, computer equipment and storage medium
CN117938951B (en) Information pushing method, device, computer equipment and storage medium
CN117237856B (en) Image recognition method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination