WO2022100045A1 - Procédé d'entraînement pour modèle de classification, procédé et appareil de classification d'échantillon, et dispositif - Google Patents

Procédé d'entraînement pour modèle de classification, procédé et appareil de classification d'échantillon, et dispositif Download PDF

Info

Publication number
WO2022100045A1
WO2022100045A1 PCT/CN2021/094064 CN2021094064W WO2022100045A1 WO 2022100045 A1 WO2022100045 A1 WO 2022100045A1 CN 2021094064 W CN2021094064 W CN 2021094064W WO 2022100045 A1 WO2022100045 A1 WO 2022100045A1
Authority
WO
WIPO (PCT)
Prior art keywords
sample
samples
classification model
model
classification
Prior art date
Application number
PCT/CN2021/094064
Other languages
English (en)
Chinese (zh)
Inventor
何烩烩
王乐义
刘明浩
郭江亮
Original Assignee
北京百度网讯科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京百度网讯科技有限公司 filed Critical 北京百度网讯科技有限公司
Priority to US17/619,533 priority Critical patent/US20220383190A1/en
Priority to EP21819712.7A priority patent/EP4027268A4/fr
Publication of WO2022100045A1 publication Critical patent/WO2022100045A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • the present disclosure relates to the field of machine learning technologies, and in particular to active learning, neural network and natural language processing technologies. More specifically, the present disclosure provides a method and apparatus for training a classification model, a method and apparatus for classifying samples, an electronic device, and a storage medium.
  • machine learning models require a large amount of labeled data, which is also a problem often encountered in practical application scenarios of machine learning models.
  • the model requires a large number of labeled samples, but the cost of labeling is relatively high.
  • a method and device for training a classification model a method and device for classifying samples, an electronic device and a storage medium.
  • a method for training a classification model includes a first classification model and a second classification model, and the method includes: selecting from the original sample set according to the classification prediction results of multiple original samples in the original sample set A plurality of original samples whose category prediction results meet the preset conditions are used as samples to be labeled; the second classification model is used to label the categories of the samples to be labeled, and the first labeled sample set is obtained; the first labeled sample set is used to perform the first classification model. train.
  • a method for classifying samples including: obtaining samples to be classified; classifying the samples to be classified by using a classification model to obtain a classification result of the samples to be classified; wherein the classification model is trained according to the training method of the classification model .
  • a training device for a classification model includes a first classification model and a second classification model, and the device includes: a selection module for predicting results according to the classification of a plurality of original samples in the original sample set, A plurality of original samples whose category prediction results meet the preset conditions are selected from the original sample set as samples to be labeled; the labeling module is used to label the categories of the samples to be labeled using the second classification model to obtain the first labeled sample set; the training module , which is used to train the first classification model using the first labeled sample set.
  • a sample classification device comprising: an acquisition module for acquiring samples to be classified; a classification module for classifying samples to be classified by using a classification model to obtain a classification result of the samples to be classified; wherein, the classification The model is trained according to the training method of the classification model.
  • an electronic device comprising: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, and the instructions are processed by the at least one processor The processor executes to enable the at least one processor to execute the training method of the classification model.
  • a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform a training method for a classification model provided according to the present disclosure.
  • 1 is an exemplary system architecture of a training method and apparatus to which a classification model can be applied and a sample classification method and apparatus according to an embodiment of the present disclosure
  • FIG. 2 is a flowchart of a training method of a classification model according to an embodiment of the present disclosure
  • FIG. 3 is a flowchart of a method for classifying a plurality of samples in a sample set using a first classification model according to an embodiment of the present disclosure
  • FIG. 4 is a flowchart of a method for selecting a plurality of samples from a sample set whose category prediction results meet preset conditions as samples to be labeled according to an embodiment of the present disclosure
  • FIG. 5 is a flowchart of a method for training a first classification model using a first labeled sample set according to an embodiment of the present disclosure
  • FIG. 6 is a flowchart of a method for training a second classification model using a second labeled sample set according to an embodiment of the present disclosure
  • FIG. 7 is a schematic diagram of a training method of a second classification model according to an embodiment of the present disclosure.
  • FIG. 8 is a flowchart of a sample classification method according to an embodiment of the present disclosure.
  • FIG. 9 is a block diagram of a training apparatus for a classification model according to an embodiment of the present disclosure.
  • FIG. 10 is a block diagram of a sample classification apparatus according to an embodiment of the present disclosure.
  • FIG. 11 is a block diagram of an electronic device for a training method of a classification model and/or a sample classification method according to an embodiment of the present disclosure.
  • IM auditing is essential to improve the health rate of IM channels and reduce the risk of being closed due to improper usage of IM channels.
  • the common method of instant message review is to use the keyword matching method, and there are also a large number of manual reviews. Specifically, the instant messages are matched and filtered based on the keyword database and regular expressions, and the sample review is carried out by means of manual review.
  • the above methods rely heavily on the keyword library, and the labor cost is high.
  • the method based on keywords and manual review with the continuous development of the business, it is necessary to manually summarize and supplement the keyword database, and the workload is large.
  • the keyword matching method only utilizes the surface features of the vocabulary, and does not fully mine the semantic relationship and semantic relationship, which has great limitations.
  • the method of instant message auditing based on machine learning is to regard instant message auditing as a two-category task of text, and perform text auditing based on machine learning algorithm.
  • machine learning models require large amounts of labeled data. This is also a problem often encountered in practical application scenarios of machine learning models.
  • the model requires a large number of labeled samples, but the cost of labeling is relatively high. Sometimes, the newly added annotated samples do not significantly help the performance of the model. Therefore, the method of viewing instant message auditing as a binary classification task requires a large amount of manually labeled data.
  • the instant messaging service generates massive instant message logs every day, and the samples to be labeled randomly selected from them may not necessarily improve the model after spending a certain amount of labeling cost.
  • the model can actively select valuable samples for the current training iteration for experts to label, so that valuable samples for the current training iteration can be found.
  • this method still has the following shortcomings: (1) The newly selected samples to be labeled need to be labeled by experts, which means that additional labor costs need to be introduced. (2) In order to improve the efficiency of manual annotation, the automatic iterative model of the online annotation platform was developed, and the workload of the development of the online annotation platform was added. (3) Due to the limitation of manual annotation work efficiency, the number of newly added annotation samples is limited, and the performance of model iterative optimization is limited.
  • the present disclosure provides a method for training a classification model
  • the classification model includes a first classification model and a second classification model
  • the method includes: according to the process of classifying a plurality of samples in a sample set by using the first classification model The category prediction result of each sample is generated, and a plurality of samples whose category prediction results meet the preset conditions are selected from the sample set as the samples to be labeled; the second classification model is used to label the categories of the samples to be labeled, and the first labeled sample set is obtained. ; Use the first labeled sample set to train the first classification model.
  • FIG. 1 is an exemplary system architecture 100 to which a training method and apparatus of a classification model and a sample classification method and apparatus may be applied, according to one embodiment of the present disclosure. It should be noted that FIG. 1 is only an example of a system architecture to which the embodiments of the present disclosure can be applied, so as to help those skilled in the art to understand the technical content of the present disclosure, but it does not mean that the embodiments of the present disclosure cannot be used for other A device, system, environment or scene.
  • the system architecture 100 may include an instant message production system 110 , a database 120 , an automatic tagging system 130 , an online review system 140 and an active learning system 150 .
  • the instant message production system 110 may store the daily generated instant messages in the database 120 .
  • the automatic tagging system 130 and the online review system 140 obtain the instant message text from the database 120 for processing.
  • the automatic tagging system 130 may include an instant message moderation model capable of automatically tagging categories of instant message text, the model may include an instant messaging domain-specific language submodel trained to be sensitive to instant message terminology and A classification sub-model that classifies texts and automatically classifies them.
  • the categories of instant message texts can include pass or fail.
  • the instant message production system 110 can generate massive instant message logs every day, and the models in the automatic labeling system 130 need to be constantly supplemented with the latest labeling data, so that the models can be optimized with business iterations.
  • the model in the automatic tagging system 130 can be set to iterate once a month, then the automatic tagging system 130 can select the unmarked historical instant message samples of the previous month from the database 120 and input them into the language sub-model, based on The data incrementally trains the language sub-model to obtain the instant messaging domain-specific language sub-model. Before inputting the samples into the language sub-model, the samples can be reduced, and some samples with high similarity can be deleted, which can reduce the sample redundancy and speed up the model iteration.
  • a small number of manually labeled samples can be used to train the language sub-model and the classification sub-model together to obtain an instant message review model based on the language model.
  • the language sub-model is an instant message domain-specific model, it can identify the text semantics of instant messages, and the classification sub-model can classify instant messages based on instant message semantics, and output the category of instant messages as approved or not, so that instant message auditing can be realized.
  • model adjustment is performed based on the sum of the loss functions of the two sub-models.
  • the model finally trained can obtain excellent performance on the instant message auditing task, so that the automatic labeling system 130 can output instant message text with category labels, which may be referred to as weakly labelled text herein.
  • the automatic labeling system 130 has a high accuracy rate for labeling the instant message text category, and can replace the manual generation of a large amount of weakly labelled data, thereby reducing labeling costs.
  • Model distillation refers to a complex model (also called a teacher model, Teacher Model), training a simple model (also called a student model, Student Model), and letting the student model learn the predictive ability of the teacher model.
  • the instant message review model based on the language model is used as the teacher model, and then the convolutional neural network TextCNN can be selected as the online student model.
  • TextCNN is selected here because TextCNN has the characteristics of weight sharing, and its parameters are relatively small.
  • the teacher model interacts with the student model based on the data pool, so that the student model learns the prediction results of the teacher model.
  • the student model resides in the online review system 140 .
  • Model distillation is divided into two stages: one is to input a large number of unlabeled historical instant message logs into the teacher model, output prediction results, and use the prediction results as weakly labeled data. The second is to input the weakly labeled data together with the manually labeled data into the student model of the online review system 140, and perform incremental training on the student model.
  • the weakly labeled data obtained based on the teacher model is introduced, which increases the amount of training corpus data of the online model and enhances the generalization ability of the online student model.
  • the classification performance of the online student model is largely close to the teacher model, and the online model has a simple structure, and the model itself can meet the business requirements for prediction time.
  • the samples used for training the student model should be selected as much as possible with a high amount of information.
  • the active learning system 150 can use the active learning algorithm to actively select unlabeled samples with a large amount of information to be labeled by experts or the automatic labeling system 130 based on the probability that the samples generated by the student model in the process of text classification belong to each category, in order to expect that in training. In the case of a small set, a higher classification accuracy rate can also be obtained.
  • the active learning algorithm may include a maximum entropy strategy, a minimum confidence strategy, a minimum interval strategy, a classifier voting strategy, and the like.
  • the samples selected by the active learning system 150 through the active learning algorithm can be input to the automatic labeling system 130 for automatic labeling to generate weak label data, and the weak label data and the manual label data are input to the online review system 140 together , train the student model to obtain the trained instant message moderation model.
  • the online auditing system 140 can use the trained instant message auditing model to audit the newly added instant message text when conducting instant message auditing online, and quickly audit whether the instant message is passed.
  • the instant message auditing model in the online auditing system 140 also needs to be supplemented with the latest annotation data continuously, so that the model can be optimized with business iterations.
  • the instant message auditing model in the online auditing system 140 may be set to iterate once a day, and the online auditing system 140 may select the unmarked instant message samples of the previous day from the database 120 and send them to the online instant message auditing.
  • the model performs category labeling. In this process, the predicted probability of each instant message can be generated. Based on the predicted probability, an active learning algorithm can be used to select unpredictable instant message samples, and after the sample is reduced again, the final active learning selection is formed. The data to be labeled.
  • the data to be labelled selected by the active learning is sent to the instant message review model based on the language model in the automatic labeling system 130, and the prediction result of the model is regarded as the weak labeling result of the unlabeled data. Then, a large amount of new weakly labeled data and a small amount of existing manually labeled data are combined into the training corpus of the day, and the online instant message review model is incrementally trained.
  • FIG. 2 is a flowchart of a training method of a classification model according to an embodiment of the present disclosure.
  • the training method 200 of the classification model may include operations S210 to S230.
  • the classification model may be a model used for class prediction and classification of samples such as text and images.
  • the classification model may be a model for predicting and classifying instant message texts.
  • the classification model may include a first classification model and a second classification model, wherein the first classification model may be obtained by performing model distillation on the second classification model, and model distillation refers to the knowledge transfer learned from a complex and strong learning model.
  • model distillation refers to the knowledge transfer learned from a complex and strong learning model.
  • the complex model can be called the teacher model
  • the simple model can be called the student model.
  • the second classification model may be a combination of multiple neural networks. After being trained by different training tasks, classification training is performed on the same data set, and finally a complex model with a classification result is obtained.
  • the second classification model has excellent performance and high prediction accuracy, but the structure is too complex and the prediction time is too long. It is difficult to meet the requirements of the review time when it is used as an online instant message review model. Therefore, the present disclosure selects the first classification model as the online instant message review model.
  • the first classification model may be a TextCNN model, which has relatively small parameters and fast iteration speed. In practical applications, online review of instant messages can meet the review time requirements. .
  • the second classification model is used as the teacher model, the first classification model is used as the student model, and the second classification model is used to train the first classification model, so that the first classification model can learn the prediction ability of the second classification model.
  • a plurality of original samples whose category prediction results meet the preset conditions are selected from the original sample set as samples to be labeled.
  • the category prediction results of the multiple original samples are generated in the process of classifying the multiple original samples in the original sample set by using the first classification model.
  • the source of the sample may be regularly sampled every day in the log of the instant message production system.
  • the samples to be labeled with a large amount of information can be selected to be labeled by experts or the second classification model with high prediction accuracy, in order to expect a smaller training set In the case of , a higher classification accuracy can also be obtained.
  • a class prediction result of each sample is generated, and the class prediction result may be the predicted probability that the sample belongs to each class.
  • the first classification model performs binary classification on the samples, and can generate the probability that the sample belongs to category A and the probability that the sample belongs to category B. It can be understood that the greater the amount of information of the sample, the higher the uncertainty of the sample, and the more difficult it is to predict its category. Therefore, selecting a sample with a large amount of information can be a sample of a difficult-to-predict category.
  • a sample of a difficult-to-predict category can be a sample whose probability of belonging to category A and the probability of belonging to category B are both 0.5.
  • the samples to be labeled are labeled by experts or a second classification model with strong performance.
  • the categories of the samples to be labeled are labeled using the second classification model to obtain a first labeled sample set.
  • the second classification model since the second classification model has excellent performance and strong prediction ability, the second classification model is used to predict and classify the selected unpredictable categories of to-be-labeled data, and the obtained classification results have a high accuracy rate.
  • the second classification model for processing, it can be obtained that the probability of the sample belonging to category A is 0.8, and it belongs to category A. If the probability of B is 0.2, it can be determined that the sample belongs to category A, and then the category of the sample can be marked as A, and the sample with the category tag can be obtained.
  • Use the second classification model to perform class prediction and class labeling on each hard-to-predict sample to obtain a first labeled sample set with class labels.
  • the first classification model is trained using the first labeled sample set.
  • the sample X is input into the first classification model, the first classification model outputs the class prediction result of the sample X, and the sample X output according to the first classification model
  • the error between the category prediction result of the second classification model and the category label of the sample X marked by the second classification model determine the loss function of the first classification model, adjust the model parameters of the first classification model according to the loss function, and obtain the updated first classification model,
  • the above training process is repeated using the updated first classification model until the loss function converges, and the trained first classification model is obtained.
  • the first classification model is trained using the first labeled sample set processed by the second classification model, so that the first classification model learns the prediction ability of the second classification model, and the classification performance of the first classification model is constantly approaching the second classification model.
  • the model can improve the accuracy of the classification result of the first classification model.
  • the first labeling sample set obtained based on the second classification model is introduced, which can increase the amount of training corpus data of the first classification model and enhance the generalization of the first classification model. ability.
  • the classification performance of the first classification model is largely close to that of the second classification model, and the structure of the first classification model is simple, and the model itself can meet the business requirements for prediction time.
  • the first classification model may also be trained by using the manually labeled sample set and the first labeled sample set, and the first classification model can be incrementally trained.
  • adding manually annotated instant message review corpus repeatedly can enhance the stability of the online model in the iterative process. This is because during the daily incremental iteration of the online instant message review model, only the latest corpus labeled by the second classification model is introduced. The uncertainty of the model increases gradually, and the uncertainty of the model prediction performance also increases. Therefore, using the manually labeled sample set to train the first classification model together with the first labeled sample set can enhance the stability of the online model in the iterative process.
  • a plurality of samples whose category prediction results meet the preset conditions are selected from the sample set
  • use the second classification model to label the categories of the samples to be labeled to obtain a first labeled sample set
  • use the first labeled sample set to train the first classification model. Since the labeled samples obtained by the second classification model are used to train the first classification model, it is possible to reduce manual labeling data and reduce labeling costs.
  • training the first classification model using the sample set marked by the second classification model can make the prediction capability of the first classification model approach that of the second classification model and improve the classification accuracy of the first classification model. Furthermore, during the iteration process of the first classification model, samples with a large amount of information are actively selected to mark the second classification model, so that a higher classification accuracy rate can be obtained even when the training set is small.
  • FIG. 3 is a flowchart of a method for classifying a plurality of samples in a sample set using a first classification model according to one embodiment of the present disclosure.
  • the method may include operations S311 to S312.
  • the first classification model is used to calculate the probability that each original sample belongs to each category in the original sample set, as a category prediction result of each original sample.
  • the predicted probability of each class of each sample is generated.
  • the first classification model performs category prediction on the samples, and can generate the probability that the sample belongs to category A and/or the probability that the sample belongs to category B.
  • the class of the sample is then determined to be A or B based on the probability that the sample belongs to class A and/or the probability that it belongs to class B.
  • the first classification model predicts the category of sample Y, and the probability that sample Y belongs to category A is 0.9, and the probability of belonging to category B is 0.1, then it can be determined that the category of sample Y is A, and the category of sample Y can be marked as A.
  • category A may be passed, and category B may be rejected. If sample Y passes the first classification model and is marked as A, then sample Y is approved.
  • FIG. 4 is a flowchart of a method for selecting, from a sample set, multiple samples whose category prediction results meet preset conditions as samples to be labeled, according to an embodiment of the present disclosure.
  • the method may include operations S411 to S413.
  • an uncertainty of each raw sample is calculated based on the probability that each raw sample belongs to each of the plurality of classes.
  • an active learning strategy can be used to actively select unlabeled samples with a large amount of information to label the second classification model. Since the greater the amount of information of the sample, the higher the uncertainty of the classification model, therefore, the valuable samples can be selected by calculating the uncertainty of the sample.
  • the uncertainty of the sample can adopt the maximum information entropy strategy and the minimum confidence Active learning strategies such as degree strategy, minimum interval strategy, and classifier voting strategy are used to calculate.
  • a class prediction result of each sample is generated, and the class prediction result may be the predicted probability that the sample belongs to each class. Based on the predicted probability that the sample belongs to each category, at least one of the maximum information entropy, the minimum confidence, the minimum interval between different categories, and the voting result of the classifier can be calculated to measure the uncertainty of the sample.
  • the confidence level of a sample refers to the degree of certainty that the sample belongs to a certain category. For example, the probability that the sample belongs to category A and category B is (0.55, 0.45), the probability that the sample belongs to category A is large, but the degree of certainty that the sample belongs to category A It is only 0.55. Therefore, the minimum confidence level focuses on the samples with the highest probability of belonging to a certain category, but the confidence level is small, and such samples are difficult to distinguish. The smaller the interval between the samples belonging to different categories, the easier it is for the samples to be classified into two categories, and the uncertainty is higher.
  • the interval between category B is small, and it is easy to be judged as two categories.
  • the classification model can include multiple classifiers, and the sample is submitted to multiple classifiers at the same time. Based on the classifier voting, the voting results of the classifiers are the most inconsistent, and the uncertainty of the sample is higher.
  • each original sample is scored according to the uncertainty of each original sample.
  • an original sample whose score is greater than a preset threshold is selected from the original sample set as a sample to be labeled.
  • the uncertainty of the sample is calculated for each strategy, and the uncertainty of the sample can be scored.
  • a plurality of intervals with increasing uncertainty may be divided according to the size of the uncertainty, and each interval corresponds to a score. The higher the score.
  • the sample score is 1, when the uncertainty of the sample belongs to interval 2, the sample score is 2, and so on, each sample is scored.
  • different uncertainty intervals can be set for different strategies, and after scoring each strategy, the scores under each strategy can be weighted to obtain a final score. Then, samples with a score higher than a certain value (eg, a score higher than 3) can be selected as samples to be labeled.
  • the samples before selecting the samples to be labeled, and after selecting the samples to be labeled, and before sending the samples to be labeled to the second classification model for labeling, the samples can be further screened based on the representativeness of the samples.
  • the samples in the sample set can be reduced based on the similarity between the samples, that is, some samples with high similarity are deleted, so that the selected samples have a certain representativeness.
  • a strategy of deleting multiple samples with a similarity higher than a certain threshold and retaining only one of them may be adopted. It is also possible to set the number of samples, and delete the samples with the highest similarity until the preset number is reached. For example, set to retain 10,000 samples and delete from 15,000 until there are 10,000 remaining samples. Reducing the sample can reduce the sample redundancy and improve the model iteration speed.
  • FIG. 5 is a flowchart of a method for training a first classification model using a first labeled sample set according to an embodiment of the present disclosure.
  • the method may include operations S531 to S534.
  • a first classification model is used to classify a plurality of first samples, and a class of any first sample is determined.
  • a loss function of the first classification model is determined based on the determined class and the class label of the first sample.
  • operation S533 it is judged whether the loss function of the first classification model has converged, if not, the operation S534 is performed, and if it has converged, the training is ended.
  • the model parameters of the first classification model are adjusted to obtain an updated first classification model, so that the above-mentioned training process is repeated using the updated first classification model until the The loss function converges.
  • the sample X is input into the first classification model, the first classification model outputs the class prediction result of the sample X, and the sample X output according to the first classification model
  • the error between the category prediction result of the second classification model and the category label of the sample X marked by the second classification model determine the loss function of the first classification model, adjust the model parameters of the first classification model according to the loss function, and obtain the updated first classification model,
  • the above training process is repeated using the updated first classification model until the loss function converges, and the trained first classification model is obtained.
  • FIG. 6 is a flowchart of a method for training a second classification model using a second set of labeled samples according to one embodiment of the present disclosure.
  • the second labeled sample set may be a manually labeled sample set, and each sample in the sample set may include manually labeled category information.
  • the method may include operations S611 to S616.
  • semantic analysis is performed on the plurality of second samples by using the language sub-model to obtain a semantic vector of each second sample and a semantic relationship between the plurality of second samples.
  • the second classification model includes a language sub-model and a classification sub-model, for the samples in the second labeled sample set, the samples are input to the language sub-model, and the language sub-model can output a semantic vector of each sample, and Semantic order between each sample. For example, for sample 1 and sample 2, using the language sub-model can output the semantic vector of sample 1, the semantic vector of sample 2, and whether the context of sample 1 is sample 2.
  • an associated second sample of any second sample is determined according to the semantic relationship among the plurality of second samples.
  • the vector representation of sample 1 is output, and the next sentence of sample 1 is output as sample 2, then the vector representation of sample 1 and the vector representation of sample 2 are Input into the classification submodel for processing.
  • the classification sub-model is used to determine the category of any second sample according to the semantic vector of any second sample and the semantic vector associated with the second sample.
  • the sample 1 and the associated sample 2 of the sample 1 are input into the second classification model, and the weight of the sample 1 may be higher than that of the sample 2. This is because some A sample may not have explicit semantics, and the semantics of the sample can be determined from the semantics of the sample's context sample.
  • the second classification model can output the category prediction result of sample 1 based on the semantics of sample 1 and sample 2.
  • the loss function of the second classification model includes the loss of the language sub-model and the loss of the classification sub-model.
  • the loss of the language sub-model includes the loss between the sample vector output by the language sub-model and the actual sample vector, and also includes the context relationship between the sample 1 and sample 2 output by the language sub-model and the actual relationship between sample 1 and sample 2. loss.
  • the loss of the classification sub-model is determined according to the error between the class prediction result of sample 1 output by the second classification model and the manually labeled class label of sample 1. The sum of the loss of the language submodel and the loss of the classification submodel can be used as the loss of the second classification model.
  • operation S615 it is judged whether the loss function of the second classification model has converged, and if not, the operation S616 is performed, and if it has converged, the training is ended.
  • the model parameters of the language sub-model and the classification word model are adjusted based on the loss function of the second classification model to obtain an updated second classification model, so that the above-mentioned training process is repeated using the updated second classification model until the second The loss function of the classification model converges, resulting in a trained second classification model.
  • the model parameters of the second classification model are adjusted according to the loss function to obtain an updated second classification model, and the above training process is repeated using the updated second classification model , until the loss function converges, and the trained second classification model is obtained.
  • FIG. 7 is a schematic diagram of a training method of a second classification model according to an embodiment of the present disclosure.
  • the second classification model includes a language sub-model 701 and a classification sub-model 702 .
  • the language sub-model 701 can be an ERNIE (Enhanced Representation from kNowledge IntEgration, knowledge-enhanced semantic representation) model, or a Bert_base model, a Bert_large model, and an ERNIE-tiny model, where ERNIE-tiny is a pre-training model distilled based on ERNIE, It has shorter training time and faster prediction speed.
  • the ERNIE model is based on a large amount of unsupervised text corpus, and the trained model contains a large amount of semantic knowledge, and ERNIE can learn the language characteristics of the training corpus.
  • ERNIE model After introducing the ERNIE model, input a large amount of unlabeled instant message text into the ERNIE language model, and use the Masked Language Model and the Next Sentence Prediction as the training tasks to perform incremental training on the model.
  • Instant messaging domain-specific language submodels that are sensitive to instant messaging.
  • the cloze task refers to covering or replacing any word or word in a sentence, and then letting the model predict the covered or replaced part.
  • the word "today” in sentence 1 "I work overtime today" is masked, and then input to the model to predict and restore the masked part.
  • the next sentence prediction is to input two sentences (such as sentence 1 and sentence 2) into the model. When these two sentences are input, they can represent [sentence 1] step [sentence 2], and let the model predict whether the next sentence of sentence 1 is sentence 2. , the model can output yes or no, and calculate the model loss according to the real context relationship between sentence 1 and sentence 2.
  • the [I mask overtime] step [today's late mask k] can be input into the model, and the sum of the losses of the two tasks is considered when calculating the loss, for example, the model output Determine the loss function for the target of "today”, "go home” and "yes”, adjust the model parameters and continue the training, so that the trained language sub-model can output the semantic representation and semantic relationship of the sentence.
  • a classification sub-model 702 is added to the output end of the language sub-model ERNIE, that is, a two-layer fully-connected network can be spliced on the output end of the language sub-model as a classification network. Input a small amount of manually annotated instant message audit data into the network, combine the language sub-model loss function and the classification sub-model loss function, continue to iterate the language sub-model, and obtain an instant message audit model based on language model fine-tuning.
  • the classification sub-model 702 can output the instant message classification as pass or fail.
  • FIG. 8 is a flowchart of a sample classification method according to one embodiment of the present disclosure.
  • the sample classification method 800 may include operations and may include operations S810 to S820.
  • the samples to be classified are classified by using the classification model to obtain a classification result of the samples to be classified.
  • the training method of the classification model above is used for training to obtain a trained classification model, and inputting the samples to be classified into the trained classification model can output the category of the samples.
  • the sample can be instant message text
  • the instant message text is input into the classification model
  • the classification model can output the label of instant message passing or instant message failing, which can quickly conduct instant message review and save labor costs.
  • FIG. 9 is a block diagram of a training apparatus of a classification model according to an embodiment of the present disclosure.
  • the training device 900 of the classification model may include a selection module 901 , a labeling module 902 and a first training module 903 .
  • the selection module 901 is configured to select, from the original sample set, a plurality of original samples whose class prediction results meet the preset conditions according to the class prediction results of the plurality of original samples in the original sample set as the samples to be labeled, wherein the class prediction of the plurality of original samples is The results are generated during the process of classifying a plurality of original samples in the original sample set using the first classification model.
  • the labeling module 902 is configured to use the second classification model to label the categories of the samples to be labelled to obtain a first labelled sample set.
  • the first training module 903 is configured to use the first labeled sample set to train the first classification model.
  • the training apparatus 900 of the classification model further includes a calculation module and a classification module.
  • the calculation module is used for calculating the probability of each original sample belonging to each category in the original sample set by using the first classification model for a plurality of preset categories, as the category prediction result of each original sample.
  • the classification module is used for determining the category of each original sample and marking the category of each original sample according to the probability that each original sample belongs to each category in the original sample set.
  • the selection module 901 includes a calculation unit, a scoring unit, and a selection unit.
  • the calculation unit is configured to calculate the uncertainty of each original sample based on the probability that each original sample belongs to each of the plurality of categories.
  • the scoring unit is used to score each raw sample according to the uncertainty of each raw sample.
  • the selection unit is used to select the original samples whose scores are greater than the preset threshold from the original sample set as the samples to be labeled.
  • the calculation unit is specifically configured to calculate the maximum information entropy, the minimum confidence level, the minimum interval belonging to different categories, and the classifier vote of each sample based on the probability that each sample belongs to each of the multiple categories at least one of the results.
  • the first labeled sample set includes a plurality of first samples with category labels
  • the first training module includes a first prediction unit, a first determination unit, a first judgment unit, and a first adjustment unit.
  • the first prediction unit is configured to use the first classification model to classify a plurality of first samples, and determine the category of any first sample.
  • the first determining unit is configured to, for any first sample in the plurality of first samples, determine the loss function of the first classification model based on the determined class and the class label of the first sample.
  • the first judgment unit is used for judging whether the loss function of the first classification model converges.
  • the first adjustment unit is configured to adjust the model parameters of the first classification model to obtain an updated first classification model when the first judgment unit judges that the loss function does not converge.
  • the training apparatus 900 of the classification model further includes a second training module.
  • the second training module is used for training the second classification model using the second labeled sample set.
  • the second labeled sample set includes a plurality of second samples with category labels
  • the second classification model includes a language sub-model and a classification sub-model.
  • the second training module includes a first processing unit, a second determination unit, a second prediction unit, a third determination unit, a second judgment unit and a second adjustment unit.
  • the first processing unit is configured to use the language sub-model to perform semantic analysis on the plurality of second samples to obtain a semantic vector of each second sample and a semantic relationship between the plurality of second samples.
  • the second determining unit is configured to determine the associated second sample of any second sample according to the semantic relationship among the plurality of second samples.
  • the second prediction unit is configured to use the classification sub-model to determine the category of any second sample according to the semantic vector of any second sample and the semantic vector associated with the second sample.
  • the third determining unit is configured to, for any second sample in the plurality of second samples, determine any second sample based on the semantic vector of any second sample, the semantic relationship between any second sample and the associated second sample, and the determined arbitrary second sample.
  • the category of a second sample and the category label of any second sample determine the loss function of the second classification model.
  • the second judgment unit is used for judging whether the loss function of the second classification model converges or not, if not.
  • the second adjustment unit is configured to adjust the model parameters of the second classification model based on the loss function of the second classification model when the second judgment unit judges that the loss function of the second classification model does not converge to obtain an updated second classification model. classification model.
  • the second training module is configured to train the first classification model using the first labeled sample set and the second labeled sample set.
  • the training apparatus 900 of the classification model further includes a processing module.
  • the processing module is used to determine the similarity between the multiple samples to be marked before the labeling module 902 marks the category of the sample to be marked, and delete a part of the multiple samples to be marked based on the similarity between the multiple samples to be marked .
  • FIG. 10 is a block diagram of a sample classification apparatus according to an embodiment of the present disclosure.
  • the sample classification apparatus 1000 may include an acquisition module 1001 and a classification module 1002 .
  • the obtaining module 1001 is used for obtaining samples to be classified.
  • the classification module 1002 is configured to use the classification model to classify the samples to be classified, and obtain the classification results of the samples to be classified.
  • the acquisition, storage and application of the user's personal information involved are all in compliance with the relevant laws and regulations, and do not violate public order and good customs.
  • the present disclosure also provides an electronic device and a readable storage medium.
  • FIG. 11 it is a block diagram of an electronic device of a method for training a classification model according to an embodiment of the present disclosure.
  • Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers.
  • Electronic devices may also represent various forms of mobile devices, such as personal digital processors, cellular phones, smart phones, wearable devices, and other similar computing devices.
  • the components shown herein, their connections and relationships, and their functions are by way of example only, and are not intended to limit implementations of the disclosure described and/or claimed herein.
  • the electronic device 1100 includes: one or more processors 1101, a memory 1102, and interfaces for connecting various components, including a high-speed interface and a low-speed interface.
  • the various components are interconnected using different buses and may be mounted on a common motherboard or otherwise as desired.
  • the processor may process instructions executed within the electronic device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface.
  • multiple processors and/or multiple buses may be used with multiple memories and multiple memories, if desired.
  • multiple electronic devices may be connected, each providing some of the necessary operations (eg, as a server array, a group of blade servers, or a multiprocessor system).
  • a processor 1101 is used as an example.
  • the memory 1102 is the non-transitory computer-readable storage medium provided by the present disclosure.
  • the memory stores instructions executable by at least one processor, so that the at least one processor executes the training method of the classification model provided by the present disclosure.
  • the non-transitory computer-readable storage medium of the present disclosure stores computer instructions for causing the computer to execute the training method of the classification model provided by the present disclosure.
  • the memory 1102 can be used to store non-transitory software programs, non-transitory computer-executable programs and modules, such as program instructions/modules ( For example, the selection module 901, the labeling module 902 and the first training module 903 shown in FIG. 9).
  • the processor 1101 executes various functional applications and data processing of the server by running the non-transitory software programs, instructions and modules stored in the memory 1102, ie, implements the training method of the classification model in the above method embodiments.
  • the memory 1102 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store a file created by the use of the electronic device 1100 according to the training method of the classification model data etc. Additionally, memory 1102 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 1102 may optionally include memory located remotely relative to the processor 1101, and these remote memories may be connected to the electronic device 1100 of the training method of the classification model through a network. Examples of such networks include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.
  • the electronic device 1100 for the training method of the classification model may further include: an input device 1103 and an output device 1104 .
  • the processor 1101 , the memory 1102 , the input device 1103 and the output device 1104 may be connected by a bus or in other ways, and the connection by a bus is taken as an example in FIG. 11 .
  • the input device 1103 may receive input numerical or character information, and generate key signal input related to user settings and function control of the electronic device 1100 for the training method of the classification model, such as a touch screen, a keypad, a mouse, a trackpad, a touchpad, An input device such as a pointing stick, one or more mouse buttons, trackball, joystick, etc.
  • Output devices 1104 may include display devices, auxiliary lighting devices (eg, LEDs), haptic feedback devices (eg, vibration motors), and the like.
  • the display device may include, but is not limited to, a liquid crystal display (LCD), a light emitting diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.
  • Various implementations of the systems and techniques described herein can be implemented in digital electronic circuitry, integrated circuit systems, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include being implemented in one or more computer programs executable and/or interpretable on a programmable system including at least one programmable processor that The processor, which may be a special purpose or general-purpose programmable processor, may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device an output device.
  • the processor which may be a special purpose or general-purpose programmable processor, may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device an output device.
  • machine-readable medium and “computer-readable medium” refer to any computer program product, apparatus, and/or apparatus for providing machine instructions and/or data to a programmable processor ( For example, magnetic disks, optical disks, memories, programmable logic devices (PLDs), including machine-readable media that receive machine instructions as machine-readable signals.
  • machine-readable signal refers to any signal used to provide machine instructions and/or data to a programmable processor.
  • the systems and techniques described herein may be implemented on a computer having a display device (eg, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user ); and a keyboard and pointing device (eg, a mouse or trackball) through which a user can provide input to the computer.
  • a display device eg, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
  • a keyboard and pointing device eg, a mouse or trackball
  • Other kinds of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (eg, visual feedback, auditory feedback, or tactile feedback); and can be in any form (including acoustic input, voice input, or tactile input) to receive input from the user.
  • the systems and techniques described herein may be implemented on a computing system that includes back-end components (eg, as a data server), or a computing system that includes middleware components (eg, an application server), or a computing system that includes front-end components (eg, a user's computer having a graphical user interface or web browser through which a user may interact with implementations of the systems and techniques described herein), or including such backend components, middleware components, Or any combination of front-end components in a computing system.
  • the components of the system may be interconnected by any form or medium of digital data communication (eg, a communication network). Examples of communication networks include: Local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
  • a computer system can include clients and servers.
  • Clients and servers are generally remote from each other and usually interact through a communication network.
  • the relationship of client and server arises by computer programs running on the respective computers and having a client-server relationship to each other.
  • the category prediction results of which meet the preset conditions are selected from the sample set.
  • Each sample is used as a sample to be labeled, and the category of the sample to be labeled is labeled using the second classification model to obtain a first labeled sample set, and the first labeled sample set is used to train the first classification model. Since the labeled samples obtained by the second classification model are used to train the first classification model, it is possible to reduce manual labeling data and reduce labeling costs.
  • training the first classification model by using the sample set marked by the second classification model can make the prediction capability of the first classification model approach that of the second classification model and improve the classification accuracy of the first classification model. Furthermore, during the iteration of the first classification model, samples with a large amount of information are actively selected to label the second classification model, so that a higher classification accuracy rate can be obtained even when the training set is small.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

La présente divulgation concerne les technologies de l'apprentissage actif, des réseaux neuronaux et du traitement de langage naturel et divulgue un procédé d'entraînement pour un modèle de classification. Un schéma de mise en œuvre spécifique est le suivant : selon les résultats de prédiction de catégorie de multiples échantillons d'origine dans un ensemble d'échantillons d'origine, puis à partir de l'ensemble d'échantillons d'origine, sélectionner de multiples échantillons d'origine dont les résultats de prédiction de catégorie satisfont des conditions préétablies en tant qu'échantillons à étiqueter ; étiqueter les catégories des échantillons à étiqueter à l'aide d'un second modèle de classification pour obtenir un premier ensemble d'échantillons étiquetés ; et entraîner un premier modèle de classification à l'aide du premier ensemble d'échantillons étiquetés. La présente divulgation concerne également un appareil d'entraînement pour le modèle de classification, un procédé et un appareil de classification d'échantillon, un dispositif électronique et un support de stockage.
PCT/CN2021/094064 2020-11-13 2021-05-17 Procédé d'entraînement pour modèle de classification, procédé et appareil de classification d'échantillon, et dispositif WO2022100045A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US17/619,533 US20220383190A1 (en) 2020-11-13 2021-05-17 Method of training classification model, method of classifying sample, and device
EP21819712.7A EP4027268A4 (fr) 2020-11-13 2021-05-17 Procédé d'entraînement pour modèle de classification, procédé et appareil de classification d'échantillon, et dispositif

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011274936.5 2020-11-13
CN202011274936.5A CN112270379B (zh) 2020-11-13 2020-11-13 分类模型的训练方法、样本分类方法、装置和设备

Publications (1)

Publication Number Publication Date
WO2022100045A1 true WO2022100045A1 (fr) 2022-05-19

Family

ID=74339086

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/094064 WO2022100045A1 (fr) 2020-11-13 2021-05-17 Procédé d'entraînement pour modèle de classification, procédé et appareil de classification d'échantillon, et dispositif

Country Status (4)

Country Link
US (1) US20220383190A1 (fr)
EP (1) EP4027268A4 (fr)
CN (1) CN112270379B (fr)
WO (1) WO2022100045A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116258390A (zh) * 2022-12-22 2023-06-13 华中师范大学 一种面向教师在线教学反馈的认知支持质量评价方法及系统

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112270379B (zh) * 2020-11-13 2023-09-19 北京百度网讯科技有限公司 分类模型的训练方法、样本分类方法、装置和设备
CN112819076B (zh) * 2021-02-03 2022-06-17 中南大学 基于深度迁移学习的医学图像分类模型的训练方法及装置
CN112559602B (zh) * 2021-02-21 2021-07-13 北京工业大数据创新中心有限公司 一种工业设备征兆的目标样本的确定方法及系统
CN112884055B (zh) * 2021-03-03 2023-02-03 歌尔股份有限公司 一种目标标注方法和一种目标标注装置
CN112784818B (zh) * 2021-03-03 2023-03-14 电子科技大学 基于分组式主动学习在光学遥感图像上的识别方法
CN112989051B (zh) * 2021-04-13 2021-09-10 北京世纪好未来教育科技有限公司 文本分类的方法、装置、设备和计算机可读存储介质
CN113178189B (zh) * 2021-04-27 2023-10-27 科大讯飞股份有限公司 一种信息分类方法及装置、信息分类模型训练方法及装置
CN113204614B (zh) * 2021-04-29 2023-10-17 北京百度网讯科技有限公司 模型训练方法、优化训练数据集的方法及其装置
CN113011534B (zh) * 2021-04-30 2024-03-29 平安科技(深圳)有限公司 分类器训练方法、装置、电子设备和存储介质
CN113205189B (zh) * 2021-05-12 2024-02-27 北京百度网讯科技有限公司 训练预测模型的方法、预测方法及装置
CN113313314A (zh) * 2021-06-11 2021-08-27 北京沃东天骏信息技术有限公司 模型训练方法、装置、设备及存储介质
CN113420533B (zh) * 2021-07-09 2023-12-29 中铁七局集团有限公司 信息提取模型的训练方法、装置及电子设备
CN113516185B (zh) * 2021-07-09 2023-10-31 北京百度网讯科技有限公司 模型训练的方法、装置、电子设备及存储介质
US11450225B1 (en) * 2021-10-14 2022-09-20 Quizlet, Inc. Machine grading of short answers with explanations
CN115033689B (zh) * 2022-05-27 2023-04-18 重庆邮电大学 一种基于小样本文本分类原型网络欧氏距离计算方法
CN115580841B (zh) * 2022-12-05 2023-03-28 安徽创瑞信息技术有限公司 一种降低短信发送延迟的方法

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108664893A (zh) * 2018-04-03 2018-10-16 福州海景科技开发有限公司 一种人脸检测方法及存储介质
CN110472681A (zh) * 2019-08-09 2019-11-19 北京市商汤科技开发有限公司 基于知识蒸馏的神经网络训练方案和图像处理方案
CN111554268A (zh) * 2020-07-13 2020-08-18 腾讯科技(深圳)有限公司 基于语言模型的语言识别方法、文本分类方法和装置
CN111754985A (zh) * 2020-07-06 2020-10-09 上海依图信息技术有限公司 一种语音识别模型的训练以及语音识别的方法和装置
CN111858943A (zh) * 2020-07-30 2020-10-30 杭州网易云音乐科技有限公司 音乐情感识别方法及装置、存储介质和电子设备
EP3736749A1 (fr) * 2019-05-09 2020-11-11 Siemens Aktiengesellschaft Procédé et dispositif de commande d'un appareil à l'aide d'un ensemble de données
CN112270379A (zh) * 2020-11-13 2021-01-26 北京百度网讯科技有限公司 分类模型的训练方法、样本分类方法、装置和设备

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8818932B2 (en) * 2011-02-14 2014-08-26 Decisive Analytics Corporation Method and apparatus for creating a predictive model
US20160140409A1 (en) * 2014-11-13 2016-05-19 Alcatel Lucent Text classification based on joint complexity and compressed sensing
CN108292205A (zh) * 2015-09-23 2018-07-17 太平洋资产评估公司 用于根据数学问题自动提炼概念以及根据多个数学概念对数学问题进行动态构建和测试的系统和方法
CN107403430B (zh) * 2017-06-15 2020-08-07 中山大学 一种rgbd图像语义分割方法
US10942977B2 (en) * 2017-08-16 2021-03-09 Social Evidence, Llc Systems and methods for targeting, reviewing, and presenting online social network data by evidence context
CN107612893B (zh) * 2017-09-01 2020-06-02 北京百悟科技有限公司 短信的审核系统和方法以及构建短信审核模型方法
CN109271633B (zh) * 2018-09-17 2023-08-18 鼎富智能科技有限公司 一种单语义监督的词向量训练方法及装置
CN109471944B (zh) * 2018-11-12 2021-07-16 中山大学 文本分类模型的训练方法、装置及可读存储介质
CN109635116B (zh) * 2018-12-17 2023-03-24 腾讯科技(深圳)有限公司 文本词向量模型的训练方法、电子设备及计算机存储介质
CN110083705B (zh) * 2019-05-06 2021-11-02 电子科技大学 一种用于目标情感分类的多跳注意力深度模型、方法、存储介质和终端
US11748613B2 (en) * 2019-05-10 2023-09-05 Baidu Usa Llc Systems and methods for large scale semantic indexing with deep level-wise extreme multi-label learning
GB2584727B (en) * 2019-06-14 2024-02-28 Vision Semantics Ltd Optimised machine learning
CN110457675B (zh) * 2019-06-26 2024-01-19 平安科技(深圳)有限公司 预测模型训练方法、装置、存储介质及计算机设备
CN111079406B (zh) * 2019-12-13 2022-01-11 华中科技大学 自然语言处理模型训练方法、任务执行方法、设备及系统
CN111506702A (zh) * 2020-03-25 2020-08-07 北京万里红科技股份有限公司 基于知识蒸馏的语言模型训练方法、文本分类方法及装置
CN111899727B (zh) * 2020-07-15 2022-05-06 思必驰科技股份有限公司 用于多说话人的语音识别模型的训练方法及系统

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108664893A (zh) * 2018-04-03 2018-10-16 福州海景科技开发有限公司 一种人脸检测方法及存储介质
EP3736749A1 (fr) * 2019-05-09 2020-11-11 Siemens Aktiengesellschaft Procédé et dispositif de commande d'un appareil à l'aide d'un ensemble de données
CN110472681A (zh) * 2019-08-09 2019-11-19 北京市商汤科技开发有限公司 基于知识蒸馏的神经网络训练方案和图像处理方案
CN111754985A (zh) * 2020-07-06 2020-10-09 上海依图信息技术有限公司 一种语音识别模型的训练以及语音识别的方法和装置
CN111554268A (zh) * 2020-07-13 2020-08-18 腾讯科技(深圳)有限公司 基于语言模型的语言识别方法、文本分类方法和装置
CN111858943A (zh) * 2020-07-30 2020-10-30 杭州网易云音乐科技有限公司 音乐情感识别方法及装置、存储介质和电子设备
CN112270379A (zh) * 2020-11-13 2021-01-26 北京百度网讯科技有限公司 分类模型的训练方法、样本分类方法、装置和设备

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116258390A (zh) * 2022-12-22 2023-06-13 华中师范大学 一种面向教师在线教学反馈的认知支持质量评价方法及系统
CN116258390B (zh) * 2022-12-22 2024-04-05 华中师范大学 一种面向教师在线教学反馈的认知支持质量评价方法及系统

Also Published As

Publication number Publication date
US20220383190A1 (en) 2022-12-01
CN112270379B (zh) 2023-09-19
EP4027268A4 (fr) 2022-11-16
EP4027268A1 (fr) 2022-07-13
CN112270379A (zh) 2021-01-26

Similar Documents

Publication Publication Date Title
WO2022100045A1 (fr) Procédé d'entraînement pour modèle de classification, procédé et appareil de classification d'échantillon, et dispositif
CN112507715B (zh) 确定实体之间关联关系的方法、装置、设备和存储介质
US11663404B2 (en) Text recognition method, electronic device, and storage medium
CN112560912B (zh) 分类模型的训练方法、装置、电子设备和存储介质
US11948058B2 (en) Utilizing recurrent neural networks to recognize and extract open intent from text inputs
US20220050967A1 (en) Extracting definitions from documents utilizing definition-labeling-dependent machine learning background
US10832001B2 (en) Machine learning to identify opinions in documents
CN111753060A (zh) 信息检索方法、装置、设备及计算机可读存储介质
EP3958145A1 (fr) Procédé et appareil de recherche sémantique, dispositif et support d'enregistrement
WO2021121198A1 (fr) Procédé et appareil d'extraction de relation d'entité basée sur une similitude sémantique, dispositif et support
KR102573637B1 (ko) 엔티티 링킹 방법, 장치, 전자 기기 및 기록 매체
WO2022077880A1 (fr) Procédé et appareil d'apprentissage de modèle, procédé et appareil de vérification de messages courts, dispositif, et support d'enregistrement
CN113591483A (zh) 一种基于序列标注的文档级事件论元抽取方法
US20220129448A1 (en) Intelligent dialogue method and apparatus, and storage medium
WO2023134083A1 (fr) Procédé et appareil de classification de sentiments basée sur texte, dispositif informatique et support de stockage
US20220172039A1 (en) Machine learning techniques to predict document type for incomplete queries
CN117453921B (zh) 一种大语言模型的数据信息标签处理方法
US11704326B2 (en) Generalization processing method, apparatus, device and computer storage medium
CN111859953A (zh) 训练数据的挖掘方法、装置、电子设备及存储介质
CN115859980A (zh) 一种半监督式命名实体识别方法、系统及电子设备
KR102608867B1 (ko) 업계 텍스트를 증분하는 방법, 관련 장치 및 매체에 저장된 컴퓨터 프로그램
CN111460791A (zh) 文本分类方法、装置、设备以及存储介质
EP4222635A1 (fr) Gestion de cycle de vie pour traitement automatique du langage naturel personnalisé
US11562150B2 (en) Language generation method and apparatus, electronic device and storage medium
WO2023173554A1 (fr) Procédé et appareil d'identification de langage d'agent inapproprié, dispositif électronique et support de stockage

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2021819712

Country of ref document: EP

Effective date: 20211215

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21819712

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE