WO2022237215A1

WO2022237215A1 - Model training method and system, and device and computer-readable storage medium

Info

Publication number: WO2022237215A1
Application number: PCT/CN2022/071325
Authority: WO
Inventors: 李明磊; 怀宝兴; 袁晶
Original assignee: 华为云计算技术有限公司
Priority date: 2021-05-11
Filing date: 2022-01-11
Publication date: 2022-11-17
Also published as: CN115329825A

Abstract

Provided in the present application are a model training method and system, and a device and a computer-readable storage medium. The method comprises the following steps: acquiring an original sample set, wherein the original sample set comprises a plurality of pieces of sample data; receiving, by means of a labelling interface, a labelling result of a user for each piece of sample data, so as to acquire a training sample set, wherein the training sample set comprises the plurality of pieces of sample data and the labelling result for each piece of sample data, the labelling result for each piece of sample data comprises category information of each piece of sample data and association information of the category information of each piece of sample data; and training a classification model according to the training sample set. By using the method, the training efficiency of a classification model can be improved, and the training costs of the classification model can be reduced.

Description

Model training method, system, device and computer-readable storage medium

This application claims the priority of the Chinese patent application filed with the State Intellectual Property Office of China on May 11, 2021, with the application number 202110513038.9, and the title of the invention is "model training method, system, device, and computer-readable storage medium", all of which The contents are incorporated by reference in this application.

technical field

This application relates to the technical field of artificial intelligence (AI), in particular to a model training method, system, device, and computer-readable storage medium.

Background technique

In recent years, with the rapid development of AI technology, deep learning has achieved remarkable results in many fields, especially in the field of classification, such as text classification, image classification or speech classification. In the process of classification learning, it is necessary to use a large amount of labeled data as samples to train the classification model. At present, in order to ensure the accuracy of the classification model, it is still necessary to manually classify and label the sample data. However, due to the low efficiency and high cost of manual labeling, the training cost of classification models is high.

Contents of the invention

The present application provides a model training method, system, device and computer-readable storage medium, which can improve the training efficiency of the classification model and reduce the training cost of the classification model.

In a first aspect, the present application provides a method for model training, which includes the following steps:

Obtain an original sample set, the original sample set includes a plurality of sample data;

Receive the user's labeling results for each sample data through the labeling interface to obtain the training sample set, wherein the training sample set includes the above-mentioned multiple sample data and the labeling results of each sample data, and the labeling results of each sample data include each sample The category information of the data and the associated information of the category information of each sample data;

The classification model is trained according to the training sample set.

Implementing the method described in the first aspect, the user can not only mark the category information of the sample data, but also mark the associated information of the category information of the sample data, so that when training the classification model, not only the category information of the sample data, but also the According to the association information of the category information of the sample data, the training efficiency of the classification model can be improved and the training cost of the classification model can be reduced.

In a possible implementation manner of the first aspect, the association information of the category information of the first sample data among the plurality of sample data is included in the first sample data. Optionally, the first sample data may be a part of the above-mentioned multiple sample data, or may be all of the above-mentioned multiple sample data.

In a possible implementation manner of the first aspect, the associated information of the category information of the first sample data includes category explanation information and metadata of the category explanation information, and the category explanation information is the first item in the first sample data The category information section of this data. The metadata of category explanation information includes the number of category explanation information, the position of category explanation information in the first sample data, and the like.

In a possible implementation manner of the first aspect, the association information of the category information of the second sample data among the plurality of sample data is not included in the second sample data. Optionally, the second sample data may be a part of the above-mentioned multiple sample data, or may be all of the above-mentioned multiple sample data.

It can be seen that when the user marks the associated information of the category information of the sample data, the user may mark the associated information of the category information in the sample data, or may not mark the associated information of the category information in the sample data. Moreover, when the associated information of the category information of the sample data includes multiple parts (category explanation information and metadata of the category explanation information), the associated information of a part of the category information can be marked in the sample data, and the associated information of a part of the category information May not be marked in the sample data. In this way, the user can mark the associated information of the category information of the sample data according to his own needs.

In a possible implementation manner of the first aspect, the association information of the above category information of each sample data is confirmed by the user during the process of generating the category information of the sample data. In this way, the time required for the user to label a sample data will not be too much.

In a possible implementation of the first aspect, the training of the classification model based on the training sample set includes: inputting the above-mentioned multiple sample data into the classification model, and obtaining the predicted category information and the predicted category of each sample data Information related to information; adjust the parameters of the classification model according to the loss function until the output of the loss function meets the threshold; where the loss function includes a first loss function and a second loss function, and the first loss function is used to indicate the category of each sample data information and the predicted class information of each sample, the second loss function is used to indicate the difference between the associated information of the class information of each sample data and the associated information of the predicted class information of each sample data.

Through the above method, the classification model can not only be trained according to the category information of the sample data, but also can be trained according to the associated information of the category information of the sample data, because the associated information of the category information of the sample data is related to the category information of the sample data , therefore, training the classification model based on the association information of the category information of the sample data is equivalent to training the classification model from another dimension, which helps to improve the accuracy of the classification model. Then, the above method can improve the training efficiency of the classification model and reduce the training cost of the classification model.

In a possible implementation of the first aspect, the above classification model includes an encoding model, a first task model and a second task model, the encoding model is used to extract the features of the above multiple sample data, and the first task model is used to The characteristics of the plurality of sample data determine the category information of each sample data, and the second task model is used to determine the associated information of the category information of each sample data according to the characteristics of the plurality of sample data. With such a design, the data classification process and the prediction process of the associated information of the category information of the data can share the features extracted by the encoding model, and at the same time, the classification result of the data and the associated information of the category information can be output separately.

In a possible implementation of the first aspect, the above method further includes: acquiring another original sample set, which also includes a plurality of sample data; The category information labeled by each sample data of , so as to obtain the above training sample set. The training sample set includes two parts, one, the above-mentioned original sample set, the category information of each sample data in the original sample set, and the associated information of the category information; second, the above-mentioned another original sample set, and another original sample set Class information for each sample data. Then, use the training sample set to train the above classification model. That is to say, when labeling the sample data, the user may choose to label the associated information of the category information of the sample data, or may choose not to label the associated information of the category information of the sample data.

In a second aspect, the present application provides a model training system, which includes:

The training data labeling module is used to obtain an original sample set, and the original sample set includes a plurality of sample data;

The training data labeling module is also used to receive the user's labeling results for each sample data through the labeling interface to obtain a training sample set, wherein the training sample set includes the above-mentioned multiple sample data and the labeling results of each sample data, and each sample data The labeling results of include the category information of each sample data and the associated information of the category information of each sample data;

The model training module is used to train the classification model according to the training sample set.

In a possible implementation manner of the second aspect, the association information of the category information of the first sample data among the plurality of sample data is included in the first sample data. Optionally, the first sample data may be a part of the above-mentioned multiple sample data, or may be all of the above-mentioned multiple sample data.

In a possible implementation manner of the second aspect, the associated information of the category information of the first sample data includes category explanation information and metadata of the category explanation information, and the category explanation information is embodied in the first sample data. Part of the category information of the first sample data. The metadata of category explanation information includes the number of category explanation information, the position of category explanation information in the first sample data, and the like.

In a possible implementation manner of the second aspect, the association information of the category information of the second sample data among the plurality of sample data is not included in the second sample data. Optionally, the second sample data may be a part of the above-mentioned multiple sample data, or may be all of the above-mentioned multiple sample data.

In a possible implementation manner of the second aspect, the association information of the above category information of each sample data is confirmed by the user during the process of generating the category information of the sample data.

In a possible implementation of the second aspect, the above-mentioned model training module is specifically configured to: input the above-mentioned multiple sample data into the classification model, and obtain the predicted category information of each sample data and the associated information of the predicted category information; Adjust the parameters of the classification model according to the loss function until the output of the loss function meets the threshold; where the loss function includes a first loss function and a second loss function, and the first loss function is used to indicate that the category information of each sample data is consistent with each sample The difference between the predicted class information of each sample data, the second loss function is used to indicate the difference between the associated information of the class information of each sample data and the associated information of the predicted class information of each sample data.

In a possible implementation of the second aspect, the classification model includes an encoding model, a first task model, and a second task model, the encoding model is used to extract the features of the plurality of sample data, and the first task model is used to The characteristics of the plurality of sample data determine the category information of each sample data, and the second task model is used to determine the associated information of the category information of each sample data according to the characteristics of the plurality of sample data.

In a possible implementation of the second aspect, the above-mentioned training data labeling module is also used to: obtain another original sample set, which also includes multiple sample data; The category information labeled by each sample data in an original sample set, so as to obtain the above training sample set. The training sample set includes two parts, one, the above-mentioned original sample set, the category information of each sample data in the original sample set, and the associated information of the category information; second, the above-mentioned another original sample set, and another original sample set Class information for each sample data. The training sample set can also be used to train the above classification model.

In a third aspect, the present application provides a computing device, the computing device includes a processor and a memory, the memory stores computer instructions, and the processor executes the computer instructions, so that the computing device performs any one of the aforementioned first aspect or the first aspect method in a possible implementation.

In a fourth aspect, the present application provides a computer-readable storage medium, the computer-readable storage medium stores computer program code, and when the computer program code is executed by a computing device, the computing device executes the aforementioned first aspect or the first aspect. A method in any one of the possible implementations.

Description of drawings

In order to more clearly illustrate the technical solutions provided by this application, the accompanying drawings that need to be used in this application will be briefly introduced below. Obviously, the accompanying drawings in the following description are some embodiments of this application. For those skilled in the art As far as people are concerned, other drawings can also be obtained based on these drawings on the premise of not paying creative work.

Fig. 1 is a schematic structural diagram of a model training system provided by the present application;

Fig. 2 is a schematic flow chart of a model training method provided by the present application;

Fig. 3 is a schematic diagram of a GUI provided by the present application;

Fig. 4 is a schematic structural diagram of a classification model provided by the present application;

Fig. 5 is a deployment mode of a model training system provided by the present application;

Fig. 6 is the deployment mode of another model training system provided by the present application;

FIG. 7 is a schematic structural diagram of a computing device provided by the present application;

FIG. 8 is a schematic structural diagram of a computing device system provided by the present application.

Detailed ways

The technical solutions provided by the present application will be described in detail below in conjunction with the accompanying drawings.

Deep learning is a kind of machine learning technology based on deep neural network algorithm, its main feature is to use multiple nonlinear transformations to process and analyze data. In recent years, deep learning has achieved great success in the field of classification (eg, text classification, image classification, speech classification, etc.), and this success is mainly due to the accuracy of the classification model.

Generally, classification models need to be trained before being used to perform classification tasks. When training a classification model, it is necessary to manually classify and label a large number of sample data, then use the sample data as the input of the classification model, and use the category label (label) corresponding to the sample data as a reference for the output value of the classification model, and then use the loss function ( loss function) calculates the loss value of the classification model output value and the category label corresponding to the sample data, and repeatedly adjusts the parameters in the classification model according to the loss value until the classification model can output the category label corresponding to the sample data according to the input sample data very close value. After that, the trained classification model can be used to predict the category of the data.

However, one thing to note is that currently, when classifying and labeling sample data, the labeler usually marks the category of the sample data according to the provided category label after reading a sample data, and then continues to read the next sample data. Until all the sample data are marked. For example, an example of text classification: In order to train a classification model for identifying sentiments associated with text data, the class labels can be set to the following two: positive sentiment labels and negative sentiment labels. Then, the annotators will label the collected sample data one by one. If the annotators think that the sample data expresses positive emotions, they will add positive emotional labels to the sample data, otherwise, add negative emotional labels to the sample data. Finally, use the labeled sample data to train the emotion classification model. It is not difficult to see that the above method will make the accuracy of the trained classification model mainly rely on category labels. Moreover, the efficiency of manual labeling is low, but the cost is very high, which will lead to a higher cost to train a classification model with better performance.

In view of the above problems, the present application provides a model training system, which can improve the training efficiency of the classification model and reduce the training cost of the classification model.

Please refer to FIG. 1 , which shows a schematic structural diagram of a possible model training system. As shown in FIG. 1 , the model training system 100 includes a training data labeling module 110 and a model training module 120 . Optionally, the model training system 100 may further include a training data storage module 130 , a model storage module 140 and a model selection module 150 . The following will briefly introduce the functions of each of the above modules:

Training data labeling module 110: for obtaining an original sample set, wherein the original sample set includes a plurality of sample data. The training data labeling module 110 is also configured to receive a labeling result of each sample data in the original sample set by the first user to obtain a training sample set.

In a specific embodiment, the original sample set includes a first original sample set, and the first original sample set includes a plurality of sample data. The training data labeling module 110 is specifically configured to: receive the first user's labeling results for each sample data in the first original sample set, wherein the first user's labeling results for each sample data in the first original sample set include the first Category information of each sample data in the original sample set and associated information of the category information of each sample data. Then, the training sample set includes a plurality of sample data in the first original sample set, and a labeling result of each sample data in the first original sample set by the user.

Optionally, the original sample set further includes a second original sample set, and the second original sample set includes a plurality of sample data. The training data labeling module 110 is also used to receive the first user's labeling results for each sample data in the second original sample set, wherein the first user's labeling results for each sample data in the second original sample set include the second original The category information of each sample data in the sample set. In this case, the training sample set includes two parts, one part is a plurality of sample data in the first original sample set, and the labeling result of each sample data in the first original sample set by the first user; the other part is the second A plurality of sample data in the original sample set, and an annotation result of each sample data in the second original sample set by the first user. That is to say, the first user may choose to mark the associated information of the category information of the sample data, or may choose not to mark the associated information of the category information of the sample data.

Optionally, the training data labeling module 110 is also configured to send the training sample set to the training data storage module 130 . The training data labeling module 110 may also send the above-mentioned original sample set to the training data storage module 130 .

In a possible implementation manner, the training data labeling module 110 may be implemented using a graphical user interface (graphical user interface, GUI). For example, the GUI can display each sample data in the original sample set to the first user, and a labeling interface is also provided on the GUI, and the first user can perform category information and associated information of the category information on each sample data in the original sample set through the labeling interface. label. After the first user marks all the sample data in the original sample set, the GUI stores the marked training sample set in the training data storage module 130 .

Model training module 120: used to read the training sample set from the training data storage module 130, and train the classification model according to the training sample set, so as to obtain a trained classification model.

Optionally, the model training module 120 is also used to receive the expected effect (including accuracy, running time, etc.) of the trained classification model input by the second user, so that the model training module 120 can be based on the expected effect of the second user to train the classification model. Wherein, the second user may be a person who needs to train the classification model, and the first user and the second user may be the same user or different users.

In some embodiments, the model training system 100 may further include a model storage module 140 . The model storage module 140 is used for storing the trained classification model trained by the model training module 120 .

Before the model training module 120 trains the classification model, it is also necessary to acquire the classification model to be trained. The model training module 120 acquires classification models in various ways, for example:

Way 1: The model training module 120 acquires the classification model uploaded by the second user.

Way 2: The model storage module 140 also stores multiple untrained models. The model selection module 150 presents a plurality of untrained models stored in the model storage module 140 to the second user, and receives the model selected by the second user as a classification model, and then sends it to the model training module 140 .

Method 3: The model selection module 150 receives the requirement input by the second user, and selects an appropriate model from the model storage module 140 as the classification model according to the requirement of the second user, and sends it to the model training module 120 . Optionally, the requirement input by the user may include the classification task that the user expects to complete. For example, if the classification task that the user expects to complete is a text classification task, the model selection module 150 will obtain the text classification task from the model storage module 140. Models (such as convolutional neural networks (CNN)). Optionally, the requirements input by the user may also include the user's requirements on the parameters of the classification model, for example, the convolution kernel size, the number of convolution layers, the activation function, the number of pooling layers, etc. of CNN.

It should be noted that the above-mentioned model training system may be a system that interacts with users. This system may be a software system, a hardware system, or a system combining software and hardware, which is not specifically limited in this application. It should also be noted that FIG. 1 is only an exemplary structural diagram of the model training system, and the model training system shown in FIG. 1 can be transformed accordingly in practical applications according to specific situations.

The process of model training performed by the above-mentioned model training system 100 will be described in detail below with reference to FIG. 2 .

Please refer to FIG. 2 . FIG. 2 shows a schematic flowchart of a model training method provided by the present application, and the method is executed by the model training system 100 shown in FIG. 1 . As shown in Figure 2, the method includes but is not limited to the following steps:

S101: The model training system 100 acquires an original sample set.

In a specific embodiment, the original sample set includes a plurality of sample data. The model training system 100 can obtain the original sample set in the following ways:

Method 1: Obtain the original sample set uploaded by the second user. Specifically, the model training system 100 provides a data upload interface, and the data upload interface includes a data upload interface. Then, the second user can upload the pre-prepared original sample set to the model training system 100 by clicking on the data upload interface.

Method 2: Obtain the business requirements of the second user, and search for sample data meeting the requirements from the database (including local databases or databases of other devices) according to the business requirements of the second user, so as to obtain the original sample set. For example, if the business requirement of the second user is text sentiment classification, then the model training system 100 may acquire a large amount of text data from a local database or the Internet as an original sample set. For another example, if the business requirement of the second user is face detection, a large number of images can be obtained from a local database or the Internet as an original sample set.

S102: The model training system 100 receives a labeling result of each sample data in the original sample set by the first user through the labeling interface, so as to obtain a training sample set.

Wherein, the original sample set includes a first original sample set, and the first original sample set includes a plurality of sample data. The tagging result of each sample data in the first original sample set by the first user includes category information of each sample data in the first original sample set and associated information of the category information of each sample data. The category information of the sample data is used to indicate the category of the sample data, and the associated information of the category information of the sample data is information confirmed by the first user during the process of generating the category information of the sample data. In this way, the first user can mark out the associated information of the category information of the sample data without consuming much time.

In a specific embodiment, the first user is a person who labels sample data, and the second user is a person who needs to train a classification model. Optionally, the first user and the second user may be the same user or different users.

Optionally, the plurality of sample data includes first sample data, and the associated information of the category information of the first sample data is included in the first sample data. Wherein, the first sample data may be a part of the sample data in the multiple sample data, or may be all the sample data in the multiple sample data.

Optionally, the plurality of sample data further includes second sample data, and the associated information of the category information of the second sample data is not included in the second sample data. The second sample data may be a part of the sample data, or all the sample data in the multiple sample data.

In a specific embodiment, the associated information of the category information of the sample data may include category explanation information and metadata of the category explanation information. Optionally, the category explanation information may be a part of the sample data that reflects the category information of the sample data. It may also be a part marked outside the sample data to reflect the category information of the sample data, which is not specifically limited this time.

Further, the category explanation information may be the reason why the user marks the corresponding category information for the sample data, and the reason may directly reflect the category information of the sample data from the positive side, or indirectly reflect the category information of the sample data from the negative side. For example, a piece of text data is "this mobile phone is so beautiful", when the first user needs to label category information (including positive emotions and negative emotions) for the text data, the user can determine the mobile phone according to the "too good-looking" in the text data. The text data expresses positive emotions. At this time, the user can mark the text data as positive emotions. Then, "too good-looking" can be used as the category explanation information in the text data. Alternatively, the first user may also add an "unattractive" label to the sample data as category explanation information of the sample data.

It should be understood that the sample data is different, and the representation form of the category interpretation information of the sample data may be different. For example, the category explanation information of text data can be keywords, key words or key sentences in the text data; the category explanation information of image data can be one or some regions in the image data; the category explanation information of audio data It can be one or some fragments in this piece of audio data.

Further, the metadata of the category explanation information can describe the category explanation information, and the metadata of the category explanation information may include the quantity of the category explanation information, the position of the category explanation information in the sample data, etc., which are not specifically limited in this application. It should be understood that the metadata of the category explanation information may also reflect the category information of the sample data. For example, a piece of text data is "Although the color of this mobile phone is not good-looking, but I think this mobile phone has good performance and unique appearance, so I like it very much", there are keywords expressing negative emotions (ie "no Good-looking"), there are also keywords expressing positive emotions (i.e., "good performance", "unique appearance", "like it very much"), but considering that the number of keywords expressing positive emotions (i.e., the amount of category explanatory information) is greater than that expressing The number of keywords of negative emotions, and the keywords representing positive emotions are located after the keyword "but" (that is, the position of category explanation information in the sample data), therefore, the category information of the text data should be positive emotions.

In a specific embodiment, the model training system 100 provides a data labeling interface, on which sample data can be displayed, and the data labeling interface can also receive the category information and the associated information of the category information marked by the user on the sample data. Taking FIG. 3 as an example, the data labeling interface 200 includes a tool bar 210 and a labeling area 220 . The tool bar 210 carries the labeling tool 212 related to category labels 211 and category information, and the labeling area 220 displays sample data. In this way, the user can view the sample data in the annotation area 220 and use the tool bar 210 to mark the sample data. Specifically, after the user reads the sample data, he selects a suitable labeling tool from the labeling tool 212 to mark the related information of the category information in the sample data, and then clicks the category label corresponding to the sample data from the category label 211 . Finally, click Save to complete the labeling of the sample data. As shown in Figure 3, when the user sees the sample data as "this mobile phone is too good-looking", at this time, the user can select the line tool from the annotation tool 212, then draw a line under "too good-looking", and then click on the front Emotion label, click the save button to complete the labeling of the sample data.

It should be understood that FIG. 3 is only an example. In practical applications, the data labeling interface 200 can also display the number of sample data that has been marked, the number of sample data to be marked, the progress of marking, etc., and can also provide a return keys, zoom in and out keys, etc., which are not specifically limited in this application.

Optionally, the original sample set may further include a second original sample set, and the labeling result of each sample data in the second original sample set by the first user includes category information of each sample data in the second original sample set. In this case, the training sample set not only includes the first original sample set and the user's labeling results for each sample data in the first original sample set, but also includes the second original sample set and the user's labeling results for each sample data in the second original sample set. Labeling results of sample data. That is to say, the first user may choose to mark the sample data in the original sample set with the associated information of category information, or may choose not to mark the sample data in the original sample set with the associated information of category information.

S103: The model training system 100 trains the classification model according to the training sample set.

The training sample set includes the first original sample set, the user's labeling results for each sample data in the first original sample set, but does not include the second original sample set and the user's labeling results for each sample data in the second original sample set Take the result as an example, the model training system 100 trains the classification model according to the training sample set, and the specific process includes: the model training system 100 inputs a plurality of sample data into the classification model, and obtains the predicted category information of each sample data (that is, the predicted category information). The category information of each sample data) and the association information of the predicted category information (that is, the association information of the category information of each sample data predicted), and then adjust the network parameters of the classification model according to the loss function until the output of the loss function satisfies Preset to complete the training of the classification model. Among them, the loss function includes a first loss function and a second loss function, the first loss function indicates the difference between the category information of each sample data marked by the user and the predicted category information of each sample data, and the second loss function Indicates the difference between the associated information of the category information of each sample data marked by the user and the predicted associated information of the category information of each sample data.

As shown in FIG. 4, FIG. 4 shows a schematic structural diagram of a classification model. Wherein, the classification model 300 includes an input module 310 , an encoding module 320 , a first task module 330 and a second task module 340 .

The input module 310 : used to obtain the training sample set, and send the training sample set to the encoding module 320 . Optionally, the input module 310 can also be used to preprocess the training sample set. It should be understood that different preprocessing processes are required for different sample data. For example, for image samples, the input module 310 can perform preprocessing such as image color space transformation, image cropping, and image scaling on the image samples; for text samples, input Module 310 can perform preprocessing such as removing non-text data and removing stop words on text samples; for audio samples, input module 310 can perform preprocessing such as audio format conversion, framing, and windowing on audio samples. Optionally, the input module 310 can also be used to preprocess the sample data according to the labeling result of the sample data, for example, when a certain sample data has no corresponding category information, the sample data can be deleted.

Coding module 320: used to read the training sample set from the input module 310, and perform coding processing on each sample data in the training sample set, and extract the coding features of each sample data.

In a specific embodiment, the encoding module 320 may include an encoding model 321 . In practical applications, the encoding module 320 can adopt different encoding models for different sample data. For example, for image samples, the encoding module 320 can adopt an existing neural network model (such as : CNN model, VGG network model, etc.) as the encoding model; for text samples, the encoding module 320 can adopt an existing neural network model (for example: long short-term memory network (long short-term memory) with better text feature extraction capabilities in the industry , LSTM) model, converter-based bidirectional encoder representations from transformers (bidirectional encoder representations from transformers, BERT) model or Transformer model, etc.) as the encoding model; The neural network model with feature extraction capability (for example: time delay neural network (TDNN) model, CNN model, etc.) is used as the encoding model.

The first task module 330: including the first task model 331, the first task module 330 is used to receive the encoding feature of each sample data extracted by the encoding module 320, and use the encoding feature of each sample data as the first task model 331 After the classification learning of the first task model 331, the predicted category information of each sample data is obtained.

The second task module 340: includes a second task model 341, the second task module 340 is used to receive the encoding feature of each sample data extracted by the encoding module 320, and use the encoding feature of each sample data as the second task model 341 After the input of the second task model 341, the association information of the predicted category information of each sample data is obtained.

It should be understood that the selection of the second task model 341 should be based on the requirements of specific classification tasks, sample data in the training sample set, labeling results of the sample data, and the like. For example, for the text classification task, when the associated information of the category information of the text data marked by the user is included in the text data, the second task model 341 can be used to identify the keywords in the text data that can reflect the category information of the text data Or keyword sequence tagging models, for example, conditional random field (conditional random field, CRF), bidirectional LSTM-CRF (bidirectional LSTM-CRF, BiLSTM-CRF), hidden Markov model (hidden markov model, HMM), etc. For another example, for a text classification task, when the associated information of the category information of the text data marked by the user is not included in the text data, the second task model 341 may be used to generate keywords or key words that can reflect the category information of the text data. Text summarization models for words, such as attention models, LSTM models, etc. For another example, for an image classification task, when the associated information of the category information of the image data marked by the user is included in the image data, the second task model 341 may be used to identify the region in the image data that can reflect the category information of the image data Target detection models, for example, one-stage unified real-time target detection (you only look once:unified, Yolo) model, single shot multi box detector (SSD) model, region convolutional neural network (region convolutional neural network) network, RCNN) model. For another example, for an audio classification task, when the associated information of the category information of the audio data marked by the user is included in the audio data, the second task model 341 may be used to identify a segment of the category information in the audio data that can characterize the audio data , its essence is to recognize audio sequences, therefore, the second task model 341 can also use a sequence labeling model.

In a specific embodiment, the coding model 321 , the first task model 331 and the second task model 341 may be uploaded by the user to the model training system 100 . Optionally, the model training system 100 may store a plurality of models, so the coding model 321 , the first task model 331 and the second task model 341 may also be selected by the user from the model training system 100 . Optionally, the encoding model 321 , the first task model 331 and the second task model 341 may also be selected by the model training system 100 according to the user's requirements, which are not specifically limited this time. Wherein, the user's requirement may include the classification task that the user expects to complete, the user's requirement for the initial parameters in the coding model 321 , the first task model 331 and the second task model 341 , and the like.

Based on the classification model shown in FIG. 4 , the specific process of the model training system 100 training the classification model according to the training sample set is: the model training system 100 inputs the training sample set into the input module 310 of the classification model 300, and then reaches the encoding module via the input module 310 320, after the processing of the encoding module 320, the encoding features of each sample data can be obtained, and then the encoding features of each sample data and the category information of each sample data marked by the user are input into the first task module 330, the first task module 330 After receiving the coding features of each sample data and the category information of each sample data marked by the user, input the coding features of each sample data into the first task model 331 to obtain the predicted category information of each sample data, and then use The first loss function calculates a first loss value between the category information of each sample data marked by the user and the predicted category information of each sample data. At the same time, the encoding module 320 also inputs the associated information of the encoding feature of each sample data and the category information of each sample data into the second task module 340 . After the second task module 340 receives the coding features of each sample data and the associated information of the category information of each sample data, it inputs the coding features of each sample data into the second task model 341, and predicts the category information of each sample data Then use the second loss function to calculate the second loss value between the association information of the category information of each sample data marked by the user and the predicted association information of the category information of each sample data. Then, adjust the parameters in the first task model 331, the second task model 341 and the encoding model 321 according to the first loss value and the second loss value (for example, the sum and product of the first loss value and the second loss value, etc.), Get the adjusted classification model. Then, the above process is repeated until the difference between the category information of each sample data predicted by the classification model based on each input sample data and the category information of each sample data marked by the user satisfies the first threshold, and the classification model The difference between the association information of the category information of each sample data predicted according to the input of each sample data and the association information of the category information of each sample data marked by the user satisfies the second threshold. Wherein, the first threshold and the second threshold may be set by the user, or may be dynamically adjusted by the model training system 100 according to actual conditions, which are not specifically limited here.

In the training process of the above classification model 300, not only the coding model 321 will be trained according to the category information of each sample data, but also the coding model 321 will be trained according to the associated information of the category information of each sample data, so that the coding model 321 can be The extracted features can not only represent the association information of the category information of the sample data, but also represent the category information of the sample data. Considering that the association information of the category information of the sample data is related to the category information of the sample data, the features of the category information of the sample data extracted by the coding model 321 can be improved after model training, thereby further improving the first task model 331 The accuracy of the classification learning is to improve the accuracy of the classification model 300 performing the classification task.

In this embodiment, after the classification model is trained, the trained classification model can be used to perform the classification task. Specifically, before using the trained classification model to perform the classification task, the trained classification model can be directly used to perform the classification task, so that not only the category information of the input data can be predicted, but also the category interpretation information of the input data can be predicted, so that The reliability of the predicted category information can be increased. In addition, the first task module in the trained classification model can also be deleted first, and then the classification task is performed, which can improve the efficiency of the classification task and save computing resources.

It should be understood that when the training sample set includes not only the first original sample set and the user's labeling results for each sample data in the first original sample set, but also includes the second original sample set and the user's labeling results for each sample data in the second original sample set When labeling the data, the model training system 100 can set the association information of the category information of the sample data labeled with category information to 0 when training the classification model, and then use the above loss function (including the first loss function and the second Loss function) to adjust the parameters of the classification model to complete the training of the classification model. When the training sample set includes the second original sample set and the user's labeling results for each sample data in the second original sample set, but does not include the first original sample set, the user's labeling results for each sample data in the first original sample set As a result, when training the classification model, the model training system 100 can only use the first loss function to adjust the parameters of the classification model, thereby completing the training of the classification model.

It should be noted that, the above-mentioned model training method may be implemented jointly by one or more modules in the model training system 100 . Specifically, the training data labeling module 110 is used to implement steps S101-S102. The training data storage module 130 is used to store the original sample set and the training sample set acquired by the training data labeling module 110 after steps S101-S102. The model training module 120 is used to implement step S103. The model storage module 140 is used to store the trained classification model obtained by the model training module 120 after step S103. Optionally, the model storage module 140 is also used to store the coding model, the first task model and the second task model. Optionally, the model selection module 150 is configured to select the above-mentioned coding model 321 , first task model 331 and second task model 341 according to user requirements.

When the model training system 100 executes the above model training method, the first user can not only mark the category information of the sample data, but also mark the associated information of the category information of the sample data. Since the association information of the category information of the sample data can reflect the category information of the sample data, compared with only using the category information of the sample data to train the classification model, using the category information of the sample data and the category information of the sample data to train the classification model The accuracy of the classification model is higher. Moreover, the associated information of the category information of the sample data is confirmed and marked by the first user during the process of generating the category information of the sample data, so the time required for the first user to mark the associated information of the category information of a sample data will not too much. Then, compared with the method in which the user only labels the category information of the sample data, the method provided by this application allows the user to train a classification model with the same accuracy only by labeling less sample data and using less time , thus improving the training efficiency of the classification model and reducing the training cost of the classification model.

The preceding content introduces the model training system 100 provided by this application in detail, and how to use the system to implement the process of model training. The following describes the deployment mode and application scenarios of the model training system 100 in conjunction with FIGS. 5-6 .

The above-mentioned model training system 100 can be deployed flexibly, specifically, it can be deployed in a cloud environment, which is an entity that uses basic resources to provide users with cloud services under the cloud computing mode. The cloud environment includes a cloud data center and a cloud service platform. The cloud data center includes a large number of basic resources (including computing resources, storage resources and network resources) owned by the cloud service provider. The computing resources included in the cloud data center can be a large number of computing resources. devices (such as servers). The model training system can be a server used for model training in the cloud data center; the model training system can also be a virtual machine created in the cloud data center for model training; the model training system can also be deployed in the cloud data center The software device on the server or virtual machine in the server, the software device realizes the training of the model, and the software device can be deployed on multiple servers in a distributed manner, or deployed on multiple virtual machines in a distributed manner, or deployed in a distributed manner on virtual machines and servers.

When the model training system 100 is deployed in a cloud environment, the model training system 100 can be abstracted as a cloud service (hereinafter referred to as a cloud service for model training) by a cloud service provider on a cloud service platform (for example, an AI development platform) and provided to the above-mentioned second user. Then, the second user can complete the training of the classification model on the model training system 100 by purchasing the cloud service of this model training on the cloud service platform. Among them, the second user can purchase the cloud service of model training in various ways, for example, the second user can pay by cycle (for example, hour, month); another example, the second user can pay on demand, that is, the second Users can recharge in advance, and after using the above cloud services, the cloud service platform will pay according to the usage of the final resources.

Taking FIG. 5 as an example, FIG. 5 shows an application scenario when the model training system 100 is deployed in a cloud environment. As shown in Figure 5, after the second user purchases the cloud service for model training on the cloud service platform, the second user uploads the original sample set to the model training system through the Application Programming Interface (API) provided by the cloud service platform 100, and then the first user marks category information and associated information of category information for each sample data in the original sample set on the model training system 100 to obtain a training sample set. Afterwards, the model training system 100 automatically trains the classification model according to the training sample set, so as to obtain a trained classification model. In addition, the model training system 100 can also return the trained classification model to the second user through the API provided by the cloud service platform, so that the second user can use the trained classification model to complete the corresponding classification task.

The model training system 100 can also be deployed in an edge environment. The edge environment refers to a collection of edge data centers or edge computing devices (for example, edge servers, edge stations with computing capabilities, etc.) that are closer to terminal computing devices. Terminal computing devices include terminal servers, smart phones, notebook computers, tablet computers, personal desktop computers, smart cameras and other devices. When the model training system 100 is deployed in the edge environment, the model training system 100 can be deployed separately on an edge server or a virtual machine in the edge environment, and can also be deployed in a distributed manner on multiple edge servers in the edge environment , or multiple virtual machines, or a part is deployed on the edge server and a part is deployed on the virtual machines.

The model training system 100 can also be deployed on one or more terminal computing devices.

Since the model training system 100 can be logically divided into multiple functional modules (as shown in FIG. 1 ), the model training system 100 can also be deployed in different environments in a distributed manner. Different environments can include the above-mentioned cloud environment, The aforementioned edge environment and the aforementioned terminal computing device. Taking FIG. 6 as an example, FIG. 6 shows an application scenario when the model training system 100 is distributed and deployed in different environments. As shown in FIG. 6 , the training data labeling module 110 in the model training system 100 can be deployed on a terminal computing device, and the model training module 120 can be deployed on a cloud environment. Optionally, when the model training system 100 includes the training data storage module 130, the model storage module 140, and the model selection module 150, the training data storage module 130, the model storage module 140, and the model selection module 150 may also be deployed in a cloud environment.

When the model training system 100 is deployed separately on a computing device in any environment (for example, separately deployed on a terminal computing device), the computing device deployed with the model training system 100 may be a computing device as shown in FIG. 7 . As shown in FIG. 7 , FIG. 7 shows a schematic diagram of a hardware structure of a computing device 400 deployed with the model training system 100 . Wherein, the computing device 400 includes a memory 410 , a processor 420 , a communication interface 430 and a bus 440 . Wherein, the memory 410 , the processor 420 , and the communication interface 430 are connected to each other through the bus 440 .

The memory 410 may be a read only memory (read only memory, ROM), a static storage device, a dynamic storage device or a random access memory (random access memory, RAM). The memory 410 may store programs, for example, programs in the training data labeling module 110, programs in the model training module 120, and the like. When the program stored in the memory 410 is executed by the processor 420, the processor 420 and the communication interface 430 are used to execute part or all of the methods described in the above steps S101-S103. The memory 410 can also store data, for example: a part of the storage resources in the memory 410 can be used to store the original sample set and the training sample set stored in the training data storage module 130, and a part of the storage resources can be used to store each model stored in the model storage module 140 , a part of the storage resource is used to store intermediate data or result data generated by the processor 420 during execution, for example, parameters of the classification model, and the like.

The processor 420 may adopt a general-purpose central processing unit (central processing unit, CPU), a microprocessor, an application specific integrated circuit (application specific integrated circuit, ASIC), a graphics processing unit (graphics processing unit, GPU) or one or more integrated circuit.

The processor 420 may also be an integrated circuit chip, which has a signal processing capability. During implementation, part or all of the functions of the above-mentioned model training system 100 can be implemented by hardware integrated logic circuits in the processor 420 or instructions in the form of software. The processor 420 can also be a general-purpose processor, a data signal processor (digital signal process, DSP), a field programmable logic gate array (field programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, The discrete hardware components can realize or execute the methods, steps and logic block diagrams disclosed in the embodiments of the present application. The general-purpose processor can be a microprocessor or the processor can also be any conventional processor, etc., and the steps of the method disclosed in conjunction with the embodiments of the present application can be directly embodied as a hardware decoding processor to execute and complete, or use decoding processing The combination of hardware and software modules in the device is completed. The software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, register. The storage medium is located in the memory 410, and the processor 420 reads the information in the memory 410, and combines with its hardware to complete part or all of the functions of the model training system 100 described above.

The communication interface 430 uses a transceiver module such as but not limited to a transceiver to implement communication between the computing device 400 and other devices or communication networks. For example, the original sample set uploaded by the user can be obtained through the communication interface 430 , and the trained classification model can also be sent to other devices through the communication interface 430 .

Bus 440 may comprise a pathway for communicating information between various components in computing device 400 (eg, memory 410 , processor 420 , communication interface 430 ).

When the various modules in the above-mentioned model training system 100 are distributed and deployed on multiple computing devices in the same environment or in different environments, the multiple computing devices deployed with the model training system 100 can constitute a computing device as shown in FIG. 8 system. As shown in FIG. 8 , FIG. 8 shows a schematic diagram of a hardware structure of a computing device system 500 deployed with the model training system 100 . Wherein, the computing device system 500 includes multiple computing devices 600 , and the multiple computing devices 600 in the computing device system 500 can cooperatively implement the functions of the model training system 100 through the execution of computer instructions by an internal processor.

As shown in FIG. 8 , each computing device 600 includes a memory 610 , a processor 620 , a communication interface 630 and a bus 640 . Wherein, the memory 610 , the processor 620 , and the communication interface 630 are connected to each other through the bus 640 .

Memory 610 may be ROM, RAM, static storage, or dynamic storage. The memory 610 may store computer instructions. When the computer instructions stored in the memory 610 are executed by the processor 620, the processor 620 and the communication interface 430 are used to execute part or all of the methods described in the above steps S101-S103. The memory 610 can also store data, for example: a part of the storage resources in the memory 610 can be used to store the original sample set and the training sample set stored in the training data storage module 130, and a part of the storage resources can be used to store each model stored in the model storage module 140 , a part of the storage resource is used to store intermediate data or result data generated by the processor 620 during execution, for example, parameters of the classification model and the like.

The processor 620 may adopt a general-purpose CPU, GPU, ASIC, microprocessor, or one or more integrated circuits. The processor 620 may also be an integrated circuit chip, which has a signal processing capability. During implementation, part or all of the functions of the model training system of the present application can be implemented by hardware integrated logic circuits in the processor 620 or instructions in the form of software. The processor 620 can also be DSP, FPGA, other programmable logic devices, general-purpose processors, discrete gates, discrete hardware components or transistor logic devices, and can realize or execute the methods, steps and logic diagrams disclosed in the embodiments of the present application. The general-purpose processor can be a microprocessor or the processor can also be any conventional processor, etc., and the steps of the method disclosed in conjunction with the embodiments of the present application can be directly embodied as a hardware decoding processor to execute and complete, or use decoding processing The combination of hardware and software modules in the device is completed. The software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, register. The storage medium is located in the memory 610, and the processor 620 reads the information in the memory 610, and completes part of the functions of the above-mentioned model training system 100 in combination with its hardware.

The communication interface 6300 uses a transceiver module such as but not limited to a transceiver to implement communication between the computing device 600 and other devices or communication networks. For example, a sample data set to be marked uploaded by a user may be obtained through the communication interface 630 .

Bus 640 may include pathways for communicating information between various components of computing device 600 (eg, memory 610 , processor 620 , communication interface 630 ).

A communication path is established between each of the aforementioned computing devices 600 through a communication network. Run a part of the model training system 100 on each computing device 600 (for example: run the training data labeling module 110, the model training module 120, the training data storage module 130, the model storage module 140 and the model selection module 150 in the model training system 100 one or more modules in ). Any computing device 600 may be a server in a cloud data center, or a computing device in an edge data center, or a terminal computing device.

The descriptions of the processes corresponding to each of the above-mentioned drawings have their own emphasis. For the parts that are not described in detail in a certain process, you can refer to the relevant descriptions of other processes.

In the above-mentioned embodiments, all or part may be implemented by software, hardware or a combination thereof. When implemented using software, it may be implemented in whole or in part in the form of a computer program product. The computer program product that provides the model training system includes one or more calculation instructions executed by the model training system. When these computer program instructions are loaded and executed on the computer, all or part of the flow or process described in the embodiment diagram of the present application will be generated. Function.

The above-mentioned computers may be general-purpose computers, special-purpose computers, computer networks, or other programmable devices. The above-mentioned computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium. (eg, coaxial cable, optical fiber, twisted pair, or wireless (eg, infrared, wireless, microwave), etc.) to another website site, computer, server, or data center. The above-mentioned computer-readable storage medium stores computer program instructions providing a model training system. The computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center integrated with one or more media. The above-mentioned usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, or a magnetic tape), an optical medium (for example, an optical disk), or a semiconductor medium (for example, a solid state disk (solid state disk, SSD)).

Claims

A model training method, characterized in that, comprising:

obtaining an original sample set, the original sample set including a plurality of sample data;

Receive the user's labeling results for each sample data through the labeling interface to obtain a training sample set, wherein the training sample set includes the plurality of sample data and the labeling results of each sample data, and each sample data The labeling result includes the category information of each sample data and the associated information of the category information of each sample data;

The classification model is trained according to the training sample set.
The method according to claim 1, wherein the association information of the category information of the first sample data among the plurality of sample data is included in the first sample data.
The method according to claim 1, wherein the associated information of the category information of the first sample data includes category explanation information and metadata of the category explanation information, and the category explanation information is the first The part of the sample data embodies the category information of the first sample data.
The method according to any one of claims 1 to 3, characterized in that the associated information of the category information of the second sample data in the plurality of sample data is not included in the second sample data.
The method according to any one of claims 1 to 4, wherein the association information of the category information of each sample data is confirmed during the process of generating the category information of the sample data by the user.
The method according to any one of claims 1 to 5, wherein the training of the classification model according to the training sample set includes:

inputting the plurality of sample data into the classification model to obtain the predicted category information of each sample data and the associated information of the predicted category information;

Adjusting the parameters of the classification model according to the loss function until the output of the loss function meets the threshold;

Wherein, the loss function includes a first loss function and a second loss function, and the first loss function is used to indicate the difference between the class information of each sample data and the predicted class information of each sample , the second loss function is used to indicate the difference between the associated information of the category information of each sample data and the predicted associated information of the category information of each sample data.
The method according to any one of claims 1 to 6, wherein the classification model includes an encoding model, a first task model and a second task model, and the encoding model is used to extract the plurality of sample data features, the first task model is used to determine the category information of each sample data according to the characteristics of the multiple sample data, and the second task model is used to determine the Association information of category information for each sample data.
A model training system, characterized in that it comprises:

The training data labeling module is used to obtain an original sample set, and the original sample set includes a plurality of sample data;

The training data labeling module is also used to receive the user's labeling results for each sample data through the labeling interface to obtain a training sample set, wherein the training sample set includes the plurality of sample data and each sample data labeling results, the labeling results of each sample data include the category information of each sample data and the associated information of the category information of each sample data;

The model training module is used to train the classification model according to the training sample set.
The system according to claim 8, wherein the association information of the category information of the first sample data among the plurality of sample data is included in the first sample data.
The system according to claim 8, wherein the associated information of the category information of the first sample data includes category explanation information and metadata of the category explanation information, and the category explanation information is the first The part of the sample data embodies the category information of the first sample data.
The system according to any one of claims 8 to 10, wherein the association information of the category information of the second sample data among the plurality of sample data is not included in the second sample data.
The system according to any one of claims 8 to 11, wherein the association information of the category information of each sample data is confirmed by the user during the process of generating the category information of the sample data.
The system according to any one of claims 8 to 12, wherein the model training module is specifically used for:

inputting the plurality of sample data into the classification model to obtain the predicted category information of each sample data and the associated information of the predicted category information;

Adjusting the parameters of the classification model according to the loss function until the output of the loss function meets the threshold;

Wherein, the loss function includes a first loss function and a second loss function, and the first loss function is used to indicate the difference between the class information of each sample data and the predicted class information of each sample , the second loss function is used to indicate the difference between the associated information of the category information of each sample data and the predicted associated information of the category information of each sample data.
The system according to any one of claims 8 to 12, wherein the classification model includes an encoding model, a first task model and a second task model, and the encoding model is used to extract the plurality of sample data features, the first task model is used to determine the category information of each sample data according to the characteristics of the multiple sample data, and the second task model is used to determine the Association information of category information for each sample data.
A computing device, characterized in that the computing device includes a processor and a memory, the memory stores computer instructions, the processor executes the computer instructions, so that the computing device performs any of the preceding claims 1 to 7 one of the methods described.
A computer-readable storage medium, characterized in that the computer-readable storage medium stores computer program codes, and when the computer program codes are executed by a computing device, the computing device executes any one of the preceding claims 1 to 7. method described in the item.