CN113204614B

CN113204614B - Model training method, method for optimizing training data set and device thereof

Info

Publication number: CN113204614B
Application number: CN202110476915.XA
Authority: CN
Inventors: 王述; 冯知凡; 柴春光; 朱勇
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-04-29
Filing date: 2021-04-29
Publication date: 2023-10-17
Anticipated expiration: 2041-04-29
Also published as: CN113204614A

Abstract

The disclosure provides a model training method, a method for optimizing a training data set and a device thereof, relates to the field of artificial intelligence, and particularly relates to the field of deep learning and knowledge graph. The specific implementation scheme is as follows: training the model based on a first training dataset comprising annotation information; determining a predicted outcome of training data in the first training data set using the trained model; determining the training data as at least a first portion of a second training data set if the prediction result is different from corresponding labeling information of the training data, the second training data set being different from the first training data set; the model is trained based on the second training data set. In this way, the technical scheme disclosed by the invention can optimize sample data of the next round of model training according to the problem of model prediction, thereby improving the model effect.

Description

Model training method, method for optimizing training data set and device thereof

Technical Field

The present disclosure relates to the field of artificial intelligence, and in particular to the field of machine learning, and in particular to model training methods, methods of optimizing training data sets, and apparatus, electronic devices, computer readable storage media and computer program products thereof.

Background

In the process of training a model, effective training data is required to be selected from a large amount of training data for training the model so as to avoid the condition such as unbalance of entity types, thereby improving the performance of the model. However, since the amount of training data in the training data set is too large and the quality of the training data is uneven, objective labor cost is required to achieve optimal selection of the training data, and a worker is required to have a professional level of knowledge.

Disclosure of Invention

The present disclosure provides a model training method, a method of optimizing a training data set, and an apparatus, an electronic device, a computer readable storage medium, and a computer program product thereof.

According to a first aspect of the present disclosure, a model training method is provided. The method may include training the model based on a first training data set containing annotation information. Further, the trained model may be utilized to determine a predicted outcome of the training data in the first training data set. The method may further include determining the training data as at least a first portion of a second training data set if the predicted outcome is different from the corresponding labeling information of the training data, the second training data set being different from the first training data set. Additionally, the method may further include training the model based on the second training data set.

According to a second aspect of the present disclosure, a method of optimizing a training data set is provided, which may include determining a prediction result of training data in a first training data set for a training model using a trained model. In addition, the method may further include determining the training data as at least a first portion of a second training data set if the prediction result is different from the corresponding labeling information of the training data, the second training data set being different from the first training data set, and for further training the model.

In a third aspect of the present disclosure, there is provided a model training apparatus comprising: a first model training module configured to train the model based on a first training data set containing annotation information; a prediction result determination module configured to determine a prediction result of training data in the first training data set using the trained model; a first training data set determination module configured to determine the training data as at least a first portion of a second training data set if the prediction result is different from corresponding annotation information of the training data, the second training data set being different from the first training data set; a second model training module configured to train the model based on the second training data set.

In a fourth aspect of the present disclosure, there is provided an apparatus for optimizing a training data set, comprising: a prediction result determination module configured to determine a prediction result of training data in a first training data set for training a trained model using the model; and a first training data set determination module configured to determine the training data as at least a first portion of a second training data set, if the prediction result is different from the corresponding annotation information of the training data, the second training data set being different from the first training data set and being used for further training the model.

In a fifth aspect of the present disclosure, an electronic device is provided that includes one or more processors; and storage means for storing one or more programs that, when executed by the one or more processors, cause the one or more processors to implement the methods according to the first and second aspects of the present disclosure.

In a sixth aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the method according to the first and second aspects of the present disclosure.

In a seventh aspect of the present disclosure, there is provided a computer program product, which when executed by a processor, implements the method according to the first and second aspects of the present disclosure.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 illustrates a schematic diagram of an example environment in which various embodiments of the present disclosure may be implemented;

FIG. 2 illustrates a flow chart of a process of model training according to an embodiment of the present disclosure;

FIG. 3 shows a flow chart of a detailed process of model training in accordance with an embodiment of the present disclosure;

fig. 4 shows a schematic block diagram of a main architecture for training an entity recognition model according to an embodiment of the present disclosure.

FIG. 5 illustrates a flow chart of a process of optimizing a training data set according to an embodiment of the present disclosure;

FIG. 6 shows a block diagram of a model training apparatus according to an embodiment of the present disclosure;

FIG. 7 illustrates a block diagram of an apparatus for optimizing a training data set in accordance with an embodiment of the present disclosure; and

FIG. 8 illustrates a block diagram of a computing device capable of implementing various embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In describing embodiments of the present disclosure, the term "comprising" and its like should be taken to be open-ended, i.e., including, but not limited to. The term "based on" should be understood as "based at least in part on". The term "one embodiment" or "the embodiment" should be understood as "at least one embodiment". The terms "first," "second," and the like, may refer to different or the same object. Other explicit and implicit definitions are also possible below.

It will be appreciated that the performance of the trained model is often unsatisfactory due to the poor quality (such as noise) of the training data set being experienced when model training is performed. To solve this problem, the conventional method is to perform data cleaning and labeling on the training data set by purely manual labeling. For example, when training the entity recognition model, if the trained model is found to fail to meet the performance requirement, the staff needs to rely on expertise to clean, screen, and even re-label the training data set, thereby adjusting the entity class distribution. The optimized dataset may continue to be used to train the entity recognition model if the performance of the model still fails to meet the performance requirements. It is necessary to manually adjust the sample data by continuously performing the above operations until the model effect reaches the standard. Therefore, the model training process needs considerable labor cost, and the model training process cannot perform field migration because each field needs corresponding expertise.

Therefore, the model training method can optimize the training data set in the training process and use the training data set for final model training, so that the model training effect can be improved on the premise of not depending on manual labeling. In addition, the present disclosure also provides a method of optimizing a training data set.

According to an embodiment of the present disclosure, a model training scheme is presented. In this approach, model training may be performed based on one set of training data that is labeled, and in the event that the performance of the trained model is determined to be substandard, the model is utilized to determine the predicted outcome of each of the training data of the set of training data. If there is training data whose predicted outcome differs from the corresponding annotation information, the training data is collected into an enhanced training data set. The enhanced training data set may also include training data with predictors at threshold boundaries and a small amount of training data with excellent predictors. The model may be further trained using the enhanced training data set by forming the enhanced training data set. In this way, efficient, accurate model training is achieved.

Corresponding to the model training method, the present disclosure also provides a method of optimizing a training data set. For example, a trained model may be utilized to determine a prediction result for training data in a training dataset used to train the model. If the predicted outcome is different from the corresponding labeling information of the training data, the training data may be collected into an enhanced training data set. The enhanced training data set may be used to further train the model. In this way, optimization of the training data set may be achieved in a manner that does not rely on manual labeling.

Embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings. FIG. 1 illustrates a schematic diagram of an example environment 100 in which various embodiments of the present disclosure may be implemented. As shown in fig. 1, the present disclosure illustrates the manner in which a model is trained and applied with an entity recognition model as an example. The example environment 100 includes input text 110 to be identified, a computing device 120, and recognition results 130 determined via the computing device 120.

In some embodiments, the text 110 to be identified entered by the user may be any text string. By identifying named entities in the text string, numerous natural language processing tasks such as information extraction, question-answering systems, syntactic analysis, machine translation, and the like can be further realized. Based on the processing of the computing device 120, the recognition result 130 of the text to be recognized 110 may be determined, for example, the recognition result of the text to be recognized "Zhang Sansing AAA" is "Zhang San" (name of person) "and" AAA "(name of song may be Zhang San name of a singer, AAA may be name of song).

In some embodiments, computing device 120 may include, but is not limited to, personal computers, server computers, hand-held or laptop devices, mobile devices (such as mobile phones, personal digital assistants PDAs, media players, etc.), consumer electronics, minicomputers, mainframe computers, cloud computing resources, and the like.

The training and use of models in computing device 120 will be described below in terms of a machine learning model. As shown in FIG. 1, the example environment 100 may generally include a model training system 160 and a model application system 170. As an example, model training system 160 and/or model application system 170 may be implemented in computing device 120 as shown in fig. 1. It should be understood that the description of the structure and functionality of the example environment 100 is for illustrative purposes only and is not intended to limit the scope of the subject matter described herein. The subject matter described herein may be implemented in different structures and/or functions.

As previously described, the process of determining the recognition result 130 of the text 110 to be recognized can be divided into two phases: a model training phase and a model application phase. As an example, in a model training phase, model training system 160 may utilize training data set 150 to train recognition model 140 for implementing named entity recognition. It should be appreciated that the training data set 150 may be a combination of a plurality of reference feature data (as input to the model 140) and corresponding reference annotation information (as output from the model 140). In the model application phase, the model application system 170 may receive a trained recognition model 140 to determine recognition results 130 by the recognition model 140 based on the text 110 to be recognized.

In other embodiments, the recognition model 140 may be constructed as a learning network. In some embodiments, the learning network may include a plurality of networks, wherein each network may be a multi-layer neural network, which may be composed of a large number of neurons. Through the training process, the corresponding parameters of the neurons in each network can be determined. The parameters of the neurons in these networks are collectively referred to as parameters of the recognition model 140.

The training process of the recognition model 140 may be performed in an iterative manner until at least some of the parameters of the recognition model 140 converge or until a predetermined number of iterations is reached, thereby obtaining final model parameters.

The technical solutions described above are only for example and do not limit the invention. It should be understood that the individual networks may also be arranged in other ways and connections. In order to more clearly explain the principles of the disclosed solution, the process of model training will be described in more detail below with reference to fig. 2.

FIG. 2 illustrates a flow chart of a process 200 of model training according to an embodiment of the present disclosure. In some embodiments, process 200 may be implemented in computing device 120 of fig. 1. A process 200 for training a model according to an embodiment of the present disclosure is now described with reference to fig. 2 in conjunction with fig. 1. For ease of understanding, the specific examples mentioned in the following description are illustrative and are not intended to limit the scope of the disclosure.

At 202, the computing device 120 may train a model based on a first training data set containing annotation information. As described above, the model may be a recognition model 140 for text entity recognition. In some embodiments, to train the recognition model 140, the computing device 120 may apply a first training data set to the recognition model 140 to be trained to determine parameters of convergence of the recognition model 140. In this way, training of the model can be achieved preliminarily. If the performance of the model meets the criteria, the model may be directly output for text entity recognition.

At 204, computing device 120 may determine a prediction result of the training data in the first training data set using trained recognition model 140. To describe embodiments of the present disclosure in more detail, the process is now described in conjunction with fig. 3. FIG. 3 shows a flowchart of a detailed process 300 of model training, according to an embodiment of the present disclosure. In some embodiments, process 300 may be implemented in computing device 120 of fig. 1. A detailed process 300 for training a model according to an embodiment of the present disclosure will now be described with reference to fig. 3 in conjunction with fig. 1. For ease of understanding, the specific examples mentioned in the following description are illustrative and are not intended to limit the scope of the disclosure.

At 302, the computing device 120 may determine the performance parameters of the trained recognition model 140, that is, may evaluate the performance of the recognition model 140. In some embodiments, the manner in which the effect is evaluated may be to evaluate the accuracy, recall, of the recognition model 140. As an example, in the labeled training data "Zhang Sansing AAA", the entity type of "Zhang Saning" is labeled as "singer" and the entity type of "AAA" is labeled as "song". In the model training process, after the training data "Zhang Sansing AAA" is input into the recognition model 140, if the entity type of Zhang Sanng "is" actor "and the entity type of" AAA "is" song ", the entity type of Zhang Sanng" is incorrect in prediction and the entity type of Zhang Sanng "singer is not predicted, so that the prediction result of Zhang Sanng is not recalled. Thus, computing device 120 may determine recall for each training data one by one and count recall as an effect parameter.

At 304, the computing device 120 may compare the determined effect parameter to a predetermined effect. For example, the determined recall may be compared to a threshold recall. Upon determining that the effect meets the criteria, the computing device 120 may output a trained model 306. When it is determined that the effect is not up to standard, 308 is entered. At 308, the computing device 120 may apply training data in the first training data set to the trained recognition model 140 to determine the prediction. In this way, it can be determined whether or not to automatically perform the optimization work of the training data by evaluating the model effect.

Returning to 206, the computing device 120 compares the determined prediction with corresponding annotation information for the training data. If the predicted outcome is different from the corresponding labeling information, the computing device 120 may determine the training data as at least a first portion of the second training data set. It will be appreciated that the second training data set is different from the first training data set described above. That is, computing device 120 may add training data that is predicted to be erroneous to the second training data set.

In some embodiments, to enrich the samples of the second training data set, further training data may be added in the second training data set. As an example, the computing device 120 may determine a portion of the training data in the first training data set that has the same prediction result as the corresponding annotation information as a second portion of the second training data set. It will be appreciated that the second portion is different to the first portion described above. That is, the computing device 120 may select a small amount of training data from a large amount of predicted correct training data and add those training data to the second training data set.

As another example, where the corresponding annotation information of the training data is used to indicate a range within which the predicted outcome should fall, the computing device 120 may determine whether the predicted outcome of the training data is at a boundary of the range, or, alternatively, whether the predicted outcome is equal to a threshold value for the range. If the predicted outcome is equal to the threshold value of the range indicated by the corresponding labeling information of the training data, the computing device 120 may determine the training data as a third portion of the second training data set. It will be appreciated that this third portion is different from the first portion described above, and also from the second portion described above. That is, the computing device 120 may add training data with "refractory" predictions to the second training data set for targeted model training.

By recreating the second training data set described above, at 208, the computing device 120 may train the recognition model 140 based on the second training data set.

Through the above embodiment, a model training method is provided, and by improving the learning framework, the step of manually labeling is removed in the model training process. In this way, the present disclosure may optimize sample data for the next round of model training based on problems with model prediction, thereby improving model effectiveness. In addition, the manual labeling step is eliminated, so that the labor cost can be reduced more effectively, and the method has better expansion and migration capabilities.

In order to more clearly demonstrate the technical solution of the present disclosure, a model training architecture according to one of the specific embodiments of the present disclosure will be described below with reference to fig. 4. Fig. 4 shows a schematic block diagram of a primary architecture 400 for training an entity recognition model, according to an embodiment of the present disclosure. It should be understood that embodiments of the present disclosure are exemplary and that the entity recognition model may be replaced by any other learning model.

As shown in FIG. 4, training data 410 may be a training data set having a large number of training samples and corresponding annotation information. To complete model training, training data 410 is input into computing device 420. The computing device 420 contains a plurality of units for model training, for example, a model training unit 421, an effect evaluation unit 422, a training data screening unit 423, and optimized training data 424.

At the model training unit 421, a model training process may be performed based on the training data 410, so that a corresponding recognition model may be trained. Thereafter, the effect evaluation unit 422 may perform effect evaluation on the trained model. When the estimated effect does not reach the standard, the training data screening unit 423 re-inputs the training data 410 into the trained recognition model and determines the prediction result of each training data one by one. The training data filtering unit 423 selects training data of which prediction is wrong, and training data of which prediction is located at a boundary, and a small amount of training data of which prediction is correct from all the results, and determines these training data as optimized training data 424. Based on the optimized training data 424, the model training unit 421 may further perform the model training process, and the other units may also continue to perform the process until the model effect reaches the standard. When the model effect reaches the standard, the computing device 420 may output the entity recognition model 430 for text entity recognition.

In addition, for the model training mode proposed in the present disclosure, the optimization process of the training data set in the training mode will be described in detail below. Fig. 5 illustrates a flow chart of a process 500 of optimizing a training data set according to an embodiment of the present disclosure.

As shown in fig. 5, at 502, computing device 120 may determine a prediction result of training data in a first training data set for training a model using the trained model. In particular, computing device 120 may evaluate the effect of the model. Thereafter, at 504, the computing device 120 may compare the determined prediction results with corresponding annotation information for the training data. If the predicted outcome is different from the corresponding labeling information, the computing device 120 may determine the training data as at least a first portion of the second training data set. That is, computing device 120 may add training data that is predicted to be erroneous to the second training data set. In this way, the training data can be screened and optimized on the premise of not needing manual labeling, so that the model training efficiency is improved.

In some embodiments, to enrich the samples of the second training data set, further training data may be added in the second training data set. As an example, the computing device 120 may determine a portion of the training data in the first training data set that has the same prediction result as the corresponding annotation information as a second portion of the second training data set. That is, the computing device 120 may select a small amount of training data from a large amount of predicted correct training data and add those training data to the second training data set.

As another example, where the corresponding annotation information of the training data is used to indicate a range within which the predicted outcome should fall, the computing device 120 may determine whether the predicted outcome of the training data is at a boundary of the range, or, alternatively, whether the predicted outcome is equal to a threshold value for the range. If the predicted outcome is equal to the threshold value of the range indicated by the corresponding labeling information of the training data, the computing device 120 may determine the training data as a third portion of the second training data set. That is, the computing device 120 may add training data with "refractory" predictions to the second training data set for targeted model training.

Fig. 6 shows a block diagram of an apparatus 600 for model training according to an embodiment of the disclosure. As shown in fig. 6, the apparatus 600 may include: a first model training module 602 configured to train the model based on a first training data set comprising annotation information; a prediction determination module 604 configured to determine a prediction of training data in the first training data set using the trained model; a first training data set determination module 606 configured to determine the training data as at least a first portion of a second training data set if the prediction result is different from corresponding annotation information of the training data, the second training data set being different from the first training data set; a second model training module 608 is configured to train the model based on the second training data set.

In an embodiment of the present disclosure, the model is an entity recognition model.

In an embodiment of the present disclosure, the prediction result determination module 604 includes: an effect parameter determination module configured to determine an effect parameter of the trained model; and a decision module configured to apply training data in the first training data set to the trained model to determine the prediction result if the determined effect parameter does not meet a predetermined effect.

In an embodiment of the present disclosure, the apparatus 600 further comprises: a second training data set determination module configured to determine a portion of training data in the training data in which the prediction result in the first training data set is the same as the corresponding annotation information as a second portion of the second training data set, the second portion being different from the first portion.

In an embodiment of the present disclosure, the corresponding labeling information of the training data is used to indicate the range within which the prediction result should fall, and the apparatus 600 further includes: a third training data set determination module configured to determine the training data as a third portion of the second training data set if the prediction result is equal to a threshold value of the range indicated by the corresponding annotation information of the training data, the third portion being different from the first portion.

In an embodiment of the present disclosure, the first model training module 602 is further configured to: the first training data set is applied to the model to be trained to determine parameters of convergence of the model.

Fig. 7 shows a block diagram of an apparatus 700 for optimizing a training data set according to an embodiment of the disclosure. As shown in fig. 7, the apparatus 700 may include: a prediction determination module 702 configured to determine a prediction of training data in a first training data set for training a trained model using the model; and a first training data set determination module 704 configured to determine the training data as at least a first portion of a second training data set, if the prediction result is different from the corresponding annotation information of the training data, the second training data set being different from the first training data set and being used for further training the model.

In an embodiment of the present disclosure, the apparatus 700 further comprises: a second training data set determination module configured to determine a portion of training data in the training data in which the prediction result in the first training data set is the same as the corresponding annotation information as a second portion of the second training data set, the second portion being different from the first portion.

In an embodiment of the present disclosure, the corresponding labeling information of the training data is used to indicate the range within which the prediction result should fall, and the apparatus 700 further includes: a third training data set determination module configured to determine the training data as a third portion of the second training data set if the prediction result is equal to a threshold value of the range indicated by the corresponding annotation information of the training data, the third portion being different from the first portion.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

Fig. 8 illustrates a block diagram of a computing device 800 capable of implementing various embodiments of the disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 8, the apparatus 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the device 800 can also be stored. The computing unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to the bus 804.

Various components in device 800 are connected to I/O interface 805, including: an input unit 806 such as a keyboard, mouse, etc.; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, etc.; and a communication unit 809, such as a network card, modem, wireless communication transceiver, or the like. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 801 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 801 performs the various methods and processes described above, such as processes 200, 300, 500. For example, in some embodiments, the processes 200, 300, 500 may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 808. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 800 via ROM 802 and/or communication unit 809. When the computer program is loaded into RAM 803 and executed by computing unit 801, one or more steps of processes 200, 300, 500 described above may be performed. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the processes 200, 300, 500 by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A model training method, comprising:

training the model based on a first training dataset comprising annotation information;

determining a predicted outcome of training data in the first training data set using the trained model;

determining the training data as at least a first portion of a second training data set if the prediction result is different from corresponding labeling information of the training data, the second training data set being different from the first training data set;

training the model based on the second training data set,

wherein determining the prediction result comprises:

determining an effect parameter of the trained model; and

if the determined effect parameter does not meet a predetermined effect, applying training data in the first training data set to the trained model to determine the predicted outcome,

wherein the method further comprises:

determining a part of training data in the training data of which the predicted result is the same as the corresponding labeling information in the first training data set as a second part of the second training data set, the second part being different from the first part,

wherein the respective annotation information of the training data is used to indicate the range within which the prediction result should fall, and the method further comprises:

and if the predicted result is equal to a critical value of the range indicated by the corresponding labeling information of the training data, determining the training data as a third part of the second training data set, wherein the third part is different from the first part.

2. The method of claim 1, wherein the model is an entity recognition model.

3. The method of claim 1, wherein training the model based on the first training data set comprises:

the first training data set is applied to the model to be trained to determine parameters of convergence of the model.

4. A method of optimizing a training dataset, comprising:

determining, using the trained model, a prediction of training data in a first training data set for training the model; and

determining the training data as at least a first part of a second training data set, which is different from the first training data set, if the prediction result is different from the corresponding labeling information of the training data, and for further training the model,

wherein determining the prediction result comprises:

determining an effect parameter of the trained model; and

wherein the method further comprises:

5. A model training apparatus comprising:

a first model training module configured to train the model based on a first training data set containing annotation information;

a prediction result determination module configured to determine a prediction result of training data in the first training data set using the trained model;

a first training data set determination module configured to determine the training data as at least a first portion of a second training data set if the prediction result is different from corresponding annotation information of the training data, the second training data set being different from the first training data set;

a second model training module configured to train the model based on the second training data set,

wherein the prediction result determining module includes:

an effect parameter determination module configured to determine an effect parameter of the trained model; and

a decision module configured to apply training data in the first training data set to the trained model to determine the prediction result if the determined effect parameter does not meet a predetermined effect,

wherein the apparatus further comprises:

a second training data set determination module configured to determine a portion of training data in the training data in which the prediction result in the first training data set is the same as the corresponding annotation information as a second portion of the second training data set, the second portion being different from the first portion,

wherein the respective annotation information of the training data is used to indicate the range within which the prediction result should fall, and the apparatus further comprises:

a third training data set determination module configured to determine the training data as a third portion of the second training data set if the prediction result is equal to a threshold value of the range indicated by the corresponding annotation information of the training data, the third portion being different from the first portion.

6. The apparatus of claim 5, wherein the model is an entity recognition model.

7. The apparatus of claim 5, wherein the first model training module is further configured to:

8. An apparatus for optimizing a training data set, comprising:

a prediction result determination module configured to determine a prediction result of training data in a first training data set for training a trained model using the model; and

a first training data set determination module configured to determine the training data as at least a first portion of a second training data set, if the prediction result is different from corresponding annotation information of the training data, the second training data set being different from the first training data set and being used for further training the model,

wherein the prediction result determining module includes:

wherein the apparatus further comprises:

9. An electronic device, the electronic device comprising:

one or more processors; and

storage means for storing one or more programs which when executed by the one or more processors cause the one or more processors to implement the method of any of claims 1-4.

10. A computer readable storage medium having stored thereon a computer program which when executed by a processor implements the method of any of claims 1-4.