CN113221564B

CN113221564B - Method, device, electronic equipment and storage medium for training entity recognition model

Info

Publication number: CN113221564B
Application number: CN202110475117.5A
Authority: CN
Inventors: 王述; 冯知凡; 柴春光; 朱勇
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-04-29
Filing date: 2021-04-29
Publication date: 2024-03-01
Anticipated expiration: 2041-04-29
Also published as: CN113221564A

Abstract

The disclosure provides a method, a device, electronic equipment and a storage medium for training an entity recognition model, and relates to the field of artificial intelligence, in particular to the field of deep learning and knowledge maps. The specific implementation scheme is as follows: determining a problem category of the received training data set; selecting a target model frame corresponding to the determined problem category from a plurality of candidate model frames for training the entity recognition model; and training the entity recognition model by inputting the training dataset into the target model framework. In this way, the present disclosure can automatically locate data scene problems for sample data, reducing a significant amount of labor costs.

Description

Method, device, electronic equipment and storage medium for training entity recognition model

Technical Field

The present disclosure relates to the field of artificial intelligence, and in particular to the field of machine learning, and in particular to a method, apparatus, electronic device, computer readable storage medium and computer program product for training an entity recognition model.

Background

Named entity recognition (Named Entity Recognition, NER) is an important basic tool for many natural language processing tasks such as information extraction, question-answering systems, syntactic analysis, machine translation, etc., as an important component of natural language processing. The accuracy of named entity recognition determines the effect of downstream tasks. However, when the named entity recognition technology is landed in a real business scenario, problems such as expensive labeling cost, insufficient model generalization migration capability, entity nesting problem, discontinuous entity problem and the like are faced. The difficulties increase the difficulty of solving the named entity recognition problem and obtaining the ideal effect in the real business scene. .

Disclosure of Invention

The present disclosure provides a method, apparatus, electronic device, computer-readable storage medium, and computer program product for training an entity recognition model.

According to a first aspect of the present disclosure, a method for training an entity recognition model is provided. The method may include determining a problem category for the received training data set. The method may further include selecting a target model frame corresponding to the determined problem category from a plurality of candidate model frames for training the entity recognition model. Additionally, the method may further include training the entity recognition model by inputting the training dataset into the target model framework.

In a second aspect of the present disclosure, there is provided an apparatus for training an entity recognition model, comprising: a problem category determination module configured to determine a problem category of the received training data set; a target model frame selection module configured to select a target model frame corresponding to the determined problem category from a plurality of candidate model frames for training the entity recognition model; and a entity recognition model training module configured to train the entity recognition model by inputting the training dataset into the target model framework.

In a third aspect of the present disclosure, an electronic device is provided that includes one or more processors; and storage means for storing one or more programs that, when executed by the one or more processors, cause the one or more processors to implement a method according to the first aspect of the present disclosure.

In a fourth aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon a computer program which when executed by a processor implements a method according to the first aspect of the present disclosure.

In a fifth aspect of the present disclosure, there is provided a computer program product, which when executed by a processor, implements a method according to the first aspect of the present disclosure.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 illustrates a schematic diagram of an example environment in which various embodiments of the present disclosure may be implemented;

FIG. 2 illustrates a flow chart of a process for training an entity recognition model in accordance with an embodiment of the present disclosure;

FIG. 3 illustrates a flowchart of a detailed process for training an entity recognition model, according to an embodiment of the present disclosure;

FIG. 4 shows a schematic block diagram of a primary architecture for training an entity recognition model, according to an embodiment of the present disclosure;

FIG. 5 illustrates a block diagram of an apparatus for training an entity recognition model, according to an embodiment of the present disclosure; and

FIG. 6 illustrates a block diagram of a computing device capable of implementing various embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In describing embodiments of the present disclosure, the term "comprising" and its like should be taken to be open-ended, i.e., including, but not limited to. The term "based on" should be understood as "based at least in part on". The term "one embodiment" or "the embodiment" should be understood as "at least one embodiment". The terms "first," "second," and the like, may refer to different or the same object. Other explicit and implicit definitions are also possible below.

It should be appreciated that a difficulty with named entity recognition is that when the entity recognition model recognizes text, it is often faced with different categories of context problems. For example, when the text "Nanjing Yangtze bridge" is identified, an entity nesting problem is faced, i.e., the text has two entities, "Nanjing (city)" and "Nanjing Yangtze bridge (place)". Upon recognition of the text "urinary tract, bladder, renal colic", a non-continuous entity problem is faced, i.e., there are three non-continuous entities for the text: "urethrodynia", "cystalgia" and "renal colic". Upon identifying the text "apple", a type confusion entity problem is faced, i.e., the inability to identify whether the text belongs to fruit, company, or brand. In addition, the recognition of the text "1, 1-tris (p-hydroxyphenyl) ethane triglycidyl ether" is faced with the long boundary (span) entity problem, i.e., the entity length is too long. In the face of these situations, the recognition results are often undesirable.

The traditional mode for solving the problems is mainly a rule template mode, namely, a mode of combining a domain dictionary based on a rule template to improve the condition that the recognition performance effect of the named entity in the service scene does not reach the standard. For example, multiple word segmentation tools and syntactic analysis tools can be used for fusion to extract candidate entities, and NER is performed in combination with a dictionary and a template, and in addition, the entity dictionary can be combined with a rule template to quickly solve some problems with difficulty. However, there is a problem in that the acquisition of the entity dictionary data is relatively labor-intensive and relatively expensive. In addition, the scheme is too dependent on the perfection of the entity dictionary and the effect of the basic word segmentation algorithm, and the generalization migration capability of the model is insufficient.

In order to solve the problems, the training scheme of the entity recognition model is improved, so that a specific model frame for training the model can be selected according to the category of the scene problem, and the entity recognition model with excellent performance can be trained.

According to an embodiment of the present disclosure, a training scheme of an entity recognition model is provided. In this approach, the categories of major scenario issues that the training data set may face may be determined first when training the model. Based on the determined problem category, a model framework capable of purposefully solving the problem of the category can be selected from a plurality of model frameworks. The model to be trained corresponds in effect to an instance of the selected model framework, and thus model training can be performed based on the selected model framework and the training data set. In this way, the model framework can be selected for the scene problem, so that the performance of the model can be optimized. Meanwhile, the whole process of model training, including the model frame selection process, does not need to be manually participated, so that the labor cost can be saved.

Embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings. FIG. 1 illustrates a schematic diagram of an example environment 100 in which various embodiments of the present disclosure may be implemented. As shown in fig. 1, the example environment 100 includes input text 110 to be recognized, a computing device 120, and recognition results 130 determined via the computing device 120.

In some embodiments, the text 110 to be identified entered by the user may be any text string. By identifying named entities in the text string, numerous natural language processing tasks such as information extraction, question-answering systems, syntactic analysis, machine translation, and the like can be further realized. Based on the processing of the computing device 120, the recognition result 130 of the text to be recognized 110 may be determined, for example, the recognition result of the text to be recognized "Zhang Sansing AAA" is "Zhang San" (name of person) "and" AAA "(name of song may be Zhang San name of a singer, AAA may be name of song).

In some embodiments, computing device 120 may include, but is not limited to, personal computers, server computers, hand-held or laptop devices, mobile devices (such as mobile phones, personal digital assistants PDAs, media players, etc.), consumer electronics, minicomputers, mainframe computers, cloud computing resources, and the like.

The training and use of models in computing device 120 will be described below in terms of a machine learning model. As shown in FIG. 1, the example environment 100 may generally include a model training system 160 and a model application system 170. As an example, model training system 160 and/or model application system 170 may be implemented in computing device 120 as shown in fig. 1. It should be understood that the description of the structure and functionality of the example environment 100 is for illustrative purposes only and is not intended to limit the scope of the subject matter described herein. The subject matter described herein may be implemented in different structures and/or functions.

As previously described, the process of determining the recognition result 130 of the text 110 to be recognized can be divided into two phases: a model training phase and a model application phase. As an example, in a model training phase, model training system 160 may utilize training data set 150 to train recognition model 140 for implementing named entity recognition. It should be appreciated that the training data set 150 may be a combination of a plurality of reference feature data (as input to the model 140) and corresponding reference annotation information (as output from the model 140). In the model application phase, the model application system 170 may receive a trained recognition model 140 to determine recognition results 130 by the recognition model 140 based on the text 110 to be recognized.

In other embodiments, the recognition model 140 may be constructed as a learning network. In some embodiments, the learning network may include a plurality of networks, wherein each network may be a multi-layer neural network, which may be composed of a large number of neurons. Through the training process, the corresponding parameters of the neurons in each network can be determined. The parameters of the neurons in these networks are collectively referred to as parameters of the recognition model 140.

The training process of the recognition model 140 may be performed in an iterative manner until at least some of the parameters of the recognition model 140 converge or until a predetermined number of iterations is reached, thereby obtaining final model parameters.

The technical solutions described above are only for example and do not limit the invention. It should be understood that the individual networks may also be arranged in other ways and connections. In order to more clearly explain the principles of the disclosed solution, the process of model training will be described in more detail below with reference to fig. 2.

FIG. 2 illustrates a flow chart of a process 200 for training an entity recognition model in accordance with an embodiment of the present disclosure. In some embodiments, process 200 may be implemented in computing device 120 of fig. 1. A process 200 for training an entity recognition model according to an embodiment of the present disclosure is now described with reference to fig. 2 in conjunction with fig. 1. For ease of understanding, the specific examples mentioned in the following description are illustrative and are not intended to limit the scope of the disclosure.

At 202, computing device 120 may determine a problem category for received training data set 150. It should be appreciated that the training data set may be a set of small clean samples, a set of sufficiently clean samples, or a set of sufficiently noisy samples. In some embodiments, the problem categories for the training data set in different scenarios may include, but are not limited to: entity nesting problems, non-contiguous entity problems, type confusion entity problems, long boundary entity problems, and the like. Entity nesting problem means that when identifying a text such as "Nanjing Yangtze river bridge", there will be two nesting situations where there are two entities "Nanjing (city)" and "Nanjing Yangtze river bridge (place)". The discontinuous entity problem is that when recognizing text such as "urethra, bladder, renal colic", there are three discontinuous entities "urethral pain", "bladder pain" and "renal colic" which are faced. A type confusion entity problem is one in which, when identifying text such as "apple," it is faced with the inability to identify whether the text belongs to fruit, company, or brand. In addition, the long boundary entity problem refers to a case where an entity length is excessively long (i.e., the number of text characters exceeds a threshold number) when recognizing a text such as "1, 1-tris (p-hydroxyphenyl) ethane triglycidyl ether".

In some embodiments, to determine the problem category, the computing device 120 may identify a respective problem category for each training data in the training data set and make statistics of the problem categories for all training data in the training data set. If the ratio of the number of training data corresponding to the identified one of the problem categories to the total amount of training data in the training data set is greater than or equal to the threshold ratio, the computing device 120 may determine the problem category as the problem category. For example, in a particular training data set, the training data identified as having an entity nesting problem may be more than 60% of the total data volume of the training data set, or more heavily weighted than other problem categories, the entity nesting problem may be determined to be a problem category for the training data set. By determining the problem category, a basis can be provided for subsequent model framework selection, so that targeted model training can be realized.

At 204, the computing device 120 may select a target model frame corresponding to the determined problem category from a plurality of candidate model frames for training the recognition model 150. It should be appreciated that the plurality of alternative model frameworks may include a sequence annotation framework. Sequence labeling framework means that each sequence position is labeled as a label, such as the BIOES labeling system, and CRF is a relatively common decoder. The framework can generally address discontinuous physical issues. The multiple alternative model frames can also comprise a pointer labeling frame, wherein the frame is used for identifying the start and stop positions of each entity boundary and then classifying the entity boundary into categories, and the category is mainly divided into a multi-pointer labeling frame and a machine reading understanding frame. The framework can generally effectively solve entity nesting problems, type confusion entity problems, and long-boundary entity problems. The multiple alternative model frameworks may also include a multi-headed selection framework, i.e., marking each word pair or group of words (token pair) may effectively solve the entity nesting problem. The multiple alternative model frameworks can also comprise a fragment arrangement framework, the framework can enumerate all entity boundaries, and then the entity types can be determined by classifying the boundaries, so that the entity nesting and type confusion entity problems can be effectively solved.

In some embodiments, the target model framework may be selected in a number of ways as follows. As an example, if the determined problem category is an entity nesting problem, the computing device 120 may select at least one model frame of a pointer annotation model frame, a multi-headed selection model frame, and a fragment arrangement model frame as the target model frame. If the determined problem category is a non-continuous entity problem, the computing device 120 may select a sequence annotation model framework as the target model framework. If the determined problem category is a type-confusing entity problem, the computing device 120 may select a pointer-labeling model framework or a fragment arrangement model framework as the target model framework. If the determined problem category is a long boundary entity problem, the computing device 120 may select a pointer-labeling model framework as the target model framework. In this way, as many scene problems as possible can be accounted for and model training performed in a targeted manner.

It should be understood that when determining the problem category of the training data set, if the two problem categories in the training data set have the same ratio, two model frames respectively applicable to the two problem categories may be selected, and two models are respectively trained on the two selected model frames based on the training data set, and the prediction results of the two models are combined, so that a final prediction result may be obtained.

At 206, the computing device 120 may train the recognition model 140 by inputting the training dataset 150 into the target model framework. In some embodiments, computing device 120 may input training data set 150 into a target model framework to obtain a processed training data set. It should be appreciated that each training data in the processed training data set contains format information corresponding to the target model framework. As an example, assuming that the input training data is "respiratory center affected", if the above-described sequence labeling frame is selected, the training data processed by the frame may be illustratively "respiratory (B-site) in (I-site) pivot (I-site) affected (O)" (based on the BIO format). Thereafter, the computing device 120 may train the entity recognition model based at least on the processed training data set and the target model framework. In this way, targeted training can be performed, so that the performance of the model can be improved.

In some embodiments, the recognition model 140 may be trained based on knowledge data in addition to the processed training data set and the target model framework. In some embodiments, the knowledge data consists essentially of at least one of a universal knowledge graph and domain entity dictionary data. In this way, more reliable knowledge can be added to the training process to improve the performance of the model.

It should also be appreciated that the present disclosure is capable of enhancing model performance in the sense that the process of model training may be optimized. FIG. 3 illustrates a flowchart of a detailed process 300 for training an entity recognition model, according to an embodiment of the present disclosure. In some embodiments, process 300 may be implemented in computing device 120 of fig. 1. A process 300 according to an embodiment of the present disclosure is now described with reference to fig. 3 in conjunction with fig. 1. For ease of understanding, the specific examples mentioned in the following description are illustrative and are not intended to limit the scope of the disclosure.

At 302, the computing device 120 may determine the performance parameters of the trained recognition model 140, that is, may evaluate the performance of the recognition model 140. In some embodiments, the manner in which the effect is evaluated may be to evaluate the accuracy, recall, of the recognition model 140. As an example, in the labeled training data "Zhang Sansing AAA", the entity type of "Zhang Saning" is labeled as "singer" and the entity type of "AAA" is labeled as "song". In the model training process, after training data of "Zhang Sansing AAA" is input into the recognition model 140, if the entity type of "Zhang Sanng" is "actor" and the entity type of "AAA" is "song", the entity type of "Zhang Sanng" is not predicted in error and the entity type of "singer" is not predicted, so that the prediction result of "Zhang Sanng" is not recalled. Thus, computing device 120 may determine recall for each training data one by one and count recall as an effect parameter.

At 304, the computing device 120 may compare the determined effect parameter to a predetermined effect. For example, the determined recall may be compared to a threshold recall. Upon determining that the effect meets the criteria, the computing device 120 may output a trained model 306. When it is determined that the effect is not up to standard, 308 is entered. At 308, computing device 120 may generate an enhanced training data set from the training data set. In some embodiments, the computing device 120 may effect the trained model by: such as text enhancement, vocabulary enhancement, active learning, semi-supervised learning, confidence learning, etc.

The text enhancement mode is to enhance the model effect by introducing a pre-training language model and enhancing the marked data/unmarked data. Vocabulary enhancement refers to helping to model and judge entity boundaries by introducing entity vocabulary information. The active learning is to acquire sample data of the class difficult to compare through a machine learning method, and confirm and audit again manually or automatically, so that the marked data is trained by using a supervised learning model or a semi-supervised learning model again, and the effect of the model is gradually improved. Semi-supervised learning is to use a small amount of marked data and a large amount of unmarked data to improve the effect of supervised learning. Confidence learning is mainly used for identifying label errors in samples and describing noise of labels to clean sample data.

At 310, the computing device 120 may train the entity recognition model based on the enhanced training dataset and the target model framework. The training may be repeated until the effect parameter meets the standard. Through the training data enhancement mode, the training process can be further optimized under the condition that the trained model does not reach the standard, and therefore the performance of the model is finally improved.

Through the embodiment, a training mode of a general entity identification model of a scene is provided. On one hand, the method and the device can automatically position the data scene problem aiming at the sample data, and reduce a large amount of labor cost. On the other hand, the method and the device can automatically select the model frames for model training according to different scenes, and fully utilize the advantages of the different model frames to integrate and output the models conforming to the business problems. In addition, different effect optimization schemes are formulated according to the effects of the model and sample data conditions, so that a complete and effective technical framework is formed, and various problems in complex business scenes are identified from entities in different dimensions and different data conditions.

In order to more clearly demonstrate the technical solution of the present disclosure, a model training architecture according to one of the specific embodiments of the present disclosure will be described below with reference to fig. 4. Fig. 4 shows a schematic block diagram of a primary architecture 400 for training an entity recognition model, according to an embodiment of the present disclosure.

As shown in FIG. 4, training data 410 may be a training data set having a large number of training samples and corresponding annotation information. To complete model training, training data 410 is input into computing device 420. The computing device 420 contains a plurality of units for model training, for example, a problem category determination unit 421, a model framework selection unit 422, a model training unit 423, and a knowledge database 424.

In the problem category determination unit 421, the problem category thereof may be determined based on the training data 410. For example, the statistical analysis described above may be performed on the sample data, and the determination of the problem category may be performed based on the problem distribution of the sample data. As an example, the problem category determination unit 421 may determine one problem category from an entity nesting problem, a discontinuous entity problem, a type confusion entity problem, a long boundary entity problem, or the like.

At model framework selection unit 422, a target model framework corresponding to the determined problem category may be selected from a plurality of candidate model frameworks for model training. As an example, the alternative model framework may include a sequence annotation framework, a pointer annotation framework, a multi-head selection framework, a fragment arrangement framework, and the like.

When an appropriate model framework is determined, model training unit 423 may perform a specific process of model training based on the framework and training data 410, so that entity recognition model 430 may be obtained. Of course, as shown in FIG. 4, model training may be further based on knowledge data in knowledge database 424 in addition to processed training data 410 and selected model frames. In some embodiments, the knowledge data may include a universal knowledge graph and/or domain entity dictionary data.

Fig. 5 illustrates a block diagram of an apparatus 500 for training an entity recognition model, according to an embodiment of the present disclosure. As shown in fig. 5, the apparatus 500 may include: a problem category determination module 502 configured to determine a problem category of the received training data set; a target model frame selection module 504 configured to select a target model frame corresponding to the determined problem category from a plurality of candidate model frames for training the entity recognition model; and a entity recognition model training module 506 configured to train the entity recognition model by inputting the training dataset into the target model framework.

In an embodiment of the present disclosure, the problem category determination module 502 may include: a problem category identification module configured to identify a respective problem category for each training data in the training data set; and a determination module configured to determine one of the problem categories as the problem category if a ratio of a number of training data corresponding to the identified one of the problem categories to a total amount of training data in the training data set is greater than or equal to a threshold ratio.

In embodiments of the present disclosure, the problem category may include at least one of: entity nesting problems; a discontinuous physical problem; type confusion entity problems; and long boundary entity problems.

In an embodiment of the present disclosure, the object model framework selection module may be further configured to: if the problem category is the entity nesting problem, selecting at least one model frame from a pointer labeling model frame, a multi-head selecting model frame and a fragment arrangement model frame as the target model frame; if the problem category is the discontinuous entity problem, selecting a sequence annotation model framework as the target model framework; if the problem category is the type confusion entity problem, selecting a pointer annotation model framework or a fragment arrangement model framework as the target model framework; or if the problem category is the long boundary entity problem, selecting a pointer annotation model framework as the target model framework.

In an embodiment of the present disclosure, the entity recognition model training module 506 may include: a data set processing module configured to input the training data set into the target model framework to obtain a processed training data set, each training data in the processed training data set containing format information corresponding to the target model framework; and a training module configured to train the entity recognition model based at least on the processed training dataset and the target model framework.

In an embodiment of the present disclosure, the training module may be further configured to: the entity recognition model is trained based on the processed training data set, the target model framework, and a domain lexicon.

In an embodiment of the present disclosure, the apparatus 500 may further include: an effect parameter determination module configured to determine an effect parameter of the trained entity recognition model; an enhanced data set generation module configured to generate an enhanced training data set based on the training data set in response to the determined effect parameter not conforming to a predetermined effect; and an enhanced training module configured to train the entity recognition model based on the enhanced training dataset and the target model framework.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

Fig. 6 illustrates a block diagram of a computing device 600 capable of implementing various embodiments of the disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 6, the apparatus 600 includes a computing unit 601 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 may also be stored. The computing unit 601, ROM 602, and RAM 603 are connected to each other by a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

Various components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, mouse, etc.; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 601 performs the various methods and processes described above, such as processes 200, 300. For example, in some embodiments, the processes 200, 300 may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into RAM 603 and executed by the computing unit 601, one or more of the steps of the processes 200, 300 described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured to perform the processes 200, 300 by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A method of training an entity recognition model, comprising:

determining a problem category of the received training data set;

selecting a target model frame corresponding to the determined problem category from a plurality of candidate model frames for training the entity recognition model; and

training the entity recognition model by inputting the training dataset into the target model framework,

wherein the problem category includes at least one of:

entity nesting problems;

a discontinuous physical problem;

type confusion entity problems; and

long boundary entity problem.

2. The method of claim 1, wherein determining the problem category comprises:

identifying a respective problem category for each training data in the training data set; and

and if the ratio of the number of training data corresponding to the identified one problem category to the total amount of training data in the training data set is greater than or equal to a threshold ratio, determining the one problem category as the problem category.

3. The method of claim 1, wherein selecting the target model framework comprises:

if the problem category is the entity nesting problem, selecting at least one model frame from a pointer labeling model frame, a multi-head selecting model frame and a fragment arrangement model frame as the target model frame;

if the problem category is the discontinuous entity problem, selecting a sequence annotation model framework as the target model framework;

if the problem category is the type confusion entity problem, selecting a pointer annotation model framework or a fragment arrangement model framework as the target model framework; or alternatively

And if the problem category is the long-boundary entity problem, selecting a pointer annotation model frame as the target model frame.

4. The method of claim 1, wherein training the entity recognition model comprises:

inputting the training data set into the target model framework to obtain a processed training data set, each training data set in the processed training data set containing format information corresponding to the target model framework; and

the entity recognition model is trained based at least on the processed training dataset and the target model framework.

5. The method of claim 4, wherein training the entity recognition model based at least on the processed training dataset and the target model framework comprises:

the entity recognition model is trained based on the processed training data set, the target model framework, and a domain lexicon.

6. The method of claim 1, further comprising:

determining an effect parameter of the trained entity recognition model;

generating an enhanced training data set based on the training data set in response to the determined effect parameter not meeting a predetermined effect; and

training the entity recognition model based on the enhanced training dataset and the target model framework.

7. An apparatus for training an entity recognition model, comprising:

a problem category determination module configured to determine a problem category of the received training data set;

a target model frame selection module configured to select a target model frame corresponding to the determined problem category from a plurality of candidate model frames for training the entity recognition model; and

a entity recognition model training module configured to train the entity recognition model by inputting the training data set into the target model framework,

wherein the problem category includes at least one of:

entity nesting problems;

a discontinuous physical problem;

type confusion entity problems; and

long boundary entity problem.

8. The apparatus of claim 7, wherein the problem category determination module comprises:

a problem category identification module configured to identify a respective problem category for each training data in the training data set; and

a determination module configured to determine one of the problem categories as the problem category if a ratio of a number of training data corresponding to the identified one of the problem categories to a total amount of training data in the training data set is greater than or equal to a threshold ratio.

9. The apparatus of claim 7, wherein the object model framework selection module is further configured to:

10. The apparatus of claim 7, wherein the entity recognition model training module comprises:

a data set processing module configured to input the training data set into the target model framework to obtain a processed training data set, each training data in the processed training data set containing format information corresponding to the target model framework; and

a training module configured to train the entity recognition model based at least on the processed training dataset and the target model framework.

11. The apparatus of claim 10, wherein the training module is further configured to:

12. The apparatus of claim 7, further comprising:

an effect parameter determination module configured to determine an effect parameter of the trained entity recognition model;

an enhanced data set generation module configured to generate an enhanced training data set based on the training data set in response to the determined effect parameter not conforming to a predetermined effect; and

an enhanced training module configured to train the entity recognition model based on the enhanced training dataset and the target model framework.

13. An electronic device, the electronic device comprising:

one or more processors; and

storage means for storing one or more programs which when executed by the one or more processors cause the one or more processors to implement the method of any of claims 1-6.

14. A computer readable storage medium having stored thereon a computer program which when executed by a processor implements the method of any of claims 1-6.