CN109543153B - Sequence labeling system and method - Google Patents

Sequence labeling system and method Download PDF

Info

Publication number
CN109543153B
CN109543153B CN201811344499.2A CN201811344499A CN109543153B CN 109543153 B CN109543153 B CN 109543153B CN 201811344499 A CN201811344499 A CN 201811344499A CN 109543153 B CN109543153 B CN 109543153B
Authority
CN
China
Prior art keywords
strategy
module
labeling
model
elements
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811344499.2A
Other languages
Chinese (zh)
Other versions
CN109543153A (en
Inventor
纪大胜
崔诚煜
刘世林
丁国栋
曾途
吴桐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Business Big Data Technology Co Ltd
Original Assignee
Chengdu Business Big Data Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Business Big Data Technology Co Ltd filed Critical Chengdu Business Big Data Technology Co Ltd
Priority to CN201811344499.2A priority Critical patent/CN109543153B/en
Publication of CN109543153A publication Critical patent/CN109543153A/en
Application granted granted Critical
Publication of CN109543153B publication Critical patent/CN109543153B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • G06F40/117Tagging; Marking up; Designating a block; Setting of attributes

Abstract

The application relates to a sequence labeling system, which comprises a model labeling module, an adjusting module and a strategy library, wherein the output end of the model labeling module is connected with the input end of the adjusting module; the model labeling module is used for carrying out sequence labeling on the input text data; the strategy library is stored with one or more strategies, and the adjustment module is used for adjusting the strategies from the strategy library and adjusting the labeling results output by the model labeling module according to the strategies and the input text data. The system or the method of the application can label the text sequence, thereby enhancing the accuracy and applicability of the original model label.

Description

Sequence labeling system and method
Technical Field
The application relates to the technical field of natural language processing, in particular to a sequence labeling system and a sequence labeling method.
Background
Knowledge and information of human society is mostly recorded in the form of text. The knowledge and information are described in terms of human language words, and the machine cannot directly recognize the knowledge and information. Natural language processing is an algorithmic technique to process human natural language text, where word segmentation (Words Segmentation), part-of-speech Tagging (POS taging), and named entity recognition (Named Entity Recognition) are fundamental tasks. Word segmentation, namely dividing a sentence into word sequences from word sequences; part of speech tagging, which is to assign a part of speech to each word, such as nouns, verbs, adjectives, etc.; named entity recognition is the extraction of nouns of a particular type in the text, such as "Xiaoming" (type: name of person), "today's morning" (type: time). The word segmentation, part-of-speech tagging, named entity recognition can all be translated into sequence tagging (Sequence Labeling) questions.
As shown in fig. 1, the sequence labeling problem is mostly performed by using a model+crf, that is, the sequence labeling is performed by using a model, and then the correction is performed by using a CRF probability model. For example, the Chinese patent application with the application number 201710828497.X and the name of text sequence labeling system and method based on Bi-LSTM and CRF is to label the sequences in the mode of Bi-LSTM model and CRF model. The prior art is a supervised machine learning algorithm, training of a model is completed through a large number of labeling corpuses, and the trained model can execute a sequence labeling task on new data (unlabeled data). However, because there may be a large difference between the new data and the training data, such as the occurrence of proper nouns (e.g., the name of a person is "abruptness", which is not present in the training data), or the undercoverage of the training data, the uneven distribution, etc., the trained model may not process some text correctly, and the re-labeling of the data is time-consuming and laborious.
Disclosure of Invention
The application aims to overcome the defects in the prior art and provide a sequence labeling system and a sequence labeling method so as to improve the accuracy of sequence labeling.
In order to achieve the above object, the embodiment of the present application provides the following technical solutions:
the sequence labeling system comprises a model labeling module, an adjusting module and a strategy library, wherein the output end of the model labeling module is connected with the input end of the adjusting module;
the model labeling module is used for carrying out sequence labeling on the input text data;
the strategy library is stored with one or more strategies, and the adjustment module is used for adjusting the strategies from the strategy library and adjusting the labeling results output by the model labeling module according to the strategies and the input text data.
According to the embodiment of the application, each strategy comprises three elements of words, boundaries and scores, and the adjustment module is specifically used for:
sequentially calling a strategy from the strategy library, and calling a strategy after the current strategy is executed until all strategies are traversed;
aiming at the current strategy, matching word elements in the current strategy with the input text data, and calling a next strategy if the matching is unsuccessful; if the matching is successful, the sequence items and the scores to be adjusted are obtained according to the boundary elements and the score elements, and the scores of the corresponding sequence items in the labeling results output by the model labeling module are adjusted.
On the other hand, the embodiment of the application also provides a sequence labeling method, which comprises the following steps:
step 1, performing preliminary sequence labeling on input text data;
and 2, invoking the strategy from the strategy library, and adjusting the primary labeling result according to the strategy and the input text data.
According to an embodiment of the present application, the step 2 specifically includes the following steps:
step 21, a strategy is called from a strategy library;
step 22, matching word elements in the current strategy with the input text data, and returning to the step 21 if the matching is unsuccessful; if the matching is successful, step 23 is advanced;
step 23, obtaining sequence items and scores to be adjusted according to boundary elements and score elements in the current strategy, and adjusting scores of corresponding sequence items in the labeling results output by the model labeling module;
step 24, determining whether all the measurements in the policy repository have been performed, if not, returning to step 21, if so, ending.
In yet another aspect, embodiments of the present application also provide a computer-readable storage medium comprising computer-readable instructions that, when executed, cause a processor to perform operations in the methods described in embodiments of the present application.
In still another aspect, an embodiment of the present application also provides an electronic device, including: a memory storing program instructions; and the processor is connected with the memory and executes program instructions in the memory to realize the steps in the method in the embodiment of the application.
Compared with the prior art, the sequence labeling system of the adjusting module is added, so that the problem that certain texts cannot be accurately identified before is successfully solved, and the original identification capability is not damaged. Even in the field of more text corpus difference, the problem that most special entities cannot be identified can be solved by adding an adjustment strategy, so that the multiplexing rate of the original model is greatly improved, and the production efficiency is effectively improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a prior art sequence labeling method.
FIG. 2 is a flow chart of one example of a process employing the method shown in FIG. 1.
Fig. 3 is a schematic block diagram of a sequence labeling system described in an embodiment.
FIG. 4 is a flowchart illustrating the operation of the adjustment module according to an embodiment.
FIG. 5 is a flow chart of one example of a process employing the system shown in FIG. 3 in an embodiment.
Fig. 6 is a block diagram of an electronic device according to an embodiment.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the application, as presented in the figures, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present application.
In order to facilitate understanding of the sequence marking system of the present application, a brief description of a sequence marking method in the prior art will be given here.
As shown in fig. 1, the sequence labeling achieves the purposes of blocking and classifying by assigning a label to each unit (word or word). Such as the text "the small Ming is late in the morning today. "small (B) indicates that (I) is earlier (I) and (I) is later (B) to (I) on day (B). (B) "where B represents the beginning (Begin), I represents the middle (Inside), where B appears as a boundary, and the word can be extracted by identifying the boundary to the B, I tag: "Xiaoming", "today's morning", "tardy", "late", "and". ". The parts of speech labels, "Small (B-NR) Ming (I-NR) jin (B-T) early (I-T) on (I-T) late (B-VI) to (I-VI) have (B-Y). (B-WJ) ", wherein B in B-NR represents a boundary, NR represents a type, and here represents a name of a person. This way both boundaries are distinguished and types, i.e. parts of speech, are identified. Entity identification, "small (B-Person) bright (I-Person) day (B-Time) early (I-Time) on (I-Time) late (O) to (O). (O) ", similar to the part of speech notation, is just one more label O, which indicates Outside, and indicates a category not of interest to the task. By processing the tag, the entity can be extracted: "Xiaoming" (type: person), "today's morning" (type: time).
It should be noted that the method of identifying the boundary by B, I, O is not unique, and there are many other ways. For example, the BIESO specifications, B (Begin), I (insert), E (End), S (Single), O (outlide) are used. "late in afternoon today. "can be labeled" small (B-Person) to (E-Person) day (B-Time) early (I-Time) and (E-Time) late (O) to (O). (O) ".
The most popular algorithm for the sequence labeling task is "model+crf":
1) And a model part. Bi-LSTM (Bi-directional LSTM), as in the Bi-LSTM-CRF model, is a deep learning model that has the task of assigning a score to each class to which each unit (here, a word is taken as an example), corresponding to performing a classification task for each word. As shown in fig. 2, the higher the score, the higher the probability that this word corresponds to this category. Bi-LSTM may be replaced with other models such as Bi-GRU, multilayer CNN, multilayer Bi-LSTM, etc.
2) The CRF part (or not) is specifically a Linear Chain CRF (Conditional Random Field), which is a probability model and mainly aims at optimizing the relation among labels and finding a label sequence with the maximum probability (generally decoding by using a Viterbi algorithm). If the B-Person tag cannot be followed by the I-Time tag, the probability of the B-Person tag being followed by the I-Person tag is higher. Through the optimization of the CRF layer, the sequence labeling precision is improved, as shown in figure 2.
Referring to fig. 3, the sequence labeling method or system provided in this embodiment adds an adjustment module or step between the model and the CRF based on the existing model+crf method. Specifically, the sequence labeling system provided in this embodiment includes a model labeling module, an adjusting module, a policy repository, and a CRF module, where an output end of the model labeling module is connected to an input end of the adjusting module, and an output end of the adjusting module is connected to an input end of the CRF module.
The policy library stores a plurality of policies, each policy consisting of three or four elements. For example, in this embodiment, each policy is composed of four elements, where the four elements are:
1) regex, used to match text. The simplest is a word stock, such as "mins" that will match all occurrences of "mins" in text;
2) pattern, for designating the label of the score to be adjusted, e.g. "Person" indicates adjusting the score of the label associated with Person; if the task does not require category information (e.g., word segmentation), the element may be omitted;
3) bounds needs to represent the left and right 2 boundaries. As an example, use is made of, for example, "-)! "? "," + "indicates a boundary," ++! "represents a determined boundary," + "represents a non-boundary,"? "means that it is not determined whether it is a boundary. Such as "+ ]! "means that the left side is not a boundary and the right side is not a boundary;
4) confidence, which specifies the size of the score to be adjusted.
For example: one strategy is: "Small Person! The following is carried out 5", representing the score of the Person type corresponding to the adjustment text" small ", corresponding score +5, the modification corresponding to the different bounds is as follows.
For flexibility, the above strategy can be extended in many ways, for example:
1) regex, regular expressions can be used, which allows for more flexible text matching. E.g. "Zhang Sanfeng? "will match" Zhang Sano "or" Zhang Sanfeng ", preferably longer strings;
2) Pattern, tags may be combined, such as by "person|company" for name or Company type; special symbols may also be used to represent special types, such as by "+" for all entity types;
3) bounds, the left and right boundaries can be divided into 2 elements for representation, for example, "BI" on the left represents the label scores corresponding to adjustments "B" and "I";
4) confidence, which can support negative numbers, is used to subtract the scores. The larger the value, the greater the probability that the term is recognized, the smaller the value, and the greater the probability that the term is not recognized.
The content varies between policies based on the different uses. For example, if it is desired to promote the probability that a sequence ending with "limited Company" is identified as a Company, then the expression for the policy may be: "Company of limited +)! 5". For another example, it is desirable to promote the probability that "Zhang Sanj" is identified as Person, then: the policy expression may be "Zhang Sanperson-! The following is carried out 10". The policies in the modification policy library may be dynamically adjusted for different application scenarios.
The adjustment module may extract each policy from the policy repository. Referring to fig. 3, for each piece of text data input into the adjustment module, the adjustment module sequentially retrieves one policy from the policy library, and then retrieves the next policy after the current policy is executed until all policies in the policy library are traversed. For the current strategy, regex (word elements) in the current strategy is matched with the input text data, if the matching is unsuccessful, a next strategy is called, if the matching is successful, the data item and the score which need to be adjusted are obtained according to the current strategy, and then the score of the corresponding data item is adjusted for the result output by the model labeling module. And the score is regulated by the regulating module and then is output to the CRF module.
For example, the text data entered is "late in the afternoon in Ming's day. Two strategies are included in the strategy library, namely' Xiaoming Person-! The following is carried out 5 "and" Company of limited +)! 5". The processing procedure of the adjusting module is as follows:
(1) The strategy "Ming Person-! The following is carried out 5';
(2) The "Ming" and text data "Ming Person-! The following is carried out 5' matching is carried out, and the matching is successful;
(3) According to the definition in the current strategy, the data items and the scores which need to be adjusted are obtained, and the specific steps are as follows:
a. pattern in the strategy is Person, which indicates to adjust the score corresponding to Person;
b. the bound in the policy is "+|! The following is carried out ", it indicates that both left and right are boundaries, the left boundary corresponds to B (to" small ") and the right boundary corresponds to E (to" bright ");
c. confidence in the policy is "5", indicating a score plus 5;
combining abc, the score adjustment terms obtained according to this strategy are:
(4) Adjusting the output result of the model labeling module according to the adjustment item and the score obtained in the step (2), as shown in fig. 5;
(5) The next policy, namely, "Company of limited +)! 5';
the "Limited company" and text data "Ming Person-! The following is carried out 5, matching, namely, not finding a limited company, and matching is unsuccessful;
(6) Since all policies in the policy repository have been traversed, the result after adjustment is output to the CRF module, as shown in fig. 5.
The adjustment module traverses all policies in the policy repository, and when regex successfully matches the input text data, the policy is executed, and thus one or more policies may be executed for the same input text data.
The adjustment module is a mild adjustment method based on probability, and the method has the advantages that the whole sequence is not damaged, for example, the score "Zhang (B-person+2)" corresponding to Zhang San is adjusted, and the name of a Person identified as Zhang Sanfeng is not influenced when 'Zhang Sanfeng' is encountered after 'San (I-person+2)'. The score adjusted by the adjusting module can be decoded through the CRF layer to find out the most likely labeling sequence, and can be directly used as output without passing through the CRF module.
After training tests on the labeling data, the F1 value on the test set reaches 95%. The sequence labeling system of the adjusting module is added, so that the problem that certain texts cannot be accurately identified before is successfully solved, and the original identification capability is not damaged. Even in the field of more text corpus difference, the problem that most special entities cannot be identified can be solved by adding an adjustment strategy, so that the multiplexing rate of the model is greatly improved, and the production efficiency is effectively improved. The model labeling module is used for carrying out sequence labeling on the input text data, and the model can be a Bi-LSTM model, a Bi-GRU, a multi-layer CNN, a multi-layer BI-LSTM and the like; the strategy library is stored with one or more strategies, and the adjustment module is used for adjusting the strategies from the strategy library and adjusting the labeling results output by the model labeling module according to the strategies and the input text data.
As shown in fig. 6, the present embodiment also provides an electronic device that may include a processor 51 and a memory 52, wherein the memory 52 is coupled to the processor 51. It is noted that the figure is exemplary and that other types of structures may be used in addition to or in place of the structure to implement data extraction, report generation, communication, or other functions.
As shown in fig. 6, the electronic device may further include: an input unit 53, a display unit 54, and a power supply 55. It is noted that the electronic device need not necessarily include all of the components shown in fig. 6. In addition, the electronic device may further comprise components not shown in fig. 6, to which reference is made to the prior art.
The processor 51, sometimes also referred to as a controller or operational control, may include a microprocessor or other processor device and/or logic device, which processor 51 receives inputs and controls the operation of the various components of the electronic device.
The memory 52 may be, for example, one or more of a buffer, a flash memory, a hard drive, a removable medium, a volatile memory, a nonvolatile memory, or other suitable devices, and may store information such as configuration information of the processor 51, instructions executed by the processor 51, and recorded table data. The processor 51 may execute programs stored in the memory 52 to realize information storage or processing, and the like. In one embodiment, a buffer memory, i.e., a buffer, is also included in memory 52 to store intermediate information.
The input unit 53 is for example used for providing the processor 51 with text data to be annotated. The display unit 54 is used for displaying various results in the processing, such as input text data, output results of the adjustment module, output results of the CRF module, etc., and may be, for example, an LCD display, but the present application is not limited thereto. The power supply 55 is used to provide power to the electronic device.
Embodiments of the present application also provide a computer readable instruction, wherein the program when executed in an electronic device causes the electronic device to perform the operational steps comprised by the method of the present application.
Embodiments of the present application also provide a storage medium storing computer-readable instructions that cause an electronic device to perform the operational steps involved in the methods of the present application.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the elements and steps of the examples have been generally described in terms of function in the foregoing description to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application is essentially or a part contributing to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Those of ordinary skill in the art will appreciate that the modules of the examples described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the constituent modules and steps of the examples have been described generally in terms of functionality in the foregoing description to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the several embodiments provided by the present application, it should be understood that the disclosed system may be implemented in other ways. For example, the system embodiments described above are merely illustrative, e.g., the division of the modules is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple modules or components may be combined or integrated into another system, or some features may be omitted or not performed.
The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (6)

1. The sequence labeling system comprises a model labeling module and is characterized by further comprising an adjusting module and a strategy library, wherein the output end of the model labeling module is connected with the input end of the adjusting module;
the model labeling module is used for carrying out sequence labeling on the input text data;
one or more strategies are stored in the strategy library, each strategy comprises three elements of words, boundaries and scores, and the adjustment module is specifically used for: sequentially calling a strategy from the strategy library, and calling a strategy after the current strategy is executed until all strategies are traversed; aiming at the current strategy, matching word elements in the current strategy with the input text data, and calling a next strategy if the matching is unsuccessful; if the matching is successful, the data items and the scores which need to be adjusted are obtained according to the boundary elements and the score elements, and the scores of the corresponding data items in the labeling results output by the model labeling module are adjusted.
2. The system of claim 1, further comprising a CRF module, wherein an output of the adjustment module is coupled to an input of the CRF module for optimizing an output of the adjustment module.
3. A method for sequence annotation, comprising the steps of:
step 1, performing preliminary sequence labeling on input text data;
step 2, a strategy is called from a strategy library, and the preliminary labeling result is adjusted according to the strategy and the input text data;
one or more strategies are stored in the strategy library, each strategy comprises three elements of words, boundaries and scores, and the step 2 specifically comprises the following steps:
step 21, a strategy is called from a strategy library;
step 22, matching word elements in the current strategy with the input text data, and returning to the step 21 if the matching is unsuccessful; if the matching is successful, the step 23 is entered;
step 23, obtaining the data items and the scores to be adjusted according to the boundary elements and the score elements in the current strategy, and adjusting the scores of the corresponding data items in the labeling results output by the model labeling module;
step 24, judging whether all strategies in the strategy library are executed, if not, returning to the step 21, and if so, ending.
4. A method according to claim 3, further comprising:
and 3, optimizing the result output in the step 2 through a CRF model.
5. A computer readable storage medium comprising computer readable instructions which, when executed, cause a processor to perform the operations of the method of any of claims 3-4.
6. An electronic device, said device comprising:
a memory storing program instructions;
a processor, coupled to the memory, for executing program instructions in the memory, for implementing the steps of the method of any of claims 3-4.
CN201811344499.2A 2018-11-13 2018-11-13 Sequence labeling system and method Active CN109543153B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811344499.2A CN109543153B (en) 2018-11-13 2018-11-13 Sequence labeling system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811344499.2A CN109543153B (en) 2018-11-13 2018-11-13 Sequence labeling system and method

Publications (2)

Publication Number Publication Date
CN109543153A CN109543153A (en) 2019-03-29
CN109543153B true CN109543153B (en) 2023-08-18

Family

ID=65846846

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811344499.2A Active CN109543153B (en) 2018-11-13 2018-11-13 Sequence labeling system and method

Country Status (1)

Country Link
CN (1) CN109543153B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110489514B (en) * 2019-07-23 2023-05-23 成都数联铭品科技有限公司 System and method for improving event extraction labeling efficiency, event extraction method and system
CN112765967A (en) * 2019-11-05 2021-05-07 北京字节跳动网络技术有限公司 Text regularization processing method and device, electronic equipment and storage medium
CN111339250B (en) 2020-02-20 2023-08-18 北京百度网讯科技有限公司 Mining method for new category labels, electronic equipment and computer readable medium
CN111985583B (en) * 2020-09-27 2021-04-30 上海松鼠课堂人工智能科技有限公司 Deep learning sample labeling method based on learning data
CN113177124B (en) * 2021-05-11 2023-05-02 北京邮电大学 Method and system for constructing knowledge graph in vertical field
CN113761044A (en) * 2021-08-30 2021-12-07 上海快确信息科技有限公司 Labeling system method for labeling text into table

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103593373A (en) * 2012-08-16 2014-02-19 北京百度网讯科技有限公司 Search result sorting method and search result sorting device
CN105678600A (en) * 2015-12-31 2016-06-15 百度在线网络技术(北京)有限公司 Credit data acquisition method and apparatus
CN106156286A (en) * 2016-06-24 2016-11-23 广东工业大学 Type extraction system and method towards technical literature knowledge entity
CN106372060A (en) * 2016-08-31 2017-02-01 北京百度网讯科技有限公司 Search text labeling method and device
CN106844530A (en) * 2016-12-29 2017-06-13 北京奇虎科技有限公司 Training method and device of a kind of question and answer to disaggregated model
CN107330011A (en) * 2017-06-14 2017-11-07 北京神州泰岳软件股份有限公司 The recognition methods of the name entity of many strategy fusions and device
CN107622050A (en) * 2017-09-14 2018-01-23 武汉烽火普天信息技术有限公司 Text sequence labeling system and method based on Bi LSTM and CRF
CN107622333A (en) * 2017-11-02 2018-01-23 北京百分点信息科技有限公司 A kind of event prediction method, apparatus and system
CN108228557A (en) * 2016-12-14 2018-06-29 北京国双科技有限公司 A kind of method and device of sequence labelling

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7346519B2 (en) * 2001-04-10 2008-03-18 Metropolitan Regional Information Systems, Inc Method and system for MRIS platinum database
US10402453B2 (en) * 2014-06-27 2019-09-03 Nuance Communications, Inc. Utilizing large-scale knowledge graphs to support inference at scale and explanation generation
US9824385B2 (en) * 2014-12-29 2017-11-21 Ebay Inc. Method for performing sequence labelling on queries

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103593373A (en) * 2012-08-16 2014-02-19 北京百度网讯科技有限公司 Search result sorting method and search result sorting device
CN105678600A (en) * 2015-12-31 2016-06-15 百度在线网络技术(北京)有限公司 Credit data acquisition method and apparatus
CN106156286A (en) * 2016-06-24 2016-11-23 广东工业大学 Type extraction system and method towards technical literature knowledge entity
CN106372060A (en) * 2016-08-31 2017-02-01 北京百度网讯科技有限公司 Search text labeling method and device
CN108228557A (en) * 2016-12-14 2018-06-29 北京国双科技有限公司 A kind of method and device of sequence labelling
CN106844530A (en) * 2016-12-29 2017-06-13 北京奇虎科技有限公司 Training method and device of a kind of question and answer to disaggregated model
CN107330011A (en) * 2017-06-14 2017-11-07 北京神州泰岳软件股份有限公司 The recognition methods of the name entity of many strategy fusions and device
CN107622050A (en) * 2017-09-14 2018-01-23 武汉烽火普天信息技术有限公司 Text sequence labeling system and method based on Bi LSTM and CRF
CN107622333A (en) * 2017-11-02 2018-01-23 北京百分点信息科技有限公司 A kind of event prediction method, apparatus and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于CRF和Bi-LSTM的保险名称实体识别;陈彦妤等;《智能计算机与应用》;20180626(第03期);111-114 *

Also Published As

Publication number Publication date
CN109543153A (en) 2019-03-29

Similar Documents

Publication Publication Date Title
CN109543153B (en) Sequence labeling system and method
CN109145153B (en) Intention category identification method and device
US10380236B1 (en) Machine learning system for annotating unstructured text
WO2021051560A1 (en) Text classification method and apparatus, electronic device, and computer non-volatile readable storage medium
CN107544726B (en) Speech recognition result error correction method and device based on artificial intelligence and storage medium
US11860684B2 (en) Few-shot named-entity recognition
US9645988B1 (en) System and method for identifying passages in electronic documents
US11657232B2 (en) Source code compiler using natural language input
CN110532563A (en) The detection method and device of crucial paragraph in text
US20190317986A1 (en) Annotated text data expanding method, annotated text data expanding computer-readable storage medium, annotated text data expanding device, and text classification model training method
US11934781B2 (en) Systems and methods for controllable text summarization
CN112115721A (en) Named entity identification method and device
US11593557B2 (en) Domain-specific grammar correction system, server and method for academic text
US11657151B2 (en) System and method for detecting source code anomalies
CN109753976B (en) Corpus labeling device and method
US11210473B1 (en) Domain knowledge learning techniques for natural language generation
CN114970529A (en) Weakly supervised and interpretable training of machine learning based Named Entity Recognition (NER) mechanisms
US20220335335A1 (en) Method and system for identifying mislabeled data samples using adversarial attacks
JP7287699B2 (en) Information provision method and device using learning model through machine learning
CN107609006B (en) Search optimization method based on local log research
US11853696B2 (en) Automated text amendment based on additional domain text and control text
CN115618054A (en) Video recommendation method and device
CN110457683B (en) Model optimization method and device, computer equipment and storage medium
US20210350088A1 (en) Systems and methods for digital document generation using natural language interaction
CN113158678A (en) Identification method and device applied to electric power text named entity

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20190329

Assignee: Shansikaiwu Technology (Chengdu) Co.,Ltd.

Assignor: CHENGDU BUSINESS BIG DATA TECHNOLOGY Co.,Ltd.

Contract record no.: X2023510000034

Denomination of invention: A sequence annotation system and method

Granted publication date: 20230818

License type: Common License

Record date: 20231219