CN116304023A - Method, system and storage medium for extracting bidding elements based on NLP technology - Google Patents
Method, system and storage medium for extracting bidding elements based on NLP technology Download PDFInfo
- Publication number
- CN116304023A CN116304023A CN202310088650.5A CN202310088650A CN116304023A CN 116304023 A CN116304023 A CN 116304023A CN 202310088650 A CN202310088650 A CN 202310088650A CN 116304023 A CN116304023 A CN 116304023A
- Authority
- CN
- China
- Prior art keywords
- model
- training
- acquiring
- extraction
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000000034 method Methods 0.000 title claims abstract description 24
- 238000005516 engineering process Methods 0.000 title claims abstract description 15
- 238000000605 extraction Methods 0.000 claims abstract description 49
- 238000012549 training Methods 0.000 claims abstract description 37
- 238000002372 labelling Methods 0.000 claims abstract description 7
- 238000004590 computer program Methods 0.000 claims abstract description 3
- 238000012512 characterization method Methods 0.000 claims description 12
- 239000013598 vector Substances 0.000 claims description 9
- 238000004140 cleaning Methods 0.000 claims description 8
- 230000011218 segmentation Effects 0.000 claims description 6
- 230000014509 gene expression Effects 0.000 claims description 4
- 238000012545 processing Methods 0.000 claims description 4
- 239000011159 matrix material Substances 0.000 claims description 3
- 238000012216 screening Methods 0.000 claims description 3
- 230000005477 standard model Effects 0.000 claims description 3
- 238000011161 development Methods 0.000 abstract description 7
- 230000018109 developmental process Effects 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 238000013145 classification model Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000006386 neutralization reaction Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000001915 proofreading effect Effects 0.000 description 1
- 230000007115 recruitment Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Machine Translation (AREA)
Abstract
The invention provides a bid and ask element extraction method based on NLP information extraction technology, which comprises the following steps: s1, acquiring an bidding original file; s2, obtaining a pre-training model A; s3, acquiring a labeling training sample; s4, carrying out data enhancement on the marked sample; s5, training a sentence potential element type identification model B; s6, training elements and relation extraction models C; s7, outputting a result by the data through a standardized module; a bid and ask element extraction system based on NLP information extraction technology comprises a processor and a memory; a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described method and electronic system for extracting bid elements based on NLP technology. The invention has the advantages that: the development efficiency is greatly improved, the development cost is reduced, the limit of the text length of the traditional model 512 is broken through, the element nesting can be efficiently performed, and the recall rate of element information extraction is high.
Description
Technical Field
The invention relates to the technical field of bidding, in particular to a bid element extraction method, a bid element extraction system and a storage medium based on an NLP technology.
Background
The bidding documents are application information, bidding content announcement information, and content information published in the processes of follow-up evaluation, winning bid and the like which are published by a bidding person for a certain purchasing requirement. The structural framework and writing formats may be slightly different from region to region, from recruitment procedure to procedure. There are usually bid-picking notices, bid-evaluating notices, bid-winning notices, change-clearing notices, and the like (hereinafter, collectively referred to as bid-picking documents), and since important information such as bid-picking processes and results are recorded in the bid-picking documents, these information have important values of analysis and attention, such as bid-picking commodity (item) names, budget amounts, bid amounts of a large number of bid-winning documents. The winning bid unit and the engineering place are used for drawing analysis of winning bid persons, enterprise operation credit analysis and the like.
The main current method is to use BERT+BILSTM+CRF to identify elements and then use classification model to judge the relation existing between the elements, but the following difficulties exist in the practice of extracting key fields and relations in bid-making bulletin:
1) The training model requires a large amount of high-quality manual post data, and the acquisition of the post data requires a large amount of manpower, material resources and financial resources;
2) The current method mainly extracts the relation between two entities in a single sentence, and this task is called sentence-level relation extraction. However, a large number of entity relationships in the bid document are jointly expressed by multiple sentences;
3) The maximum input length required by BERT is 512, while the actual text length of the bidding document is much greater than this limit; and the presently disclosed pre-training model is based on a generic corpus rather than specific to the bidding domain, it is therefore desirable to provide improved bid element extraction schemes and obtain a pre-training model for the bidding domain.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides the method, the system and the storage medium for extracting the bidding elements based on the NLP technology, which have the advantages of greatly improving the development efficiency, reducing the development cost, breaking through the limitation of the text length of the traditional model 512, being capable of efficiently nesting elements and having high recall rate of element information extraction.
In order to solve the technical problems, the technical scheme provided by the invention is as follows: a bid and ask element extraction method based on NLP information extraction technology comprises the following steps:
s1, acquiring bidding original files and acquiring bidding file information from the Internet;
s2, converting an original file into a plain text document, removing special character strings by using a regularization method, segmenting the plain text into sentence sets according to rules, splicing the sentence sets into a new text document by using a line feeder\n, obtaining a pre-training expectation, training BERT after word segmentation of the expectation, and obtaining a pre-training model A based on a Transformer network structure in the bidding field;
s3, acquiring a labeling training sample, using a part of data matched by a regular expression, utilizing a universal language model to acquire the last complete sample based on data matched by a large number of various template, and finally manually checking;
s4, carrying out data enhancement on the marked sample: word segmentation, clustering and screening are carried out on a large number of bidding document corpora to obtain element key field corpora, and more training samples are generated by utilizing a data enhancement technology;
s5, training a potential element type recognition model B of a sentence, summarizing and classifying element types, summarizing M elements into N types, constructing an element dictionary table with corresponding relation between element labels and element types, training an NER recognition model based on sentence level to recognize the element types possibly contained in the sentence, acquiring the types of sentences where the annotation data are located by using element label labels through the element dictionary table, acquiring CLS layer characteristics as sentence characterization by using a pre-training model, taking a sentence characterization as a token, constructing a token-pair matrix by using a multi-head idea, acquiring the element types contained in the sentence by using a globalpoint method, and finally splicing and combining the sentences with the same element type marks according to element type combination rule strategies to obtain a target paragraph text with known element information types;
s6, training elements and relation extraction model C, and acquiring all element information and relation among elements contained in the type from the different type of paragraph sets identified in S5;
s7, outputting a result by the data through a standardized module: each element and each group of relation pair has a standard model for standardization, a newly acquired bidding original file is cleaned, element information of the file can be acquired by using the model A, B, C acquired by the steps, then the standardized treatment is performed according to standardized output formats of the element information, such as date, amount, address, telephone, mailbox and the like, the extracted primary result formats are various, and the final output result is required to be in a unified format through a format standardized module and finally output.
Further, the specific step of S6 includes,
s61, acquiring a sentence set of information to be extracted, and constructing a schema according to the sentence set type: according to the information of the address, the contact person and the like, a schema = { winning unit name is constructed: [ address, contact, phone, winning amount ] };
s62, constructing a model input, fixing a prefix form, and taking a spliced form of schema+text as an input;
s63, model input information is subjected to a pre-training model to obtain token level characterization vectors, the token level characterization vectors are mapped into feature vectors with dimensions being the number of types of output elements through a full connection layer, whether the token is the beginning or the end of the elements is judged through sigmoid, and element information can be completely obtained according to the beginning and the end of the elements.
Further, the step S61 specifically includes,
s611, firstly acquiring a winning unit name of an element, wherein a spliced form of schema+text is winning unit name+X, and acquiring a starting position and a stopping position of the winning unit name through a model B so as to acquire a winning unit name bit Y;
s612, obtaining an address of Y: the spliced form of the schema+text is an address+X of Y, and the starting and stopping positions of the address are acquired through a model B so as to acquire the address;
s613, respectively acquiring the contact person of Y, the telephone of Y and the winning amount of Y by the similar method.
Further, in S613, not only the specific number of the winning bid amount is obtained, but also whether the unit of the winning bid amount is a yuan or a ten thousand yuan is obtained.
A bid and ask element extraction system based on NLP information extraction technology, comprising a processor and a memory, wherein the memory is used for storing an instruction file and an algorithm model of the processing; the processor is configured with a data acquisition module, a data cleaning module, an extraction module of the bidding element extraction method based on the NLP technology and an output module of the extraction result, wherein the data acquisition module comprises a bidding document.
A computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described method and electronic system for extracting bid elements based on NLP technology.
Compared with the prior art, the invention has the advantages that:
based on a small amount of annotation data, a large amount of annotation samples are obtained by using a neutralization data enhancement technology of various template models, and finally, a large amount of high-quality annotation data are obtained by manually checking the samples, wherein the quantity and quality of the annotation data are key of the subsequent models; the current mainstream extraction model algorithm is improved to be based on BERT+GlobalPointer, and a pre-trained prompt model is used for carrying out two-level cascade information extraction on the basis of Ernie, so that the implementation is realized:
1. the development efficiency is greatly improved, the development cost is reduced, and the development of the model can be completed based on a small amount of samples;
2. the limitation of the text length of the traditional model 512 is broken through, and the model B is not affected by the length;
3. the multi-level joint information extraction models selected by the method are the beginning and the end of the predicted elements, so that the problem that the conventional method cannot efficiently solve element nesting is solved;
4. in the scheme, a plurality of element categories are arranged, one category is taken as the input of a model by splicing the template with the original text each time in element identification, so that the plurality of element categories can be predicted for a plurality of times when the scheme is provided with the plurality of element categories, and the recall rate of element information extraction is greatly improved.
Drawings
FIG. 1 is a flow chart of the present invention.
FIG. 2 is a model training flow chart of the present invention.
FIG. 3 is a schematic diagram of the structure of the extraction model B of the present invention using model A+GlobalPointer.
Fig. 4 is a schematic diagram of the structure of the extraction model C using a+promt according to the present invention.
Detailed Description
The present invention will be described in further detail with reference to examples.
Examples
A bid and ask element extraction system based on NLP information extraction technology, comprising a processor and a memory, wherein the memory is used for storing an instruction file and an algorithm model of the processing; the processor is configured with a data acquisition module, a data cleaning module, an extraction module of the bidding element extraction method based on the NLP technology and an output module of the extraction result, wherein the data acquisition module and the data cleaning module comprise bidding documents, and the structure is shown in figure 1.
The bidding original file acquisition module is used for acquiring a text file to be extracted; the original file cleaning module is used for uniformly cleaning and converting the various file forms in the text to be extracted into plain text character strings; the bidding document element extraction module is used for integrating a functional model of the bidding document element extraction method based on the NLP technology, extracting element information based on the plain text character strings of the cleaning module and obtaining element information sets; the data standardization module unit is used for determining element extraction results of the text to be extracted from the element information set based on element boundary characteristics; and outputting the element extraction result according to the specific requirements by using an output model of the element result according to the service requirements.
S1, acquiring a bidding original file: acquiring bidding document information from the Internet, wherein the document information can be in the formats of PDF, HTML (hypertext markup language) documents, WORD (WORD) and the like;
s2, converting an original file into a plain text document, removing special character strings by using a regularization method, segmenting the plain text into sentence sets according to rules, splicing the sentence sets into a new text document by using a line feeder\n, obtaining a pre-training expectation, and training BERT after word segmentation of the expectation to obtain a pre-training model A based on a Transformer network structure in the bidding field;
s3, acquiring a labeling training sample, wherein the sample consists of multiple parts, such as partial structured data contained in an original file, a part of data matched by using a regular expression is acquired, a universal language model is utilized to acquire the data matched by using a large number of various template, and finally, the last complete sample is acquired by manual proofreading;
s4, carrying out data enhancement on the labeling samples, carrying out word segmentation, clustering and screening on a large number of bidding document linguistic data to obtain element key field linguistic data, and generating more training samples by utilizing a data enhancement technology;
s5, training a sentence potential element type identification model B: as shown in fig. 3, most of the bidding documents do not contain element information, the element types are summarized and classified, M elements are summarized into N classes (M > N), and an element dictionary table with element labels and element types in correspondence is constructed. The objective is to split the target text into a plurality of target paragraph texts. A NER recognition model based on sentence level is trained to recognize the types of elements that a sentence may contain. The type of the sentence in which the labeling data is positioned can be obtained through the element dictionary table by utilizing the element label, so that the sample can be produced without manually labeling again. And acquiring the CLS layer characteristics by using the pre-training model as sentence characterization, constructing a token-pair matrix by using a multi-head idea by using one sentence characterization as a token, acquiring the contained element category of the sentence by using a globalpinter method, and finally splicing and combining the same sentences with element type marks according to an element type combination rule strategy to obtain the target paragraph text with known element information type.
6: training elements and relations extraction model C, the model structure is as shown in figure 4:
and (5) acquiring all element information and relations among elements contained in the type from the different type of paragraph sets identified in the step (S5). The method comprises the following steps:
taking a sentence set of information to be extracted, and constructing a schema according to the sentence set type: for example, the sentence combination of the winning bid information class is obtained, the winning bid unit and the winning bid amount thereof, the address, the contact and other information are required to be extracted, and a schema = { winning bid unit name is constructed: [ Address, contact, telephone, winning amount ] }
And (3) constructing a model input: the form of fixed prefix template adopts a spliced form of schema+text as input
The model input information is subjected to a pre-training model to obtain token level characterization vectors, and the token level characterization vectors are mapped into feature vectors with dimensions being the number of output element types through a full connection layer; and judging whether the token is the beginning or the end of the element by using the sigmoid, and completely acquiring element information according to the beginning and the end of the element.
More specifically, the following expressions: let text X and schema = { winning unit names known to contain winning information: [ Address, contact, telephone, winning amount ] }
Firstly, acquiring the beginning and ending positions of a 'winning unit name' of an element, wherein the splicing form of a schema+text is winning unit name+X, and acquiring a 'winning unit name' position Y by a model B
Acquiring an address of Y: the spliced form of the schema+text is Y address+X, and the starting and stopping positions of the address are acquired through the model B so as to acquire the address "
And the contact person of Y, the telephone of Y and the winning amount of Y are respectively obtained by the similar method.
It is particularly pointed out that not only the specific number of the winning bid amount is obtained, but also whether the winning bid amount is in units of ten thousands of units.
S7, outputting a result by the data through a standardized module: each element, each set of relationship pairs has a standard model for its normalization. The element information of the newly acquired bidding original file can be acquired by using the model A, B, C acquired by the steps after cleaning, and then standardized processing is performed according to standardized output formats of the element information, such as date, amount, address, telephone, mailbox and the like, the preliminary result formats extracted by the steps of extracting are various, and the final output result is required to be in a unified format through a format standardized module and finally output.
The invention and its embodiments have been described in a non-limiting manner, and the actual construction is not limited to the embodiments of the invention as shown in the drawings. In summary, if one of ordinary skill in the art is informed by this disclosure, a structural manner and an embodiment similar to the technical solution should not be creatively devised without departing from the gist of the present invention.
Claims (6)
1. A bid and ask element extraction method based on NLP information extraction technology, characterized by comprising the steps of:
s1, acquiring bidding original files and acquiring bidding file information from the Internet;
s2, converting an original file into a plain text document, removing special character strings by using a regularization method, segmenting the plain text into sentence sets according to rules, splicing the sentence sets into a new text document by using a line feeder\n, obtaining a pre-training expectation, training BERT after word segmentation of the expectation, and obtaining a pre-training model A based on a Transformer network structure in the bidding field;
s3, acquiring a labeling training sample, using a part of data matched by a regular expression, utilizing a universal language model to acquire the last complete sample based on data matched by a large number of various template, and finally manually checking;
s4, carrying out data enhancement on the marked sample: word segmentation, clustering and screening are carried out on a large number of bidding document corpora to obtain element key field corpora, and more training samples are generated by utilizing a data enhancement technology;
s5, training a potential element type recognition model B of a sentence, summarizing and classifying element types, summarizing M elements into N types, constructing an element dictionary table with corresponding relation between element labels and element types, training an NER recognition model based on sentence level to recognize the element types possibly contained in the sentence, acquiring the types of sentences where the annotation data are located by using element label labels through the element dictionary table, acquiring CLS layer characteristics as sentence characterization by using a pre-training model, taking a sentence characterization as a token, constructing a token-pair matrix by using a multi-head idea, acquiring the element types contained in the sentence by using a globalpoint method, and finally splicing and combining the sentences with the same element type marks according to element type combination rule strategies to obtain a target paragraph text with known element information types;
s6, training elements and relation extraction model C, and acquiring all element information and relation among elements contained in the type from the different type of paragraph sets identified in S5;
s7, outputting a result by the data through a standardized module: each element and each group of relation pair has a standard model for standardization, a newly acquired bidding original file is cleaned, element information of the file can be acquired by using the model A, B, C acquired by the steps, then the standardized treatment is performed according to standardized output formats of the element information, such as date, amount, address, telephone, mailbox and the like, the extracted primary result formats are various, and the final output result is required to be in a unified format through a format standardized module and finally output.
2. The bid element extraction method based on the NLP information extraction technology of claim 1, wherein: the specific steps of S6 include,
s61, acquiring a sentence set of information to be extracted, and constructing a schema according to the sentence set type: according to the information of the address, the contact person and the like, a schema = { winning unit name is constructed: [ address, contact, phone, winning amount ] };
s62, constructing a model input, fixing a prefix form, and taking a spliced form of schema+text as an input;
s63, model input information is subjected to a pre-training model to obtain token level characterization vectors, the token level characterization vectors are mapped into feature vectors with dimensions being the number of types of output elements through a full connection layer, whether the token is the beginning or the end of the elements is judged through sigmoid, and element information can be completely obtained according to the beginning and the end of the elements.
3. The bid element extraction method based on the NLP information extraction technology of claim 2, wherein: the step S61 specifically includes,
s611, firstly acquiring a winning unit name of an element, wherein a spliced form of schema+text is winning unit name+X, and acquiring a starting position and a stopping position of the winning unit name through a model B so as to acquire a winning unit name bit Y;
s612, obtaining an address of Y: the spliced form of the schema+text is an address+X of Y, and the starting and stopping positions of the address are acquired through a model B so as to acquire the address;
s613, respectively acquiring the contact person of Y, the telephone of Y and the winning amount of Y by the similar method.
4. A bid element extraction method based on NLP information extraction technique as claimed in claim 3, wherein: in S613, not only the specific number of the winning amount but also whether the winning amount is in units of yuan or ten thousand yuan is obtained.
5. A bid and ask element extraction system based on NLP information extraction technology, characterized in that: the system comprises a processor and a memory, wherein the memory is used for storing an instruction file and an algorithm model of the processing; the processor is configured with a data acquisition module, a data cleaning module, an extraction module of the bidding element extraction method based on the NLP technology and an output module of the extraction result, wherein the data acquisition module comprises a bidding document.
6. A computer-readable storage medium, characterized by: a computer program is stored thereon, which when executed by a processor implements the above-described method for extracting bid elements based on NLP technology and an electronic system.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310088650.5A CN116304023A (en) | 2023-02-09 | 2023-02-09 | Method, system and storage medium for extracting bidding elements based on NLP technology |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310088650.5A CN116304023A (en) | 2023-02-09 | 2023-02-09 | Method, system and storage medium for extracting bidding elements based on NLP technology |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116304023A true CN116304023A (en) | 2023-06-23 |
Family
ID=86784143
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310088650.5A Withdrawn CN116304023A (en) | 2023-02-09 | 2023-02-09 | Method, system and storage medium for extracting bidding elements based on NLP technology |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116304023A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117391086A (en) * | 2023-12-11 | 2024-01-12 | 四川隧唐科技股份有限公司 | Bid participation information extraction method, device, equipment and medium |
CN118132683A (en) * | 2024-05-07 | 2024-06-04 | 杭州海康威视数字技术股份有限公司 | Training method of text extraction model, text extraction method and equipment |
CN118170891A (en) * | 2024-05-13 | 2024-06-11 | 浙江大学 | Text information extraction method, device, equipment and readable storage medium |
-
2023
- 2023-02-09 CN CN202310088650.5A patent/CN116304023A/en not_active Withdrawn
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117391086A (en) * | 2023-12-11 | 2024-01-12 | 四川隧唐科技股份有限公司 | Bid participation information extraction method, device, equipment and medium |
CN118132683A (en) * | 2024-05-07 | 2024-06-04 | 杭州海康威视数字技术股份有限公司 | Training method of text extraction model, text extraction method and equipment |
CN118132683B (en) * | 2024-05-07 | 2024-08-20 | 杭州海康威视数字技术股份有限公司 | Training method of text extraction model, text extraction method and equipment |
CN118170891A (en) * | 2024-05-13 | 2024-06-11 | 浙江大学 | Text information extraction method, device, equipment and readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11734328B2 (en) | Artificial intelligence based corpus enrichment for knowledge population and query response | |
CN109685056B (en) | Method and device for acquiring document information | |
CN109886270B (en) | Case element identification method for electronic file record text | |
CN116304023A (en) | Method, system and storage medium for extracting bidding elements based on NLP technology | |
CN113128227A (en) | Entity extraction method and device | |
CN112836514A (en) | Nested entity recognition method and device, electronic equipment and storage medium | |
KR20200139008A (en) | User intention-analysis based contract recommendation and autocomplete service using deep learning | |
CN114297987B (en) | Document information extraction method and system based on text classification and reading understanding | |
CN112464927A (en) | Information extraction method, device and system | |
CN107783958B (en) | Target statement identification method and device | |
CN112257442A (en) | Policy document information extraction method based on corpus expansion neural network | |
CN111178080A (en) | Named entity identification method and system based on structured information | |
CN114239579A (en) | Electric power searchable document extraction method and device based on regular expression and CRF model | |
CN114398492B (en) | Knowledge graph construction method, terminal and medium in digital field | |
CN115600561A (en) | Webpage structuring method, equipment and storage medium fusing rules and small samples | |
CN116069946A (en) | Biomedical knowledge graph construction method based on deep learning | |
CN115688703A (en) | Specific field text error correction method, storage medium and device | |
CN114676699A (en) | Entity emotion analysis method and device, computer equipment and storage medium | |
CN115329783A (en) | Tibetan Chinese neural machine translation method based on cross-language pre-training model | |
CN114611489A (en) | Text logic condition extraction AI model construction method, extraction method and system | |
Suriyachay et al. | Thai named entity tagged corpus annotation scheme and self verification | |
CN114048321A (en) | Multi-granularity text error correction data set generation method, device and equipment | |
CN113962196A (en) | Resume processing method and device, electronic equipment and storage medium | |
CN112632985A (en) | Corpus processing method and device, storage medium and processor | |
CN112133308A (en) | Method and device for multi-label classification of voice recognition text |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20230623 |