CN116304023A - Method, system and storage medium for extracting bidding elements based on NLP technology - Google Patents

Method, system and storage medium for extracting bidding elements based on NLP technology Download PDF

Info

Publication number
CN116304023A
CN116304023A CN202310088650.5A CN202310088650A CN116304023A CN 116304023 A CN116304023 A CN 116304023A CN 202310088650 A CN202310088650 A CN 202310088650A CN 116304023 A CN116304023 A CN 116304023A
Authority
CN
China
Prior art keywords
model
training
acquiring
extraction
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202310088650.5A
Other languages
Chinese (zh)
Inventor
李正
张晴晴
徐立群
郭海涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Zhiyuxin Information Technology Co ltd
Original Assignee
Anhui Zhiyuxin Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Zhiyuxin Information Technology Co ltd filed Critical Anhui Zhiyuxin Information Technology Co ltd
Priority to CN202310088650.5A priority Critical patent/CN116304023A/en
Publication of CN116304023A publication Critical patent/CN116304023A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a bid and ask element extraction method based on NLP information extraction technology, which comprises the following steps: s1, acquiring an bidding original file; s2, obtaining a pre-training model A; s3, acquiring a labeling training sample; s4, carrying out data enhancement on the marked sample; s5, training a sentence potential element type identification model B; s6, training elements and relation extraction models C; s7, outputting a result by the data through a standardized module; a bid and ask element extraction system based on NLP information extraction technology comprises a processor and a memory; a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described method and electronic system for extracting bid elements based on NLP technology. The invention has the advantages that: the development efficiency is greatly improved, the development cost is reduced, the limit of the text length of the traditional model 512 is broken through, the element nesting can be efficiently performed, and the recall rate of element information extraction is high.

Description

Method, system and storage medium for extracting bidding elements based on NLP technology
Technical Field
The invention relates to the technical field of bidding, in particular to a bid element extraction method, a bid element extraction system and a storage medium based on an NLP technology.
Background
The bidding documents are application information, bidding content announcement information, and content information published in the processes of follow-up evaluation, winning bid and the like which are published by a bidding person for a certain purchasing requirement. The structural framework and writing formats may be slightly different from region to region, from recruitment procedure to procedure. There are usually bid-picking notices, bid-evaluating notices, bid-winning notices, change-clearing notices, and the like (hereinafter, collectively referred to as bid-picking documents), and since important information such as bid-picking processes and results are recorded in the bid-picking documents, these information have important values of analysis and attention, such as bid-picking commodity (item) names, budget amounts, bid amounts of a large number of bid-winning documents. The winning bid unit and the engineering place are used for drawing analysis of winning bid persons, enterprise operation credit analysis and the like.
The main current method is to use BERT+BILSTM+CRF to identify elements and then use classification model to judge the relation existing between the elements, but the following difficulties exist in the practice of extracting key fields and relations in bid-making bulletin:
1) The training model requires a large amount of high-quality manual post data, and the acquisition of the post data requires a large amount of manpower, material resources and financial resources;
2) The current method mainly extracts the relation between two entities in a single sentence, and this task is called sentence-level relation extraction. However, a large number of entity relationships in the bid document are jointly expressed by multiple sentences;
3) The maximum input length required by BERT is 512, while the actual text length of the bidding document is much greater than this limit; and the presently disclosed pre-training model is based on a generic corpus rather than specific to the bidding domain, it is therefore desirable to provide improved bid element extraction schemes and obtain a pre-training model for the bidding domain.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides the method, the system and the storage medium for extracting the bidding elements based on the NLP technology, which have the advantages of greatly improving the development efficiency, reducing the development cost, breaking through the limitation of the text length of the traditional model 512, being capable of efficiently nesting elements and having high recall rate of element information extraction.
In order to solve the technical problems, the technical scheme provided by the invention is as follows: a bid and ask element extraction method based on NLP information extraction technology comprises the following steps:
s1, acquiring bidding original files and acquiring bidding file information from the Internet;
s2, converting an original file into a plain text document, removing special character strings by using a regularization method, segmenting the plain text into sentence sets according to rules, splicing the sentence sets into a new text document by using a line feeder\n, obtaining a pre-training expectation, training BERT after word segmentation of the expectation, and obtaining a pre-training model A based on a Transformer network structure in the bidding field;
s3, acquiring a labeling training sample, using a part of data matched by a regular expression, utilizing a universal language model to acquire the last complete sample based on data matched by a large number of various template, and finally manually checking;
s4, carrying out data enhancement on the marked sample: word segmentation, clustering and screening are carried out on a large number of bidding document corpora to obtain element key field corpora, and more training samples are generated by utilizing a data enhancement technology;
s5, training a potential element type recognition model B of a sentence, summarizing and classifying element types, summarizing M elements into N types, constructing an element dictionary table with corresponding relation between element labels and element types, training an NER recognition model based on sentence level to recognize the element types possibly contained in the sentence, acquiring the types of sentences where the annotation data are located by using element label labels through the element dictionary table, acquiring CLS layer characteristics as sentence characterization by using a pre-training model, taking a sentence characterization as a token, constructing a token-pair matrix by using a multi-head idea, acquiring the element types contained in the sentence by using a globalpoint method, and finally splicing and combining the sentences with the same element type marks according to element type combination rule strategies to obtain a target paragraph text with known element information types;
s6, training elements and relation extraction model C, and acquiring all element information and relation among elements contained in the type from the different type of paragraph sets identified in S5;
s7, outputting a result by the data through a standardized module: each element and each group of relation pair has a standard model for standardization, a newly acquired bidding original file is cleaned, element information of the file can be acquired by using the model A, B, C acquired by the steps, then the standardized treatment is performed according to standardized output formats of the element information, such as date, amount, address, telephone, mailbox and the like, the extracted primary result formats are various, and the final output result is required to be in a unified format through a format standardized module and finally output.
Further, the specific step of S6 includes,
s61, acquiring a sentence set of information to be extracted, and constructing a schema according to the sentence set type: according to the information of the address, the contact person and the like, a schema = { winning unit name is constructed: [ address, contact, phone, winning amount ] };
s62, constructing a model input, fixing a prefix form, and taking a spliced form of schema+text as an input;
s63, model input information is subjected to a pre-training model to obtain token level characterization vectors, the token level characterization vectors are mapped into feature vectors with dimensions being the number of types of output elements through a full connection layer, whether the token is the beginning or the end of the elements is judged through sigmoid, and element information can be completely obtained according to the beginning and the end of the elements.
Further, the step S61 specifically includes,
s611, firstly acquiring a winning unit name of an element, wherein a spliced form of schema+text is winning unit name+X, and acquiring a starting position and a stopping position of the winning unit name through a model B so as to acquire a winning unit name bit Y;
s612, obtaining an address of Y: the spliced form of the schema+text is an address+X of Y, and the starting and stopping positions of the address are acquired through a model B so as to acquire the address;
s613, respectively acquiring the contact person of Y, the telephone of Y and the winning amount of Y by the similar method.
Further, in S613, not only the specific number of the winning bid amount is obtained, but also whether the unit of the winning bid amount is a yuan or a ten thousand yuan is obtained.
A bid and ask element extraction system based on NLP information extraction technology, comprising a processor and a memory, wherein the memory is used for storing an instruction file and an algorithm model of the processing; the processor is configured with a data acquisition module, a data cleaning module, an extraction module of the bidding element extraction method based on the NLP technology and an output module of the extraction result, wherein the data acquisition module comprises a bidding document.
A computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described method and electronic system for extracting bid elements based on NLP technology.
Compared with the prior art, the invention has the advantages that:
based on a small amount of annotation data, a large amount of annotation samples are obtained by using a neutralization data enhancement technology of various template models, and finally, a large amount of high-quality annotation data are obtained by manually checking the samples, wherein the quantity and quality of the annotation data are key of the subsequent models; the current mainstream extraction model algorithm is improved to be based on BERT+GlobalPointer, and a pre-trained prompt model is used for carrying out two-level cascade information extraction on the basis of Ernie, so that the implementation is realized:
1. the development efficiency is greatly improved, the development cost is reduced, and the development of the model can be completed based on a small amount of samples;
2. the limitation of the text length of the traditional model 512 is broken through, and the model B is not affected by the length;
3. the multi-level joint information extraction models selected by the method are the beginning and the end of the predicted elements, so that the problem that the conventional method cannot efficiently solve element nesting is solved;
4. in the scheme, a plurality of element categories are arranged, one category is taken as the input of a model by splicing the template with the original text each time in element identification, so that the plurality of element categories can be predicted for a plurality of times when the scheme is provided with the plurality of element categories, and the recall rate of element information extraction is greatly improved.
Drawings
FIG. 1 is a flow chart of the present invention.
FIG. 2 is a model training flow chart of the present invention.
FIG. 3 is a schematic diagram of the structure of the extraction model B of the present invention using model A+GlobalPointer.
Fig. 4 is a schematic diagram of the structure of the extraction model C using a+promt according to the present invention.
Detailed Description
The present invention will be described in further detail with reference to examples.
Examples
A bid and ask element extraction system based on NLP information extraction technology, comprising a processor and a memory, wherein the memory is used for storing an instruction file and an algorithm model of the processing; the processor is configured with a data acquisition module, a data cleaning module, an extraction module of the bidding element extraction method based on the NLP technology and an output module of the extraction result, wherein the data acquisition module and the data cleaning module comprise bidding documents, and the structure is shown in figure 1.
The bidding original file acquisition module is used for acquiring a text file to be extracted; the original file cleaning module is used for uniformly cleaning and converting the various file forms in the text to be extracted into plain text character strings; the bidding document element extraction module is used for integrating a functional model of the bidding document element extraction method based on the NLP technology, extracting element information based on the plain text character strings of the cleaning module and obtaining element information sets; the data standardization module unit is used for determining element extraction results of the text to be extracted from the element information set based on element boundary characteristics; and outputting the element extraction result according to the specific requirements by using an output model of the element result according to the service requirements.
S1, acquiring a bidding original file: acquiring bidding document information from the Internet, wherein the document information can be in the formats of PDF, HTML (hypertext markup language) documents, WORD (WORD) and the like;
s2, converting an original file into a plain text document, removing special character strings by using a regularization method, segmenting the plain text into sentence sets according to rules, splicing the sentence sets into a new text document by using a line feeder\n, obtaining a pre-training expectation, and training BERT after word segmentation of the expectation to obtain a pre-training model A based on a Transformer network structure in the bidding field;
s3, acquiring a labeling training sample, wherein the sample consists of multiple parts, such as partial structured data contained in an original file, a part of data matched by using a regular expression is acquired, a universal language model is utilized to acquire the data matched by using a large number of various template, and finally, the last complete sample is acquired by manual proofreading;
s4, carrying out data enhancement on the labeling samples, carrying out word segmentation, clustering and screening on a large number of bidding document linguistic data to obtain element key field linguistic data, and generating more training samples by utilizing a data enhancement technology;
s5, training a sentence potential element type identification model B: as shown in fig. 3, most of the bidding documents do not contain element information, the element types are summarized and classified, M elements are summarized into N classes (M > N), and an element dictionary table with element labels and element types in correspondence is constructed. The objective is to split the target text into a plurality of target paragraph texts. A NER recognition model based on sentence level is trained to recognize the types of elements that a sentence may contain. The type of the sentence in which the labeling data is positioned can be obtained through the element dictionary table by utilizing the element label, so that the sample can be produced without manually labeling again. And acquiring the CLS layer characteristics by using the pre-training model as sentence characterization, constructing a token-pair matrix by using a multi-head idea by using one sentence characterization as a token, acquiring the contained element category of the sentence by using a globalpinter method, and finally splicing and combining the same sentences with element type marks according to an element type combination rule strategy to obtain the target paragraph text with known element information type.
6: training elements and relations extraction model C, the model structure is as shown in figure 4:
and (5) acquiring all element information and relations among elements contained in the type from the different type of paragraph sets identified in the step (S5). The method comprises the following steps:
taking a sentence set of information to be extracted, and constructing a schema according to the sentence set type: for example, the sentence combination of the winning bid information class is obtained, the winning bid unit and the winning bid amount thereof, the address, the contact and other information are required to be extracted, and a schema = { winning bid unit name is constructed: [ Address, contact, telephone, winning amount ] }
And (3) constructing a model input: the form of fixed prefix template adopts a spliced form of schema+text as input
The model input information is subjected to a pre-training model to obtain token level characterization vectors, and the token level characterization vectors are mapped into feature vectors with dimensions being the number of output element types through a full connection layer; and judging whether the token is the beginning or the end of the element by using the sigmoid, and completely acquiring element information according to the beginning and the end of the element.
More specifically, the following expressions: let text X and schema = { winning unit names known to contain winning information: [ Address, contact, telephone, winning amount ] }
Firstly, acquiring the beginning and ending positions of a 'winning unit name' of an element, wherein the splicing form of a schema+text is winning unit name+X, and acquiring a 'winning unit name' position Y by a model B
Acquiring an address of Y: the spliced form of the schema+text is Y address+X, and the starting and stopping positions of the address are acquired through the model B so as to acquire the address "
And the contact person of Y, the telephone of Y and the winning amount of Y are respectively obtained by the similar method.
It is particularly pointed out that not only the specific number of the winning bid amount is obtained, but also whether the winning bid amount is in units of ten thousands of units.
S7, outputting a result by the data through a standardized module: each element, each set of relationship pairs has a standard model for its normalization. The element information of the newly acquired bidding original file can be acquired by using the model A, B, C acquired by the steps after cleaning, and then standardized processing is performed according to standardized output formats of the element information, such as date, amount, address, telephone, mailbox and the like, the preliminary result formats extracted by the steps of extracting are various, and the final output result is required to be in a unified format through a format standardized module and finally output.
The invention and its embodiments have been described in a non-limiting manner, and the actual construction is not limited to the embodiments of the invention as shown in the drawings. In summary, if one of ordinary skill in the art is informed by this disclosure, a structural manner and an embodiment similar to the technical solution should not be creatively devised without departing from the gist of the present invention.

Claims (6)

1. A bid and ask element extraction method based on NLP information extraction technology, characterized by comprising the steps of:
s1, acquiring bidding original files and acquiring bidding file information from the Internet;
s2, converting an original file into a plain text document, removing special character strings by using a regularization method, segmenting the plain text into sentence sets according to rules, splicing the sentence sets into a new text document by using a line feeder\n, obtaining a pre-training expectation, training BERT after word segmentation of the expectation, and obtaining a pre-training model A based on a Transformer network structure in the bidding field;
s3, acquiring a labeling training sample, using a part of data matched by a regular expression, utilizing a universal language model to acquire the last complete sample based on data matched by a large number of various template, and finally manually checking;
s4, carrying out data enhancement on the marked sample: word segmentation, clustering and screening are carried out on a large number of bidding document corpora to obtain element key field corpora, and more training samples are generated by utilizing a data enhancement technology;
s5, training a potential element type recognition model B of a sentence, summarizing and classifying element types, summarizing M elements into N types, constructing an element dictionary table with corresponding relation between element labels and element types, training an NER recognition model based on sentence level to recognize the element types possibly contained in the sentence, acquiring the types of sentences where the annotation data are located by using element label labels through the element dictionary table, acquiring CLS layer characteristics as sentence characterization by using a pre-training model, taking a sentence characterization as a token, constructing a token-pair matrix by using a multi-head idea, acquiring the element types contained in the sentence by using a globalpoint method, and finally splicing and combining the sentences with the same element type marks according to element type combination rule strategies to obtain a target paragraph text with known element information types;
s6, training elements and relation extraction model C, and acquiring all element information and relation among elements contained in the type from the different type of paragraph sets identified in S5;
s7, outputting a result by the data through a standardized module: each element and each group of relation pair has a standard model for standardization, a newly acquired bidding original file is cleaned, element information of the file can be acquired by using the model A, B, C acquired by the steps, then the standardized treatment is performed according to standardized output formats of the element information, such as date, amount, address, telephone, mailbox and the like, the extracted primary result formats are various, and the final output result is required to be in a unified format through a format standardized module and finally output.
2. The bid element extraction method based on the NLP information extraction technology of claim 1, wherein: the specific steps of S6 include,
s61, acquiring a sentence set of information to be extracted, and constructing a schema according to the sentence set type: according to the information of the address, the contact person and the like, a schema = { winning unit name is constructed: [ address, contact, phone, winning amount ] };
s62, constructing a model input, fixing a prefix form, and taking a spliced form of schema+text as an input;
s63, model input information is subjected to a pre-training model to obtain token level characterization vectors, the token level characterization vectors are mapped into feature vectors with dimensions being the number of types of output elements through a full connection layer, whether the token is the beginning or the end of the elements is judged through sigmoid, and element information can be completely obtained according to the beginning and the end of the elements.
3. The bid element extraction method based on the NLP information extraction technology of claim 2, wherein: the step S61 specifically includes,
s611, firstly acquiring a winning unit name of an element, wherein a spliced form of schema+text is winning unit name+X, and acquiring a starting position and a stopping position of the winning unit name through a model B so as to acquire a winning unit name bit Y;
s612, obtaining an address of Y: the spliced form of the schema+text is an address+X of Y, and the starting and stopping positions of the address are acquired through a model B so as to acquire the address;
s613, respectively acquiring the contact person of Y, the telephone of Y and the winning amount of Y by the similar method.
4. A bid element extraction method based on NLP information extraction technique as claimed in claim 3, wherein: in S613, not only the specific number of the winning amount but also whether the winning amount is in units of yuan or ten thousand yuan is obtained.
5. A bid and ask element extraction system based on NLP information extraction technology, characterized in that: the system comprises a processor and a memory, wherein the memory is used for storing an instruction file and an algorithm model of the processing; the processor is configured with a data acquisition module, a data cleaning module, an extraction module of the bidding element extraction method based on the NLP technology and an output module of the extraction result, wherein the data acquisition module comprises a bidding document.
6. A computer-readable storage medium, characterized by: a computer program is stored thereon, which when executed by a processor implements the above-described method for extracting bid elements based on NLP technology and an electronic system.
CN202310088650.5A 2023-02-09 2023-02-09 Method, system and storage medium for extracting bidding elements based on NLP technology Withdrawn CN116304023A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310088650.5A CN116304023A (en) 2023-02-09 2023-02-09 Method, system and storage medium for extracting bidding elements based on NLP technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310088650.5A CN116304023A (en) 2023-02-09 2023-02-09 Method, system and storage medium for extracting bidding elements based on NLP technology

Publications (1)

Publication Number Publication Date
CN116304023A true CN116304023A (en) 2023-06-23

Family

ID=86784143

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310088650.5A Withdrawn CN116304023A (en) 2023-02-09 2023-02-09 Method, system and storage medium for extracting bidding elements based on NLP technology

Country Status (1)

Country Link
CN (1) CN116304023A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117391086A (en) * 2023-12-11 2024-01-12 四川隧唐科技股份有限公司 Bid participation information extraction method, device, equipment and medium
CN118132683A (en) * 2024-05-07 2024-06-04 杭州海康威视数字技术股份有限公司 Training method of text extraction model, text extraction method and equipment
CN118170891A (en) * 2024-05-13 2024-06-11 浙江大学 Text information extraction method, device, equipment and readable storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117391086A (en) * 2023-12-11 2024-01-12 四川隧唐科技股份有限公司 Bid participation information extraction method, device, equipment and medium
CN118132683A (en) * 2024-05-07 2024-06-04 杭州海康威视数字技术股份有限公司 Training method of text extraction model, text extraction method and equipment
CN118132683B (en) * 2024-05-07 2024-08-20 杭州海康威视数字技术股份有限公司 Training method of text extraction model, text extraction method and equipment
CN118170891A (en) * 2024-05-13 2024-06-11 浙江大学 Text information extraction method, device, equipment and readable storage medium

Similar Documents

Publication Publication Date Title
US11734328B2 (en) Artificial intelligence based corpus enrichment for knowledge population and query response
CN109685056B (en) Method and device for acquiring document information
CN109886270B (en) Case element identification method for electronic file record text
CN116304023A (en) Method, system and storage medium for extracting bidding elements based on NLP technology
CN113128227A (en) Entity extraction method and device
CN112836514A (en) Nested entity recognition method and device, electronic equipment and storage medium
KR20200139008A (en) User intention-analysis based contract recommendation and autocomplete service using deep learning
CN114297987B (en) Document information extraction method and system based on text classification and reading understanding
CN112464927A (en) Information extraction method, device and system
CN107783958B (en) Target statement identification method and device
CN112257442A (en) Policy document information extraction method based on corpus expansion neural network
CN111178080A (en) Named entity identification method and system based on structured information
CN114239579A (en) Electric power searchable document extraction method and device based on regular expression and CRF model
CN114398492B (en) Knowledge graph construction method, terminal and medium in digital field
CN115600561A (en) Webpage structuring method, equipment and storage medium fusing rules and small samples
CN116069946A (en) Biomedical knowledge graph construction method based on deep learning
CN115688703A (en) Specific field text error correction method, storage medium and device
CN114676699A (en) Entity emotion analysis method and device, computer equipment and storage medium
CN115329783A (en) Tibetan Chinese neural machine translation method based on cross-language pre-training model
CN114611489A (en) Text logic condition extraction AI model construction method, extraction method and system
Suriyachay et al. Thai named entity tagged corpus annotation scheme and self verification
CN114048321A (en) Multi-granularity text error correction data set generation method, device and equipment
CN113962196A (en) Resume processing method and device, electronic equipment and storage medium
CN112632985A (en) Corpus processing method and device, storage medium and processor
CN112133308A (en) Method and device for multi-label classification of voice recognition text

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20230623