CN116127986A - Method for extracting key information of bidding documents based on pre-training model and BiLatticeLSTM - Google Patents

Method for extracting key information of bidding documents based on pre-training model and BiLatticeLSTM Download PDF

Info

Publication number
CN116127986A
CN116127986A CN202310165102.8A CN202310165102A CN116127986A CN 116127986 A CN116127986 A CN 116127986A CN 202310165102 A CN202310165102 A CN 202310165102A CN 116127986 A CN116127986 A CN 116127986A
Authority
CN
China
Prior art keywords
word
model
key information
extracting
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310165102.8A
Other languages
Chinese (zh)
Inventor
涂著刚
汤双明
周鸿章
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guiyang Gaoxin Ston Information Co ltd
Original Assignee
Guiyang Gaoxin Ston Information Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guiyang Gaoxin Ston Information Co ltd filed Critical Guiyang Gaoxin Ston Information Co ltd
Priority to CN202310165102.8A priority Critical patent/CN116127986A/en
Publication of CN116127986A publication Critical patent/CN116127986A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the technical field of information extraction, in particular to a method for extracting key information of a punctuation mark based on a pre-training model and BiLatticeLSTM. The method comprises the following steps: s100: acquiring a plurality of bidding documents and preprocessing the bidding documents to generate a data set; s200: inputting the data set into a Bert model for pre-training, and learning semantic information of a bidding document to obtain a BidBiert pre-training model S300: the key information in the data set is marked and then is input into a BidBiert model, so that a word vector of each word in the mark document and a word vector of each word related to the key information are obtained; s400: extracting feature vectors required by key information identification in a tagbook file according to the word vectors and the word vectors, and decoding the feature vectors through a conditional random field to obtain an optimal parameter model; s500: and (5) performing iterative training to obtain a final model for extracting the key information of the bidding documents. The accuracy and efficiency of extracting the key information of the bidding documents can be improved.

Description

Method for extracting key information of bidding documents based on pre-training model and BiLatticeLSTM
Technical Field
The invention relates to the technical field of information extraction, in particular to a method for extracting key information of a punctuation mark based on a pre-training model and BiLatticeLSTM.
Background
The bidding document is a document which is compiled by a bidding issuing unit or a consignment design unit and provides the bidder with requirements of main technology, quality, construction period and the like of the project. There are some important information in the bidding document, such as more than 30 items of key information, such as project name, bidding unit, winning amount, bidding deadline, etc., which are of great interest. At present, the effective way of searching the key information in the bidding document is a way of manual copy and paste and rule extraction. However, when the engineering project is ordered or the goods are purchased, the engineering project is usually released at a plurality of sites, and has the characteristics of no fixed template, unstructured data, various document forms (Word, PDF, HTML, scanned pictures and the like), the artificial mode is time-consuming and labor-consuming, the engineering project can be completed only by experienced workers, a large number of rules are required to be configured by specific personnel for rule extraction, and the boundary of an extraction result is fuzzy, so that the information extraction effect is not ideal, the adaptability to different documents is poor, semantic information cannot be obtained from a large number of labels in the prior art, and therefore, some semantic ambiguous key information is difficult to extract correctly.
Disclosure of Invention
The technical problem solved by the invention is to provide a method for extracting the key information of the bidding document based on a pre-training model and BiLatticeLSTM, which can improve the accuracy and efficiency of extracting the key information of the bidding document.
The basic scheme provided by the invention is as follows: a method for extracting key information of a bidding document based on a pre-training model and BiLatticeLSTM comprises the following steps:
s100: acquiring a plurality of bidding documents, preprocessing, extracting text information and generating a data set;
s200: inputting the data set into a Bert model for pre-training, and learning semantic information of a bidding document to obtain a BidBiert pre-training model;
s300: the key information in the data set is marked and then is input into a BidBiert model, so that a word vector of each word in the mark document and a word vector of each word related to the key information are obtained;
s400: extracting feature vectors required by key information identification in a tagbook file according to the word vectors and the word vectors, and decoding the feature vectors through a conditional random field to obtain an optimal parameter model;
s500: and (5) performing iterative training to obtain a final model for extracting the key information of the bidding documents.
The principle of the invention is as follows: firstly, massive bidding documents are obtained to serve as data sets, the data sets are input into a Bert model for pre-training, a BidBiert model is obtained, semantic information in the bidding documents is learned, semantic learning is carried out through the massive bidding documents, a pre-training model in the bidding field is obtained, and word vectors of input data can be obtained more accurately through the model. And marking key information of the data set bidding document, inputting the marked key information into the BidBiert model, extracting word vectors and word vectors, and training the word vectors and the word vectors. The key information in the bidding document is extracted through the word vector and the word vector, the required feature vector is decoded to obtain an optimal model, the final bidding information extraction model is obtained after repeated iterative training, and the key information in the bidding document can be directly extracted by directly inputting the bidding document into the model.
Compared with the prior art, the following advantages exist:
compared with the traditional manual mode, the method has the advantages that only the key information is required to be marked in the model training process, the target document is directly input into the model, the key information in the target document can be directly obtained, and the labor, material resources and time cost are reduced.
Compared with the extraction modes of rules and word libraries, the method can accurately identify key information in the tag document by learning semantic information in the tag document, has higher coverage and accuracy, can be suitable for tag documents in various formats, and does not need to consider to maintain the word libraries and identification rules.
Further, the step S100 includes the steps of:
s110: acquiring a bid-inviting file disclosed on a network through a crawler;
s120: extracting text information in the bidding document;
s130: and intercepting the long sentence in the text information into a preset sentence length.
Massive bidding documents disclosed on the network are obtained through crawlers, text information in the bidding documents is extracted as training samples, long sentences are intercepted into preset sentence lengths, and semantic information of each sentence is learned subsequently. By intercepting long sentences into short sentences, the operand in the recognition process is reduced, and ambiguity of semantic recognition is avoided.
Further, the step S200 includes the steps of:
s210: the input sentence is divided into words and then a plurality of words are randomly covered;
s220: obtaining word vectors of each word by a plurality of words through the Embedding;
s230: word vectors predict masked words by the Encoder;
s240: repeating S210-S230, and obtaining a BidBiert model through iterative learning.
After inputting sentences into a Bert model, masking a plurality of words after word segmentation, obtaining word vectors of each word through Embedding, predicting the Masked words through Enclder, wherein the Bert model comprises two unsupervised prediction tasks, namely a Masked LM and a Masked LM Next sentence Predic, acquiring massive taggant files, using the Masked LM task in the Bert model, randomly erasing one or more words in the sentence for a given sentence by working logic of the Masked LM, and respectively erasing the words according to the rest vocabulary prediction positions. And (3) perfecting a pre-training model through iterative learning, and learning semantic information in the bidding field.
Further, the step S300 includes the steps of:
s310: manually labeling key information in a data set;
s320: transmitting the marked data set into a BidBiert model to obtain a word vector of each word in the data set;
s330: and carrying out word vector training on the word segmentation result according to a self-built word library and a word segmentation tool in the preset bidding field, and obtaining the word vector of each word.
After key information in the markup document is marked manually, the marked markup document is transmitted to a BidBiert model assembly, and a word vector of each word in the data set is obtained. And carrying out word segmentation processing by combining a self-built word stock and a word segmentation tool in the bidding field, and carrying out word vector training on the segmented structure to obtain the word vector of each word.
Further, the step S400 includes the steps of:
s410: inputting the character vector and the word vector obtained in the S300 into a BiLatticLSTM model, and extracting feature vectors required by identification of key information of the bidding project data;
s420: and inputting the feature vector into a CRF model, calculating an optimal labeling sequence, and fitting the artificial labeling sequence to obtain optimal model parameters.
The character vector and the word vector are input into a BiLatticeLSTM model to extract the feature vector, the BiLatticeLSTM model is an LSTM model with a bidirectional Lattice structure, the features of a text sequence can be extracted from a front lane and a back lane through the model, and the word vector in the bidding field is fused in the front direction and the back direction, so that the entity boundary information is more defined, and the entity ambiguity problem is solved. After extracting the feature vector of the text sequence, inputting the feature vector into a CRF model for decoding, wherein CRF is a conditional random field, and is a conditional probability distribution model of a given group of input random variables, and the other group of input random variables, so that optimal model parameters are obtained.
Further, S510: repeating S200-S400, and performing iterative training to obtain a BidBiert+BiLatticeLSTM+CRF model.
Drawings
FIG. 1 is a schematic flow chart of BidBiert training based on a pre-training model and BiLatticeLSTM method of extracting the key information of a target book;
FIG. 2 is a schematic diagram of a training process of BidBiert+BiLatticeLSTM+CRF model according to an embodiment of a method for extracting key information of a target book based on a pre-training model and BiLatticeLSTM;
FIG. 3 is a schematic diagram of a BidBiert model of an embodiment of a method for extracting taggant key information based on a pre-training model and BiLatticeLSTM;
FIG. 4 is a schematic diagram of a framework of BidBiert+BiLatticeLSTM+CRF model according to an embodiment of a method for extracting taggant key information based on a pre-training model and BiLatticeLSTM.
Detailed Description
The following is a further detailed description of the embodiments:
an example is substantially as shown in figures 1 and 2:
a method for extracting key information of a bidding document based on a pre-training model and BiLatticeLSTM comprises the following steps:
s100, acquiring a plurality of bidding documents, preprocessing the bidding documents, and generating a data set. S100 specifically comprises the following steps:
s110: acquiring a bid-inviting file disclosed on a network through a crawler;
s120: extracting text information in the bidding document;
s130: and intercepting long sentences in the text information to be in a preset sentence length.
Specifically, in this embodiment, a plurality of published bidding documents are obtained from the network through a Python crawler tool, and text information in the bidding documents is extracted. The bidding documents are in various formats, including Word, PDF, HTML and scanned pictures. For Word and PDF bidding documents, text information is extracted after direct data enhancement processing, for HTML bidding documents, HTML tags are removed first, then data enhancement processing is performed, and for scanned documents, text data in pictures is extracted through the existing picture Word extraction technology. And then, cutting long sentences in the text information into short sentences according to the preset sentence length.
S200: inputting the data set into a Bert model for pre-training, and learning semantic information in a tagbook file to obtain a BidBiert pre-training model. S200 specifically comprises the following steps:
s210: the input sentence is divided into words and then a plurality of words are randomly covered;
s220: obtaining word vectors of each word by a plurality of words through the Embedding;
s230: word vectors predict masked words by the Encoder;
s240: repeating S210-S230, and obtaining a BidBiert model through iterative learning.
The Bert model comprises two unsupervised prediction tasks, namely a Masked LM task and a Masked LM task Next sentence Predic, a massive amount of tagbook files are obtained, the Masked LM task is used in the Bert model, the working logic of the Masked LM is given a sentence, one or more words in the sentence are randomly wiped out, and according to the residual vocabulary, what the wiped out words are respectively predicted. Specifically, as shown in fig. 3, a sentence "a tornado project steel bid announcement" is input into a Bert model, 1-n words are obtained by dividing words into random MASK, 1-n words are obtained by encoding, word vectors E1-En of each word are obtained by the 1-n words, finally, the words which are subjected to MSK (masking) are predicted by a plurality of encoders, and iterative learning is repeated continuously to obtain a Bidbert model. Through iterative training, the pretraining of the BidBiert model in the field of the bidding is completed, and semantic information in the field of the bidding is learned.
S300: and marking the key information in the data set, and inputting the marked key information into the BidBiert model to obtain a word vector of each word in the mark file and a word vector of each word related to the key information. S300 specifically comprises the following steps:
s310: manually labeling key information in a data set;
s320: transmitting the marked data set into a BidBiert model to obtain a word vector of each word in the data set;
s330: and carrying out word vector training on the word segmentation result according to a self-built word library and a word segmentation tool in the preset bidding field, and obtaining the word vector of each word.
Specifically, first, key information in the dataset is manually marked, wherein the key information comprises project names, bid units, bid amount, bid deadlines and the like. Inputting the marked words into a BidBiert model, penetrating the Bid model to obtain Word vectors Char Embedding of each Word, combining a data set with a pre-configured self-built Word library and Word segmentation tools in the bidding field, carrying out Word segmentation processing, and carrying out Word vector training on the Word segmentation result to obtain Word vectors Word Embedding of each Word.
S400: extracting feature vectors required by key information identification in a tagbook file according to the word vectors and the word vectors, and decoding the feature vectors through a conditional random field to obtain an optimal parameter model;
the step S400 includes the steps of:
s410: inputting the character vector and the word vector obtained in the S300 into a BiLatticLSTM model, and extracting feature vectors required by identification of key information of the bidding project data;
s420: and inputting the feature vector into a CRF model, calculating an optimal labeling sequence, and fitting the artificial labeling sequence to obtain optimal model parameters.
Inputting Word vectors Char and Word vectors into BiLatticeLSTM model to extract feature Vector Fertrector required by identifying key information of a bidding document, wherein the BiLatticeLSTM model is an LSTM model with a bidirectional Lattice structure, features of text sequences can be extracted from forward and backward directions through the model, word vectors in bidding fields can be fused from forward and backward directions, entity boundary information is more defined, and the problem of entity ambiguity is solved.
S510: repeating S200-S400, and performing iterative training to obtain a BidBiert+BiLatticeLSTM+CRF model.
The BidBiert+BiLatticeLSTM+CRF model is shown in FIG. 4, with 15 characters for the input sentence: and (5) bid-winning results in a bid section of Guiyang municipal engineering service. 15 characters are used for obtaining a word vector CE of each character through Bid encoding 1 -CE 15 . Wherein, two words of municipal engineering and winning bid are in a self-built word stock in the bidding field, and the word vectors of the two words are obtained as WE respectively 3,6 And WE 12,13 . Word vector CE 1 -CE 15 Sum word vector WE 3,6 And WE 12,13 Extracting key information features through BiLatticeLSTM to obtain feature directionsQuantity FV 1 -FV 15 . The feature vector is calculated through CRF to obtain the label of each word, the label of the feature is marked when the key information is encountered, for example, the labels of B-PN, I-PN and E-PN are marked on the names of the last items in fig. 4, and the label of the non-key information is marked with the O label, so that the extraction of the key information in the bidding document is realized.
The foregoing is merely exemplary of the present invention, and the specific structures and features well known in the art are not described in any way herein, so that those skilled in the art will be able to ascertain all prior art in the field, and will not be able to ascertain any prior art to which this invention pertains, without the general knowledge of the skilled person in the field, before the application date or the priority date, to practice the present invention, with the ability of these skilled persons to perfect and practice this invention, with the help of the teachings of this application, with some typical known structures or methods not being the obstacle to the practice of this application by those skilled in the art. It should be noted that modifications and improvements can be made by those skilled in the art without departing from the structure of the present invention, and these should also be considered as the scope of the present invention, which does not affect the effect of the implementation of the present invention and the utility of the patent. The protection scope of the present application shall be subject to the content of the claims, and the description of the specific embodiments and the like in the specification can be used for explaining the content of the claims.

Claims (6)

1. A method for extracting key information of a target book based on a pre-training model and BiLatticeLSTM is characterized by comprising the following steps of: the method comprises the following steps:
s100: acquiring a plurality of bidding documents and preprocessing the bidding documents to generate a data set;
s200: inputting the data set into a Bert model for pre-training, and learning semantic information of a bidding document to obtain a BidBiert pre-training model;
s300: the key information in the data set is marked and then is input into a BidBiert model, so that a word vector of each word in the mark document and a word vector of each word related to the key information are obtained;
s400: extracting feature vectors required by key information identification in a tagbook file according to the word vectors and the word vectors, and decoding the feature vectors through a conditional random field to obtain an optimal parameter model;
s500: and (5) performing iterative training to obtain a final model for extracting the key information of the bidding documents.
2. The method for extracting critical information from a pre-training model and BiLatticeLSTM according to claim 1, wherein said step S100 comprises the steps of:
s110: acquiring a bid-inviting file disclosed on a network through a crawler;
s120: extracting text information in the bidding document;
s130: and intercepting the long sentence in the text information into a preset sentence length.
3. The method for extracting the key information of the punctuation based on the pre-training model and the BiLatticeLSTM according to claim 2, wherein the method comprises the following steps of: the step S200 includes the steps of:
s210: the input sentence is divided into words and then a plurality of words are randomly covered;
s220: obtaining word vectors of each word by a plurality of words through the Embedding;
s230: word vectors predict masked words by the Encoder;
s240: repeating S210-S230, and obtaining a BidBiert model through iterative learning.
4. A method for extracting key information of a bidding document based on a pre-training model and a BiLatticeLSTM according to claim 3, wherein: the step S300 includes the steps of:
s310: manually labeling key information in a data set;
s320: transmitting the marked data set into a BidBiert model to obtain a word vector of each word in the data set;
s330: and carrying out word vector training on the word segmentation result according to a self-built word library and a word segmentation tool in the preset bidding field, and obtaining the word vector of each word.
5. A method for extracting key information of a bidding document based on a pre-training model and a BiLatticeLSTM according to claim 3, wherein: the step S400 includes the steps of:
s410: inputting the character vector and the word vector obtained in the S300 into a BiLatticLSTM model, and extracting feature vectors required by identification of key information of the bidding project data;
s420: and inputting the feature vector into a CRF model, calculating an optimal labeling sequence, and fitting the artificial labeling sequence to obtain optimal model parameters.
6. The method for extracting the key information of the punctuation based on the pre-training model and the BiLatticeLSTM according to claim 5, wherein the method comprises the following steps of: the step S500 includes the steps of:
s510: repeating S200-S400, and performing iterative training to obtain a BidBiert+BiLatticeLSTM+CRF model.
CN202310165102.8A 2023-02-24 2023-02-24 Method for extracting key information of bidding documents based on pre-training model and BiLatticeLSTM Pending CN116127986A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310165102.8A CN116127986A (en) 2023-02-24 2023-02-24 Method for extracting key information of bidding documents based on pre-training model and BiLatticeLSTM

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310165102.8A CN116127986A (en) 2023-02-24 2023-02-24 Method for extracting key information of bidding documents based on pre-training model and BiLatticeLSTM

Publications (1)

Publication Number Publication Date
CN116127986A true CN116127986A (en) 2023-05-16

Family

ID=86302774

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310165102.8A Pending CN116127986A (en) 2023-02-24 2023-02-24 Method for extracting key information of bidding documents based on pre-training model and BiLatticeLSTM

Country Status (1)

Country Link
CN (1) CN116127986A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116882416A (en) * 2023-09-08 2023-10-13 江西省精彩纵横采购咨询有限公司 Information identification method and system for bidding documents
CN117010390A (en) * 2023-07-04 2023-11-07 北大荒信息有限公司 Company entity identification method, device, equipment and medium based on bidding information

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117010390A (en) * 2023-07-04 2023-11-07 北大荒信息有限公司 Company entity identification method, device, equipment and medium based on bidding information
CN116882416A (en) * 2023-09-08 2023-10-13 江西省精彩纵横采购咨询有限公司 Information identification method and system for bidding documents
CN116882416B (en) * 2023-09-08 2023-11-21 江西省精彩纵横采购咨询有限公司 Information identification method and system for bidding documents

Similar Documents

Publication Publication Date Title
CN111709241B (en) Named entity identification method oriented to network security field
CN111143550B (en) Method for automatically identifying dispute focus based on hierarchical attention neural network model
CN110110054B (en) Method for acquiring question-answer pairs from unstructured text based on deep learning
CN116127986A (en) Method for extracting key information of bidding documents based on pre-training model and BiLatticeLSTM
CN111708882B (en) Transformer-based Chinese text information missing completion method
US20220300546A1 (en) Event extraction method, device and storage medium
CN114580424B (en) Labeling method and device for named entity identification of legal document
CN116151132B (en) Intelligent code completion method, system and storage medium for programming learning scene
CN112163424A (en) Data labeling method, device, equipment and medium
CN108205524B (en) Text data processing method and device
Moeng et al. Canonical and surface morphological segmentation for nguni languages
CN115687331A (en) Intelligent matching method and system for engineering cost quota
CN115952791A (en) Chapter-level event extraction method, device and equipment based on machine reading understanding and storage medium
CN115357699A (en) Text extraction method, device, equipment and storage medium
CN115630648A (en) Address element analysis method and system for man-machine conversation and computer readable medium
CN115098673A (en) Business document information extraction method based on variant attention and hierarchical structure
CN114416991A (en) Method and system for analyzing text emotion reason based on prompt
CN114356924A (en) Method and apparatus for extracting data from structured documents
CN114239576A (en) Issue label classification method based on topic model and convolutional neural network
CN115017144B (en) Judicial document case element entity identification method based on graphic neural network
CN112148879B (en) Computer readable storage medium for automatically labeling code with data structure
CN113961696B (en) Automatic oracle conjugation verification method based on ObiBert
CN115203415A (en) Resume document information extraction method and related device
CN112819622B (en) Information entity relationship joint extraction method and device and terminal equipment
CN110472243B (en) Chinese spelling checking method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination