CN113553852B - Contract information extraction method, system and storage medium based on neural network - Google Patents

Contract information extraction method, system and storage medium based on neural network Download PDF

Info

Publication number
CN113553852B
CN113553852B CN202111016995.7A CN202111016995A CN113553852B CN 113553852 B CN113553852 B CN 113553852B CN 202111016995 A CN202111016995 A CN 202111016995A CN 113553852 B CN113553852 B CN 113553852B
Authority
CN
China
Prior art keywords
text data
speech
named entity
preprocessed
named
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111016995.7A
Other languages
Chinese (zh)
Other versions
CN113553852A (en
Inventor
柴茂森
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur General Software Co Ltd
Original Assignee
Inspur General Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur General Software Co Ltd filed Critical Inspur General Software Co Ltd
Priority to CN202111016995.7A priority Critical patent/CN113553852B/en
Publication of CN113553852A publication Critical patent/CN113553852A/en
Application granted granted Critical
Publication of CN113553852B publication Critical patent/CN113553852B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

A contract information extraction method based on a neural network comprises the following steps: extracting original text data from the contract text, and preprocessing the original text data to obtain preprocessed text data; carrying out named entity recognition on the preprocessed text data through an NER entity recognition model to obtain named entities in the preprocessed text data; identifying the preprocessed text data through a part-of-speech identification model to obtain parts of speech of all segmentation words in the preprocessed text data; and completing the named entity in the preprocessed text data according to word segmentation parts of speech before and after the position of the named entity in the preprocessed text data in the original text data. The invention solves the problem that the named entity model identification can not effectively identify the named entity with correspondingly long length under certain conditions, effectively improves the accuracy of named entity identification in contract information extraction, and improves the credibility of the named entity model.

Description

Contract information extraction method, system and storage medium based on neural network
Technical Field
The invention belongs to the field of computers, and particularly relates to a contract information extraction method, a contract information extraction system and a storage medium based on a neural network.
Background
With the continuous development of the office automation field, electronic office is gradually used for replacing paper office, so that the use of paper documents in office places is effectively reduced. And due to the development of artificial intelligence, the algorithm and the calculation speed are improved, and based on the electronization of contract information, clients gradually tend to extract the content in the contract information in an artificial intelligence mode, so that the workload of manual contract confidence processing is reduced, and the office efficiency is improved.
At present, the traditional contract content extraction method comprises the following steps: the two modes of manual extraction or rule extraction have obvious two disadvantages: 1. the manual maintenance cost is high and the efficiency is low; 2. rule matching generalization capability is poor, limitations exist, and maintenance is frequent, because rules of contract templates to be identified need to be modified when different templates are identified, and some contracts can be hundreds of pages, so that the identification efficiency is low. There are many undesirable situations in contract information extraction, and it is difficult to meet the expected requirements of customers.
Therefore, a new and practical method for extracting contract information is needed to overcome the defects in the prior art, and truly reduce the workload of extracting the manual contract information.
Disclosure of Invention
In order to make up for the deficiency of the current industry development situation, the invention provides a contract information extraction method based on neural network and rule matching, which can be directly used for extracting contract contents of non-universal templates after single training, automatically classifies the acquired contents into information fields of all parties, and has high cloud deployment and safety.
In order to achieve the above object, an aspect of the present invention provides a method for extracting contract information based on a neural network, including:
extracting original text data from the contract text, and preprocessing the original text data to obtain preprocessed text data;
carrying out named entity recognition on the preprocessed text data through an NER entity recognition model to obtain named entities in the preprocessed text data;
identifying the preprocessed text data through a part-of-speech identification model to obtain parts of speech of all segmentation words in the preprocessed text data;
and completing the named entity in the preprocessed text data according to word segmentation parts of speech before and after the position of the named entity in the preprocessed text data in the original text data.
In some embodiments of the present invention, extracting original text data from the contracted text and preprocessing the original text data to obtain preprocessed text data includes:
and deleting the blank and punctuation marks in the original text data.
In some embodiments of the invention, the NER entity recognition model is based on LSTM neural network and is trained by Kears and Tensorflow frameworks.
In some embodiments of the present invention, identifying the preprocessed text data by the part-of-speech recognition model to obtain part-of-speech of all the segmented words in the preprocessed text data includes:
and performing part-of-speech tagging on the segmented words while performing word segmentation on the preprocessed text data by using a jieba word segmentation tool.
In some embodiments of the present invention, identifying the preprocessed text data by the part-of-speech recognition model to obtain part-of-speech of all the segmented words in the preprocessed text data further includes:
intercepting continuous text paragraphs formed by a preset number of segmentation words before and after the named entity in the preprocessed text data according to the named entity in the preprocessed text data;
manually checking part-of-speech tagging results of the word segments in the continuous text paragraphs, and establishing a manual part-of-speech table for the word segments with incorrect part-of-speech tagging;
in response to word segmentation of the pre-processed text using the jieba word segmentation tool, the artificial part-of-speech table is added to the part-of-speech table of the jieba word segmentation.
In some embodiments of the present invention, identifying the preprocessed text data by the part-of-speech recognition model to obtain part-of-speech of all the segmented words in the preprocessed text data further includes:
and intercepting continuous text paragraphs formed by a preset number of segmentation words before and after the named entity in the original text data according to the named entity in the preprocessed text data.
In some embodiments of the present invention, complementing the named entity in the pre-processed text data according to the part of speech of the word before and after the location of the named entity in the pre-processed text data in the original text data comprises:
judging whether the part of speech of the word segmentation before and after the position of the named entity in the original text data is the same as the part of speech of the named entity;
and combining the word segmentation before and after the position of the named entity in the original text data and the named entity into one named entity in response to the word segmentation before and after the position of the named entity in the original text data being the same as the word segmentation of the named entity.
In some embodiments of the present invention, complementing the named entity in the pre-processed text data according to the part of speech of the word before and after the location of the named entity in the pre-processed text data in the original text data further comprises:
and in response to the presence of punctuation marks between the named entity and the preceding and following words of the named entity in the position in the original text data, prohibiting the preceding and following words of the named entity in the position in the original text data from being combined with the named entity into one named entity.
Another aspect of the present invention also provides a system for extracting contract information based on a neural network, including:
the text processing module is configured to extract original text data from the contract text and preprocess the original text data to obtain preprocessed text data;
the text recognition module is configured to perform named entity recognition on the preprocessed text data through the NER entity recognition model to obtain named entities in the preprocessed text data;
the text part-of-speech tagging module is configured to identify the preprocessed text data through a part-of-speech identification model to obtain part of speech of all the segmented words in the preprocessed text data;
and the text entity completion module is configured to complete the named entities in the preprocessed text data according to word segmentation parts of speech before and after the named entities in the preprocessed text data are positioned in the original text data.
Still another aspect of the present invention provides a computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method according to any of the above embodiments.
The method for extracting the contract information based on the neural network provided by the invention judges the parts of speech of the named entities identified by the entity identification model and the front and rear word segmentation in the preprocessed text data or the original text data, and supplements the parts of speech to the named entities if the parts of speech are the same, thereby solving the problem that the named entity model identification can not effectively identify the correspondingly extremely long named entities under certain conditions, effectively improving the accuracy of the named entity identification in the contract information extraction and improving the credibility of the named entity model.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a method for extracting contract information based on a neural network according to an embodiment of the present invention;
fig. 2 is a system structure diagram of a contract information extraction system based on a neural network provided by the invention;
fig. 3 is a schematic structural diagram of a computer storage medium for extracting contract information based on a neural network according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention will be described in further detail with reference to the accompanying drawings.
As shown in fig. 1, in a first aspect of the embodiment of the present invention, a method for extracting contract information based on a neural network is provided, including:
s01, extracting original text data from contract text, and preprocessing the original text data to obtain preprocessed text data;
step S02, carrying out named entity recognition on the preprocessed text data through an NER entity recognition model to obtain named entities in the preprocessed text data;
s03, recognizing the preprocessed text data through a part-of-speech recognition model to obtain parts of speech of all segmentation words in the preprocessed text data;
and step S04, completing the named entity in the preprocessed text data according to the word segmentation parts of speech before and after the position of the named entity in the preprocessed text data in the original text data.
In this embodiment, the method for extracting contract information based on the neural network is applied to a web server, a user can upload a contract file carrying contract information to a corresponding server in a client or browser mode, the contract file is stored in the server in a binary mode, and when the user calls an API-Key format of a corresponding tenant to carry out authentication and calling, the server carries out information extraction on the corresponding contract information and carries out named entity identification according to the steps of the method.
In step S01, the contents in the contract document are read by the relevant document reading tool, and the document contents are preprocessed. The contract file is read as a PDF type contract file, for example, by a PDFminer tool in Python language, and the content is extracted as original text data. And extracting text contents in the contract file by using a Docx tool in Python language for the contract file of the Docx file type, and storing the text contents as original text data. Preprocessing the original text data after text extraction, removing useless characters in the original text data, and storing the useless characters as preprocessed text data.
In step S02, entity recognition is performed on the preprocessed text data, the trained NER entity recognition model is invoked, the preprocessed text data is input into the NER entity recognition model, and the output named entity of the entity recognition model is saved.
In step S03, word segmentation operation is performed on the preprocessed text data through the part-of-speech recognition model, and the part of speech of the segmented words is labeled, so as to obtain a word segmentation table and the part of speech of each segmented word in the word segmentation table.
In step S04, according to the named entity in step S02, the location of the named entity is found in the preprocessed text data, and the parts of speech of the word segmentation before and after the location of the named entity are obtained. And merging the named entity and the front and rear segmentation thereof according to whether the parts of speech are the same.
In some embodiments of the present invention, extracting original text data from the contracted text and preprocessing the original text data to obtain preprocessed text data includes:
and deleting the blank and punctuation marks in the original text data.
In this embodiment, to prevent the influence of invalid symbols or some repeated nonsensical characters on the NER entity naming model, the corresponding characters are deleted from the original text data. Specifically, preprocessing the original text data includes deleting spaces and punctuation marks in the original text data.
In some embodiments of the invention, the NER entity recognition model is based on LSTM neural network and is trained by Kears and Tensorflow frameworks.
In this embodiment, the NER entity recognition model adopts LSTM neural network algorithm, uses Kears and TensorFlow tools, and is trained based on people daily report chinese data (about 7 ten thousand sentences, about 250 ten thousand words), and is used for extracting common entities such as addresses, names, place names, and the like in a contract.
In some embodiments of the present invention, identifying the preprocessed text data by the part-of-speech recognition model to obtain part-of-speech of all the segmented words in the preprocessed text data includes:
and performing part-of-speech tagging on the segmented words while performing word segmentation on the preprocessed text data by using a jieba word segmentation tool.
In this embodiment, when performing word segmentation and part-of-speech tagging generation on the preprocessed text data, a jieba word segmentation tool is used to perform word segmentation and part-of-speech generation processing on the data.
In some embodiments of the present invention, identifying the preprocessed text data by the part-of-speech recognition model to obtain part-of-speech of all the segmented words in the preprocessed text data further includes:
intercepting continuous text paragraphs formed by a preset number of segmentation words before and after the named entity in the preprocessed text data according to the named entity in the preprocessed text data;
manually checking part-of-speech tagging results of the word segments in the continuous text paragraphs, and establishing a manual part-of-speech table for the word segments with incorrect part-of-speech tagging;
in response to word segmentation of the pre-processed text using the jieba word segmentation tool, the artificial part-of-speech table is added to the part-of-speech table of the jieba word segmentation.
In this embodiment, in some cases, the NER entity recognition model may not recognize some entities, for example, some people's names are strange, for example, when some words in the names are verbs, or some addresses are also verbs or some other words, which results in that both the NER entity recognition model and the jieba word segmentation tool cannot effectively process the words, and the recognition accuracy is reduced. In order to solve the problem that the NER entity recognition model and the jieba word segmentation tool judge the part of speech of certain specific words inaccurately, the NER entity recognition model and the jieba word segmentation result are corrected.
Specifically, according to the named entity output by the NER model, intercepting continuous text paragraphs formed by at least 5 word segments before and after the named entity in the preprocessed text data, and inquiring the corresponding part of speech of the text paragraphs in the word segmentation list. And manually judging the parts of speech and the real entity in the text paragraph according to the original text of the contract file, and storing the judgment result into a manual part of speech table. For example, when processing some place names, the place names are: "something is done in a certain area of a certain city. In the case of a fish-strike village, the fish-strike may be classified into a verb of one action at the time of word segmentation, and the "fish-strike village" may not be determined as one place name, so that there may be a problem of inaccurate recognition. Therefore, it is necessary to manually identify some special words, and make the identified result into a special part-of-speech table to be assigned to NER entity identification and Jieba word segmentation model for training to improve the accuracy thereof.
In some embodiments of the present invention, identifying the preprocessed text data by the part-of-speech recognition model to obtain part-of-speech of all the segmented words in the preprocessed text data further includes:
and intercepting continuous text paragraphs formed by a preset number of segmentation words before and after the named entity in the original text data according to the named entity in the preprocessed text data.
In this embodiment, when determining the context of the named entity, the continuous text is formed by at least 5 words in the original text data, and the boundary of the named entity can be accurately obtained by the position relationship of the paragraph symbol, i.e. punctuation mark, of the original text data, so as to prevent the situation that after the punctuation mark is removed, the tail end and the beginning end of two sentences are perfectly joined into a sentence or a name, and thus, recognition errors occur.
In some embodiments of the present invention, complementing the named entity in the pre-processed text data according to the part of speech of the word before and after the location of the named entity in the pre-processed text data in the original text data comprises:
judging whether the part of speech of the word segmentation before and after the position of the named entity in the original text data is the same as the part of speech of the named entity;
and combining the word segmentation before and after the position of the named entity in the original text data and the named entity into one named entity in response to the word segmentation before and after the position of the named entity in the original text data being the same as the word segmentation of the named entity.
In this embodiment, if the part of speech of the named entity identified by the NER entity identification model is the same as the part of speech of the named entity in the front and rear word segments of the original text data, the front and rear word segments of the named entity and the named entity are combined into a named entity. For example, address: the term part of speech indicates geographical nouns before and after the named entity "Chaoyang district", so the "Beijing city Chaoyang district Datun way" is taken as an integral named entity.
In some embodiments of the present invention, complementing the named entity in the pre-processed text data according to the part of speech of the word before and after the location of the named entity in the pre-processed text data in the original text data further comprises:
and in response to the presence of punctuation marks between the named entity and the preceding and following words of the named entity in the position in the original text data, prohibiting the preceding and following words of the named entity in the position in the original text data from being combined with the named entity into one named entity.
In this embodiment, if punctuation marks or paragraph segmentation marks "/n" exist before and after the named entity, the combination of the named entity and the front and rear segmentation is stopped.
As shown in fig. 2, another aspect of the present invention further provides a system for extracting contract information based on a neural network, including:
the text processing module 1 is configured to extract original text data from contract text, and preprocess the original text data to obtain preprocessed text data;
the text recognition module 2 is configured to perform named entity recognition on the preprocessed text data through the NER entity recognition model to obtain named entities in the preprocessed text data;
the text part-of-speech tagging module 3 is configured to identify the preprocessed text data through a part-of-speech identification model to obtain part of speech of all the segmented words in the preprocessed text data;
and the text entity completion module 4 is configured to complete the named entities in the preprocessed text data according to word segmentation parts of speech before and after the named entities in the preprocessed text data are located in the original text data.
As shown in fig. 3, a further aspect of the present invention further provides a computer readable storage medium 401 storing a computer program 402, the computer program 402 implementing the steps of the method according to any of the above embodiments when being executed by a processor.
The method for extracting the contract information based on the neural network provided by the invention judges the parts of speech of the named entities identified by the entity identification model and the front and rear word segmentation in the preprocessed text data or the original text data, and supplements the parts of speech to the named entities if the parts of speech are the same, thereby solving the problem that the named entity model identification can not effectively identify the correspondingly extremely long named entities under certain conditions, effectively improving the accuracy of the named entity identification in the contract information extraction and improving the credibility of the named entity model.
The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.
Those of ordinary skill in the art will appreciate that: the above discussion of any embodiment is merely exemplary and is not intended to imply that the scope of the disclosure of embodiments of the invention, including the claims, is limited to such examples; combinations of features of the above embodiments or in different embodiments are also possible within the idea of an embodiment of the invention, and there are many other variations of the different aspects of the embodiments of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omission, modification, equivalent replacement, improvement, etc. of the embodiments should be included in the protection scope of the embodiments of the present invention.

Claims (9)

1. A method for extracting contract information based on a neural network, comprising:
extracting original text data from the contract text, and preprocessing the original text data to obtain preprocessed text data;
carrying out named entity recognition on the preprocessed text data through an NER entity recognition model to obtain named entities in the preprocessed text data;
identifying the preprocessed text data through a part-of-speech identification model to obtain parts of speech of all segmentation words in the preprocessed text data;
the method for complementing the named entities in the preprocessed text data according to the word segmentation parts of speech before and after the positions of the named entities in the preprocessed text data in the original text data comprises the following steps:
judging whether the part of speech of the word segmentation before and after the position of the named entity in the original text data is the same as the part of speech of the named entity;
and combining the word segmentation before and after the position of the named entity in the original text data and the named entity into one named entity in response to the word segmentation before and after the position of the named entity in the original text data being the same as the word segmentation of the named entity.
2. The method of claim 1, wherein extracting the original text data from the contracted text and preprocessing the original text data to obtain the preprocessed text data comprises:
and deleting the blank and punctuation marks in the original text data.
3. The method of claim 1, wherein the NER entity recognition model is based on LSTM neural network and is trained from Kears and Tensorflow frameworks.
4. The method of claim 1, wherein the identifying the pre-processed text data by the part-of-speech recognition model to obtain the part of speech of all the tokens in the pre-processed text data comprises:
and performing part-of-speech tagging on the segmented words while performing word segmentation on the preprocessed text data by using a jieba word segmentation tool.
5. The method of claim 4, wherein the identifying the pre-processed text data by the part-of-speech recognition model to obtain part-of-speech of all the tokens in the pre-processed text data further comprises:
intercepting continuous text paragraphs formed by a preset number of segmentation words before and after the named entity in the preprocessed text data according to the named entity in the preprocessed text data;
manually checking part-of-speech tagging results of the word segments in the continuous text paragraphs, and establishing a manual part-of-speech table for the word segments with incorrect part-of-speech tagging;
in response to word segmentation of the pre-processed text using the jieba word segmentation tool, the artificial part-of-speech table is added to the part-of-speech table of the jieba word segmentation.
6. The method of claim 5, wherein the identifying the pre-processed text data by the part-of-speech recognition model to obtain part-of-speech of all the tokens in the pre-processed text data further comprises:
and intercepting continuous text paragraphs formed by a preset number of segmentation words before and after the named entity in the original text data according to the named entity in the preprocessed text data.
7. The method of claim 1, wherein the complementing named entities in the pre-processed text data according to word parts of speech before and after the location of the named entities in the pre-processed text data in the original text data further comprises:
and in response to the presence of punctuation marks between the named entity and the preceding and following words of the named entity in the position in the original text data, prohibiting the preceding and following words of the named entity in the position in the original text data from being combined with the named entity into one named entity.
8. A neural network-based contract information extraction system, comprising:
the text processing module is configured to extract original text data from the contract text and preprocess the original text data to obtain preprocessed text data;
the text recognition module is configured to perform named entity recognition on the preprocessed text data through the NER entity recognition model to obtain named entities in the preprocessed text data;
the text part-of-speech tagging module is configured to identify the preprocessed text data through a part-of-speech identification model to obtain part of speech of all the segmented words in the preprocessed text data;
the text entity completion module is configured to complete the named entities in the preprocessed text data according to word segmentation parts of speech before and after the named entities in the preprocessed text data are located in the original text data;
the text entity completion module is further configured to: judging whether the part of speech of the word segmentation before and after the position of the named entity in the original text data is the same as the part of speech of the named entity;
and combining the word segmentation before and after the position of the named entity in the original text data and the named entity into one named entity in response to the word segmentation before and after the position of the named entity in the original text data being the same as the word segmentation of the named entity.
9. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method of any one of claims 1-7.
CN202111016995.7A 2021-08-31 2021-08-31 Contract information extraction method, system and storage medium based on neural network Active CN113553852B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111016995.7A CN113553852B (en) 2021-08-31 2021-08-31 Contract information extraction method, system and storage medium based on neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111016995.7A CN113553852B (en) 2021-08-31 2021-08-31 Contract information extraction method, system and storage medium based on neural network

Publications (2)

Publication Number Publication Date
CN113553852A CN113553852A (en) 2021-10-26
CN113553852B true CN113553852B (en) 2023-06-20

Family

ID=78106333

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111016995.7A Active CN113553852B (en) 2021-08-31 2021-08-31 Contract information extraction method, system and storage medium based on neural network

Country Status (1)

Country Link
CN (1) CN113553852B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109166608A (en) * 2018-09-17 2019-01-08 新华三大数据技术有限公司 Electronic health record information extracting method, device and equipment
CN111144118A (en) * 2019-12-26 2020-05-12 携程计算机技术(上海)有限公司 Method, system, device and medium for identifying named entities in spoken text
CN111178079A (en) * 2019-12-31 2020-05-19 北京明略软件系统有限公司 Triple extraction method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8214346B2 (en) * 2008-06-27 2012-07-03 Cbs Interactive Inc. Personalization engine for classifying unstructured documents

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109166608A (en) * 2018-09-17 2019-01-08 新华三大数据技术有限公司 Electronic health record information extracting method, device and equipment
CN111144118A (en) * 2019-12-26 2020-05-12 携程计算机技术(上海)有限公司 Method, system, device and medium for identifying named entities in spoken text
CN111178079A (en) * 2019-12-31 2020-05-19 北京明略软件系统有限公司 Triple extraction method and device

Also Published As

Publication number Publication date
CN113553852A (en) 2021-10-26

Similar Documents

Publication Publication Date Title
US9384389B1 (en) Detecting errors in recognized text
CN108664474B (en) Resume analysis method based on deep learning
CN109670494B (en) Text detection method and system with recognition confidence
CN105260727A (en) Academic-literature semantic restructuring method based on image processing and sequence labeling
CN110909123B (en) Data extraction method and device, terminal equipment and storage medium
CN112580308A (en) Document comparison method and device, electronic equipment and readable storage medium
CN112766255A (en) Optical character recognition method, device, equipment and storage medium
CN113935710A (en) Contract auditing method and device, electronic equipment and storage medium
CN111178080A (en) Named entity identification method and system based on structured information
CN114003692A (en) Contract text information processing method and device, computer equipment and storage medium
CN112464927B (en) Information extraction method, device and system
CN113553852B (en) Contract information extraction method, system and storage medium based on neural network
CN112036330A (en) Text recognition method, text recognition device and readable storage medium
CN114579796B (en) Machine reading understanding method and device
CN116110066A (en) Information extraction method, device and equipment of bill text and storage medium
CN112348022B (en) Free-form document identification method based on deep learning
EP4167106A1 (en) Method and apparatus for data structuring of text
Agamamidi et al. Extraction of textual information from images using mobile devices
Desai et al. An approach for Text Recognition from Document Images
Shimomura et al. Construction of restoration system for old books written in braille
Banerjee et al. Quote examiner: verifying quoted images using web-based text similarity
KR102646428B1 (en) Method and apparatus for extracting similar letters using artificial intelligence learning model
CN113989822B (en) Picture table content extraction method based on computer vision and natural language processing
CN117807291B (en) Intelligent identification interaction processing method and platform for business materials
Kurhekar et al. Automated text and tabular data extraction from scanned document images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant