CN109190098A - A kind of document automatic creation method and system based on natural language processing - Google Patents
A kind of document automatic creation method and system based on natural language processing Download PDFInfo
- Publication number
- CN109190098A CN109190098A CN201810928628.6A CN201810928628A CN109190098A CN 109190098 A CN109190098 A CN 109190098A CN 201810928628 A CN201810928628 A CN 201810928628A CN 109190098 A CN109190098 A CN 109190098A
- Authority
- CN
- China
- Prior art keywords
- document
- processing
- data
- text
- natural language
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 49
- 238000003058 natural language processing Methods 0.000 title claims abstract description 48
- 238000012545 processing Methods 0.000 claims abstract description 97
- 238000000605 extraction Methods 0.000 claims abstract description 74
- 230000011218 segmentation Effects 0.000 claims abstract description 23
- 238000010276 construction Methods 0.000 claims abstract description 6
- 238000013507 mapping Methods 0.000 claims description 23
- 238000003860 storage Methods 0.000 claims description 20
- 239000000284 extract Substances 0.000 claims description 9
- 238000006243 chemical reaction Methods 0.000 claims description 8
- 238000010801 machine learning Methods 0.000 claims description 6
- 238000012790 confirmation Methods 0.000 claims description 5
- 238000013500 data storage Methods 0.000 claims description 5
- 238000005259 measurement Methods 0.000 claims description 5
- 238000012795 verification Methods 0.000 claims description 5
- 230000002547 anomalous effect Effects 0.000 claims description 4
- 230000008439 repair process Effects 0.000 claims description 4
- 230000002159 abnormal effect Effects 0.000 claims 1
- 230000008569 process Effects 0.000 description 12
- 238000004458 analytical method Methods 0.000 description 11
- 238000012015 optical character recognition Methods 0.000 description 10
- 238000005516 engineering process Methods 0.000 description 8
- 230000003287 optical effect Effects 0.000 description 4
- 238000013527 convolutional neural network Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 238000012937 correction Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 235000013399 edible fruits Nutrition 0.000 description 2
- 239000000835 fiber Substances 0.000 description 2
- 238000011835 investigation Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
- 244000025254 Cannabis sativa Species 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 238000009412 basement excavation Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- XDDAORKBJWWYJS-UHFFFAOYSA-N glyphosate Chemical compound OC(=O)CNCP(O)(O)=O XDDAORKBJWWYJS-UHFFFAOYSA-N 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 238000011430 maximum method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000000547 structure data Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 238000007794 visualization technique Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a kind of document automatic creation method and system based on natural language processing, can automatically generate the report file of professional domain.Its technical solution are as follows: classified automatically to the original document of input, the original document based on different classifications carries out alignment processing, respectively obtains intermediate data and structural data;Word segmentation processing, Entity recognition, Relation extraction, event extraction and construction of knowledge base are carried out to intermediate data, the data extracted are stored in database as structural data;Document template is selected according to the Doctype of output, document assembling is carried out in conjunction with the structural data got, exports final destination document.
Description
Technical field
The present invention relates to documents to automatically generate field, and in particular to the document Auto in terms of legal analysis.
Background technique
In legal field, lawyer usually needs to check large volume document, including legal person's main body situation, equity structure, business
Range, business license, great assets and business contract, lawsuit/arbitration cases etc., pass through the methods of field investigation, interview and determine
Situation, and then write and generate corresponding law report, Analysis of Policy Making is provided.
The analysis report of law practitioner needs manual analysis legal person's main body situation, equity structure, range of business, business
The various documentations such as license, great assets and business contract, lawsuit/arbitration cases, really by the methods of field investigation, access
Recognize situation, arranges extract key message by hand, the report of conclusion needed for generating.The row of this method dependence law practitioner many years
Industry experience accumulation, it is difficult to which scale application uses full realm information, has the higher extensive threshold of study.
Summary of the invention
A brief summary of one or more aspects is given below to provide to the basic comprehension in terms of these.This general introduction is not
The extensive overview of all aspects contemplated, and be both not intended to identify critical or decisive element in all aspects also non-
Attempt to define the range in terms of any or all.Its unique purpose is to provide the one of one or more aspects in simplified form
A little concepts are with the sequence for more detailed description given later.
The purpose of the present invention is to solve the above problem, provides a kind of document based on natural language processing and automatically generates
Method and system, can automatically generating the report file of professional domain, (such as automatically generating, there is primary legal industry to analyze personnel
The intelligent law report of ability).
The technical solution of the present invention is as follows: present invention discloses a kind of document side of automatically generating based on natural language processing
Method, comprising:
Step 1: being classified automatically to the original document of input, the original document based on different classifications carries out corresponding position
Reason, respectively obtains intermediate data and structural data;
Step 2: word segmentation processing, Entity recognition, Relation extraction, event extraction and construction of knowledge base are carried out to intermediate data,
The data extracted are stored in database as structural data;
Step 3: document template being selected according to the Doctype of output, carries out sets of documentation in conjunction with the structural data got
Dress, exports final destination document.
One embodiment of the document automatic creation method according to the present invention based on natural language processing, step 1 are further
Include:
Determine data acquisition demand;
According to the original document of input, the file type of each original document is obtained, and then various differences can be distinguished
The original document of type;
Photo-document is judged whether it is, if not after photo-document then first carries out the pictured processing of original document progress again
Continuous step then directly carries out subsequent step if photo-document;
Document classification is carried out based on image procossing;
According to document classification judge document whether be fixed format document, be then based on machine if it is the document of fixed format
Device study to fixed-format document carry out information extraction obtain structural data, if not fixed format document then carry out after
Continuous step;
Judge whether document supports text directly to extract, obtains it from original document if supporting text directly to extract
In content of text and be stored as intermediate data, subsequent step is carried out if not supporting text directly to extract;
Document is identified, by the text conversion in image at text formatting;
Content reparation is carried out to the text identified based on natural language processing, the data after reparation are stored as mediant
According to.
One embodiment of the document automatic creation method according to the present invention based on natural language processing, step 2 are further
Include:
Word segmentation processing is carried out to intermediate data;
Data after word segmentation processing carry out Entity recognition processing;
Relation extraction is carried out to the data after Entity recognition, obtains in text existing grammer between entity or semantically
Connection;
Event extraction is carried out to the data after Relation extraction, required interest is extracted from the text containing event information
Event information will be presented in the form of structuring with the event of natural language expressing;
Knowledge mapping checking treatment is carried out to the data after event extraction, according to the reality got from multiple documents
The relevant knowledge mapping of the information architecture of body, relationship and event, mutual confirmation and the automatic discovery of anomalous event for information;
Data after knowledge mapping checking treatment form structural data.
One embodiment of the document automatic creation method according to the present invention based on natural language processing, at Relation extraction
Reference resolution processing is carried out before reason, also to improve the accuracy that follow-up extracts result.
One embodiment of the document automatic creation method according to the present invention based on natural language processing, step 3 are further
Include:
Based on structural data, different Task Tree coordinates measurement reports is selected according to the destination document type of required output
It accuses;
Processing stage based on current document carries out corresponding processing: according to template if document is in intermediate treatment stage
The rough draft document for automatically generating professional domain automatically generates professional domain according to template if document is in the final output stage
Official documentation.
Present invention further teaches a kind of document automatic creation system based on natural language processing, system include:
Original document processing module classifies automatically to the original document of input, the original document based on different classifications
Alignment processing is carried out, intermediate data and structural data are respectively obtained;
Intermediate data processing module, to intermediate data carry out word segmentation processing, Entity recognition, Relation extraction, event extraction and
Construction of knowledge base, the data extracted are stored in database as structural data;And
Destination document automatically-generating module selects document template according to the Doctype of output, in conjunction with the structure got
Change data and carry out document assembling, exports final destination document.
One embodiment of the document automatic creation system according to the present invention based on natural language processing, original document processing
Module further comprises:
Demand formulates unit, determines data acquisition demand;
Doctype analytical unit obtains the file type of each original document, Jin Erke according to the original document of input
To distinguish various different types of original documents;
The pictured processing unit of document, first judges whether it is photo-document, if not photo-document is then first to original document
The pictured processing for handling and carrying out subsequent cell again is carried out, the processing of subsequent cell is then directly carried out if photo-document;
Document classification unit carries out document classification based on image procossing;
Fixed-format document information extraction unit, first according to document classification judge document whether be fixed format document,
If it is fixed format document be then based on machine learning to fixed-format document carry out information extraction obtain structural data, such as
Fruit is not that the document of fixed format then carries out the processing of subsequent cell;
The direct extraction unit of content of text, first judges whether document supports text directly to extract, if supporting text direct
Extraction then obtains content of text therein from original document and is stored as intermediate data, if not supporting text directly to extract
Carry out the processing of subsequent cell;
Text identification unit, identifies document, by the text conversion in image at text formatting;
Content repairs unit, carries out content reparation to the text identified based on natural language processing, the data after reparation
It is stored as intermediate data.
One embodiment of the document automatic creation system according to the present invention based on natural language processing, intermediate data processing
Module further comprises:
Word segmentation processing unit carries out word segmentation processing to intermediate data;
Entity recognition unit, the data after word segmentation processing carry out Entity recognition processing;
Relation extraction unit carries out Relation extraction to the data after Entity recognition, obtains existing between entity in text
Grammer or connection semantically;
Event extraction unit carries out event extraction to the data after Relation extraction, takes out from the text containing event information
Interesting event information needed for taking out, will be presented in the form of structuring with the event of natural language expressing;
Knowledge mapping verification unit carries out knowledge mapping checking treatment to the data after event extraction, according to from multiple texts
The relevant knowledge mapping of information architecture of entity, relationship and event that shelves have been got, mutual for information are confirmed and different
The automatic discovery of ordinary affair part;
Structural data storage unit, the data after knowledge mapping checking treatment are stored into structural data.
One embodiment of the document automatic creation system according to the present invention based on natural language processing, intermediate data processing
Module further include:
Reference resolution unit also carries out reference resolution processing before Relation extraction processing, to improve follow-up extraction
As a result accuracy.
One embodiment of the document automatic creation system according to the present invention based on natural language processing, destination document are automatic
Generation module further comprises:
Template selection unit is based on structural data, selects different tasks according to the destination document type of required output
Coordinates measurement report is set, including the different template of selection;
Destination document generation unit, the processing stage based on current document carry out corresponding processing: if document is in centre
Processing stage then automatically generates the rough draft document of professional domain according to template, according to template if document is in the final output stage
Automatically generate the official documentation of professional domain.
Present invention discloses a kind of document automatic creation system based on natural language processing, comprising:
Processor;And
Memory, the memory be configured as the executable instruction of storage series of computation machine and with it is described a series of
The executable associated computer-accessible data of instruction of computer,
Wherein, when the instruction that the series of computation machine can be performed is executed by the processor, so that the processor
Carry out method above-mentioned.
Present invention discloses a kind of non-transitorycomputer readable storage mediums, which is characterized in that the non-transitory meter
The executable instruction of series of computation machine is stored on calculation machine readable storage medium storing program for executing, when a series of executable instructions are counted
When calculating device execution, so that computing device carries out method above-mentioned.
The present invention comparison prior art has following the utility model has the advantages that present invention combination professional domain (such as legal field) knowledge
With the technologies such as natural language processing, by extracting to magnanimity document classification, OCR, NLP is repaired, Chinese and proprietary term participle, real
Multiple path combinations such as body identification, event extraction, template, generation are ultimately generated with analysis personnel's energy primary in professional domain
The intelligence report (such as intelligent law report with primary legal industry analysis personnel ability) of power is simultaneously applied.For intelligence
Energy law report, then can be widely applied to merging and acquisition, security IPO (Initial Public Offering), financial institution loan, restructure, again
The laws scenes such as big assets transfer.
Detailed description of the invention
After the detailed description for reading embodiment of the disclosure in conjunction with the following drawings, it better understood when of the invention
Features described above and advantage.In the accompanying drawings, each component is not necessarily drawn to scale, and has similar correlation properties or feature
Component may have same or similar appended drawing reference.
Fig. 1 shows the process of an embodiment of the document automatic creation method of the invention based on natural language processing
Figure.
Fig. 2 shows the flow charts of the step S1 in embodiment shown in FIG. 1.
Fig. 3 shows the flow chart of the step S2 in embodiment shown in FIG. 1.
Fig. 4 shows the flow chart of the step S3 in embodiment shown in FIG. 1.
Fig. 5 shows the principle of an embodiment of the document automatic creation system of the invention based on natural language processing
Figure.
Fig. 6 shows the schematic diagram of the original document processing module in embodiment shown in fig. 5.
Fig. 7 shows the schematic diagram of the intermediate data processing module in embodiment shown in fig. 5.
Fig. 8 shows the schematic diagram of the destination document automatically-generating module in embodiment shown in fig. 5.
Specific embodiment
Below in conjunction with the drawings and specific embodiments, the present invention is described in detail.Note that below in conjunction with attached drawing and specifically real
The aspects for applying example description is merely exemplary, and is understood not to carry out any restrictions to protection scope of the present invention.
Fig. 1 shows the process of an embodiment of the document automatic creation method of the invention based on natural language processing.
Referring to Figure 1, details are as follows for the implementation steps of the method for the present embodiment.
Step S1: classifying automatically to the original document of input, and the original document based on different classifications carries out corresponding position
Reason, respectively obtains intermediate data and structural data.
Specifically, shown in Figure 2, this is the micronization processes of step S1.
Step S101: data acquisition demand is determined.
It is discussed based on the scene of professional domain (such as legal field), expert team and product team and determines specific crucial number
According to the demand of acquisition.
Step S102: determining original document type is analyzed.
According to the original document of input, the file type of each original document is obtained, and then various differences can be distinguished
The original document of type.
Step S103: judge whether it is photo-document.
According to the original document type got, judge whether to belong to photo-document.If it is picture type document then
S105 is entered step, then enters step S104 if not the document of picture type.
Step S104: pictured processing is carried out to original document.
The process of pictured processing, which is generally comprised, carries out correction process, including offset correction, denoising pond to original document
Deng, finally output conversion after photo-document.
Step S105: the document classification based on image procossing.
The document classification of this step refers to using the convolutional neural networks model in deep learning, passes through multilayer convolutional Neural
Then the identification model of network struction document identifies photo-document input neural network, document is carried out after the completion of identification
Affiliated classification.
Step S106: according to document classification judge document whether be fixed format document.Then turn if it is fixed format
Enter step S112, step S107 is then transferred to if not fixed format.
For legal field, the document of fixed format includes business license, Tax Registration Certificate, patent certificate, identity card etc..
Step S107: judge whether document supports text directly to extract.It is transferred to step S110 if supporting, if do not propped up
It holds, is transferred to step S108.
Step S108: OCR identification is carried out to document.
For the document that can not directly acquire word content, such as scanning PDF, picture etc., OCR identification is carried out, it will be original
Word segment in document identifies.
OCR full name Optical Character Recognition, i.e. optical character identification refer to using optical side
Text conversion in paper document is become the image file of black and white lattice by formula, and is turned the text in image by identification software
Change text formatting into, the technology further edited and processed for word processor.
Step S109: the content that OCR is identified is repaired based on NLP.
Due to picture quality etc., there are several mistakes for the text that OCR is identified, by identification problem and NLP model knot
Altogether, transition probability calculating is carried out to the text that OCR is identified, word judgement is then carried out according to semanteme, to transition probability
The word of very low word and Semantic judgement mistake is modified, and improves the accuracy rate of text conversion.
NLP full name is Natural Language Processing, i.e. natural language processing, NLP is artificial intelligence (AI)
A subdomains, be one fusion artificial intelligence and linguistics, computer science scheduling theory technology cross discipline, include
The technologies such as participle, part-of-speech tagging, Entity recognition, keyword abstraction, interdependent syntactic analysis, time phrase identification, cluster, reasoning.
It has been successfully applied to the fields such as recommender system, public sentiment monitoring, interactive voice at present.The present invention is to be applied to natural language processing
Professional domain as such as law documentation analysis field, handles the massive information of enterprise, and then extracts lawyer pass
The related data of the heart carries out situation awareness and document output accordingly, maximizes and reduces unnecessary manual labor, and help lawyer mentions
High working efficiency.
Step S110: for document (such as word, excel and the division format PDF that can support to obtain word content
Deng), original document is read, content of text therein is got and is stored as intermediate data.
Step S111: it is produced by the direct Word Input of document, through OCR identification and NLP content reparation initial available
Valid data, which has had been provided with preliminary analysis and research value.
Step S112: information extraction is carried out to fixed-format document based on machine learning.
For similar fixed-format document is had determined, using the convolutional neural networks model in machine learning, pass through
Multilayer convolutional neural networks construct the identification model of specific format document, and fixed-format document is inputted neural network, obtain special
Determine area image, Text region then is carried out to the specific region image got, is structuring the text output identified
Data.
Step S2: word segmentation processing (canonical matching), Entity recognition, reference resolution, Relation extraction, thing are carried out to intermediate data
Part extracts and construction of knowledge base, and the data extracted are stored in database as structural data.
Specifically, shown in Figure 3, this is the micronization processes of step S2.
Step S201: word segmentation processing is carried out to intermediate data.
Intermediate data refers to that by original document treated valid data, these data portions are straight from original document
The text information taken is obtained, is partially the text information that first OCR identification is exported through NLP content reparation again, these information are intermediate
The importation of data processing.
Word segmentation processing is to complete participle using the participle technique of natural language processing to act (including word, phrase and phrase
Cutting), the present embodiment, which combines Forward Maximum Method method and reverse maximum matching process, constitutes bi-directional matching method to mention
Rise participle correctness.The present embodiment can get out the universaling dictionary on basis and the special term of professional domain (such as legal industry) in advance
Allusion quotation is conducive to the participle effect for promoting the professional domains documents such as legal industry in this way.
Step S202: Entity recognition processing.
Naming Entity recognition (full name Named Entity Recognition, abbreviation NER) is the basic of information extraction
Work, task is out of Party A, Party B, target, the amount of money, liability for breach of contract, the time identified in contract in text etc.
Hold, for another example the date in asset examination report, accounting firm's title, report number, certification of registered capital result etc., and the corresponding mark of addition for it
Information is infused, provides convenience for information extraction follow-up work.
Step S203: reference resolution processing.
Reference is a kind of common language phenomenon, is generally divided into and refers to and refer to altogether two kinds, refers to refer to current anaphor
There are close semantic relevances with the word, phrase or sentence that occur above;Refer to altogether and is then primarily referred to as multiple nouns (including code name
Word, noun phrase) it is directed toward the same reference body in real world.Reference resolution can simplify, the form of presentation of consolidated entity, right
The accuracy for improving information extraction result has very big facilitation.
Step S204: Relation extraction processing.
The effect of Relation extraction is to obtain in text existing grammer between entity or connection semantically, Relation extraction are
Key link in information extraction.The MBL method and SVM method of comprehensive use pattern matching, dictionary driving, machine learning, into
And detection judgement is carried out to multi-method effect, export optimal solution.
Step S205: event extraction processing.
In information extraction, event refer to it is occurring in some specific time slice and territorial scope, by one or
Something that multiple roles participate in, are made of one or more movements, usually Sentence-level.Event extraction (Event
Extraction main target) is that required interesting event information is extracted from the text containing event information, will be with certainly
The event of right language expression is showed in the form of structuring.
Step S206: knowledge mapping checking treatment.
Knowledge mapping verification is to have got the information such as entity, relationship and event according to from multiple documents, and building is related
Knowledge mapping, for the mutual confirmation of information and the automatic discovery of anomalous event, such as multistage shareholder's information in legal field
Combination discovery connected transaction, the proof document missing of property ownership certificate with contract information etc..
Mapping knowledge domains abbreviation KG, full name Knowledge Graph/Vault are explicit knowledge's development process and structure
A series of a variety of different figures of relationship, describe knowledge resource and its carrier with visualization technique, excavate, analyze, construct, draw
System and explicit knowledge and connecting each other between them.
Step S207: structural data is formed.
Structural data includes the information needed extracted, these information automatically generate the document for being used for the later period.
Step S3: selecting document template according to the Doctype of output, carries out document in conjunction with the structural data got
Assembling, exports final document.
Specifically, shown in Figure 4, this is the micronization processes of step S3.
Step S301: destination document type judgement.
Different Task Tree coordinates measurement reports is selected according to the destination document type of required output based on structural data
It accuses.
In the present embodiment, type is divided into Excel document, Word document and PPT document.If the document class of target output
Type is Word document, then selects corresponding Word document template.If the Doctype of target output is Excel document, select
Select corresponding Excel document template.If the Doctype of target output is PPT document, corresponding PPT document mould is selected
Plate.
Step S302: judge the document process stage.
Judge that current document generation phase, document are likely to be at the stage of intermediate treatment process, it is also possible to be in finally just
Formula output stage.It is transferred to step S304 if the stage in intermediate treatment process, if turning in final formal output stage
Enter step S303.
Step S303: in the case where being currently at the final output stage, professional domain is being automatically generated just according to template
Formula document.
It is automatically generated the official documentation of law report in the present embodiment.Law report is that lawyer provides legal services
A kind of comprehensive written document, content include providing legal basis, legislative advice and the side solved the problems, such as to consultant
Case.Law report is widely used in merging and acquisition, security IPO (Initial Public Offering), financial institution loan, restructures, great assets
Transfer the possession of etc..
Report Auto refers to the pattern according to document template, is filled in conjunction with several information of acquisition, into
And a kind of technology reported needed for automatically generating out, the technology have more universal application in various industries.
Step S304: in the case where being currently at intermediate treatment stage, the grass of professional domain is automatically generated according to template
Original text document.
It is automatically generated rough draft (draft) document of law report in the present embodiment.
Fig. 5 shows the principle of an embodiment of the document automatic creation system of the invention based on natural language processing.
Refer to Fig. 5, the document automatic creation system of the present embodiment include: original document processing module, intermediate data processing module and
Destination document automatically-generating module.
Original document processing module is for classifying automatically to the original document of input, the original text based on different classifications
Shelves carry out alignment processing, respectively obtain intermediate data and structural data.
Intermediate data processing module is used to carry out word segmentation processing, Entity recognition, Relation extraction, event extraction to intermediate data
And construction of knowledge base, the data extracted are stored in database as structural data.
Destination document automatically-generating module is used to select document template according to the Doctype of output, in conjunction with the knot got
Structure data carry out document assembling, export final destination document.
As shown in fig. 6, the original document processing module of the present embodiment includes: that demand formulates unit, Doctype analysis list
Member, the pictured processing unit of document, document classification unit, fixed-format document information extraction unit, content of text directly extract
Unit, text identification unit, content repair unit.
Demand formulates unit for determining data acquisition demand.
Doctype analytical unit is used for the original document according to input, obtains the file type of each original document, into
And various different types of original documents can be distinguished.
The pictured processing unit of document is for first judging whether it is photo-document, if not photo-document is then first to original text
Shelves carry out the pictured processing for handling and carrying out subsequent cell again, and the processing of subsequent cell is then directly carried out if photo-document.
Document classification unit is used to carry out document classification based on image procossing.
Fixed-format document information extraction unit for first according to document classification judge document whether be fixed format text
Shelves, if it is fixed format document be then based on machine learning to fixed-format document carry out information extraction obtain structuring number
According to then carrying out the processing of subsequent cell if not the document of fixed format.
The direct extraction unit of content of text is for first judging whether document supports text directly to extract, if supporting text straight
It connects extraction then to obtain content of text therein from original document and be stored as intermediate data, if not supporting text directly to extract
Then carry out the processing of subsequent cell.
Text identification unit is for identifying document, by the text conversion in image at text formatting.
Content repairs unit and is used to carry out content reparation to the text identified based on natural language processing, the number after reparation
According to being stored as intermediate data.
As shown in fig. 7, the intermediate data processing module of the present embodiment includes: word segmentation processing unit, Entity recognition unit, refers to
Generation resolution unit, Relation extraction unit, event extraction unit, knowledge mapping verification unit, structural data storage unit.
Word segmentation processing unit is used to carry out word segmentation processing to intermediate data.
Entity recognition unit is used for the data progress Entity recognition processing after word segmentation processing.
Reference resolution unit is used for the reference resolution processing before Relation extraction processing, extracts knot to improve follow-up
The accuracy of fruit.
Relation extraction unit is used to carry out Relation extraction to the data after Entity recognition, obtains in text and exists between entity
Grammer or connection semantically.
Event extraction unit is used to carry out event extraction to the data after Relation extraction, from the text containing event information
Required interesting event information is extracted, will be presented in the form of structuring with the event of natural language expressing.
Knowledge mapping verification unit is used to carry out knowledge mapping checking treatment to the data after event extraction, according to from multiple
The relevant knowledge mapping of information architecture for entity, relationship and the event that document has been got, for information it is mutual confirmation and
The automatic discovery of anomalous event.
Structural data storage unit is stored for the data after knowledge mapping checking treatment into structural data.
As shown in figure 8, the destination document automatically-generating module of the present embodiment includes: that template selection unit and destination document are raw
At unit.
Template selection unit is used to be based on structural data, selects different appoint according to the destination document type of required output
Business tree coordinates measurement report, including the different template of selection.
Destination document generation module carries out corresponding processing for the processing stage based on current document: if during document is in
Between processing stage the rough draft document of professional domain is then automatically generated according to template, according to mould if document is in the final output stage
Plate automatically generates the official documentation of professional domain.
In addition, present invention further teaches a kind of document automatic creation system based on natural language processing, system includes place
Device and memory are managed, memory is configured as the executable instruction of storage series of computation machine and can hold with series of computation machine
The associated computer-accessible data of capable instruction, wherein when the instruction that this family computer can be performed is by processor
When execution, so that processor carries out method above-mentioned.
Present invention further teaches a kind of non-transitorycomputer readable storage medium, non-transitory computer-readable storage mediums
The executable instruction of series of computation machine is stored in matter to be made when a series of this executable instruction is executed by a computing apparatus
It obtains computing device and carries out method above-mentioned.
The specific implementation of method is described in detail in the aforementioned embodiment, and details are not described herein.
In addition to the report for the legal industry being related in previous embodiment automatically generates, News Field can also be applied to,
Based on being excavated with search system by news, with the form tissue news agregator of news topic and entity, by media event
News topic excavation, the relationship analysis of news property, the extraction of theme of news sentence are carried out, the relevant statistics of a large amount of media events is obtained
Data and semantic description.The information that these are excavated from news agregator is retouched in the form of chart, table, text paragraph
It states, the material as symposium.Finally according to the style of writing feature tissue material of symposium, automatically generate brief, objective, more
The news roundup at visual angle is reported.
Although for simplify explain the above method is illustrated to and is described as a series of actions, it should be understood that and understand,
The order that these methods are not acted is limited, because according to one or more embodiments, some movements can occur in different order
And/or with from it is depicted and described herein or herein it is not shown and describe but it will be appreciated by those skilled in the art that other
Movement concomitantly occurs.
Those skilled in the art will further appreciate that, the various illustratives described in conjunction with the embodiments described herein
Logic plate, module, circuit and algorithm steps can be realized as electronic hardware, computer software or combination of the two.It is clear
Explain to Chu this interchangeability of hardware and software, various illustrative components, frame, module, circuit and step be above with
Its functional form makees generalization description.Such functionality be implemented as hardware or software depend on concrete application and
It is applied to the design constraint of total system.Technical staff can realize every kind of specific application described with different modes
Functionality, but such realization decision should not be interpreted to cause departing from the scope of the present invention.
General place can be used in conjunction with various illustrative logic plates, module and the circuit that presently disclosed embodiment describes
Reason device, digital signal processor (DSP), specific integrated circuit (ASIC), field programmable gate array (FPGA) other are compiled
Journey logical device, discrete door or transistor logic, discrete hardware component or its be designed to carry out function described herein
Any combination is realized or is executed.General processor can be microprocessor, but in alternative, which, which can be, appoints
What conventional processor, controller, microcontroller or state machine.Processor is also implemented as calculating the combination of equipment, example
As DSP and the combination of microprocessor, multi-microprocessor, the one or more microprocessors to cooperate with DSP core or it is any its
His such configuration.
The step of method or algorithm for describing in conjunction with embodiment disclosed herein, can be embodied directly in hardware, in by processor
It is embodied in the software module of execution or in combination of the two.Software module can reside in RAM memory, flash memory, ROM and deposit
Reservoir, eprom memory, eeprom memory, register, hard disk, removable disk, CD-ROM or known in the art appoint
In the storage medium of what other forms.Exemplary storage medium is coupled to processor so that the processor can be from/to the storage
Medium reads and writees information.In alternative, storage medium can be integrated into processor.Pocessor and storage media can
It resides in ASIC.ASIC can reside in user terminal.In alternative, pocessor and storage media can be used as discrete sets
Part is resident in the user terminal.
In one or more exemplary embodiments, described function can be in hardware, software, firmware, or any combination thereof
Middle realization.If being embodied as computer program product in software, each function can be used as one or more item instructions or generation
Code may be stored on the computer-readable medium or be transmitted by it.Computer-readable medium includes computer storage medium and communication
Both media comprising any medium for facilitating computer program to shift from one place to another.Storage medium can be can quilt
Any usable medium of computer access.It is non-limiting as example, such computer-readable medium may include RAM, ROM,
EEPROM, CD-ROM or other optical disc storages, disk storage or other magnetic storage apparatus can be used to carrying or store instruction
Or data structure form desirable program code and any other medium that can be accessed by a computer.Any connection is also by by rights
Referred to as computer-readable medium.For example, if software is using coaxial cable, fiber optic cables, twisted pair, digital subscriber line
(DSL) or the wireless technology of such as infrared, radio and microwave etc is passed from web site, server or other remote sources
It send, then the coaxial cable, fiber optic cables, twisted pair, DSL or such as infrared, radio and microwave etc is wireless
Technology is just included among the definition of medium.Disk (disk) and dish (disc) as used herein include compression dish
(CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and blu-ray disc, which disk (disk) are often reproduced in a manner of magnetic
Data, and dish (disc) with laser reproduce data optically.Combinations of the above should also be included in computer-readable medium
In the range of.
Offer is to make any person skilled in the art all and can make or use this public affairs to the previous description of the disclosure
It opens.The various modifications of the disclosure all will be apparent for a person skilled in the art, and as defined herein general
Suitable principle can be applied to other variants without departing from the spirit or scope of the disclosure.The disclosure is not intended to be limited as a result,
Due to example described herein and design, but should be awarded and principle disclosed herein and novel features phase one
The widest scope of cause.
Claims (12)
1. a kind of document automatic creation method based on natural language processing characterized by comprising
Step 1: to be classified automatically to the original document of input, the original document based on different classifications carries out alignment processing, point
Intermediate data and structural data are not obtained;
Step 2: word segmentation processing, Entity recognition, Relation extraction, event extraction and construction of knowledge base being carried out to intermediate data, extracted
Data out are stored in database as structural data;
Step 3: document template is selected according to the Doctype of output, carries out document assembling in conjunction with the structural data got,
Export final destination document.
2. the document automatic creation method according to claim 1 based on natural language processing, which is characterized in that step 1
Further comprise:
Determine data acquisition demand;
According to the original document of input, the file type of each original document is obtained, and then various variety classes can be distinguished
Original document;
Photo-document is judged whether it is, if not photo-document then first carries out pictured processing to original document carries out subsequent step again
Suddenly, subsequent step is then directly carried out if photo-document;
Document classification is carried out based on image procossing;
According to document classification judge document whether be fixed format document, be then based on engineering if it is the document of fixed format
It practises and structural data is obtained to fixed-format document progress information extraction, then carry out subsequent step if not the document of fixed format
Suddenly;
Judge whether document supports text directly to extract, is obtained from original document if supporting text directly to extract therein
Content of text is simultaneously stored as intermediate data, and subsequent step is carried out if not supporting text directly to extract;
Document is identified, by the text conversion in image at text formatting;
Content reparation is carried out to the text identified based on natural language processing, the data after reparation are stored as intermediate data.
3. the document automatic creation method according to claim 1 based on natural language processing, which is characterized in that step 2
Further comprise:
Word segmentation processing is carried out to intermediate data;
Data after word segmentation processing carry out Entity recognition processing;
Relation extraction is carried out to the data after Entity recognition, obtains in text existing grammer between entity or connection semantically
System;
Event extraction is carried out to the data after Relation extraction, required interesting event is extracted from the text containing event information
Information will be presented in the form of structuring with the event of natural language expressing;
Knowledge mapping checking treatment is carried out to the data after event extraction, according to the entity got from multiple documents, is closed
It is knowledge mapping relevant with the information architecture of event, the mutual confirmation and the automatic discovery of anomalous event for information;
Data after knowledge mapping checking treatment form structural data.
4. the document automatic creation method according to claim 3 based on natural language processing, which is characterized in that in relationship
Reference resolution processing is also carried out before extracting processing, to improve the accuracy that follow-up extracts result.
5. the document automatic creation method according to claim 1 based on natural language processing, which is characterized in that step 3
Further comprise:
Based on structural data, different Task Tree coordinates measurements is selected to report according to the destination document type of required output;
Processing stage based on current document carries out corresponding processing: automatic according to template if document is in intermediate treatment stage
The rough draft document for generating professional domain, automatically generates the formal of professional domain according to template if document is in the final output stage
Document.
6. a kind of document automatic creation system based on natural language processing, which is characterized in that system includes:
Original document processing module classifies automatically to the original document of input, and the original document based on different classifications carries out
Alignment processing respectively obtains intermediate data and structural data;
Intermediate data processing module carries out word segmentation processing, Entity recognition, Relation extraction, event extraction and knowledge to intermediate data
Library building, the data extracted are stored in database as structural data;And
Destination document automatically-generating module selects document template according to the Doctype of output, in conjunction with the structuring number got
According to document assembling is carried out, final destination document is exported.
7. the document automatic creation system according to claim 6 based on natural language processing, which is characterized in that original text
Shelves processing module further comprises:
Demand formulates unit, determines data acquisition demand;
Doctype analytical unit obtains the file type of each original document according to the original document of input, and then can be with area
Separate various different types of original documents;
The pictured processing unit of document, first judges whether it is photo-document, if not photo-document then first carries out original document
It is pictured to handle the processing for carrying out subsequent cell again, the processing of subsequent cell is then directly carried out if photo-document;
Document classification unit carries out document classification based on image procossing;
Fixed-format document information extraction unit, first according to document classification judge document whether be fixed format document, if
Be fixed format document be then based on machine learning to fixed-format document carry out information extraction obtain structural data, if not
It is that the document of fixed format then carries out the processing of subsequent cell;
The direct extraction unit of content of text, first judges whether document supports text directly to extract, if text is supported directly to extract
Content of text therein is then obtained from original document and is stored as intermediate data, is carried out if not supporting text directly to extract
The processing of subsequent cell;
Text identification unit, identifies document, by the text conversion in image at text formatting;
Content repairs unit, carries out content reparation to the text identified based on natural language processing, the data storage after reparation
For intermediate data.
8. the document automatic creation system according to claim 6 based on natural language processing, which is characterized in that mediant
Further comprise according to processing module:
Word segmentation processing unit carries out word segmentation processing to intermediate data;
Entity recognition unit, the data after word segmentation processing carry out Entity recognition processing;
Relation extraction unit carries out Relation extraction to the data after Entity recognition, existing grammer between entity in acquisition text
Or connection semantically;
Event extraction unit carries out event extraction to the data after Relation extraction, extracts from the text containing event information
Required interesting event information will be presented in the form of structuring with the event of natural language expressing;
Knowledge mapping verification unit, to after event extraction data carry out knowledge mapping checking treatment, according to from multiple documents
Entity, relationship through getting and the relevant knowledge mapping of the information architecture of event, mutual confirmation and abnormal thing for information
The automatic discovery of part;
Structural data storage unit, the data after knowledge mapping checking treatment are stored into structural data.
9. the document automatic creation system according to claim 8 based on natural language processing, which is characterized in that mediant
According to processing module further include:
Reference resolution unit also carries out reference resolution processing before Relation extraction processing, extracts result to improve follow-up
Accuracy.
10. the document automatic creation system according to claim 6 based on natural language processing, which is characterized in that target
Document automatically-generating module further comprises:
Template selection unit is based on structural data, different Task Tree roads is selected according to the destination document type of required output
Diameter generates report, including the different template of selection;
Destination document generation unit, the processing stage based on current document carry out corresponding processing: if document is in intermediate treatment
Stage then automatically generates the rough draft document of professional domain according to template, automatic according to template if document is in the final output stage
Generate the official documentation of professional domain.
11. a kind of document automatic creation system based on natural language processing characterized by comprising
Processor;And
Memory, the memory be configured as the executable instruction of storage series of computation machine and with the series of computation
The executable associated computer-accessible data of instruction of machine,
Wherein, when the instruction that the series of computation machine can be performed is executed by the processor, so that the processor carries out
Method as described in any one of claims 1 to 5.
12. a kind of non-transitorycomputer readable storage medium, which is characterized in that the non-transitory computer-readable storage medium
The executable instruction of series of computation machine is stored in matter, when a series of executable instructions are executed by a computing apparatus,
So that computing device carries out the method as described in any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810928628.6A CN109190098A (en) | 2018-08-15 | 2018-08-15 | A kind of document automatic creation method and system based on natural language processing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810928628.6A CN109190098A (en) | 2018-08-15 | 2018-08-15 | A kind of document automatic creation method and system based on natural language processing |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109190098A true CN109190098A (en) | 2019-01-11 |
Family
ID=64935869
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810928628.6A Pending CN109190098A (en) | 2018-08-15 | 2018-08-15 | A kind of document automatic creation method and system based on natural language processing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109190098A (en) |
Cited By (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110096257A (en) * | 2019-04-10 | 2019-08-06 | 沈阳哲航信息科技有限公司 | A kind of design configuration automation evaluation system and method based on intelligent recognition |
CN110110332A (en) * | 2019-05-06 | 2019-08-09 | 中国联合网络通信集团有限公司 | Text snippet generation method and equipment |
CN110265024A (en) * | 2019-05-20 | 2019-09-20 | 平安普惠企业管理有限公司 | Requirement documents generation method and relevant device |
CN110377751A (en) * | 2019-06-17 | 2019-10-25 | 深圳壹账通智能科技有限公司 | Courseware intelligent generation method, device, computer equipment and storage medium |
CN110532370A (en) * | 2019-06-11 | 2019-12-03 | 福建奇点时空数字科技有限公司 | A kind of expert data entity attribute abstracting method based on attribute labeling |
CN110795923A (en) * | 2019-11-01 | 2020-02-14 | 达而观信息科技(上海)有限公司 | Automatic generation system and generation method of technical document based on natural language processing |
CN110866382A (en) * | 2019-10-14 | 2020-03-06 | 深圳价值在线信息科技股份有限公司 | Document generation method, device, terminal equipment and medium |
CN110955801A (en) * | 2019-12-06 | 2020-04-03 | 中国建设银行股份有限公司 | Knowledge graph analysis method and system for cognos report indexes |
CN111144116A (en) * | 2019-12-25 | 2020-05-12 | 国网江苏省电力有限公司电力科学研究院 | Document knowledge structuralization extraction method and device |
CN111339311A (en) * | 2019-12-30 | 2020-06-26 | 智慧神州(北京)科技有限公司 | Method, device and processor for extracting structured events based on generative network |
CN111680490A (en) * | 2020-06-10 | 2020-09-18 | 东南大学 | Cross-modal document processing method and device and electronic equipment |
CN111897781A (en) * | 2020-08-03 | 2020-11-06 | 厦门渊亭信息科技有限公司 | Method and system for extracting knowledge graph data |
CN112001703A (en) * | 2020-08-26 | 2020-11-27 | 中国银行股份有限公司 | Front-end data self-service processing method and system for bank transaction |
CN112115694A (en) * | 2020-08-21 | 2020-12-22 | 江苏徐工工程机械研究院有限公司 | Simulation report generation method and device based on multi-element data structure |
CN112541337A (en) * | 2020-12-16 | 2021-03-23 | 格美安(北京)信息技术有限公司 | Document template automatic generation method and system based on recurrent neural network language model |
CN112668323A (en) * | 2019-10-14 | 2021-04-16 | 北京慧点科技有限公司 | Text element extraction method based on natural language processing and text examination system thereof |
CN112800765A (en) * | 2021-01-22 | 2021-05-14 | 南京亚派软件技术有限公司 | Automatic work order generation method |
CN112800719A (en) * | 2020-12-28 | 2021-05-14 | 北京思题科技有限公司 | Electronic document structuring method |
CN113221516A (en) * | 2020-09-14 | 2021-08-06 | 苏州七星天专利运营管理有限责任公司 | Method and system for assisting in editing document |
CN113553812A (en) * | 2021-06-22 | 2021-10-26 | 北京来也网络科技有限公司 | News processing method and device combining RPA and AI |
CN113723918A (en) * | 2021-08-25 | 2021-11-30 | 北京来也网络科技有限公司 | Information input method and device combining RPA and AI |
CN113779948A (en) * | 2021-09-10 | 2021-12-10 | 成都材智科技有限公司 | Nuclear power structural material data file automatic extraction system and method |
CN113779215A (en) * | 2021-08-25 | 2021-12-10 | 海南硬壳科技有限公司 | Data processing platform |
CN113792155A (en) * | 2021-08-30 | 2021-12-14 | 北京百度网讯科技有限公司 | Text verification method and device based on knowledge graph, electronic equipment and medium |
CN114186072A (en) * | 2021-12-13 | 2022-03-15 | 长安大学 | Method, system and storage medium for extracting traffic accident report and reasoning scene type |
CN116501875A (en) * | 2023-04-28 | 2023-07-28 | 中电科大数据研究院有限公司 | Document processing method and system based on natural language and knowledge graph |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103034940A (en) * | 2012-12-07 | 2013-04-10 | 深圳市智维通达科技有限公司 | Method and system for automatic analysis report generation |
CN106649223A (en) * | 2016-12-23 | 2017-05-10 | 北京文因互联科技有限公司 | Financial report automatic generation method based on natural language processing |
-
2018
- 2018-08-15 CN CN201810928628.6A patent/CN109190098A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103034940A (en) * | 2012-12-07 | 2013-04-10 | 深圳市智维通达科技有限公司 | Method and system for automatic analysis report generation |
CN106649223A (en) * | 2016-12-23 | 2017-05-10 | 北京文因互联科技有限公司 | Financial report automatic generation method based on natural language processing |
Cited By (37)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110096257A (en) * | 2019-04-10 | 2019-08-06 | 沈阳哲航信息科技有限公司 | A kind of design configuration automation evaluation system and method based on intelligent recognition |
CN110110332A (en) * | 2019-05-06 | 2019-08-09 | 中国联合网络通信集团有限公司 | Text snippet generation method and equipment |
CN110265024A (en) * | 2019-05-20 | 2019-09-20 | 平安普惠企业管理有限公司 | Requirement documents generation method and relevant device |
CN110532370A (en) * | 2019-06-11 | 2019-12-03 | 福建奇点时空数字科技有限公司 | A kind of expert data entity attribute abstracting method based on attribute labeling |
CN110377751A (en) * | 2019-06-17 | 2019-10-25 | 深圳壹账通智能科技有限公司 | Courseware intelligent generation method, device, computer equipment and storage medium |
CN110866382A (en) * | 2019-10-14 | 2020-03-06 | 深圳价值在线信息科技股份有限公司 | Document generation method, device, terminal equipment and medium |
CN112668323A (en) * | 2019-10-14 | 2021-04-16 | 北京慧点科技有限公司 | Text element extraction method based on natural language processing and text examination system thereof |
CN112668323B (en) * | 2019-10-14 | 2024-02-02 | 北京慧点科技有限公司 | Text element extraction method based on natural language processing and text examination system thereof |
CN110795923A (en) * | 2019-11-01 | 2020-02-14 | 达而观信息科技(上海)有限公司 | Automatic generation system and generation method of technical document based on natural language processing |
CN110795923B (en) * | 2019-11-01 | 2024-03-22 | 达观数据有限公司 | Automatic generation system and generation method for technical document based on natural language processing |
CN110955801A (en) * | 2019-12-06 | 2020-04-03 | 中国建设银行股份有限公司 | Knowledge graph analysis method and system for cognos report indexes |
CN110955801B (en) * | 2019-12-06 | 2022-10-21 | 中国建设银行股份有限公司 | Knowledge graph analysis method and system for cognos report indexes |
CN111144116A (en) * | 2019-12-25 | 2020-05-12 | 国网江苏省电力有限公司电力科学研究院 | Document knowledge structuralization extraction method and device |
CN111144116B (en) * | 2019-12-25 | 2024-02-02 | 国网江苏省电力有限公司电力科学研究院 | Document knowledge structured extraction method and device |
CN111339311A (en) * | 2019-12-30 | 2020-06-26 | 智慧神州(北京)科技有限公司 | Method, device and processor for extracting structured events based on generative network |
CN111680490A (en) * | 2020-06-10 | 2020-09-18 | 东南大学 | Cross-modal document processing method and device and electronic equipment |
CN111897781A (en) * | 2020-08-03 | 2020-11-06 | 厦门渊亭信息科技有限公司 | Method and system for extracting knowledge graph data |
CN111897781B (en) * | 2020-08-03 | 2023-12-26 | 厦门渊亭信息科技有限公司 | Knowledge graph data extraction method and system |
CN112115694A (en) * | 2020-08-21 | 2020-12-22 | 江苏徐工工程机械研究院有限公司 | Simulation report generation method and device based on multi-element data structure |
CN112115694B (en) * | 2020-08-21 | 2023-07-04 | 江苏徐工工程机械研究院有限公司 | Simulation report generation method and device based on multi-element data structure |
CN112001703B (en) * | 2020-08-26 | 2024-03-29 | 中国银行股份有限公司 | Front-end data self-service processing method and system for banking transaction |
CN112001703A (en) * | 2020-08-26 | 2020-11-27 | 中国银行股份有限公司 | Front-end data self-service processing method and system for bank transaction |
CN113221516B (en) * | 2020-09-14 | 2021-11-30 | 苏州七星天专利运营管理有限责任公司 | Method and system for assisting in editing document |
CN113221516A (en) * | 2020-09-14 | 2021-08-06 | 苏州七星天专利运营管理有限责任公司 | Method and system for assisting in editing document |
CN112541337B (en) * | 2020-12-16 | 2022-05-24 | 格美安(北京)信息技术有限公司 | Document template automatic generation method and system based on recurrent neural network language model |
CN112541337A (en) * | 2020-12-16 | 2021-03-23 | 格美安(北京)信息技术有限公司 | Document template automatic generation method and system based on recurrent neural network language model |
CN112800719A (en) * | 2020-12-28 | 2021-05-14 | 北京思题科技有限公司 | Electronic document structuring method |
CN112800765A (en) * | 2021-01-22 | 2021-05-14 | 南京亚派软件技术有限公司 | Automatic work order generation method |
CN113553812A (en) * | 2021-06-22 | 2021-10-26 | 北京来也网络科技有限公司 | News processing method and device combining RPA and AI |
CN113779215A (en) * | 2021-08-25 | 2021-12-10 | 海南硬壳科技有限公司 | Data processing platform |
CN113723918A (en) * | 2021-08-25 | 2021-11-30 | 北京来也网络科技有限公司 | Information input method and device combining RPA and AI |
CN113792155A (en) * | 2021-08-30 | 2021-12-14 | 北京百度网讯科技有限公司 | Text verification method and device based on knowledge graph, electronic equipment and medium |
CN113779948A (en) * | 2021-09-10 | 2021-12-10 | 成都材智科技有限公司 | Nuclear power structural material data file automatic extraction system and method |
CN114186072A (en) * | 2021-12-13 | 2022-03-15 | 长安大学 | Method, system and storage medium for extracting traffic accident report and reasoning scene type |
CN114186072B (en) * | 2021-12-13 | 2024-08-02 | 长安大学 | Traffic accident report extraction and scene type reasoning method, system and storage medium |
CN116501875A (en) * | 2023-04-28 | 2023-07-28 | 中电科大数据研究院有限公司 | Document processing method and system based on natural language and knowledge graph |
CN116501875B (en) * | 2023-04-28 | 2024-04-26 | 中电科大数据研究院有限公司 | Document processing method and system based on natural language and knowledge graph |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109190098A (en) | A kind of document automatic creation method and system based on natural language processing | |
CN111488465A (en) | Knowledge graph construction method and related device | |
Teixeira de Melo et al. | Thinking (in) complexity:(In) definitions and (mis) conceptions | |
EP2191421A1 (en) | System for assisting in drafting applications | |
Zehra et al. | Financial knowledge graph based financial report query system | |
Koulu | Law, technology and dispute resolution: The privatisation of coercion | |
RU2640718C1 (en) | Verification of information object attributes | |
CN110880142A (en) | Risk entity acquisition method and device | |
CN110489565A (en) | Based on the object root type design method and system in domain knowledge map ontology | |
CN112396437A (en) | Trade contract verification method and device based on knowledge graph | |
Jeon et al. | Extraction of construction quality requirements from textual specifications via natural language processing | |
CN114254617A (en) | Method, device, computing equipment and storage medium for revising clauses | |
Newberry et al. | Constructing causal loop diagrams from large interview data sets | |
CN117952104A (en) | Small sample triplet extraction method based on fusion of large model and knowledge graph | |
Cetera et al. | Potential for the use of large unstructured data resources by public innovation support institutions | |
Hermansson et al. | Tracking amendments to legislation and other political texts with a novel minimum-edit-distance algorithm: DocuToads | |
Macanovic et al. | A systematic evaluation of text mining methods for short texts: Mapping individuals’ internal states from online posts | |
Kiyavitskaya et al. | Requirements model generation to support requirements elicitation: the Secure Tropos experience | |
Köse | Crypto asset taxonomy classification and crypto news sentiment analysis | |
Newman et al. | A controllable QA-based framework for decontextualization | |
CN113010647A (en) | Corpus processing model training method and device, storage medium and electronic equipment | |
Le Billon et al. | A theory of change for the extractive industries transparency initiative: designing resource governance pathways to improve developmental outcomes | |
Chandrasekaran et al. | Automating Transfer Credit Assessment in Student Mobility--A Natural Language Processing-based Approach | |
Pont et al. | Legal Summarisation through LLMs: The PRODIGIT Project | |
Xu et al. | Research on intelligent campus and visual teaching system based on Internet of things |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20210707 Address after: 100098 fb106-30, ground floor, building 1, yard 13, Dazhongsi, Haidian District, Beijing Applicant after: Beijing youfatian Technology Co.,Ltd. Address before: 200120 5th floor, building 28, 498 GuoShouJing Road, Pudong New Area, Shanghai Applicant before: SHANGHAI WIZLAWGIC INFORMATION TECHNOLOGY Co.,Ltd. |
|
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190111 |