CN110059176A - A kind of rule-based generic text information extracts and information generating method - Google Patents
A kind of rule-based generic text information extracts and information generating method Download PDFInfo
- Publication number
- CN110059176A CN110059176A CN201910153119.5A CN201910153119A CN110059176A CN 110059176 A CN110059176 A CN 110059176A CN 201910153119 A CN201910153119 A CN 201910153119A CN 110059176 A CN110059176 A CN 110059176A
- Authority
- CN
- China
- Prior art keywords
- information
- rule
- text
- user
- extraction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 40
- 239000000284 extract Substances 0.000 title claims abstract description 36
- 238000000605 extraction Methods 0.000 claims abstract description 114
- 230000033228 biological regulation Effects 0.000 claims abstract description 33
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 26
- 238000002372 labelling Methods 0.000 claims abstract description 19
- 230000014509 gene expression Effects 0.000 claims description 26
- 230000006870 function Effects 0.000 claims description 19
- 230000001419 dependent effect Effects 0.000 claims description 12
- 230000008676 import Effects 0.000 claims description 7
- 230000000877 morphologic effect Effects 0.000 claims description 6
- 238000003860 storage Methods 0.000 claims description 4
- 238000004321 preservation Methods 0.000 claims description 3
- 230000002950 deficient Effects 0.000 claims description 2
- 238000005086 pumping Methods 0.000 claims description 2
- 238000005457 optimization Methods 0.000 abstract description 4
- 230000015572 biosynthetic process Effects 0.000 abstract description 2
- 238000005065 mining Methods 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 11
- 230000008569 process Effects 0.000 description 4
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 238000012423 maintenance Methods 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
- G06F40/186—Templates
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention provides a kind of rule-based generic text information extraction and information generating methods, comprising: initialization information dictionary context, regular word packet, regulation engine and template engine;Information labeling is carried out to text;Define information extraction algorithm and redaction rule scripted code;Create-rule relies on digraph;It executes text decimation rule and is finely adjusted according to accuracy is extracted;It defines information and generates meta template;Self-defined template rule chooses and text generation.The present invention realizes decimation rule modularization, improve the shared possibility of decimation rule, good analysis mining can be carried out to the structure of complex text information, it greatly improves Extracting Information and external information generates the efficiency of text, the field of bulk information text progress information extraction and generation is needed especially suitable for legal documents etc..The method of the present invention can significantly improve text extraction efficiency and accuracy, optimization text extract complexity and improve information text formation efficiency.
Description
Technical field
The present invention relates to soft projects and regulation engine field, are related to rule-based generic text information extraction and text
Generation method is more specifically to be related to a kind of rule-based generic text information to extract and information generating method.
Background technique
With the raising of each information level of the enterprise, traditional data input has had preferable solution.And more
More enterprise's problems faceds are: many information all derive from semi-structured text, this part text also lacks to be believed well
Extracting tool is ceased, the typing mode of many information is all by being manually entered or complicated extraction logic is realized, manually at present
Typing expends a large amount of manpower and material resources, ineffective, although and complicated extraction logic to extract accuracy rate higher, bring great number
Maintenance cost, and the more difficult multiplexing of decimation rule, while whole process needs the longer period, is unfavorable for the quick friendship of software
It pays.
Summary of the invention
It is extracted and text generation side to solve the above problems, the present invention provides a kind of rule-based generic text information
Method realizes and carries out information labeling to text, while staff can be allowed efficiently to carry out writing for decimation rule, maximizes
Ground is reached the multiplexing of decimation rule and is shared, and at the same time, can also integrate with third party's data source, it is good to generate format
Text form good forward circulation convenient for the progress of extraction process.
In order to achieve the above object, the invention provides the following technical scheme:
A kind of rule-based generic text information extracts and document creation method, includes the following steps:
Step 1: initialization information dictionary context, regular word packet, regulation engine and template engine
Context of the initialization information dictionary as information extraction, for carrying out the letter of dynamic, expansion type to information text
Breath extracts;Engine classification defined in configuration information is loaded, rule syntax resolver, grammer dependency parser and rule are carried out
The load work of actuator;Initialization depends on regulation engine and supports the data access engine of third party's data source;Pass through
Template engine configuration information is loaded, are generated by template and is added for defined precompile template instruction and the information write
It carries, to complete the load work of entire template engine;
Step 2: information labeling is carried out to text information
Modeling analysis is carried out to Text Information Extraction, Text Information Extraction model is divided into monodrome information extraction and multilevel information
It extracts;Monodrome information extraction indicates the text that single region content is extracted from one section of text;And multilevel information extracts expression
The information of specified multiple regions is extracted from one section of text;Text information marking model includes: the range of text information label, mark
Note information characteristics and information labeling identifier can find desired each information labeling from a segment information text
Extract text;
Step 3: information extraction algorithm and redaction rule scripted code are defined
Analysis modeling is carried out to decimation rule, decimation rule model includes: scalar rule, shares rule, calculates without dependence
Rule relies on computation rule and variable context rule;User carry out information extraction when, if current extraction item of information without according to
Rely other rules also to rely on without apparent text context, is able to use scalar rule and carries out information extraction;If current extraction is believed
The extraction mode for ceasing item is similar to other similar structure text, and extraction rule can be carried out by way of directly quoting or copying
Then shared;If current extraction item of information, can be by calculating rule without dependence without other rules for relying on current rules context
Then information is extracted;If current extraction item of information to current rules context have other rule dependence, can by according to
Bad computation rule is calculated;If current extraction item of information has very deep structure to rely on, and the information of state does not need among it
The extraction of display, then can carry out information extraction by variable context rule does not influence current rules context simultaneously;
Step 4: create-rule relies on digraph
Syntax parsing is carried out by the decimation rule write to user, exports the dependence item and its export item of the rule, it is raw
Digraph is relied at rule;
Step 5: executing text decimation rule and is finely adjusted according to accuracy is extracted
This paper decimation rule is put into regulation engine and is executed, the extraction text of well-formed can be generated, by this
Extracting Information and incipient text marking information carry out content comparison, and generate Extracting Information accuracy.
Step 6: it defines information and generates meta template
User can be directed to displaying demand, define information and generate meta template;It includes basic letter that information, which generates meta template,
Informative text format and several regular filling regions;In order to provide general information generating mode, by providing self-defined information
The mode that rule is expanded, user can import the information of third party's data source in a manner of meeting rule schemata;
Step 7: self-defined template rule chooses and text generation
Meta template is generated for same information, user can be by carrying out different information rule to several regular filling regions
Selection then generates the text for adapting to different sub-scenes;User can select format to carry out information text generation.
Further, the step 1 includes the following steps:
Step 1-1: initial state;
Step 1-2: the data structure table of definition storage dictionary of information, the data structure of the dictionary of information are the Kazakhstan of hierarchical
Uncommon table structure, can support the message structure of multi-layer;
Step 1-3: information load is carried out to dictionary of information according to hierarchical structure, root node is first loaded, then edge
Hierarchical sequence until leaf node loaded;
Step 1-4: for each information subitem of dictionary of information, its corresponding information result is obtained, is needed first
Leaf node is accessed, checks that it whether there is, if current leaf node exists, direct return information item result;Otherwise edge
Hierarchical structure search upwards, until including the information subitem, then return information item result in some level of information;
Step 1-5: being loaded onto system for word packet from database, and a word packet includes single or multiple phrases and several
Optional condition case statement, certain information extraction rules can include one or more word packet, pass through redaction rule script
Word packet can be obtained;
Step 1-6: the associated condition discriminant function of word packet is loaded, for using when operation;For word packet, in advance
Some condition discriminant functions are defined, can judge that some or certain words whether there is in word packet and whether is certain section of sentence
Comprising the vocabulary in word packet, while user can be carrying out the expansion of conditional function to certain word packets from by way of expanding;
Step 1-7: by loading rule engine configuration information, regulation engine is initialized: selection regulation engine language
Method collection carries out the load of grammar parser, then loads nonessential grammar contexts dependency parser for grammar parser,
Finally regular actuator is loaded, completes the loading procedure of entire regulation engine;
Step 1-8: by loading template engine configuration information, template engine is initialized: selection template engine class
Type is loaded template engine instruction set, and the information that system has defined is generated template and is loaded, entire mould is completed
The loading procedure of plate engine;
Step 1-9: dictionary of information context, regular word packet and regulation engine are imported into algorithm, finally draw template
It holds up and is integrated with algorithm, entire initial work finishes.
Further, the step 2 includes the following steps:
Step 2-1: initial state;
Step 2-2: user's selection needs to carry out the text of information extraction or by text import system to be extracted;
Step 2-3: user carries out the determination of extracting region by carrying out customized division to text;
Step 2-4: user adds the type of Extracting Information, monodrome or multivalue;
Step 2-5: if user selects monodrome type, text marking is carried out in specified text;
Step 2-6: if user select multivalue type, user it needs to be determined that text extracting region quantity, it is then right
Specified text carries out selection mark;
Step 2-7: the secondary information labeling is named by user, and then system gives unique information flag identifier;
Step 2-8: text information mark is finished.
Further, the step 3 includes the following steps:
Step 3-1: initial state;
Step 3-2: user selects the information labeling carried out in step 2, then carries out specific information extraction rule
Then write;
Step 3-3: user is extracted using several classes predetermined in regulation engine and is calculated in carrying out actual extraction process
Method does not need specific rule and writes if algorithm extracts result satisfaction;
Step 3-4: otherwise, user needs to carry out writing for custom rule: user needs to conclude from text to be extracted
Feature;
Step 3-5: morphological analysis is carried out to the decimation rule that user writes by rule syntax resolver, identification user exists
The rule whether variable defined when redaction rule meets in specification and dependent Rule context whether there is;
Step 3-6: the morphological analysis sequence generated according to step 3-5 further carries out language by rule syntax analyzer
Method is analyzed, and function defined in the rule write to user, program structure is analyzed, to the function wherein re-defined, no
Correct program structure carries out error prompting;
Step 3-7: the rule script write to user is led by text information of the predefined function to extraction
Out, export item is used to carry out to other rules of current rules context using in order to the extraction of structured text information;
Step 3-8: user can perform the following operation the rule write in rules context list: check in extraction
Hold, analyzed using decimation rule and to decimation rule dependence;
Step 3-9: defining information extraction algorithm and redaction rule scripted code finishes.
Further, in the step 3-4, user's needs conclusion feature from text to be extracted specifically includes as follows
Step: user can be from the keyword of context-free, and the specified modes such as phrase or regular expression carry out feature conclusion,
Feature conclusion can be carried out from the context-sensitive mode containing specific semanteme.
Further, the step 4 includes the following steps:
Step 4-1: initial state;
Step 4-2: user can selectively carry out the analysis of regular dependence, if user selects to carry out dependency analysis,
7 are then entered step, otherwise enters step 3;
Step 4-3: after user selects regular dependency analysis, system carries out abstract syntax tree to rule by regulation engine
Building;
Step 4-4: dependency parser analyzes rely on variable, local variable and rule in abstract syntax tree and exports item
Content completes the analysis that rule relies on by carrying out deep search to abstract syntax tree;
Step 4-5: the dependency analysis digraph of generation is shown to user by dependency parser, and user passes through to interested
Rule relies on item or rule export item is selected, and to check the in-degree relationship out of currentitem, understands the dependence of current rule
Context;
Step 4-6: user can be directly entered the rule adjustment stage by the regularization term in selection digraph;
Step 4-7: create-rule relies on digraph and finishes.
Further, in the step 5, if text extracts accuracy and do not reach target, continue adjustment and owe accurate
The decimation rule of degree reaches specified threshold value until extracting accuracy.
Further, the step 5 includes the following steps:
Step 5-1: initial state;
Step 5-2: user carries out regular execution to single rule, obtains after whole decimation rules is write, carries out
The whole of rule execute;
Step 5-3: when user, which carries out rule, to be executed, according to defining information extraction algorithm and redaction rule foot in step 3
The Rule content of no grammer content mistake is put into regular actuator and is executed by the content in this code, system, according to matching
The enforcement engine difference set uses corresponding enforcement engine mode;
Step 5-4: firstly, dictionary of information and word packet that rule relies on are put into rule by regular actuator executes context
In, by the regular carry out sequence execution that will be needed to be implemented, for the rule that Mr. Yu's item executes, if currently performed rule
There is the rule being also not carried out in the rule set relied on, then the rule being not carried out can be executed, be held until currently first
The dependent Rule of capable rule, which is all performed, to be finished, the rule that then backtracking is not performed before having executed;
Step 5-5: if after rule has been finished, system can be by the corresponding text information mark of the rule and rule
Export item compares, while calculating the accuracy of extraction, and the document markup information not hit is prompted;
Step 5-6: if the accuracy extracted has reached requirement, until step 5-7 is continued to execute, otherwise adjustment is regular
Content continues to execute step 5-2;
Step 5-7: executing text decimation rule and is finely adjusted according to accuracy is extracted.
Further, the step 6 includes the following steps:
Step 6-1: initial state;
Step 6-2: user creates one and generates meta template with denominative information;
Step 6-3: user adds text essential information block, fixed dependent Rule item and placeholder in meta template;Text
Essential information block is any text information;Fixed dependent Rule is the regularization term that certain seed type text extracts;Placeholder is directed to,
When later period template generation, it is able to use text information and writes the regularization term in context with rule and be replaced;
Step 6-4: information is generated meta template and carries out database preservation by user;
Step 6-5: it defines information generation meta template and finishes.
Further, the step 7 includes the following steps:
Step 7-1: initial state;
Step 7-2: user selects already existing information to generate meta template and carries out text generation;
Step 7-3: user selects to carry out provisional version generation, or generates new self-defined template;
Step 7-4: user generates the placeholder in meta template to information and is replaced, and placeholder is common text envelope
Breath, rule write the item of information in context or the regularization term in rules context;
Step 7-5: user selects the format of text generation, then downloads after filling the placeholder in template
Generated text;
Step 7-6: the selection of self-defined template rule is finished with text generation.
Compared with prior art, the invention has the advantages that and the utility model has the advantages that
(1) traditional Text Information Extraction method has been carried out effective expansion by the present invention, so that entire extraction process is more
Added with effect, more information contents are excavated from text information convenient for user;Rule-based carry out information extraction, by that will extract
Content rule, so that rule is easier to be re-used and share, while assisting with dictionary of information and word packet, so that the process extracted
It can dynamically be expanded, without repeatedly carrying out modification in logic;Logic refinement, which will be extracted, becomes rule
Afterwards, by carrying out syntactic analysis to rule, the dependency analysis between rule visual expression has been subjected to, has made user clear
The dependence flow direction of Extracting Information in current text is recognized on ground, efficiently provides the foundation to extract logic optimization and extraction process;
Meta template is generated by introducing information, decimation rule and enterprise third party data source have been subjected to advantageous combination, allows text envelope
Breath, which generates, becomes simple and efficient.
(2) compared with traditionally information extraction mode, the present invention realizes decimation rule modularization, improves decimation rule
Shared possibility good analysis mining can be carried out to the structure of complex text information, together and by regular dependency analysis
When template generated by self-defined information, greatly improve the efficiency that Extracting Information and external information generate text.The present invention
The field of bulk information text progress information extraction and generation is needed especially suitable for legal documents etc..It facts have proved this method energy
It enough significantly improves text extraction efficiency and accuracy, optimization text extracts complexity and improve information text formation efficiency.
Detailed description of the invention
Fig. 1 is the process of a kind of rule-based generic text information extraction and information generating method provided by the invention
Figure.
Fig. 2 is structural schematic diagram of the invention.
Schematic diagram when Fig. 3 is operation of the invention.
Fig. 4 is regular dependency analysis algorithm flow chart of the invention.
Fig. 5 is that document of the invention marks schematic diagram.
Fig. 6 is the schematic diagram that practical decimation rule of the invention is write.
Fig. 7 is dependency analysis result schematic diagram of the invention.
Fig. 8 is meta template editor and text generation schematic diagram of the invention.
Fig. 9 is that information text of the invention generates result figure.
Specific embodiment
Technical solution provided by the invention is described in detail below with reference to specific embodiment, it should be understood that following specific
Embodiment is only illustrative of the invention and is not intended to limit the scope of the invention.
Fig. 1 be flow chart of the present invention, generally comprise initial phase, information labeling stage, write the decimation rule stage with
And edit model and text generation stage,
Wherein, in initial phase, mainly preconfigured dictionary of information, word packet are loaded onto from storages such as databases
In system, while by the rule parsing device of regulation engine, dependency parser and regular actuator connection, regulation engine is completed
Meta template, predefined template instruction and template engine are attached, complete the initial chemical industry of template engine by initial work
Make.
In the information labeling stage, after the completion of initial work, user can choose the text for needing to carry out information extraction, so
Extracting Information mark is carried out to the text afterwards, which can be by text marking at structure corresponding with subsequent redaction rule, side
Continue the accuracy for improving and extracting after an action of the bowels.
The decimation rule stage is being write, when actually carrying out Text Information Extraction, is needing to carry out decimation rule analysis and builds
Mould.The characteristics of user is according to Extracting Information carries out writing for decimation rule script.It the characteristics of for decimation rule, is segmented into
Following several classifications:
A. scalar rule is then scalar also without text context is relied on if current extraction item of information is without other rules are relied on
Rule.
B. share rule, if current extraction item of information is similar to other text structures, can by directly quoting or
The mode of copy carries out the shared of decimation rule, referred to as shared rule.
C. without computation rule is relied on, if other rules of current extraction item of information without the current rules context of dependence,
For without dependence computation rule.
D. rely on computation rule, if current extraction item of information rely on current rules context other rule, for according to
Rely computation rule.
E. variable context rule, if current extraction item of information has very deep structure to rely on, and in-between state letter
Breath does not need explicit extraction, then can carry out information extraction simultaneously by variable context rule does not influence in current rule
Hereafter, referred to as variable context rule.
In edit model and text generation stage, user can be right by the template or selecting extraction rule of adding information
Meta template is regenerated, while third-party data can also be introduced into text generation by third party's data source adapter
In.
More specifically, as shown in Figure 1, rule-based generic text information provided by the invention extracts and information is raw
At method, include the following steps:
Step 1: initialization information dictionary context, regular word packet, regulation engine and template engine
For the information text of different field, the dictionary of information for thering is its field to frequently occur, by first in terms of this step 1
Context of the beginningization dictionary of information as information extraction can carry out the information extraction of dynamic, expansion type to information text;It is another
Aspect, due to information text have in it the characteristics of, for the word of same meaning, different text writers, which can provide, to be contained
Word similar in justice can improve the accuracy of information extraction by initializing regular word packet step by step.In regulation engine side
Face carries out rule syntax resolver, grammer dependency parser and rule by engine classification defined in load configuration information
The load work of actuator, in addition to this, depending on regulation engine, there are also support that the data access engine of third party's data source need to
Carry out initial work.In terms of information generation, by loading template engine configuration information, to defined precompile mould
The information that plate is instructed and write generates template and is loaded, to complete the load work of entire template engine.
This step includes following sub-step:
Step 1-1: initial state;
Step 1-2: the data structure table of definition storage dictionary of information, the data structure of the dictionary of information are the Kazakhstan of hierarchical
Uncommon table structure, can support the message structure of multi-layer;
Step 1-3: information load is carried out to dictionary of information according to hierarchical structure, root node is first loaded, then edge
Hierarchical sequence until leaf node loaded.
Step 1-4: for each information subitem of dictionary of information, its corresponding information result is obtained, is needed first
Leaf node is accessed, checks that it whether there is, if current leaf node exists, direct return information item result;Otherwise edge
Hierarchical structure search upwards, until including the information subitem, then return information item result in some level of information;
Step 1-5: being loaded onto system for word packet from database, and a word packet includes single or multiple phrases and several
Optional condition case statement, certain information extraction rules may include one or more word packet, pass through redaction rule script
Word packet can be obtained;
Step 1-6: the associated condition discriminant function of word packet is loaded, for using when operation.For word packet, in advance
Define some condition discriminant functions, it can be determined that some or certain words whether there is in word packet and whether is certain section of sentence
Comprising the vocabulary etc. in word packet, while user can also be carrying out conditional function to certain word packets and opening up from by way of expanding
Exhibition;
Step 1-7: by loading rule engine configuration information, regulation engine is initialized.Select regulation engine language
Method collection carries out the load of grammar parser, then loads nonessential grammar contexts dependency parser for grammar parser,
Finally regular actuator is loaded, completes the loading procedure of entire regulation engine;
Step 1-8: by loading template engine configuration information, template engine is initialized.Select template engine class
Type is loaded template engine instruction set, and the information that system has defined is generated template and is loaded, entire mould is completed
The loading procedure of plate engine;
Step 1-9: dictionary of information context, regular word packet and regulation engine are imported into algorithm, finally draw template
It holds up and is integrated with algorithm, entire initial work finishes.
Step 2: information labeling is carried out to text information
This step is to carry out modeling analysis to Text Information Extraction, is guiding, the text with the purpose of Text Information Extraction
Information Extraction Model is divided into monodrome information extraction and multilevel information extracts.Monodrome information extraction is to extract list from one section of text
The text of a region content;And multilevel information extracts the information for then indicating that specified multiple regions are extracted from one section of text.Information
Marking model includes the following contents: the range of text information label, markup information feature and information labeling identifier, for
Each information labeling can find desired extraction text from a segment information text.
This step includes following sub-step:
Step 2-1: initial state;
Step 2-2: user's selection needs to carry out the text of information extraction or by text import system to be extracted;
Step 2-3: user carries out the determination of extracting region by carrying out customized division to text;
Step 2-4: user adds the type of Extracting Information, monodrome or multivalue;
Step 2-5:, can be by carrying out text marking in specified text if user selects monodrome type;
Step 2-6: if user select multivalue type, user it needs to be determined that text extracting region quantity, it is then right
Specified text carries out selection mark;
Step 2-7: the secondary information labeling is named by user, and then system gives unique information flag identifier;
Step 2-8: text information mark is finished.
Step 3: information extraction algorithm and redaction rule scripted code are defined
When actually carrying out Text Information Extraction, need to carry out analysis modeling to decimation rule.In order to carry out general text
This information extraction, decimation rule model include: scalar rule, shared rule, without rely on computation rule, rely on computation rule and
Variable context rule.These rules are write the decimation rule stage and are explained in detail above-mentioned.
This step includes following sub-step:
Step 3-1: initial state;
Step 3-2: user selects the information labeling carried out in step 2, then carries out specific information extraction rule
Then write;
Step 3-3: several classes pumpings predetermined in regulation engine can be used in carrying out actual extraction process in user
Algorithm is taken, if algorithm extracts result satisfaction, specific rule can not needed and write;
Step 3-4: otherwise, user needs to carry out writing for custom rule.User needs to conclude from text to be extracted
Feature, can be from the keyword of context-free, and the specified modes such as phrase or regular expression carry out feature conclusion, can also be with
Feature conclusion is carried out from the context-sensitive mode containing specific semanteme;
Step 3-5: for general rule script, the decimation rule that user writes can pass through rule syntax first
Resolver carries out morphological analysis, this stage be mainly used for identifying the variable that is defined in redaction rule of user whether meet specification with
And the rule in dependent Rule context whether there is;
Step 3-6: the morphological analysis sequence generated according to step 3-5 further carries out language by rule syntax analyzer
Method analysis, function defined in the rule that this stage mainly writes user, program structure are analyzed, to wherein re-defining
Function, incorrect program structure carry out error prompting;
Step 3-7: the rule script that user writes can be led by text information of the predefined function to extraction
Out, export item can be carried out to other rules of current rules context using in order to the extraction of structured text information;
Step 3-8: user can perform the following operation the rule write in rules context list: check in extraction
Hold, analyzed using decimation rule and to decimation rule dependence;
Step 3-9: defining information extraction algorithm and redaction rule scripted code finishes.
Step 4: create-rule relies on digraph
Syntax parsing is carried out by the decimation rule write to user, exports the dependence item and its export item of the rule, it can
Digraph is relied on create-rule, user can be helped to optimize current extraction rule, user can also be helped to understand and worked as
Preceding text structure.
This step includes following sub-step:
Step 4-1: initial state;
Step 4-2: user can be selectively performed the analysis of rule dependence, if user selects to carry out dependency analysis,
7 are then entered step, otherwise enters step 3;
Step 4-3: after user selects regular dependency analysis, system can carry out abstract syntax tree to rule by regulation engine
Building;
Step 4-4: dependency parser can analyze dependence variable, local variable and rule export item in abstract syntax tree
Content, by abstract syntax tree carry out deep search, complete rule rely on analysis;
Step 4-5: the dependency analysis digraph of generation can be shown to user by dependency parser, and user can be by sense
The rule of interest relies on item or rule export item is selected, and to check the in-degree relationship out of currentitem, understands current rule
Dependence context;
Step 4-6: user can also be directly entered the rule adjustment stage by the regularization term in selection digraph;
Step 4-7: create-rule relies on digraph and finishes.
Step 5: executing text decimation rule and is finely adjusted according to accuracy is extracted
It is executed by the way that this paper decimation rule to be put into regulation engine, the extraction text of well-formed can be generated,
The Extracting Information and incipient text marking information can be subjected to content comparison, and generate Extracting Information accuracy.If
Text extracts accuracy and does not reach target, can continue the decimation rule for adjusting deficient accuracy, reaches until extracting accuracy
Specified threshold value.
This step includes following sub-step:
Step 5-1: initial state;
Step 5-2: user can carry out regular execution to single rule, and decimation rule that can also be whole, which is write, to be finished
Afterwards, the whole of rule are carried out to execute;
Step 5-3: when user, which carries out rule, to be executed, information extraction algorithm and redaction rule script are defined according to step 3
The Rule content of no grammer content mistake can be put into regular actuator and be executed by the content in code, system, according to matching
The enforcement engine set is different, and specific enforcement engine mode is also different;
Step 5-4: firstly, the dictionary of information and word packet that rule relies on can be put into rule by regular actuator executes context
In, by the regular carry out sequence execution that will be needed to be implemented, for the rule that Mr. Yu's item executes, if currently performed rule
There is the rule being also not carried out in the rule set relied on, then the rule being not carried out can be executed, be held until currently first
The dependent Rule of capable rule, which is all performed, to be finished, the rule that then backtracking is not performed before having executed;
Step 5-5: if after rule has been finished, system can be by the corresponding text information mark of the rule and rule
Export item compares, while calculating the accuracy of extraction, and the document markup information not hit is prompted;
Step 5-6: if the accuracy extracted has reached requirement, until step 6, otherwise adjusts Rule content, continue
Execute step 1;
Step 5-7: executing text decimation rule and is finely adjusted according to accuracy is extracted;
Step 6: it defines information and generates meta template
User can be directed to displaying demand, define information and generate meta template.It mainly includes basic that information, which generates meta template,
Information text format and several regular filling regions.It is customized by providing in order to provide general information generating mode
The mode that rule information is expanded, user can import the information of third party's data source in a manner of meeting rule schemata.
This step includes following sub-step:
Step 6-1: initial state;
Step 6-2: user can create one and generate meta template with denominative information;
Step 6-3: user can add text essential information block, fixed dependent Rule item and placeholder in meta template;
It is directed to text essential information block, text information can be arbitrary;
It is directed to fixed dependent Rule, can be the regularization term of certain seed type text extraction;
It is directed to placeholder, when later period template generation, can be used text information and rule writes rule in context
Item is replaced with placeholder;
Step 6-4: information is generated meta template and carries out database preservation by user;
Step 6-5: it defines information generation meta template and finishes.
Step 7: self-defined template rule chooses and text generation
Meta template is generated for same information, user can be by carrying out different information rule to several regular filling regions
Selection then generates the text for adapting to different sub-scenes.User can choose the format ultimately generated and carry out information text generation.
This step includes following sub-step:
Step 7-1: initial state;
Step 7-2: user can choose already existing information and generate meta template progress text generation;
Step 7-3: user, which can choose, carries out provisional version generation, and new self-defined template also can be generated;
Step 7-4: it is replaced firstly, user needs to generate the placeholder in meta template to information, placeholder can be
Common text information, rule write the item of information in context or the regularization term in rules context;
Step 7-5: user can choose the format of text generation after filling the placeholder in template, including
TXT, DOC, DOCX, PDF etc., then user can download generated text;
Step 7-6: the selection of self-defined template rule is finished with text generation.
Fig. 2 is structural schematic diagram of the invention, schematic diagram when Fig. 3 is operation of the invention, wherein core of the invention knot
Structure is exactly dynamic configuration information, prolongable regulation engine and efficient template engine.If for some area of text
The Extracting Information in domain is changed, and user can be solved the problems, such as by addition dictionary of information context and expansion word packet,
And changed if it is the decimation rule in some region of text and extraction content, user, which can also pass through, relies on digraph
The dependence of current extraction is first cleared, then selection needs the rule modified to be adjusted and re-apply.It is past in enterprise
Frequently change toward the requirement for text generation, and meta template design of the invention, certain items of information can be carried out
Convenient replacement is redesigned with text filed, without any written in code, passes through online item of information and rule
It then configures, the generation task of template can be completed.
Fig. 4 is the algorithm flow chart of the regular dependency parser of the present invention.After user has write rule, rule passes through rule
After syntax analyzer, the abstract syntax tree there are also expression formula information is generated.It is available by being traversed to abstract syntax tree
The dependence item and export item of rule.In this algorithm, need to identify the specific expression formula in syntax tree.Here in syntax tree
Expression formula classify:
A. item associated expression, for property access expressions, variable expression and array expression are relied on.
B. item associated expression is exported, for method call expression formula.
C. local variable expression formula, for stating expression formula, assignment expression.
Algorithm steps are as follows.
Step 1: rule is generated into abstract syntax tree by grammar parser;
Step 2: abstract syntax tree is traversed;
Does step 3: expression formula traversal finish? if completed, step is arrived, otherwise enters step 4;
Does is step 4: current expression that local variable defines expression formula? if it is, local variable collection is added, enter
Step 3,5 are otherwise entered step;
Does is step 5: current expression Attribute expression and attribute is not concentrated in local variable? if it is, plus
Enter according to
Rely and concentrate, otherwise enters step 6;
Does is step 6: current expression method call expression formula and method call is export function? if it is,
It is added
Definition collection, otherwise enters step 7;
Step 7: other expression formulas carry out recursive traversal, enter step 3;
Step 8: export relies on item and defined item;
Step 9: terminating.
Fig. 5 is text marking schematic diagram of the invention.After user imports text information, letter can be carried out to text information
Breath mark.User can select text filed, can be by the way that title is arranged to region, for subsequent after selection
Decimation rule write.After user is to text marking, it can be determined by carrying out text information to text marking segment
Position.
Fig. 6 is the schematic diagram that practical decimation rule of the invention is write, and Fig. 7 is that dependency analysis result of the invention is illustrated
Figure.User can practically write specific decimation rule after being labeled to text, carry out information extraction to text.User
Text information can be carried out detailed by placement algorithm rule, the mode of keyword and regular expression or script edit
It extracts on ground.At the same time, user can be checked by carrying out relying on digraph to text, analyze the dependence item of current rule with
Export item is optimized and is adjusted to the rule currently write.
Fig. 8 is meta template editor and text generation schematic diagram of the invention, and Fig. 9 is that information text of the invention generates result
Figure.User can enter text generation edit model module, pass through definition after the decimation rule to text information is write
The template content of text generation and its decimation rule relied on, ultimately generate the information text of needs.
In conclusion traditional Text Information Extraction method has been carried out effective expansion by the present invention, so that entire extract
Process is more efficient, excavates more information contents from text information convenient for user.And rule-based extraction script can be with
Efficient multiplexing is carried out, and brings the promotion of maintenance aspect.By carrying out syntactic analysis to rule script, the present invention can be helped
It helps user to understand the dependence of current various texts, makees good place mat for subsequent information extraction and extraction optimization.Except this
In addition, in order to preferably be utilized Extracting Information, the invention proposes the concept of meta template, user can be by online may be used
Mode depending on changing carries out the generation of text, greatly reduces the complexity of text generation, improves the efficiency of text generation.
The technical means disclosed in the embodiments of the present invention is not limited only to technological means disclosed in above embodiment, further includes
Technical solution consisting of any combination of the above technical features.It should be pointed out that for those skilled in the art
For, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also considered as
Protection scope of the present invention.
Claims (10)
1. a kind of rule-based generic text information extracts and document creation method, which comprises the steps of:
Step 1: initialization information dictionary context, regular word packet, regulation engine and template engine
Context of the initialization information dictionary as information extraction, for carrying out dynamic to information text, the information of expansion type is taken out
It takes;Engine classification defined in configuration information is loaded, rule syntax resolver, grammer dependency parser and rule is carried out and executes
The load work of device;Initialization depends on regulation engine and supports the data access engine of third party's data source;Pass through load
Template engine configuration information generates template and loads to defined precompile template instruction and the information write,
To complete the load work of entire template engine;
Step 2: information labeling is carried out to text information
Modeling analysis is carried out to Text Information Extraction, Text Information Extraction model is divided into monodrome information extraction and multilevel information is taken out
It takes;Monodrome information extraction indicates the text that single region content is extracted from one section of text;And multilevel information extract indicate from
The information of specified multiple regions is extracted in one section of text;Text information marking model includes: the range of text information label, mark
Information characteristics and information labeling identifier can find desired pumping from a segment information text for each information labeling
Take text;
Step 3: information extraction algorithm and redaction rule scripted code are defined
Analysis modeling carried out to decimation rule, decimation rule model include: scalar rule, shared rule, without rely on computation rule,
Rely on computation rule and variable context rule;User carry out information extraction when, if current extraction item of information without rely on its
He also relies on without apparent text context rule, is able to use scalar rule and carries out information extraction;If current extraction item of information
Extraction mode it is similar to other similar structure text, decimation rule can be carried out by way of directly quoting or copying
It is shared;If current extraction item of information, can be by without dependence computation rule pair without other rules for relying on current rules context
Information is extracted;If current extraction item of information has the dependence of other rules to current rules context, can be counted by relying on
Rule is calculated to be calculated;If current extraction item of information has very deep structure to rely on, and the information of state does not need to show among it
Extraction, then can carry out information extraction by variable context rule not influence current rules context simultaneously;
Step 4: create-rule relies on digraph
Syntax parsing is carried out by the decimation rule write to user, the dependence item and its export item of the rule is exported, generates rule
Then rely on digraph;
Step 5: executing text decimation rule and is finely adjusted according to accuracy is extracted
This paper decimation rule is put into regulation engine and is executed, the extraction text of well-formed can be generated, by the extraction
Information and incipient text marking information carry out content comparison, and generate Extracting Information accuracy;
Step 6: it defines information and generates meta template
User can be directed to displaying demand, define information and generate meta template;It includes basic information text that information, which generates meta template,
This format and several regular filling regions;In order to provide general information generating mode, by providing self-defined information rule
The mode of expansion, user can import the information of third party's data source in a manner of meeting rule schemata;
Step 7: self-defined template rule chooses and text generation
Meta template is generated for same information, user can be by carrying out different rule informations to several regular filling regions
It chooses, generates the text for adapting to different sub-scenes;User can select format to carry out information text generation.
2. rule-based generic text information according to claim 1 extracts and document creation method, which is characterized in that
The step 1 includes the following steps:
Step 1-1: initial state;
Step 1-2: the data structure table of definition storage dictionary of information, the data structure of the dictionary of information are the Hash table of hierarchical
Structure can support the message structure of multi-layer;
Step 1-3: information load is carried out to dictionary of information according to hierarchical structure, root node is first loaded, then along layer
Secondary sequence is until leaf node loaded;
Step 1-4: for each information subitem of dictionary of information, its corresponding information result is obtained, needs to access first
Leaf node checks that it whether there is, if current leaf node exists, direct return information item result;Otherwise along layer
Secondary structure is searched upwards, until including the information subitem, then return information item result in some level of information;
Step 1-5: being loaded onto system for word packet from database, and a word packet includes single or multiple phrases and several optional
Condition case statement, certain information extraction rules can include one or more word packet, can by redaction rule script
Word packet is obtained;
Step 1-6: the associated condition discriminant function of word packet is loaded, for using when operation;For word packet, predefine
Some condition discriminant functions, can judge some or certain words whether there is in word packet and certain section of sentence whether include
Vocabulary in word packet, while user can be carrying out the expansion of conditional function to certain word packets from by way of expanding;
Step 1-7: by loading rule engine configuration information, regulation engine is initialized: selection regulation engine grammer collection,
The load of grammar parser is carried out, then loads nonessential grammar contexts dependency parser for grammar parser, finally
Regular actuator is loaded, the loading procedure of entire regulation engine is completed;
Step 1-8: by load template engine configuration information, template engine is initialized: selection template engine type is right
Template engine instruction set is loaded, and the information that system has defined is generated template and is loaded, entire template is completed and draws
The loading procedure held up;
Step 1-9: importing algorithm for dictionary of information context, regular word packet and regulation engine, finally by template engine and
Algorithm is integrated, and entire initial work finishes.
3. rule-based generic text information according to claim 1 extracts and document creation method, which is characterized in that
The step 2 includes the following steps:
Step 2-1: initial state;
Step 2-2: user's selection needs to carry out the text of information extraction or by text import system to be extracted;
Step 2-3: user carries out the determination of extracting region by carrying out customized division to text;
Step 2-4: user adds the type of Extracting Information, monodrome or multivalue;
Step 2-5: if user selects monodrome type, text marking is carried out in specified text;
Step 2-6: if user selects multivalue type, user it needs to be determined that text extracting region quantity, then to specified
Text carry out selection mark;
Step 2-7: the secondary information labeling is named by user, and then system gives unique information flag identifier;
Step 2-8: text information mark is finished.
4. rule-based generic text information according to claim 1 extracts and document creation method, which is characterized in that
The step 3 includes the following steps:
Step 3-1: initial state;
Step 3-2: user selects the information labeling carried out in step 2, then carries out specific information extraction rules volume
It writes;
Step 3-3: user is in carrying out actual extraction process, using a few class extraction algorithms predetermined in regulation engine,
If it is satisfied that algorithm extracts result, does not need specific rule and write;
Step 3-4: otherwise, user needs to carry out writing for custom rule: user needs to conclude feature from text to be extracted;
Step 3-5: morphological analysis is carried out to the decimation rule that user writes by rule syntax resolver, identification user is writing
The rule whether variable defined when regular meets in specification and dependent Rule context whether there is;
Step 3-6: the morphological analysis sequence generated according to step 3-5 further carries out grammer point by rule syntax analyzer
Analysis, function defined in the rule write to user, program structure are analyzed, to the function, incorrect wherein re-defined
Program structure carry out error prompting;
Step 3-7: the rule script write to user is exported by text information of the predefined function to extraction, is led
Item is used to carry out to other rules of current rules context using in order to the extraction of structured text information out;
Step 3-8: user can perform the following operation the rule write in rules context list: check extract content,
It is analyzed using decimation rule and to decimation rule dependence;
Step 3-9: defining information extraction algorithm and redaction rule scripted code finishes.
5. rule-based generic text information according to claim 4 extracts and document creation method, which is characterized in that
It is described
In step 3-4, the user needs to conclude feature from text to be extracted and specifically comprises the following steps: that user can be from upper
Hereafter unrelated keyword, the specified modes such as phrase or regular expression carry out feature conclusion, also can be from context-sensitive
Carry out feature conclusion containing specific semantic mode.
6. rule-based generic text information according to claim 1 extracts and document creation method, which is characterized in that
The step 4 includes the following steps:
Step 4-1: initial state;
Step 4-2: user can selectively carry out the analysis of regular dependence, if user selects to carry out dependency analysis, into
Enter step 7, otherwise enters step 3;
Step 4-3: after user selects regular dependency analysis, system carries out the structure of abstract syntax tree by regulation engine to rule
It builds;
Step 4-4: dependency parser is analyzed in abstract syntax tree and relies on the interior of variable, local variable and rule export item
Hold, by carrying out deep search to abstract syntax tree, completes the analysis that rule relies on;
Step 4-5: the dependency analysis digraph of generation is shown to user by dependency parser, and user passes through to interested rule
It relies on item or regular export item is selected, to check the in-degree relationship out of currentitem, understand current regular dependence or more
Text;
Step 4-6: user can be directly entered the rule adjustment stage by the regularization term in selection digraph;
Step 4-7: create-rule relies on digraph and finishes.
7. rule-based generic text information according to claim 1 extracts and document creation method, which is characterized in that
In the step 5, if text extracts accuracy and do not reach target, continue the decimation rule for adjusting deficient accuracy, until
It extracts accuracy and reaches specified threshold value.
8. rule-based generic text information according to claim 7 extracts and document creation method, which is characterized in that
The step 5 includes the following steps:
Step 5-1: initial state;
Step 5-2: user carries out regular execution to single rule, obtains after whole decimation rules is write, carries out rule
Whole execute;
Step 5-3: when user, which carries out rule, to be executed, according to defining information extraction algorithm and redaction rule script generation in step 3
The Rule content of no grammer content mistake is put into regular actuator and is executed, according to configuration by the content in code, system
Enforcement engine difference uses corresponding enforcement engine mode;
Step 5-4: it executes in context, leads to firstly, dictionary of information and word packet that rule relies on are put into rule by regular actuator
Cross the regular carry out sequence execution that will be needed to be implemented, to Mr. Yu's item execute rule for, if it is currently performed rule institute according to
There is the rule being also not carried out in bad rule set, then the rule being not carried out can be executed first, until currently performed
The dependent Rule of rule, which is all performed, to be finished, the rule that then backtracking is not performed before having executed;
Step 5-5: if after rule has been finished, system can export the corresponding text information mark of the rule with rule
Item compares, while calculating the accuracy of extraction, and the document markup information not hit is prompted;
Step 5-6: if the accuracy extracted has reached requirement, until step 5-7 is continued to execute, otherwise in adjustment rule
Hold, continues to execute step 5-2;
Step 5-7: executing text decimation rule and is finely adjusted according to accuracy is extracted.
9. rule-based generic text information according to claim 1 extracts and document creation method, which is characterized in that
The step 6 includes the following steps:
Step 6-1: initial state;
Step 6-2: user creates one and generates meta template with denominative information;
Step 6-3: user adds text essential information block, fixed dependent Rule item and placeholder in meta template;Text is basic
Block of information is any text information;Fixed dependent Rule is the regularization term that certain seed type text extracts;It is directed to placeholder, later period
When template generation, it is able to use text information and writes the regularization term in context with rule and be replaced;
Step 6-4: information is generated meta template and carries out database preservation by user;
Step 6-5: it defines information generation meta template and finishes.
10. rule-based generic text information according to claim 1 extracts and document creation method, feature exist
In the step 7 includes the following steps:
Step 7-1: initial state;
Step 7-2: user selects already existing information to generate meta template and carries out text generation;
Step 7-3: user selects to carry out provisional version generation, or generates new self-defined template;
Step 7-4: user generates the placeholder in meta template to information and is replaced, and placeholder is common text information, rule
Then write the item of information in context or the regularization term in rules context;
Step 7-5: user selects the format of text generation after filling the placeholder in template, and then downloading generates
Good text;
Step 7-6: the selection of self-defined template rule is finished with text generation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910153119.5A CN110059176B (en) | 2019-02-28 | 2019-02-28 | Rule-based general text information extraction and information generation method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910153119.5A CN110059176B (en) | 2019-02-28 | 2019-02-28 | Rule-based general text information extraction and information generation method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110059176A true CN110059176A (en) | 2019-07-26 |
CN110059176B CN110059176B (en) | 2021-07-13 |
Family
ID=67316534
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910153119.5A Active CN110059176B (en) | 2019-02-28 | 2019-02-28 | Rule-based general text information extraction and information generation method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110059176B (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110597959A (en) * | 2019-09-17 | 2019-12-20 | 北京百度网讯科技有限公司 | Text information extraction method and device and electronic equipment |
CN111476034A (en) * | 2020-04-07 | 2020-07-31 | 同方赛威讯信息技术有限公司 | Legal document information extraction method and system based on combination of rules and models |
CN111639480A (en) * | 2020-05-28 | 2020-09-08 | 深圳壹账通智能科技有限公司 | Text labeling method based on artificial intelligence, electronic device and storage medium |
CN112560460A (en) * | 2020-12-08 | 2021-03-26 | 北京百度网讯科技有限公司 | Method and device for extracting structured information, electronic equipment and readable storage medium |
CN112669076A (en) * | 2020-12-30 | 2021-04-16 | 平安证券股份有限公司 | Data distribution method based on rule engine, server and storage medium |
CN113485182A (en) * | 2021-06-30 | 2021-10-08 | 中冶华天工程技术有限公司 | Method for automatically generating material yard belt flow control program |
CN113590769A (en) * | 2020-04-30 | 2021-11-02 | 阿里巴巴集团控股有限公司 | State tracking method and device in task-driven multi-turn dialogue system |
CN115185502A (en) * | 2022-09-14 | 2022-10-14 | 中国人民解放军国防科技大学 | Rule-based data processing workflow definition method, device, terminal and medium |
CN116484768A (en) * | 2023-05-25 | 2023-07-25 | 之江实验室 | System dynamics model construction method and device |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104572849A (en) * | 2014-12-17 | 2015-04-29 | 西安美林数据技术股份有限公司 | Automatic standardized filing method based on text semantic mining |
CN105069560A (en) * | 2015-07-30 | 2015-11-18 | 中国科学院软件研究所 | Resume information extraction and characteristic identification analysis system and method based on knowledge base and rule base |
US9280528B2 (en) * | 2010-10-04 | 2016-03-08 | Yahoo! Inc. | Method and system for processing and learning rules for extracting information from incoming web pages |
CN106156035A (en) * | 2015-02-28 | 2016-11-23 | 南京网感至察信息科技有限公司 | A kind of generic text method for digging and system |
US9594747B2 (en) * | 2012-03-27 | 2017-03-14 | Accenture Global Services Limited | Generation of a semantic model from textual listings |
CN107092674A (en) * | 2017-04-14 | 2017-08-25 | 福建工程学院 | The automatic abstracting method and system of a kind of Chinese medicine acupuncture field event trigger word |
CN107608948A (en) * | 2017-10-16 | 2018-01-19 | 北京神州泰岳软件股份有限公司 | A kind of construction method and device of Text Information Extraction model |
CN108763483A (en) * | 2018-05-25 | 2018-11-06 | 南京大学 | A kind of Text Information Extraction method towards judgement document |
JP2018206423A (en) * | 2018-08-30 | 2018-12-27 | 三井住友カード株式会社 | User information input assistance system |
-
2019
- 2019-02-28 CN CN201910153119.5A patent/CN110059176B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9280528B2 (en) * | 2010-10-04 | 2016-03-08 | Yahoo! Inc. | Method and system for processing and learning rules for extracting information from incoming web pages |
US9594747B2 (en) * | 2012-03-27 | 2017-03-14 | Accenture Global Services Limited | Generation of a semantic model from textual listings |
CN104572849A (en) * | 2014-12-17 | 2015-04-29 | 西安美林数据技术股份有限公司 | Automatic standardized filing method based on text semantic mining |
CN106156035A (en) * | 2015-02-28 | 2016-11-23 | 南京网感至察信息科技有限公司 | A kind of generic text method for digging and system |
CN105069560A (en) * | 2015-07-30 | 2015-11-18 | 中国科学院软件研究所 | Resume information extraction and characteristic identification analysis system and method based on knowledge base and rule base |
CN107092674A (en) * | 2017-04-14 | 2017-08-25 | 福建工程学院 | The automatic abstracting method and system of a kind of Chinese medicine acupuncture field event trigger word |
CN107608948A (en) * | 2017-10-16 | 2018-01-19 | 北京神州泰岳软件股份有限公司 | A kind of construction method and device of Text Information Extraction model |
CN108763483A (en) * | 2018-05-25 | 2018-11-06 | 南京大学 | A kind of Text Information Extraction method towards judgement document |
JP2018206423A (en) * | 2018-08-30 | 2018-12-27 | 三井住友カード株式会社 | User information input assistance system |
Non-Patent Citations (3)
Title |
---|
TAO XIE, SHENGSHENG SHI,YIHUA HUANG: "Research on Complex Structure-Oriented Accurate Web Information Extraction Rules", 《2010 IEEE INTERNATIONAL CONFERENCE ON PROGRESS IN INFORMATICS AND COMPUTING》 * |
WU WEI, SHENGSHENG SHI,YIHUA HUANG: "Extraction Rule Language for Web Information Extraction and Integration", 《WEB INFORMATION SYSTEM AND APPLICATION CONFERENCE》 * |
辛欣,李涓子: "文本信息抽取平台的设计与实现——基于机器学习", 《第七届中文信息处理国际会议》 * |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110597959A (en) * | 2019-09-17 | 2019-12-20 | 北京百度网讯科技有限公司 | Text information extraction method and device and electronic equipment |
CN111476034A (en) * | 2020-04-07 | 2020-07-31 | 同方赛威讯信息技术有限公司 | Legal document information extraction method and system based on combination of rules and models |
CN111476034B (en) * | 2020-04-07 | 2023-05-12 | 同方赛威讯信息技术有限公司 | Legal document information extraction method and system based on combination of rules and models |
CN113590769A (en) * | 2020-04-30 | 2021-11-02 | 阿里巴巴集团控股有限公司 | State tracking method and device in task-driven multi-turn dialogue system |
CN111639480A (en) * | 2020-05-28 | 2020-09-08 | 深圳壹账通智能科技有限公司 | Text labeling method based on artificial intelligence, electronic device and storage medium |
CN112560460A (en) * | 2020-12-08 | 2021-03-26 | 北京百度网讯科技有限公司 | Method and device for extracting structured information, electronic equipment and readable storage medium |
CN112560460B (en) * | 2020-12-08 | 2022-02-25 | 北京百度网讯科技有限公司 | Method and device for extracting structured information, electronic equipment and readable storage medium |
CN112669076A (en) * | 2020-12-30 | 2021-04-16 | 平安证券股份有限公司 | Data distribution method based on rule engine, server and storage medium |
CN113485182A (en) * | 2021-06-30 | 2021-10-08 | 中冶华天工程技术有限公司 | Method for automatically generating material yard belt flow control program |
CN115185502A (en) * | 2022-09-14 | 2022-10-14 | 中国人民解放军国防科技大学 | Rule-based data processing workflow definition method, device, terminal and medium |
CN115185502B (en) * | 2022-09-14 | 2022-11-15 | 中国人民解放军国防科技大学 | Rule-based data processing workflow definition method, device, terminal and medium |
CN116484768A (en) * | 2023-05-25 | 2023-07-25 | 之江实验室 | System dynamics model construction method and device |
CN116484768B (en) * | 2023-05-25 | 2023-08-18 | 之江实验室 | System dynamics model construction method and device |
Also Published As
Publication number | Publication date |
---|---|
CN110059176B (en) | 2021-07-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110059176A (en) | A kind of rule-based generic text information extracts and information generating method | |
US11138005B2 (en) | Methods and systems for automatically generating documentation for software | |
US7191119B2 (en) | Integrated development tool for building a natural language understanding application | |
US7165216B2 (en) | Systems and methods for converting legacy and proprietary documents into extended mark-up language format | |
Alexa et al. | A review of software for text analysis | |
CN104199871B (en) | A kind of high speed examination question introduction method for wisdom teaching | |
US9645988B1 (en) | System and method for identifying passages in electronic documents | |
US11537797B2 (en) | Hierarchical entity recognition and semantic modeling framework for information extraction | |
CN107203468A (en) | A kind of software version evolution comparative analysis method based on AST | |
CN102609427A (en) | Public opinion vertical search analysis system and method | |
CN107992476B (en) | Corpus generation method and system for sentence-level biological relation network extraction | |
de Almeida Ferreira et al. | RSL-PL: A linguistic pattern language for documenting software requirements | |
Koznov et al. | Clone detection in reuse of software technical documentation | |
CN111898024A (en) | Intelligent question and answer method and device, readable storage medium and computing equipment | |
Xia et al. | Enriching a massively multilingual database of interlinear glossed text | |
CN111753536A (en) | Automatic patent application text writing method and device | |
CN109325217B (en) | File conversion method, system, device and computer readable storage medium | |
Han et al. | A novel part of speech tagging framework for nlp based business process management | |
Agnoloni et al. | xmLegesEditor: an opensource visual XML editor for supporting legal national standards | |
US20090217156A1 (en) | Method for Storing Localized XML Document Values | |
CN114691820A (en) | Question-answering implementation method and device based on knowledge graph | |
Khoufi et al. | A Framework for Language Resource Construction and Syntactic Analysis: Case of Arabic | |
CN117852637B (en) | Definition-based subject concept knowledge system automatic construction method and system | |
Labský et al. | The ex project: Web information extraction using extraction ontologies | |
WO2024092553A1 (en) | Methods and systems for model generation and instantiation of optimization models from markup documents |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |