CN114492399A - Contract information extraction system and method based on regular expression - Google Patents
Contract information extraction system and method based on regular expression Download PDFInfo
- Publication number
- CN114492399A CN114492399A CN202111682272.0A CN202111682272A CN114492399A CN 114492399 A CN114492399 A CN 114492399A CN 202111682272 A CN202111682272 A CN 202111682272A CN 114492399 A CN114492399 A CN 114492399A
- Authority
- CN
- China
- Prior art keywords
- data
- information extraction
- module
- regular expression
- contract
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000014509 gene expression Effects 0.000 title claims abstract description 76
- 238000000605 extraction Methods 0.000 title claims abstract description 47
- 238000000034 method Methods 0.000 title claims abstract description 41
- 238000006243 chemical reaction Methods 0.000 claims abstract description 11
- 238000005516 engineering process Methods 0.000 claims abstract description 10
- 238000012550 audit Methods 0.000 claims description 29
- 238000013500 data storage Methods 0.000 claims description 19
- 238000007405 data analysis Methods 0.000 claims description 16
- 230000008569 process Effects 0.000 claims description 13
- 238000004801 process automation Methods 0.000 claims description 12
- 238000012545 processing Methods 0.000 claims description 9
- 238000005065 mining Methods 0.000 claims description 3
- 238000004088 simulation Methods 0.000 claims description 3
- 239000000284 extract Substances 0.000 abstract description 3
- 238000010586 diagram Methods 0.000 description 8
- 238000004458 analytical method Methods 0.000 description 5
- 238000010276 construction Methods 0.000 description 4
- 238000013075 data extraction Methods 0.000 description 4
- 238000007726 management method Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012552 review Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 238000013474 audit trail Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005553 drilling Methods 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000001915 proofreading effect Effects 0.000 description 1
- 238000013468 resource allocation Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 230000007306 turnover Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/253—Grammatical analysis; Style critique
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/90335—Query processing
- G06F16/90344—Query processing by using string matching techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/151—Transformation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to a contract information extraction system and method based on a regular expression. The invention extracts key information through a regular expression unstructured conversion technology, stores the information as structured data, and screens the data according to an inherent rule.
Description
Technical Field
The invention belongs to the technical field of intelligent auditing, relates to an auditing information extraction system, and particularly relates to a contract information extraction system and method based on a regular expression.
Background
The traditional method for extracting information from files is only to simply turn over files and record information manually, errors are easy to occur, the efficiency of extracting information is low, and structured information data cannot be formed efficiently. The traditional method can not meet the requirements of the existing work at the present stage, and with the continuous development of scientific technology, a big data, intelligentization, innovative audit mode, innovative data analysis technology and method are promoted, and an effective information extraction system and method facing audit information are produced.
Upon search, no prior art publications that are the same or similar to the present invention were found.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a contract information extraction system and method based on a regular expression.
The invention solves the practical problem by adopting the following technical scheme:
a contract information extraction system based on a regular expression comprises a task setting module, a data acquisition module, an information extraction module, a data storage module and a big data analysis module; the output end of the task setting module is connected with the data acquisition module and is used for presetting tasks and parameters; the output end of the data acquisition module is connected with the information extraction module and used for realizing accurate acquisition flow of target data through the process automation operation terminal according to tasks and parameters preset by the task setting module and providing a data source for the information extraction module; the output end of the information extraction module is connected with the data storage module and used for processing the data acquired by the data acquisition module, the key information required by auditing is mined by adopting a regular expression matching algorithm for non-structural data, and a corresponding automaton is established by using a regular expression to match character strings; the output end of the information extraction module is connected with the data storage module and is used for storing the data of the data acquisition module and the information extraction module; and the output end of the data storage module is connected with the big data analysis module and is used for further data analysis of the data storage module.
Moreover, the method for matching the character strings by establishing the corresponding automaton by using the regular expression comprises the following steps: the regular expression is converted into an uncertain automaton, and then the uncertain automaton is converted into a certain automaton.
A contract information extraction method based on regular expressions comprises the following steps:
and 3, extracting the information of the target data acquired in the step 2.
Further, the specific steps of step 1 include:
(1) designing a contract information auditing intermediate table according to fields of required data given by auditors and the meaning of the fields;
(2) meanwhile, a data acquisition path is set, and the work operation and the pre-programming operation of auditors are simulated;
(3) and setting an audit task list according to the audit task.
Moreover, the specific method of the step 2 is as follows:
and according to the collection path, the simulation operation, the data intermediate table and the audit task list set by the task, collecting contract information in the service system by the process automation operation terminal, and downloading the unstructured contract file.
Further, the specific steps of step 3 include:
(1) and reading the contract document acquired in the data acquisition stage into text information by using a reading technology of a robot.
(2) According to the read text information, unstructured data conversion is carried out by using an information extraction technology based on a regular expression, an automaton is constructed according to combination construction of syntactic elements of the regular expression and an expression matched with key information, and text key information is mined;
furthermore, the syntax elements of the regular expression of step 3 and step (2) include: common characters, character sets, matching times qualifiers, grouping expressions, selection expressions, and escape characters.
Furthermore, the step 3 further comprises the following steps:
the specific method of the step 4 comprises the following steps:
firstly, an auditor analyzes and searches logics among data according to the collected data, a fixed audit model is constructed through business logic conversion, the business logic interacts with a program developer again, secondly, the program developer converts the business logic into a computer language, and auditing doubtful points are automatically judged and output through logic operation.
the specific method of the step 5 comprises the following steps:
and the contract auditing robot automatically sends the auditing doubt to the mailbox of the auditor to assist in verifying the doubt, directly locks the auditing problem after the auditor verifies and confirms, and finishes the process.
The invention has the advantages and beneficial effects that:
1. the invention adopts a flow Automation (Robotic Process Automation) technology and a Regular Expression (Regular Expression) technology, takes an RPA robot as a virtual labor force, takes a Regular Expression as an algorithm of unstructured data conversion, presets an audit task, and carries out Automation information extraction, data storage and data analysis. The traditional office process can be effectively optimized, the working efficiency is improved, the labor resource allocation of enterprises is indirectly optimized, and the digital upgrading of the enterprises is assisted.
2. The invention applies the RPA and regular expression technology to formulate the work task of the RPA robot to automatically execute at regular time, does not depend on manual triggering, is a 24-hour uninterrupted work mode, and can realize work closed loop in the whole work process. By using the algorithm of the regular expression, the required effective information of the file is accurately and efficiently extracted so as to assist the RPA robot to perform data analysis on the key information in the file. Compared with the traditional method, the traditional method mainly depends on a large amount of manpower to review the files, manually extracts key information, manually pastes or writes the key information, arranges the key information into normalized effective information, and uses the normalized effective information for work. The invention can replace manual operation with high repeatability and low complexity in the working process, is preset according to units, time, range and the like, automatically collects required data from the system according to a preset automatic process, downloads files in batches and the like, extracts key information through a regular expression unstructured conversion technology, stores the information into structured data, and screens the data according to inherent rules. The invention can be collectively called as an information extraction robot, and the automatic process can generate an intuitive structured data result for staff to quickly review files.
Drawings
FIG. 1 is a system configuration diagram of the present invention;
FIG. 2 is a process flow diagram of a data acquisition module of the present invention;
FIG. 3(a) is a schematic diagram of an A/B uncertain automaton of a data extraction module of the present invention;
fig. 3(b) is a schematic diagram of an a x uncertain automaton of the data extraction module of the present invention;
fig. 3(c) is a schematic diagram of an uncertain automaton of regular expression (a/B) × ABB of the data extraction module of the present invention;
fig. 3(d) is a schematic diagram of a deterministic automaton of regular expression (a/B) × ABB for the data extraction module of the present invention;
FIG. 4 is a process flow diagram of the present invention.
Detailed Description
The embodiments of the invention will be described in further detail below with reference to the accompanying drawings:
a contract information extraction system based on regular expressions is shown in figure 1 and comprises a task setting module, a data acquisition module, an information extraction module, a data storage module and a big data analysis module; the output end of the task setting module is connected with the data acquisition module and is used for presetting tasks and parameters; the output end of the data acquisition module is connected with the information extraction module and used for realizing accurate acquisition flow of target data through the process automation operation terminal according to tasks and parameters preset by the task setting module and providing a data source for the information extraction module; the output end of the information extraction module is connected with the data storage module and used for processing the data acquired by the data acquisition module, the key information required by auditing is mined by adopting a regular expression matching algorithm for non-structural data, and a corresponding automaton is established by using a regular expression to match character strings; the output end of the information extraction module is connected with the data storage module and is used for storing the data of the data acquisition module and the information extraction module; and the output end of the data storage module is connected with the big data analysis module and is used for further data analysis of the data storage module.
In this embodiment, the method for matching a character string by establishing a corresponding automaton using a regular expression includes: the regular expression is converted into an uncertain automaton, and then the uncertain automaton is converted into a certain automaton.
The composition and operation of the various modules within the system are further described below:
1. the task setting module is used for presetting a work plan input by a worker and is an important operation of interaction between the worker and the robot, and the process automation operation terminal reads work plan parameters and obtains preset login website, login account, password, unit, time and range information for setting query conditions in the acquisition process in the data acquisition module.
2. As shown in fig. 2, the data collection module accesses the web page by using the RPA and identifies the interface HTML program code according to the preset task and parameter of the task setting module, so as to realize accurate collection of the target data. And the process automation operation terminal and the target web page carry out full duplex communication through a WebSocket protocol to realize synchronous data interaction. The data acquisition module can acquire service data from the service system and internet data according to working requirements, download related files, store the data into a data warehouse by using the data storage module and provide a data source for the information extraction module.
3. The information extraction module is used for processing the data acquired by the data acquisition module, adopting a regular expression matching algorithm for the non-structural data, mining key information required by auditing, and establishing a corresponding automaton by using a regular expression to match character strings;
the automaton establishment steps are generally as follows: the regular expression is converted into an uncertain automaton, and then the uncertain automaton is converted into a certain automaton.
The uncertain automaton is defined as: a quintuple, M ═ K, Σ, f, S, Z) wherein:
(1) k is a finite set, each element of which is called a state;
(2) Σ is a finite alphabet, each element of which is called an input symbol and therefore also called an input symbol table;
(3) f is an image of a subset from K x Σ to K, Σ representing a sequence of strings on the alphabet;
The defined automaton is defined as: a quintuple, M ═ K, Σ, f, S, Z) wherein:
(1) k is a finite set, each element of which is called a state;
(2) Σ is a finite alphabet, each element of which is called an input symbol;
(3) f is an image of a subset of the transfer function from K × Σ to K;
(4) s belongs to K and is only one initial state;
Both deterministic automata and non-deterministic automata can be represented by graphs or matrices, as shown in fig. 3(a) -3 (d), where the nodes in the graphs represent states when represented graphically, and the deterministic automata and non-deterministic automata differ mainly by definition as follows: the determined automaton has a unique initial state and a final state set; the uncertain automata has an initial state set and a final state set; the character values on the edges of the diagram represent the transition from one state to another, a state of the deterministic automata can be converted to one or more states by a certain character value, and a state of the deterministic automata can only be converted to one deterministic state by a certain character value.
As for the regular expression Q ═ (a | B) × ABB, a | B is represented by an uncertain automaton as shown in fig. 3(a) where $ represents an empty string where the start node is node No. 1 and the end node is node No. 6. A is represented by an uncertain automaton as shown in fig. 3(b), where the start node is node No. 1 and the end node is node No. 4. The uncertainty is mainly reflected in that node number 1 can reach node number 2 and node number 4 by $ or that node numbers 1, 2 and 4 are all starting nodes. The uncertain automata of Q is represented as shown in fig. 3(c), wherein node 1 is a starting node and node 7 is an ending node, the uncertain automata can be converted into a certain automata, the conversion result is shown in fig. 3(d), wherein node 1 is a starting node and node 5 is an ending node, the conversion process mainly uses a subset construction algorithm, and the main idea of the algorithm is as follows: each state in the deterministic automata corresponds to a set of states in the deterministic automata, i.e. the states of the deterministic automata are recorded for all states that may be reached after the deterministic automata reads in an input character.
When the determined automaton is used for matching the text character string, if the text character string sequence can reach the end node from the start node to each character on the determined automaton side in a matching mode, the text character string sequence can be matched with the regular expression. The regular expression matching method based on the passive factors is applied in a mode with higher current efficiency, namely the passive factors divide a text character string sequence into a segment of short text sub-character strings, whether prefixes and suffixes exist in each short character string sequence is judged, and if the prefixes or the suffixes do not exist, the prefixes and the suffixes are directly filtered. When both a prefix and a suffix are present in a short sequence of strings, a match verification can be performed in a deterministic automaton from each prefix position. This enables to find exactly all start and end positions in the text that match a given regular expression.
4. The data storage module stores the data of the data acquisition and information extraction module, is realized in a data warehouse mode, firstly determines the theme domain of the data warehouse according to the actual business field of an enterprise, and determines the analysis theme in each theme domain according to the model. For example, the compliance of a company staff during the hiring period within a certain time period, the implementation of a company's important policy within a certain time period, the contract signed in a month in a certain year, the bid winning document, etc. are analyzed. After the theme is clarified, information such as the measurement, the data granularity and the dimension of data analysis is determined, for example, the condition that a company is expected to analyze the important policy in terms of time, units, file types and the like is determined, and the time, the units and the file types are corresponding dimensions. The dimension and the original data are determined, the basis of the analysis of the data of each topic is determined, and the key object of the data maintenance work is determined.
5. The big data analysis module is used for further data analysis of data of the data storage module, and the OLAP service is adopted, supports complex analysis operation and can provide visual and understandable query functions. The staff analyzes data from different business angles for each theme, and obtains intuitive analysis results by performing analysis operations such as rotation, slicing, drilling and the like on the data in the database.
A contract information extraction method based on regular expressions, as shown in FIG. 4, includes the following steps:
the specific steps of the step 1 comprise:
(2) designing a contract information auditing intermediate table according to fields of required data given by auditors and the meaning of the fields;
in this embodiment, the contract information auditing intermediate table mainly includes fields such as "contract name, contract signing unit, contract undertaking unit, contract amount, contract signing date, bid winning date, purchasing mode, contract text link, bid winning notice link, and supplementary agreement link". Both structured and unstructured data are covered in the intermediate table.
(2) Meanwhile, a data acquisition path is set, and the work operation and the pre-programming operation of auditors are simulated;
(3) and setting an audit task list according to the audit task.
the specific method of the step 2 comprises the following steps:
and according to the collection path, the simulation operation, the data intermediate table and the audit task list set by the task, collecting contract information in the service system by the process automation operation terminal, and downloading the unstructured contract file.
the specific steps of the step 3 comprise:
(1) and reading the contract document acquired in the data acquisition stage into text information by using a reading technology of a robot.
(2) According to the read text information, unstructured data conversion is carried out by using an information extraction technology based on a regular expression, an automaton is constructed according to combination construction of syntactic elements of the regular expression and an expression matched with key information, and text key information is mined;
in this embodiment, the syntax elements of the regular expression include the following syntax elements in 6:
(1) common characters
Letters, numbers, Chinese characters, underlines, and punctuation marks without defined special meanings are all "common characters" which, when matched, match one character the same as it.
(2) Character set
Multiple characters are contained with brackets [ ] and any of the contained characters can be matched. Also, only one can be matched at a time.
[ m-n ] -e.g. [1-5], indicates that the character to be matched should be in the range of 1 to 5;
[ n1 n2 n3] -as in [135], indicates that the character to be matched is 1, 3 or 5.
(3) Matching times qualifier
The number of repetitions is contained in curly brackets, so that the modified expression can be repeatedly matched multiple times.
{ n } -expression fixed repeats n times: e.g., A {2}, indicating that a match to 2 consecutive letters A is required;
{ m, n } -the expression is repeated at least m times and at most n times.
{ m, } -the expression is at least m times, and the maximum number of repetitions is unlimited.
(4) Grouping expressions
Other expressions are contained in parentheses () so that the contained expressions form a whole and can be decorated as a whole when decorated for the number of matches.
(5) Selecting an expression
The vertical line "|" is used to separate the multiple segment expressions, and the expression on the left and right sides is in an "or" relationship, such as 010|021, then the expressions can only match 010 or 021.
(6) Escape character
Is there a The number of modifications matches is 0 or 1; such as to match the month of the day: 1-12 months, the regular expression can be set as: 0? [1-9] |1[0-2 ];
2-the number of modifications matches is at least 1;
modification matches 0 or arbitrary;
the three symbols above define a special meaning and therefore require a preceding "\" to be escape before the character itself can be matched.
By combining and constructing the syntax elements in the above 6, the formulation requirements of various format character string matching criteria, such as numbers, characters, dates, amounts, and descriptions of more complicated Email addresses, telephone numbers, Internet URL character strings, etc., can be satisfied.
Extracting file information content according to an audit task list preset by an auditor, constructing according to regular expression syntax element combination, matching an expression of key information, extracting the key information, and storing the information in an audit contract information data table by a robot.
For example, if a winning bid unit in a winning bid notice needs to be extracted, the winning bid notice can be found by a robot using an isamatch (material name, "# notice. # doc") method, a file Text message is Read by using a robot plug-in Read Text, and a regular expression "(? And searching all matching items, returning successful matching items, namely extracting a winning unit in a winning notice, assigning a value to a data field, storing, executing N circulation operations, reading all files, extracting key information in all files, and listing to form a structured audit material for auditing staff to perform full coverage audit work.
the specific method of the step 4 comprises the following steps:
firstly, an auditor analyzes and searches logics among data according to the collected data, a fixed audit model is constructed through business logic conversion, the business logic interacts with a program developer again, secondly, the program developer converts the business logic into a computer language, and auditing doubtful points are automatically judged and output through logic operation.
the specific method of the step 5 comprises the following steps:
and the contract auditing robot automatically sends the auditing doubt to the mailbox of the auditor to assist in verifying the doubt, directly locks the auditing problem after the auditor verifies and confirms, and finishes the process.
After the model is successfully built, the auditing robot can be started to carry out the acquisition, data processing, data analysis and output of the relevant data of the model
The regular expression-based information extraction method plays a key role in the construction of an audit trail contract audit model and contract management of enterprise business, and is an important field of internal control and compliance management. Meanwhile, the contract information is the basic data frequently applied by each professional auditing group such as engineering, finance and the like. Therefore, by utilizing a digital auditing means, the contract data value is mined, the problems in contract management are accurately and efficiently positioned, and meanwhile, high-quality information support services are provided for each professional auditing group, which becomes an essential requirement in auditing work, so that the contract information extraction method based on the regular expression plays a key role.
The system and the method have the advantages that the system and the method for extracting contract information based on the regular expression are utilized, a computer has text reading capacity, and helps workers to automatically process massive text data, so that the workers can quickly deal with complex work such as review, search and proofreading, risk terms in contract files can be effectively monitored, labor and time cost are saved, enterprise bidding files, internal document data and other long-term files can be effectively analyzed, valuable information can be extracted from a large amount of text data, and word processing efficiency and text mining depth are improved.
It should be emphasized that the examples described herein are illustrative and not restrictive, and thus the present invention includes, but is not limited to, those examples described in this detailed description, as well as other embodiments that can be derived from the teachings of the present invention by those skilled in the art and that are within the scope of the present invention.
Claims (8)
1. A contract information extraction system based on regular expressions is characterized in that: the system comprises a task setting module, a data acquisition module, an information extraction module, a data storage module and a big data analysis module; the output end of the task setting module is connected with the data acquisition module and is used for presetting tasks and parameters; the output end of the data acquisition module is connected with the information extraction module and used for realizing accurate acquisition flow of target data through the process automation operation terminal according to tasks and parameters preset by the task setting module and providing a data source for the information extraction module; the output end of the information extraction module is connected with the data storage module and used for processing the data acquired by the data acquisition module, the key information required by auditing is mined by adopting a regular expression matching algorithm for non-structural data, and a corresponding automaton is established by using a regular expression to match character strings; the output end of the information extraction module is connected with the data storage module and is used for storing the data of the data acquisition module and the information extraction module; and the output end of the data storage module is connected with the big data analysis module and is used for further data analysis of the data storage module.
2. The regular-expression-based contract information extraction system according to claim 1, wherein: the method for matching the character strings by establishing the corresponding automaton by using the regular expression comprises the following steps: the regular expression is converted into an uncertain automaton, and then the uncertain automaton is converted into a certain automaton.
3. A contract information extraction method based on regular expressions is characterized in that: the method comprises the following steps:
step 1, setting tasks and constructing an audit task list;
step 2, collecting target data through a process automation operation terminal according to the audit task list in the step 1;
and 3, extracting the information of the target data acquired in the step 2.
4. The regular expression-based contract information extraction method according to claim 3, characterized in that: the specific steps of the step 1 comprise:
(1) designing a contract information auditing intermediate table according to fields of required data given by auditors and the meaning of the fields;
(2) meanwhile, a data acquisition path is set, and the work operation and the pre-programming operation of auditors are simulated;
(3) and setting an audit task list according to the audit task.
5. The regular expression-based contract information extraction method according to claim 3, characterized in that: the specific method of the step 2 comprises the following steps:
and according to the collection path, the simulation operation, the data intermediate table and the audit task list set by the task, collecting contract information in the service system by the process automation operation terminal, and downloading the unstructured contract file.
6. The regular-expression-based contract information extraction method according to claim 3, characterized in that: the specific steps of the step 3 comprise:
(1) and reading the contract document acquired in the data acquisition stage into text information by using a reading technology of a robot.
(2) According to the read text information, performing unstructured data conversion by using an information extraction technology based on a regular expression, constructing an automaton according to a combination of syntactic elements of the regular expression and an expression matched with key information, and mining text key information;
7. the regular expression-based contract information extraction method according to claim 6, wherein: the syntax elements of the regular expression in step 3 and step (2) comprise: common characters, character sets, matching times qualifiers, grouping expressions, selection expressions, and escape characters.
8. The regular expression-based contract information extraction method according to claim 3, characterized in that: the step 3 is followed by the following steps:
step 4, analyzing and processing the data extracted in the step 3, and outputting audit doubtful points;
the specific method of the step 4 comprises the following steps:
firstly, an auditor analyzes and searches logics among data according to the collected data, a fixed audit model is constructed through business logic conversion, the business logic interacts with a program developer again, secondly, the program developer converts the business logic into a computer language, and auditing doubtful points are automatically judged and output through logic operation.
Step 5, verifying the suspicious points output in the step 4;
the specific method of the step 5 comprises the following steps:
and the contract auditing robot automatically sends the auditing doubt to the mailbox of the auditor to assist in verifying the doubt, directly locks the auditing problem after the auditor verifies and confirms, and finishes the process.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111682272.0A CN114492399A (en) | 2021-12-29 | 2021-12-29 | Contract information extraction system and method based on regular expression |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111682272.0A CN114492399A (en) | 2021-12-29 | 2021-12-29 | Contract information extraction system and method based on regular expression |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114492399A true CN114492399A (en) | 2022-05-13 |
Family
ID=81509819
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111682272.0A Pending CN114492399A (en) | 2021-12-29 | 2021-12-29 | Contract information extraction system and method based on regular expression |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114492399A (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101201836A (en) * | 2007-09-04 | 2008-06-18 | 浙江大学 | Method for matching in speedup regular expression based on finite automaton containing memorization determination |
US20120150887A1 (en) * | 2010-12-08 | 2012-06-14 | Clark Christopher F | Pattern matching |
CN102637180A (en) * | 2011-02-14 | 2012-08-15 | 汉王科技股份有限公司 | Character post processing method and device based on regular expression |
CN103259793A (en) * | 2013-05-02 | 2013-08-21 | 东北大学 | Method for inspecting deep packets based on suffix automaton regular engine structure |
CN111461668A (en) * | 2020-04-08 | 2020-07-28 | 国网天津市电力公司 | Digital auditing system and method based on process automation technology |
CN112771514A (en) * | 2019-09-30 | 2021-05-07 | 尤帕斯公司 | Document processing framework for robotic process automation |
CN113435218A (en) * | 2021-07-22 | 2021-09-24 | 贵州电网有限责任公司 | Regular expression-based speech translation text information extraction method |
-
2021
- 2021-12-29 CN CN202111682272.0A patent/CN114492399A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101201836A (en) * | 2007-09-04 | 2008-06-18 | 浙江大学 | Method for matching in speedup regular expression based on finite automaton containing memorization determination |
US20120150887A1 (en) * | 2010-12-08 | 2012-06-14 | Clark Christopher F | Pattern matching |
CN102637180A (en) * | 2011-02-14 | 2012-08-15 | 汉王科技股份有限公司 | Character post processing method and device based on regular expression |
CN103259793A (en) * | 2013-05-02 | 2013-08-21 | 东北大学 | Method for inspecting deep packets based on suffix automaton regular engine structure |
CN112771514A (en) * | 2019-09-30 | 2021-05-07 | 尤帕斯公司 | Document processing framework for robotic process automation |
CN111461668A (en) * | 2020-04-08 | 2020-07-28 | 国网天津市电力公司 | Digital auditing system and method based on process automation technology |
CN113435218A (en) * | 2021-07-22 | 2021-09-24 | 贵州电网有限责任公司 | Regular expression-based speech translation text information extraction method |
Non-Patent Citations (1)
Title |
---|
米丝鱼: ""由正则表达式生成不确定有限自动机再转化成确定有限自动机的全过程"", Retrieved from the Internet <URL:https://blog.csdn.net/qq1187239259/article/details/103921068> * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7418440B2 (en) | Method and system for extraction and organizing selected data from sources on a network | |
CN100485603C (en) | Systems and methods for generating concept units from search queries | |
CN102073726B (en) | Structured data import method and device for search engine system | |
CN105095369A (en) | Website matching method and device | |
CN111898023A (en) | Message pushing method and device, readable storage medium and computing equipment | |
JP7222040B2 (en) | Model training, image processing method and device, storage medium, program product | |
CN110334343B (en) | Method and system for extracting personal privacy information in contract | |
CN110134845A (en) | Project public sentiment monitoring method, device, computer equipment and storage medium | |
Jiang et al. | Towards reengineering web sites to web-services providers | |
CN104598570A (en) | Resource fetching method and device | |
CN103399968B (en) | A kind of micro-blog information acquisition method and system | |
CN109636303B (en) | Storage method and system for semi-automatically extracting and structuring document information | |
CN110737432A (en) | script aided design method and device based on root list | |
CN1367446A (en) | Chinese personal biographical notes information treatment system and method | |
US10360208B2 (en) | Method and system of process reconstruction | |
CN112069305B (en) | Data screening method and device and electronic equipment | |
CN108549714A (en) | A kind of data processing method and device | |
CN100456291C (en) | Glossary shared system and method | |
CN113806647A (en) | Method for identifying development framework and related equipment | |
CN114492399A (en) | Contract information extraction system and method based on regular expression | |
CN115757995A (en) | Method and device for processing characteristic-free data label, computer equipment and storage medium | |
Lalis | Searching for Architectural Design Decisions in Open-Source Software Mailing Lists | |
CN103577560A (en) | Method and device for inputting data base operating instructions | |
WO2023185377A1 (en) | Multi-granularity data pattern mining method and related device | |
CN110633431B (en) | Web request correlation analysis method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |