CN114492399A - Contract information extraction system and method based on regular expression - Google Patents

Contract information extraction system and method based on regular expression Download PDF

Info

Publication number
CN114492399A
CN114492399A CN202111682272.0A CN202111682272A CN114492399A CN 114492399 A CN114492399 A CN 114492399A CN 202111682272 A CN202111682272 A CN 202111682272A CN 114492399 A CN114492399 A CN 114492399A
Authority
CN
China
Prior art keywords
data
information extraction
module
regular expression
contract
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111682272.0A
Other languages
Chinese (zh)
Inventor
孙常鹏
戴斐斐
高静
赵猛
贾晓亮
李博
刘德玉
张耀心
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
State Grid Tianjin Electric Power Co Ltd
Information and Telecommunication Branch of State Grid Tianjin Electric Power Co Ltd
Original Assignee
State Grid Corp of China SGCC
State Grid Tianjin Electric Power Co Ltd
Information and Telecommunication Branch of State Grid Tianjin Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, State Grid Tianjin Electric Power Co Ltd, Information and Telecommunication Branch of State Grid Tianjin Electric Power Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN202111682272.0A priority Critical patent/CN114492399A/en
Publication of CN114492399A publication Critical patent/CN114492399A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a contract information extraction system and method based on a regular expression. The invention extracts key information through a regular expression unstructured conversion technology, stores the information as structured data, and screens the data according to an inherent rule.

Description

Contract information extraction system and method based on regular expression
Technical Field
The invention belongs to the technical field of intelligent auditing, relates to an auditing information extraction system, and particularly relates to a contract information extraction system and method based on a regular expression.
Background
The traditional method for extracting information from files is only to simply turn over files and record information manually, errors are easy to occur, the efficiency of extracting information is low, and structured information data cannot be formed efficiently. The traditional method can not meet the requirements of the existing work at the present stage, and with the continuous development of scientific technology, a big data, intelligentization, innovative audit mode, innovative data analysis technology and method are promoted, and an effective information extraction system and method facing audit information are produced.
Upon search, no prior art publications that are the same or similar to the present invention were found.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a contract information extraction system and method based on a regular expression.
The invention solves the practical problem by adopting the following technical scheme:
a contract information extraction system based on a regular expression comprises a task setting module, a data acquisition module, an information extraction module, a data storage module and a big data analysis module; the output end of the task setting module is connected with the data acquisition module and is used for presetting tasks and parameters; the output end of the data acquisition module is connected with the information extraction module and used for realizing accurate acquisition flow of target data through the process automation operation terminal according to tasks and parameters preset by the task setting module and providing a data source for the information extraction module; the output end of the information extraction module is connected with the data storage module and used for processing the data acquired by the data acquisition module, the key information required by auditing is mined by adopting a regular expression matching algorithm for non-structural data, and a corresponding automaton is established by using a regular expression to match character strings; the output end of the information extraction module is connected with the data storage module and is used for storing the data of the data acquisition module and the information extraction module; and the output end of the data storage module is connected with the big data analysis module and is used for further data analysis of the data storage module.
Moreover, the method for matching the character strings by establishing the corresponding automaton by using the regular expression comprises the following steps: the regular expression is converted into an uncertain automaton, and then the uncertain automaton is converted into a certain automaton.
A contract information extraction method based on regular expressions comprises the following steps:
step 1, setting tasks and constructing an audit task list;
step 2, collecting target data through a process automation operation terminal according to the audit task list in the step 1;
and 3, extracting the information of the target data acquired in the step 2.
Further, the specific steps of step 1 include:
(1) designing a contract information auditing intermediate table according to fields of required data given by auditors and the meaning of the fields;
(2) meanwhile, a data acquisition path is set, and the work operation and the pre-programming operation of auditors are simulated;
(3) and setting an audit task list according to the audit task.
Moreover, the specific method of the step 2 is as follows:
and according to the collection path, the simulation operation, the data intermediate table and the audit task list set by the task, collecting contract information in the service system by the process automation operation terminal, and downloading the unstructured contract file.
Further, the specific steps of step 3 include:
(1) and reading the contract document acquired in the data acquisition stage into text information by using a reading technology of a robot.
(2) According to the read text information, unstructured data conversion is carried out by using an information extraction technology based on a regular expression, an automaton is constructed according to combination construction of syntactic elements of the regular expression and an expression matched with key information, and text key information is mined;
furthermore, the syntax elements of the regular expression of step 3 and step (2) include: common characters, character sets, matching times qualifiers, grouping expressions, selection expressions, and escape characters.
Furthermore, the step 3 further comprises the following steps:
step 4, analyzing and processing the data extracted in the step 3, and outputting audit doubtful points;
the specific method of the step 4 comprises the following steps:
firstly, an auditor analyzes and searches logics among data according to the collected data, a fixed audit model is constructed through business logic conversion, the business logic interacts with a program developer again, secondly, the program developer converts the business logic into a computer language, and auditing doubtful points are automatically judged and output through logic operation.
Step 5, verifying the suspicious points output in the step 4;
the specific method of the step 5 comprises the following steps:
and the contract auditing robot automatically sends the auditing doubt to the mailbox of the auditor to assist in verifying the doubt, directly locks the auditing problem after the auditor verifies and confirms, and finishes the process.
The invention has the advantages and beneficial effects that:
1. the invention adopts a flow Automation (Robotic Process Automation) technology and a Regular Expression (Regular Expression) technology, takes an RPA robot as a virtual labor force, takes a Regular Expression as an algorithm of unstructured data conversion, presets an audit task, and carries out Automation information extraction, data storage and data analysis. The traditional office process can be effectively optimized, the working efficiency is improved, the labor resource allocation of enterprises is indirectly optimized, and the digital upgrading of the enterprises is assisted.
2. The invention applies the RPA and regular expression technology to formulate the work task of the RPA robot to automatically execute at regular time, does not depend on manual triggering, is a 24-hour uninterrupted work mode, and can realize work closed loop in the whole work process. By using the algorithm of the regular expression, the required effective information of the file is accurately and efficiently extracted so as to assist the RPA robot to perform data analysis on the key information in the file. Compared with the traditional method, the traditional method mainly depends on a large amount of manpower to review the files, manually extracts key information, manually pastes or writes the key information, arranges the key information into normalized effective information, and uses the normalized effective information for work. The invention can replace manual operation with high repeatability and low complexity in the working process, is preset according to units, time, range and the like, automatically collects required data from the system according to a preset automatic process, downloads files in batches and the like, extracts key information through a regular expression unstructured conversion technology, stores the information into structured data, and screens the data according to inherent rules. The invention can be collectively called as an information extraction robot, and the automatic process can generate an intuitive structured data result for staff to quickly review files.
Drawings
FIG. 1 is a system configuration diagram of the present invention;
FIG. 2 is a process flow diagram of a data acquisition module of the present invention;
FIG. 3(a) is a schematic diagram of an A/B uncertain automaton of a data extraction module of the present invention;
fig. 3(b) is a schematic diagram of an a x uncertain automaton of the data extraction module of the present invention;
fig. 3(c) is a schematic diagram of an uncertain automaton of regular expression (a/B) × ABB of the data extraction module of the present invention;
fig. 3(d) is a schematic diagram of a deterministic automaton of regular expression (a/B) × ABB for the data extraction module of the present invention;
FIG. 4 is a process flow diagram of the present invention.
Detailed Description
The embodiments of the invention will be described in further detail below with reference to the accompanying drawings:
a contract information extraction system based on regular expressions is shown in figure 1 and comprises a task setting module, a data acquisition module, an information extraction module, a data storage module and a big data analysis module; the output end of the task setting module is connected with the data acquisition module and is used for presetting tasks and parameters; the output end of the data acquisition module is connected with the information extraction module and used for realizing accurate acquisition flow of target data through the process automation operation terminal according to tasks and parameters preset by the task setting module and providing a data source for the information extraction module; the output end of the information extraction module is connected with the data storage module and used for processing the data acquired by the data acquisition module, the key information required by auditing is mined by adopting a regular expression matching algorithm for non-structural data, and a corresponding automaton is established by using a regular expression to match character strings; the output end of the information extraction module is connected with the data storage module and is used for storing the data of the data acquisition module and the information extraction module; and the output end of the data storage module is connected with the big data analysis module and is used for further data analysis of the data storage module.
In this embodiment, the method for matching a character string by establishing a corresponding automaton using a regular expression includes: the regular expression is converted into an uncertain automaton, and then the uncertain automaton is converted into a certain automaton.
The composition and operation of the various modules within the system are further described below:
1. the task setting module is used for presetting a work plan input by a worker and is an important operation of interaction between the worker and the robot, and the process automation operation terminal reads work plan parameters and obtains preset login website, login account, password, unit, time and range information for setting query conditions in the acquisition process in the data acquisition module.
2. As shown in fig. 2, the data collection module accesses the web page by using the RPA and identifies the interface HTML program code according to the preset task and parameter of the task setting module, so as to realize accurate collection of the target data. And the process automation operation terminal and the target web page carry out full duplex communication through a WebSocket protocol to realize synchronous data interaction. The data acquisition module can acquire service data from the service system and internet data according to working requirements, download related files, store the data into a data warehouse by using the data storage module and provide a data source for the information extraction module.
3. The information extraction module is used for processing the data acquired by the data acquisition module, adopting a regular expression matching algorithm for the non-structural data, mining key information required by auditing, and establishing a corresponding automaton by using a regular expression to match character strings;
the automaton establishment steps are generally as follows: the regular expression is converted into an uncertain automaton, and then the uncertain automaton is converted into a certain automaton.
The uncertain automaton is defined as: a quintuple, M ═ K, Σ, f, S, Z) wherein:
(1) k is a finite set, each element of which is called a state;
(2) Σ is a finite alphabet, each element of which is called an input symbol and therefore also called an input symbol table;
(3) f is an image of a subset from K x Σ to K, Σ representing a sequence of strings on the alphabet;
(4)
Figure BDA0003443529910000071
is a non-empty state set;
(5)
Figure BDA0003443529910000072
is a final state set.
The defined automaton is defined as: a quintuple, M ═ K, Σ, f, S, Z) wherein:
(1) k is a finite set, each element of which is called a state;
(2) Σ is a finite alphabet, each element of which is called an input symbol;
(3) f is an image of a subset of the transfer function from K × Σ to K;
(4) s belongs to K and is only one initial state;
(5)
Figure BDA0003443529910000073
is a final state set.
Both deterministic automata and non-deterministic automata can be represented by graphs or matrices, as shown in fig. 3(a) -3 (d), where the nodes in the graphs represent states when represented graphically, and the deterministic automata and non-deterministic automata differ mainly by definition as follows: the determined automaton has a unique initial state and a final state set; the uncertain automata has an initial state set and a final state set; the character values on the edges of the diagram represent the transition from one state to another, a state of the deterministic automata can be converted to one or more states by a certain character value, and a state of the deterministic automata can only be converted to one deterministic state by a certain character value.
As for the regular expression Q ═ (a | B) × ABB, a | B is represented by an uncertain automaton as shown in fig. 3(a) where $ represents an empty string where the start node is node No. 1 and the end node is node No. 6. A is represented by an uncertain automaton as shown in fig. 3(b), where the start node is node No. 1 and the end node is node No. 4. The uncertainty is mainly reflected in that node number 1 can reach node number 2 and node number 4 by $ or that node numbers 1, 2 and 4 are all starting nodes. The uncertain automata of Q is represented as shown in fig. 3(c), wherein node 1 is a starting node and node 7 is an ending node, the uncertain automata can be converted into a certain automata, the conversion result is shown in fig. 3(d), wherein node 1 is a starting node and node 5 is an ending node, the conversion process mainly uses a subset construction algorithm, and the main idea of the algorithm is as follows: each state in the deterministic automata corresponds to a set of states in the deterministic automata, i.e. the states of the deterministic automata are recorded for all states that may be reached after the deterministic automata reads in an input character.
When the determined automaton is used for matching the text character string, if the text character string sequence can reach the end node from the start node to each character on the determined automaton side in a matching mode, the text character string sequence can be matched with the regular expression. The regular expression matching method based on the passive factors is applied in a mode with higher current efficiency, namely the passive factors divide a text character string sequence into a segment of short text sub-character strings, whether prefixes and suffixes exist in each short character string sequence is judged, and if the prefixes or the suffixes do not exist, the prefixes and the suffixes are directly filtered. When both a prefix and a suffix are present in a short sequence of strings, a match verification can be performed in a deterministic automaton from each prefix position. This enables to find exactly all start and end positions in the text that match a given regular expression.
4. The data storage module stores the data of the data acquisition and information extraction module, is realized in a data warehouse mode, firstly determines the theme domain of the data warehouse according to the actual business field of an enterprise, and determines the analysis theme in each theme domain according to the model. For example, the compliance of a company staff during the hiring period within a certain time period, the implementation of a company's important policy within a certain time period, the contract signed in a month in a certain year, the bid winning document, etc. are analyzed. After the theme is clarified, information such as the measurement, the data granularity and the dimension of data analysis is determined, for example, the condition that a company is expected to analyze the important policy in terms of time, units, file types and the like is determined, and the time, the units and the file types are corresponding dimensions. The dimension and the original data are determined, the basis of the analysis of the data of each topic is determined, and the key object of the data maintenance work is determined.
5. The big data analysis module is used for further data analysis of data of the data storage module, and the OLAP service is adopted, supports complex analysis operation and can provide visual and understandable query functions. The staff analyzes data from different business angles for each theme, and obtains intuitive analysis results by performing analysis operations such as rotation, slicing, drilling and the like on the data in the database.
A contract information extraction method based on regular expressions, as shown in FIG. 4, includes the following steps:
step 1, setting tasks and constructing an audit task list;
the specific steps of the step 1 comprise:
(2) designing a contract information auditing intermediate table according to fields of required data given by auditors and the meaning of the fields;
in this embodiment, the contract information auditing intermediate table mainly includes fields such as "contract name, contract signing unit, contract undertaking unit, contract amount, contract signing date, bid winning date, purchasing mode, contract text link, bid winning notice link, and supplementary agreement link". Both structured and unstructured data are covered in the intermediate table.
(2) Meanwhile, a data acquisition path is set, and the work operation and the pre-programming operation of auditors are simulated;
(3) and setting an audit task list according to the audit task.
Step 2, collecting target data through a process automation operation terminal according to the audit task list in the step 1;
the specific method of the step 2 comprises the following steps:
and according to the collection path, the simulation operation, the data intermediate table and the audit task list set by the task, collecting contract information in the service system by the process automation operation terminal, and downloading the unstructured contract file.
Step 3, extracting information of the target data acquired in the step 2;
the specific steps of the step 3 comprise:
(1) and reading the contract document acquired in the data acquisition stage into text information by using a reading technology of a robot.
(2) According to the read text information, unstructured data conversion is carried out by using an information extraction technology based on a regular expression, an automaton is constructed according to combination construction of syntactic elements of the regular expression and an expression matched with key information, and text key information is mined;
in this embodiment, the syntax elements of the regular expression include the following syntax elements in 6:
(1) common characters
Letters, numbers, Chinese characters, underlines, and punctuation marks without defined special meanings are all "common characters" which, when matched, match one character the same as it.
(2) Character set
Multiple characters are contained with brackets [ ] and any of the contained characters can be matched. Also, only one can be matched at a time.
[ m-n ] -e.g. [1-5], indicates that the character to be matched should be in the range of 1 to 5;
[ n1 n2 n3] -as in [135], indicates that the character to be matched is 1, 3 or 5.
(3) Matching times qualifier
The number of repetitions is contained in curly brackets, so that the modified expression can be repeatedly matched multiple times.
{ n } -expression fixed repeats n times: e.g., A {2}, indicating that a match to 2 consecutive letters A is required;
{ m, n } -the expression is repeated at least m times and at most n times.
{ m, } -the expression is at least m times, and the maximum number of repetitions is unlimited.
(4) Grouping expressions
Other expressions are contained in parentheses () so that the contained expressions form a whole and can be decorated as a whole when decorated for the number of matches.
(5) Selecting an expression
The vertical line "|" is used to separate the multiple segment expressions, and the expression on the left and right sides is in an "or" relationship, such as 010|021, then the expressions can only match 010 or 021.
(6) Escape character
Is there a The number of modifications matches is 0 or 1; such as to match the month of the day: 1-12 months, the regular expression can be set as: 0? [1-9] |1[0-2 ];
2-the number of modifications matches is at least 1;
modification matches 0 or arbitrary;
the three symbols above define a special meaning and therefore require a preceding "\" to be escape before the character itself can be matched.
By combining and constructing the syntax elements in the above 6, the formulation requirements of various format character string matching criteria, such as numbers, characters, dates, amounts, and descriptions of more complicated Email addresses, telephone numbers, Internet URL character strings, etc., can be satisfied.
Extracting file information content according to an audit task list preset by an auditor, constructing according to regular expression syntax element combination, matching an expression of key information, extracting the key information, and storing the information in an audit contract information data table by a robot.
For example, if a winning bid unit in a winning bid notice needs to be extracted, the winning bid notice can be found by a robot using an isamatch (material name, "# notice. # doc") method, a file Text message is Read by using a robot plug-in Read Text, and a regular expression "(? And searching all matching items, returning successful matching items, namely extracting a winning unit in a winning notice, assigning a value to a data field, storing, executing N circulation operations, reading all files, extracting key information in all files, and listing to form a structured audit material for auditing staff to perform full coverage audit work.
Step 4, analyzing and processing the data extracted in the step 3, and outputting audit doubtful points;
the specific method of the step 4 comprises the following steps:
firstly, an auditor analyzes and searches logics among data according to the collected data, a fixed audit model is constructed through business logic conversion, the business logic interacts with a program developer again, secondly, the program developer converts the business logic into a computer language, and auditing doubtful points are automatically judged and output through logic operation.
Step 5, verifying the doubtful points output in the step 4;
the specific method of the step 5 comprises the following steps:
and the contract auditing robot automatically sends the auditing doubt to the mailbox of the auditor to assist in verifying the doubt, directly locks the auditing problem after the auditor verifies and confirms, and finishes the process.
After the model is successfully built, the auditing robot can be started to carry out the acquisition, data processing, data analysis and output of the relevant data of the model
The regular expression-based information extraction method plays a key role in the construction of an audit trail contract audit model and contract management of enterprise business, and is an important field of internal control and compliance management. Meanwhile, the contract information is the basic data frequently applied by each professional auditing group such as engineering, finance and the like. Therefore, by utilizing a digital auditing means, the contract data value is mined, the problems in contract management are accurately and efficiently positioned, and meanwhile, high-quality information support services are provided for each professional auditing group, which becomes an essential requirement in auditing work, so that the contract information extraction method based on the regular expression plays a key role.
The system and the method have the advantages that the system and the method for extracting contract information based on the regular expression are utilized, a computer has text reading capacity, and helps workers to automatically process massive text data, so that the workers can quickly deal with complex work such as review, search and proofreading, risk terms in contract files can be effectively monitored, labor and time cost are saved, enterprise bidding files, internal document data and other long-term files can be effectively analyzed, valuable information can be extracted from a large amount of text data, and word processing efficiency and text mining depth are improved.
It should be emphasized that the examples described herein are illustrative and not restrictive, and thus the present invention includes, but is not limited to, those examples described in this detailed description, as well as other embodiments that can be derived from the teachings of the present invention by those skilled in the art and that are within the scope of the present invention.

Claims (8)

1. A contract information extraction system based on regular expressions is characterized in that: the system comprises a task setting module, a data acquisition module, an information extraction module, a data storage module and a big data analysis module; the output end of the task setting module is connected with the data acquisition module and is used for presetting tasks and parameters; the output end of the data acquisition module is connected with the information extraction module and used for realizing accurate acquisition flow of target data through the process automation operation terminal according to tasks and parameters preset by the task setting module and providing a data source for the information extraction module; the output end of the information extraction module is connected with the data storage module and used for processing the data acquired by the data acquisition module, the key information required by auditing is mined by adopting a regular expression matching algorithm for non-structural data, and a corresponding automaton is established by using a regular expression to match character strings; the output end of the information extraction module is connected with the data storage module and is used for storing the data of the data acquisition module and the information extraction module; and the output end of the data storage module is connected with the big data analysis module and is used for further data analysis of the data storage module.
2. The regular-expression-based contract information extraction system according to claim 1, wherein: the method for matching the character strings by establishing the corresponding automaton by using the regular expression comprises the following steps: the regular expression is converted into an uncertain automaton, and then the uncertain automaton is converted into a certain automaton.
3. A contract information extraction method based on regular expressions is characterized in that: the method comprises the following steps:
step 1, setting tasks and constructing an audit task list;
step 2, collecting target data through a process automation operation terminal according to the audit task list in the step 1;
and 3, extracting the information of the target data acquired in the step 2.
4. The regular expression-based contract information extraction method according to claim 3, characterized in that: the specific steps of the step 1 comprise:
(1) designing a contract information auditing intermediate table according to fields of required data given by auditors and the meaning of the fields;
(2) meanwhile, a data acquisition path is set, and the work operation and the pre-programming operation of auditors are simulated;
(3) and setting an audit task list according to the audit task.
5. The regular expression-based contract information extraction method according to claim 3, characterized in that: the specific method of the step 2 comprises the following steps:
and according to the collection path, the simulation operation, the data intermediate table and the audit task list set by the task, collecting contract information in the service system by the process automation operation terminal, and downloading the unstructured contract file.
6. The regular-expression-based contract information extraction method according to claim 3, characterized in that: the specific steps of the step 3 comprise:
(1) and reading the contract document acquired in the data acquisition stage into text information by using a reading technology of a robot.
(2) According to the read text information, performing unstructured data conversion by using an information extraction technology based on a regular expression, constructing an automaton according to a combination of syntactic elements of the regular expression and an expression matched with key information, and mining text key information;
7. the regular expression-based contract information extraction method according to claim 6, wherein: the syntax elements of the regular expression in step 3 and step (2) comprise: common characters, character sets, matching times qualifiers, grouping expressions, selection expressions, and escape characters.
8. The regular expression-based contract information extraction method according to claim 3, characterized in that: the step 3 is followed by the following steps:
step 4, analyzing and processing the data extracted in the step 3, and outputting audit doubtful points;
the specific method of the step 4 comprises the following steps:
firstly, an auditor analyzes and searches logics among data according to the collected data, a fixed audit model is constructed through business logic conversion, the business logic interacts with a program developer again, secondly, the program developer converts the business logic into a computer language, and auditing doubtful points are automatically judged and output through logic operation.
Step 5, verifying the suspicious points output in the step 4;
the specific method of the step 5 comprises the following steps:
and the contract auditing robot automatically sends the auditing doubt to the mailbox of the auditor to assist in verifying the doubt, directly locks the auditing problem after the auditor verifies and confirms, and finishes the process.
CN202111682272.0A 2021-12-29 2021-12-29 Contract information extraction system and method based on regular expression Pending CN114492399A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111682272.0A CN114492399A (en) 2021-12-29 2021-12-29 Contract information extraction system and method based on regular expression

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111682272.0A CN114492399A (en) 2021-12-29 2021-12-29 Contract information extraction system and method based on regular expression

Publications (1)

Publication Number Publication Date
CN114492399A true CN114492399A (en) 2022-05-13

Family

ID=81509819

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111682272.0A Pending CN114492399A (en) 2021-12-29 2021-12-29 Contract information extraction system and method based on regular expression

Country Status (1)

Country Link
CN (1) CN114492399A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101201836A (en) * 2007-09-04 2008-06-18 浙江大学 Method for matching in speedup regular expression based on finite automaton containing memorization determination
US20120150887A1 (en) * 2010-12-08 2012-06-14 Clark Christopher F Pattern matching
CN102637180A (en) * 2011-02-14 2012-08-15 汉王科技股份有限公司 Character post processing method and device based on regular expression
CN103259793A (en) * 2013-05-02 2013-08-21 东北大学 Method for inspecting deep packets based on suffix automaton regular engine structure
CN111461668A (en) * 2020-04-08 2020-07-28 国网天津市电力公司 Digital auditing system and method based on process automation technology
CN112771514A (en) * 2019-09-30 2021-05-07 尤帕斯公司 Document processing framework for robotic process automation
CN113435218A (en) * 2021-07-22 2021-09-24 贵州电网有限责任公司 Regular expression-based speech translation text information extraction method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101201836A (en) * 2007-09-04 2008-06-18 浙江大学 Method for matching in speedup regular expression based on finite automaton containing memorization determination
US20120150887A1 (en) * 2010-12-08 2012-06-14 Clark Christopher F Pattern matching
CN102637180A (en) * 2011-02-14 2012-08-15 汉王科技股份有限公司 Character post processing method and device based on regular expression
CN103259793A (en) * 2013-05-02 2013-08-21 东北大学 Method for inspecting deep packets based on suffix automaton regular engine structure
CN112771514A (en) * 2019-09-30 2021-05-07 尤帕斯公司 Document processing framework for robotic process automation
CN111461668A (en) * 2020-04-08 2020-07-28 国网天津市电力公司 Digital auditing system and method based on process automation technology
CN113435218A (en) * 2021-07-22 2021-09-24 贵州电网有限责任公司 Regular expression-based speech translation text information extraction method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
米丝鱼: ""由正则表达式生成不确定有限自动机再转化成确定有限自动机的全过程"", Retrieved from the Internet <URL:https://blog.csdn.net/qq1187239259/article/details/103921068> *

Similar Documents

Publication Publication Date Title
US7418440B2 (en) Method and system for extraction and organizing selected data from sources on a network
CN100485603C (en) Systems and methods for generating concept units from search queries
CN102073726B (en) Structured data import method and device for search engine system
CN105095369A (en) Website matching method and device
CN111898023A (en) Message pushing method and device, readable storage medium and computing equipment
JP7222040B2 (en) Model training, image processing method and device, storage medium, program product
CN110334343B (en) Method and system for extracting personal privacy information in contract
CN110134845A (en) Project public sentiment monitoring method, device, computer equipment and storage medium
Jiang et al. Towards reengineering web sites to web-services providers
CN104598570A (en) Resource fetching method and device
CN103399968B (en) A kind of micro-blog information acquisition method and system
CN109636303B (en) Storage method and system for semi-automatically extracting and structuring document information
CN110737432A (en) script aided design method and device based on root list
CN1367446A (en) Chinese personal biographical notes information treatment system and method
US10360208B2 (en) Method and system of process reconstruction
CN112069305B (en) Data screening method and device and electronic equipment
CN108549714A (en) A kind of data processing method and device
CN100456291C (en) Glossary shared system and method
CN113806647A (en) Method for identifying development framework and related equipment
CN114492399A (en) Contract information extraction system and method based on regular expression
CN115757995A (en) Method and device for processing characteristic-free data label, computer equipment and storage medium
Lalis Searching for Architectural Design Decisions in Open-Source Software Mailing Lists
CN103577560A (en) Method and device for inputting data base operating instructions
WO2023185377A1 (en) Multi-granularity data pattern mining method and related device
CN110633431B (en) Web request correlation analysis method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination