CN113887205A - Automatic document examination method and device and storage medium - Google Patents

Automatic document examination method and device and storage medium Download PDF

Info

Publication number
CN113887205A
CN113887205A CN202111132732.2A CN202111132732A CN113887205A CN 113887205 A CN113887205 A CN 113887205A CN 202111132732 A CN202111132732 A CN 202111132732A CN 113887205 A CN113887205 A CN 113887205A
Authority
CN
China
Prior art keywords
document
strategy
text
examination
matching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111132732.2A
Other languages
Chinese (zh)
Inventor
万振华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Seczone Technology Co Ltd
Original Assignee
Seczone Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Seczone Technology Co Ltd filed Critical Seczone Technology Co Ltd
Priority to CN202111132732.2A priority Critical patent/CN113887205A/en
Publication of CN113887205A publication Critical patent/CN113887205A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/258Heading extraction; Automatic titling; Numbering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to a method, a device and a storage medium for automatically reviewing a document, belonging to the technical field of document processing, wherein the method comprises the following steps: acquiring a document title; selecting a checking strategy corresponding to the document title; checking the document based on the checking strategy to generate a checking condition; the method and the device have the effects of improving the efficiency of document examination, reducing the manpower consumed in the examination process and improving the accuracy of examination.

Description

Automatic document examination method and device and storage medium
Technical Field
The present application relates to the field of document processing technologies, and in particular, to a method and an apparatus for automatically reviewing a document, and a storage medium.
Background
The document is an important means necessary for enterprise management and work records, especially the electronic document, and with the development of electronic technology, the use of the electronic document brings great convenience to enterprises.
After a document is made, in order to ensure the accuracy of document contents, documents are generally required to be reviewed, and most of the current review methods are manual review methods to review document contents one by one.
In view of the above-mentioned related technologies, the inventor believes that the review efficiency is low and a lot of manpower is required by such a review method, and the review accuracy is low because the manual review method is easily affected by subjective judgment of the reviewer.
Disclosure of Invention
In order to improve the efficiency of document review, reduce the manpower consumed in the review process and improve the review accuracy, the application provides a document automatic review method, a document automatic review device and a storage medium.
In a first aspect, the present application provides a method for automatically reviewing a document, which adopts the following technical solutions:
an automated document review method comprising:
acquiring a document title;
selecting the inspection policy corresponding to the document title;
checking the document based on the checking strategy to generate a checking condition;
and outputting an examination result based on the examination condition.
By adopting the technical scheme, the document title is obtained after the document is stored in the system, then the document title is matched with the preset checking strategy, the document content is obtained after the matching is successful, the document content is checked according to the checking strategy, and then the result of the checking success or the checking failure is output according to the checking condition.
Optionally, the checking policy includes a policy requirement, the checking the document based on the checking policy, and generating the checking condition includes the following steps:
acquiring a document text;
acquiring key requirements based on the document text;
and matching the key requirement with the strategy requirement to generate a checking condition.
By adopting the technical scheme, the key requirements in the document text are acquired after the document text is acquired, the document title is matched with the strategy title, the key requirements are matched with the strategy requirements after the matching is successful, the matching success result is output after the matching is successful, and the matching failure result is output when the matching is unsuccessful, so that the document is censored through double guarantee of the document title and the key requirements, and the document censoring accuracy is further improved.
Optionally, the obtaining of the key requirement based on the document text includes the following steps:
acquiring a document type of a document;
selecting a corresponding analysis scheme according to the document type to analyze the document;
obtaining paragraph contents based on the analyzed document;
and acquiring the key requirement based on the paragraph content.
By adopting the technical scheme, the types of the documents are analyzed, so that the documents of different types can be conveniently identified, the content of the documents can be better identified, and the accuracy of document identification is further improved.
Optionally, the method further includes:
obtaining paragraph types based on the parsed document;
obtaining a document structure based on the paragraph type;
matching the document structure with a strategy structure to obtain a matching result;
wherein the inspection policy further comprises the policy structure.
By adopting the technical scheme, the structure of the document is matched with the strategy structure, so that the document auditing rigor can be further improved, and the document auditing accuracy is further improved.
Optionally, the key requirements include at least one of a text spread, keywords, and a design drawing.
By adopting the technical scheme, the text space, the keywords and the design drawing are used for auditing, and the auditing mode is relatively comprehensive, so that the auditing accuracy is improved.
Optionally, the outputting the review result includes the following steps:
comparing the text space with a space threshold of the examination strategy;
and when the text space is larger than the space threshold value, outputting the examination result as an excess prompt.
By adopting the technical scheme, after the text space is compared with the space threshold value, when the text space is greater than the space threshold value, the excess prompt is output so as to prompt that the document is not in line with the requirement, and the user can conveniently change the document space by prompting the excess quantity.
Optionally, the outputting the review result further includes the following steps:
judging whether keywords and/or design drawing names matched with the strategy requirements exist in the document or not based on the strategy requirements;
if so, outputting the examination result as a successful matching;
if not, the output examination result is the keyword and/or the design drawing name which are not successfully matched.
By adopting the technical scheme, the matching success is prompted when the matching is successful by matching the keyword and/or the design drawing name, the document of the user is prompted to meet the standard, and when the matching is unsuccessful, the keyword and/or the design drawing name is output, so that the unqualified part of the document to be examined is prompted, whether the document meets the judgment requirement is further judged, whether the document meets the examination standard is determined, and the targeted modification on the unqualified document is facilitated.
Optionally, the method further includes: outputting statistical data based on the comparison result, wherein the statistical data comprises the ratio of the examination result of each type to the total examination result.
By adopting the technical scheme, the unqualified places and the qualified places in the documents can be conveniently known through the output statistical data, the occupation ratio of the qualified documents and the unqualified documents to all the checked documents can be obtained, and then the subsequent distribution of the documents by a user can be conveniently managed.
In a second aspect, the present application provides an automatic document review device, which adopts the following technical solutions:
an automated document review apparatus comprising:
a memory storing an intelligent processing program;
a processor which, when running said smart handler, performs the steps of the method of any of claims 1-9.
By adopting the technical scheme, the memory can store information, and the processor can call the information and send out a control instruction, so that the ordered execution of the program is ensured, and the effect of the scheme is realized.
In a third aspect, the present application provides a computer-readable storage medium, which adopts the following technical solutions:
a computer readable storage medium storing a computer program capable of being loaded by a processor and performing any of the methods described above.
By adopting the technical scheme, after the computer-readable storage medium is loaded into any computer, any computer can execute the document automatic examination method provided by the application.
In summary, the present application includes at least one of the following beneficial technical effects:
1. the system acquires a document title and an inspection strategy, then the document title is matched with the inspection strategy, after the matching is successful, the document content is acquired and inspected according to the inspection strategy, and then the result of the inspection success or the inspection failure is output according to the inspection condition, so that the document is inspected in the mode, the document inspection efficiency is improved, the manpower consumed in the inspection process is reduced, and the inspection accuracy is improved;
2. the system acquires a document text and key requirements based on the document text, then matches the document title with the strategy title, if the document examination is unsuccessful, the document examination does not pass, if the document examination is successful, the key requirements are matched with the strategy requirements, and then an examination result is output according to the matching result, so that whether the document passes the examination is further judged, and the accuracy of the document examination is further improved;
3. the system acquires the type of the document, analyzes the document by using different tools according to the type of the document so as to acquire the type and the content of paragraphs in the document, and acquires key requirements according to the content of the paragraphs, so that the system can adapt to different document types, conveniently analyzes the documents of different types, improves the comprehensiveness of document review, and improves the accuracy of document review.
Drawings
FIG. 1 is a flowchart illustrating an embodiment of the present disclosure;
fig. 2 is a detailed flowchart of the step S4 of outputting the review result based on the checking condition in the embodiment of the present application.
Detailed Description
The present application is described in further detail below with reference to figures 1-2.
The embodiment of the application discloses an automatic document examination method.
Referring to fig. 1, the document automated review method includes:
s1: acquiring a document title;
s2: selecting a checking strategy corresponding to the document title;
s3: checking the document based on a checking policy;
s4: and outputting the examination result based on the examination condition.
Specifically, the examination policy includes a policy title, policy requirements, policy structure, and a spread threshold. The steps of the method are all completed through a computer system, in the operation process, an operator inputs the document into the system, and the computer system reads the document title of the document. And, the operator inputs the detection policy into the computer system in advance, and then the system matches the document title with the policy title in the inspection policy, and outputs the examination result based on the matching result.
Specifically, after the matching is successful, the information of successful matching is output, for example, a word of "title matching is successful" is displayed on a display screen of the computer system, so that a worker is prompted, the title of the document is the same as the title of the strategy, and the initial examination is passed. When the matching is unsuccessful, outputting information that the matching is unsuccessful, for example, displaying a word of "title matching unsuccessful", i.e., a document title error, on a display screen, thereby prompting that the document preliminary examination does not pass, and extracting the document with the wrong title to other paths of the computer system. For example, the computer system creates a folder with an 'error title', so that the document with the error title is moved to the position below the folder, and the subsequent search and modification of the document with the error title are facilitated.
When the document title is successfully matched, the system acquires the content of the document, checks the content of the document according to the checking requirement of the checking strategy, judges whether the content of the document meets the requirement of the checking strategy or not, generates a checking condition when the content of the document meets the requirement of the strategy, outputs a checking result of successful checking according to the checking condition, and outputs a checking result of unsuccessful checking when the content of the document does not meet the requirement of the strategy.
Specifically, referring to fig. 1 and 2, step S4 includes the following sub-steps:
s41: acquiring a document text;
s42: acquiring key requirements based on the text of the document;
s43: matching the key requirements with the strategy requirements;
s44: an inspection case is generated.
Specifically, the type of the document is judged first, and then the document is analyzed by using different analysis methods according to different document types to obtain the text content of the document, namely the text of the document is obtained. And then acquiring key requirements according to the document vibration text, wherein the key requirements comprise: text space, keywords, and design drawing names. It should be noted that the policy requirements include a keyword requirement and a design drawing requirement, and the system extracts a keyword and a design drawing name in the document body based on the keyword requirement and the design drawing requirement. And then the system matches the key requirements with the strategy requirements to generate the inspection condition.
Of course, it is known from the above description that the document type of the document needs to be obtained, and then the corresponding parsing scheme is selected to parse the document. Specifically, the system first identifies based on the type of document entered into the system, preferably a Word document. At present, the types of Word documents are mainly divided into two types, namely 'docx' and 'doc', a system firstly analyzes the types of the Word documents through a POI tool kit in JAVA language, when the system judges that the types of the Word documents are 'docx' types, the system analyzes an XWPFParagraph type by utilizing an XWPFDcolumn tool class, and then respectively acquires the types and the contents of paragraphs through getStyle () and getText () methods. When the system judges that the type of the Word document is the type of 'doc', the system acquires the getStyleIndex () and text () methods of the Paragraph class by using the HWPFdocument tool class, and respectively acquires the Paragraph type and the Paragraph content.
After the system acquires the contents of the paragraphs in the two ways, the text of the document can be acquired. The system identifies content retrieval key requirements for the text of the document.
After the document titles are successfully matched, the system judges whether keywords matched with the keyword requirements exist in the document text or not based on the keyword requirements, namely the system extracts the keywords in the document text according to the keyword requirements, and if the keywords matched with the keyword requirements exist, the result of successful matching is output, for example, "keyword matching is successful" is displayed on a display screen; if the keyword cannot be extracted, outputting the result of the unmatched keyword, and outputting the keyword which is successfully matched, for example, displaying the keyword which is not successfully matched on a display screen, and displaying the keyword which is not matched in the keyword requirement, so that a user is prompted to modify the corresponding keyword conveniently.
Meanwhile, the system judges whether a design drawing matched with the design drawing requirement exists in the document text or not based on the design drawing requirement, namely the system extracts the name of the design drawing in the document text according to the design drawing requirement, and if the design drawing matched with the design drawing requirement exists, the result of successful matching is output, for example, "the design drawing is successfully matched" is displayed on a display screen; if the design drawing cannot be extracted, outputting the result of the unmatched success, and outputting the name of the unmatched design drawing, for example, displaying 'the design drawing is not successfully matched' on the display screen, and displaying the name of the unmatched design drawing in the requirement of the design drawing, so that a user is prompted to modify the corresponding design drawing conveniently.
When one of the keyword and the name of the design drawing is not successfully matched, the system independently outputs one of the results which is not successfully matched; when the keyword and the design drawing are not successfully matched, the system outputs two results which are successfully matched, so that a user can be further prompted, and the accuracy of document review can be further improved and the efficiency of document review can be improved through the checking mode.
In another embodiment, the method further comprises comparing the text run to a run threshold. The system reads the text space of the document, compares the text space with the space threshold value, and outputs the excess prompt and prompts the excess quantity when the numerical value of the text space is greater than the numerical value of the space threshold value. For example, "the length excess" is displayed on the display screen, and specific values of the excess, such as positive integer values of 1, 2, 3, etc., are displayed, so that an operator is prompted about document length errors and the number of errors, the accuracy of document review can be further improved, and the subsequent change of the erroneous document is facilitated.
In another embodiment, after the system obtains the paragraph type, the document structure is obtained according to the paragraph type, that is, the overall structure of the paragraph of the document body is obtained. And the system matches the strategy structure with the document structure according to the strategy structure and outputs matching information according to the matching result.
Specifically, the policy structure includes a total fraction, a parallel type, a partial total type, etc., the system judges whether the structure of the document is the total fraction or not according to the current form of the policy structure, for example, the current total fraction, and according to the overall structure and the text content of the paragraph of the text of the document, if so, the policy structure is judged to be matched with the structure of the document, and at this time, matching success information can be output, for example, "structure matching success" is displayed on a display; if the structure of the document is not matched with the strategy structure, judging that the structure of the document is not matched with the strategy structure, outputting matching unsuccessful information, for example, displaying 'structure matching unsuccessful' on a display, and simultaneously moving the document with the unsuccessful structure matching to a separate folder by the system, thereby further improving the document examination accuracy and facilitating the search and the change of the document which does not pass through.
The above structural judgment can be implemented by the way of the reference numbers for the structure of the general formula, for example, the paragraph of "general" is denoted by "1", the sub-paragraphs are denoted by "1.1", "1.2" or "1-1", "1-2", etc., when the system recognizes these numeric formats in the document, and a "1" precedes a "1.1", "1.2" or a "1-1", "1-2", namely, the structure of the document is judged to be the total score, the structure of the document is judged to be successfully matched with the strategy structure, otherwise, the matching is not successful, and the same is true, other configurations may also be identified by reference numerals, such as only 1, 2, 3 or other numbers of the same type, then the system can be judged as a parallel type, and the system is judged as a partial general type when the 1 is behind the 1.1, 1.2 or 1-1, 1-2.
And finally, after all the documents are examined, the system outputs statistical data according to the judgment result and the comparison result. The statistical data comprises the ratio of the review results of each type to the total review results, namely the proportion of the documents which are not successfully matched to all the reviewed documents is calculated by the system and is output to the display; meanwhile, in a single document, the proportion of the document in all examination items is calculated and output respectively because the structural examination of the document does not pass, the key word matching is unsuccessful, the design drawing matching is unsuccessful, the text space matching is unsuccessful and the document title matching is unsuccessful, so that examiners can conveniently judge the statistical result, and then can conveniently examine the document maker according to the statistical result, and the management effect of the document is improved.
The implementation principle of the document automatic review method in the embodiment of the application is as follows: the system acquires a document title and an inspection strategy, then matches the document title with a strategy title in the inspection strategy, screens the documents which do not pass if the matching is unsuccessful, acquires the key requirement in the document body if the matching is successful, matches the key requirement with the strategy requirement in the inspection strategy, and further screens the documents which do not pass if the matching is unsuccessful, thereby improving the document inspection accuracy and the inspection efficiency, and reducing the labor force in the inspection process.
The embodiment of the application also discloses an automatic document examination device.
The document automated review device includes a memory and a processor. And the memory stores the intelligent processing program. And the processor executes the steps of the method when running the intelligent processing program. The intelligent processing program can adopt a known processing program to carry out a series of steps such as identification, judgment, screening and the like on the document, thereby realizing the automatic examination of the document.
The embodiment of the present application further discloses a computer-readable storage medium, which stores a computer program that can be loaded by a processor and execute the document automatic review method as described above, and the computer-readable storage medium includes, for example: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above embodiments are preferred embodiments of the present application, and the protection scope of the present application is not limited by the above embodiments, so: all equivalent changes made according to the structure, shape and principle of the present application shall be covered by the protection scope of the present application.

Claims (10)

1. An automated document review method, comprising:
acquiring a document title;
selecting the inspection policy corresponding to the document title;
checking the document based on the checking strategy to generate a checking condition;
and outputting an examination result based on the examination condition.
2. The method of claim 1, wherein the inspection policy includes policy requirements, wherein the inspection of the document based on the inspection policy, and wherein generating the inspection case comprises:
acquiring a document text;
acquiring key requirements based on the document text;
and matching the key requirement with the strategy requirement to generate a checking condition.
3. The method of claim 2, wherein the obtaining key requirements based on the document text comprises:
acquiring a document type of a document;
selecting a corresponding analysis scheme according to the document type to analyze the document;
obtaining paragraph contents based on the analyzed document;
and acquiring the key requirement based on the paragraph content.
4. The method of claim 3, further comprising:
obtaining paragraph types based on the parsed document;
obtaining a document structure based on the paragraph type;
matching the document structure with a strategy structure to obtain a matching result;
wherein the inspection policy further comprises the policy structure.
5. The method of claim 2, wherein: the key requirements include at least one of a text spread, a keyword, and a design drawing.
6. The method of claim 5, wherein outputting the review result comprises:
comparing the text space with a space threshold of the examination strategy;
and when the text space is larger than the space threshold value, outputting the examination result as an excess prompt.
7. The method of claim 5, wherein outputting the review result further comprises:
judging whether keywords and/or design drawing names matched with the strategy requirements exist in the document or not based on the strategy requirements;
if so, outputting the examination result as a successful matching;
if not, the output examination result is the keyword and/or the design drawing name which are not successfully matched.
8. The method of claim 6 or 7, further comprising: outputting statistical data based on the comparison result, wherein the statistical data comprises the ratio of the examination result of each type to the total examination result.
9. An automated document review apparatus, comprising:
a memory storing an intelligent processing program;
a processor which, when running said smart handler, performs the steps of the method of any of claims 1-8.
10. A computer-readable storage medium characterized by: a computer program which can be loaded by a processor and which performs the method according to any of claims 1-8.
CN202111132732.2A 2021-09-27 2021-09-27 Automatic document examination method and device and storage medium Pending CN113887205A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111132732.2A CN113887205A (en) 2021-09-27 2021-09-27 Automatic document examination method and device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111132732.2A CN113887205A (en) 2021-09-27 2021-09-27 Automatic document examination method and device and storage medium

Publications (1)

Publication Number Publication Date
CN113887205A true CN113887205A (en) 2022-01-04

Family

ID=79006886

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111132732.2A Pending CN113887205A (en) 2021-09-27 2021-09-27 Automatic document examination method and device and storage medium

Country Status (1)

Country Link
CN (1) CN113887205A (en)

Similar Documents

Publication Publication Date Title
US9633010B2 (en) Converting data into natural language form
US8577884B2 (en) Automated analysis and summarization of comments in survey response data
US20120158625A1 (en) Creating and Processing a Data Rule
JP4860903B2 (en) How to automatically index documents
US7647317B2 (en) Search techniques for page-based document layouts
US20160063062A1 (en) Code searching and ranking
CN111553137A (en) Report generation method and device, storage medium and computer equipment
CN111444718A (en) Insurance product demand document processing method and device and electronic equipment
US10782942B1 (en) Rapid onboarding of data from diverse data sources into standardized objects with parser and unit test generation
CN110109678B (en) Code audit rule base generation method, device, equipment and medium
CN110414806B (en) Employee risk early warning method and related device
WO2003021472A1 (en) System, method and computer program product for creating a description for a document of a remote network data source for later identification of the document and identifying the document utilizing a description
CN113626558B (en) Intelligent recommendation-based field standardization method and system
CN114676231A (en) Target information detection method, device and medium
CN114462383B (en) Method, system, storage medium and equipment for obtaining design specification of building drawing
CN113887205A (en) Automatic document examination method and device and storage medium
US20080162165A1 (en) Method and system for analyzing non-patent references in a set of patents
CN112925874B (en) Similar code searching method and system based on case marks
Chang et al. Validating halstead metrics for scratch program using process data
CN113095794A (en) Production problem checking method and device based on Markov chain
US20050235266A1 (en) System and method for business rule identification and classification
WO2021104027A1 (en) Code performance testing method, apparatus and device, and storage medium
CN113051156B (en) Software defect positioning method based on block chain traceability and information retrieval
JP2001312419A (en) Software overlap degree evaluating device and recording medium with recorded software overlap degree evaluating program
CN117609095A (en) Code large model-oriented evaluation set quality detection method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination