CN111046142A - Text examination method and device, electronic equipment and computer storage medium - Google Patents

Text examination method and device, electronic equipment and computer storage medium Download PDF

Info

Publication number
CN111046142A
CN111046142A CN201911289135.3A CN201911289135A CN111046142A CN 111046142 A CN111046142 A CN 111046142A CN 201911289135 A CN201911289135 A CN 201911289135A CN 111046142 A CN111046142 A CN 111046142A
Authority
CN
China
Prior art keywords
sentence
similarity
text
document
single sentence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911289135.3A
Other languages
Chinese (zh)
Inventor
范有文
谭江龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Qianhai Huanrong Lianyi Information Technology Service Co Ltd
Original Assignee
Shenzhen Qianhai Huanrong Lianyi Information Technology Service Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Qianhai Huanrong Lianyi Information Technology Service Co Ltd filed Critical Shenzhen Qianhai Huanrong Lianyi Information Technology Service Co Ltd
Priority to CN201911289135.3A priority Critical patent/CN111046142A/en
Publication of CN111046142A publication Critical patent/CN111046142A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3335Syntactic pre-processing, e.g. stopword elimination, stemming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/374Thesaurus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Tourism & Hospitality (AREA)
  • Economics (AREA)
  • Technology Law (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to the technical field of artificial intelligence, in particular to a text review method, a text review device, electronic equipment and a computer storage medium. The method comprises the following steps: acquiring an inspection document, and the document type and the document format of the inspection document; calling a system database, wherein key auditing terms are stored in the system database; sentence-dividing the inspection document to form a plurality of single sentences; inputting each single sentence into a system database to be matched with the single sentence to obtain the sentence similarity of each single sentence; and judging whether the similarity of each sentence is greater than a set threshold, and marking the single sentence when the similarity of each sentence is greater than the set threshold. According to the method and the device, the corresponding system database is correspondingly called according to the document type and the document format, the sentence similarity calculation is carried out on the single sentence of the text to be checked and the key checking clause in the system database, the corresponding sentence is marked according to the similarity, risk prompt information can be carried out, the text checking risk is reduced, and the text checking efficiency is improved.

Description

Text examination method and device, electronic equipment and computer storage medium
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a text review method, a text review device, electronic equipment and a computer storage medium.
Background
In traditional document examination, legal risk judgment of each term mainly depends on professionals, and the process is time-consuming and labor-consuming. The method not only brings huge workload to related legal personnel, but also can be difficult for legal workers with less experience to identify risk terms in the legal personnel, and is easy to cause inaccurate examination of the same terms, so that the examination efficiency is reduced.
Aiming at huge document review work, how to develop an intelligent platform tool to solve the problem of manually operating text inspection one by one is urgent to solve at present.
Disclosure of Invention
The embodiment of the invention provides a text examination method, a text examination device, electronic equipment and a computer storage medium, which are used for solving one or more problems that in the prior art, examination of money is easy to be inaccurate and examination efficiency is low.
A first aspect of an embodiment of the present invention provides a text review method, including: acquiring an inspection document, and the document type and the document format of the inspection document; calling a system database corresponding to the document type and the document format, wherein key auditing terms are stored in the system database; sentence-dividing processing is carried out on the examination document to form a plurality of single sentences; inputting each single sentence into a system database to be matched with the key auditing clauses to obtain the sentence similarity of each single sentence; and judging whether the similarity of each sentence is greater than a set threshold, and marking the single sentence when the similarity of each sentence is greater than the set threshold.
Optionally, inputting each single sentence into a system database to be matched with the key review term, so as to obtain the sentence similarity of each single sentence, including: performing word segmentation processing on each single sentence to form a text sequence consisting of a plurality of words; calculating to obtain the attribute similarity of each single sentence according to a text sequence formed by a plurality of participles; calculating to obtain the grammar similarity of each single sentence according to a text sequence formed by a plurality of participles; and performing weighted calculation based on the attribute similarity and the grammar similarity to obtain the sentence similarity of each single sentence.
Optionally, the calculating, according to a text sequence composed of a plurality of participles, an attribute similarity of each single sentence includes: respectively extracting the attribute of each word in the text sequence and the attribute of the word in the key review term, wherein the word attribute comprises: nouns, adjectives, place names, and person names; matching the extracted word segmentation attributes of the plurality of words with the word segmentation attributes of the key review term, and determining the successfully matched word segmentation attributes; and determining the attribute similarity of each single sentence and the key review clause based on the successfully matched word segmentation attributes.
Optionally, the calculating, according to a text sequence composed of a plurality of participles, a grammatical similarity of each single sentence includes: mapping each word segmentation in the text sequence into a vector with a fixed length; and calculating the vector similarity of the multiple word segments by adopting a cosine similarity formula to obtain the grammar similarity of each single sentence.
Optionally, the method further comprises: searching forbidden words in the text sequence according to a preset forbidden word dictionary; and deleting forbidden words in the text sequence.
A second aspect of an embodiment of the present invention provides a text review apparatus, including: the acquisition module is used for acquiring an inspection document, and the document type and the document format of the inspection document; the calling module is used for calling a system database corresponding to the document type and the document format, and key auditing terms are stored in the system database; the segmentation module is used for carrying out sentence segmentation processing on the examination document to form a plurality of single sentences; the matching module is used for inputting each single sentence into a system database to be matched with the key examination clauses to obtain the sentence similarity of each single sentence; and the judging module is used for judging whether the similarity of each sentence is greater than a set threshold value or not, and marking the single sentence when the similarity of each sentence is greater than the set threshold value.
Optionally, the matching module comprises: the segmentation unit is used for performing word segmentation processing on each single sentence to form a text sequence consisting of a plurality of words; the first calculation unit is used for calculating the attribute similarity of each single sentence according to a text sequence formed by a plurality of participles; the second calculation unit is used for calculating and obtaining the grammar similarity of each single sentence according to a text sequence formed by a plurality of participles; and the third calculating unit is used for performing weighted calculation based on the attribute similarity and the grammar similarity to obtain the sentence similarity of each single sentence.
Optionally, the apparatus further comprises: the search module is used for searching forbidden words in the text sequence according to a preset forbidden word dictionary; and the deleting module is used for deleting forbidden words in the text sequence.
A third aspect of the embodiments of the present invention provides an electronic device for text examination, including a processor, a memory, a network interface, and a system bus, where the processor, the memory, and the network interface complete communication with each other through the system bus; the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the steps of the text examination method.
A fourth aspect of the embodiments of the present invention provides a computer storage medium, where at least one executable instruction is stored in the storage medium, and the executable instruction causes a processor to execute the steps of the text review method described above.
On the basis that the Wrod office tool has the functions of comparison, search, replacement and marking, the problem that the work is operated by manpower one by one and is low in efficiency is solved, namely, the method correspondingly calls a corresponding system database through the document type and the document format, calculates the sentence similarity of a single sentence of the text to be checked and key checking terms in the system database, marks the corresponding sentence according to the similarity, can prompt risk information, reduces the text checking risk and improves the text checking efficiency.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a block diagram of an electronic device according to an embodiment of the present invention;
FIG. 2 is a flow chart of a text review method provided by an embodiment of the invention;
FIG. 3 is a flowchart of step 142 provided in FIG. 2 in accordance with an embodiment of the present invention;
fig. 4 is a functional block diagram of a text review device provided in an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
The hardware environment of the text review method is first described below. Referring to fig. 1, fig. 1 is a schematic structural diagram of an electronic device 100 according to an embodiment of the present invention. The electronic device 100 may be a computer, a cluster of computers, a main stream computer, or a computing device dedicated to providing online content.
As shown in fig. 1, the electronic device 100 includes: a processor 102, memory and network interface 105 connected by a system bus 101; the memory may include, among other things, a non-volatile storage medium 103 and an internal memory 104.
In the embodiment of the present invention, the Processor 102 may be a Central Processing Unit (CPU), and the Processor 102 may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field-Programmable Gate arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, etc. according to the type of hardware used. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The number of processors 102 may be one or more, and the one or more processors 102 may execute sequences of computer program instructions to perform various text review methods that will be described in more detail below.
The computer program instructions are stored by, accessed from, and read from the non-volatile storage medium 103 to be executed by the processor 10, thereby implementing the text review method disclosed in the following embodiments of the present invention. For example, the nonvolatile storage medium 103 stores a software application that executes a text review method described below. Further, the non-volatile storage medium 103 may store the entire software application or only a portion of the software application that may be executed by the processor 102. It should be noted that although only one block is shown in fig. 1, the non-volatile storage medium 103 may comprise a plurality of physical devices installed on a central processing device or different computing devices.
The network interface 105 is used for network communication, such as providing transmission of data information. Those skilled in the art will appreciate that the structure shown in fig. 1 is a block diagram of only a portion of the structure associated with the inventive arrangements, and does not constitute a limitation on the electronic device 100 to which the inventive arrangements are applied, and that a particular electronic device 100 may include more or less components than those shown, or combine certain components, or have a different arrangement of components.
The embodiment of the invention also provides a nonvolatile computer storage medium. The non-volatile computer storage medium stores a computer program, wherein the computer program, when executed by a processor, implements the text review method disclosed by embodiments of the invention. The computer program product is embodied on one or more computer readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer program code.
In the case where the electronic device 100 is implemented in software, fig. 2 shows a flowchart of a text review method according to an embodiment, and the method in fig. 2 is described in detail below. Referring to fig. 2, the method includes the following steps:
step 210, obtaining a review document, and identifying a document type and a document format of the review document.
In the embodiment of the present invention, the review document is a document that needs to be subjected to file review, and may be a document that can be identified in any suitable format and in any suitable type.
Based on different document functions, the document type of the document to be processed may be: contracts, agreements, executives, etc.; the format of the document may be: word, txt, picture format, etc.
In some embodiments, when the document type or document format of the review document cannot be identified, the review document may be formatted to meet the identification requirements. For the review document in the picture format, characters in the picture can be identified through character identification commonly used in the field, so that word segmentation processing is performed on the document to be processed in the following steps.
And step 220, calling a system database corresponding to the document type and the document format, wherein key auditing terms are stored in the system database.
In the embodiment of the invention, the key auditing terms are some risky terms in contracts, agreements, execution books and the like, and the risky terms need to be further audited manually in the auditing process so as to reduce the risk of the document to be audited.
The system database stores a series of key auditing terms in advance, and the key auditing terms are given by professional legal staff and stored in the system database in advance.
In the embodiment of the present invention, the system database is divided according to the document type and the document format stored in the system database, for example, the important review term with the same meaning of the text may be stored in different system databases due to the difference between the document type and the document format.
And step 230, performing sentence segmentation processing on the examination document to form a plurality of single sentences.
In the step, the text to be examined is firstly preliminarily segmented to form a plurality of single sentences, and the document to be examined is preliminarily divided into sentences by recognizing the punctuation marks, so that the similarity of each single sentence is calculated in the following steps.
And 240, inputting each single sentence into a system database to be matched with the key examination clauses, so as to obtain the sentence similarity of each single sentence.
Since each single sentence can be subjected to statement similarity calculation from multiple dimensions, in a specific embodiment of the present invention, the statement similarity calculation is performed on each single sentence from two dimensions of the attribute and the grammar structure in the single sentence.
That is, as shown in fig. 3, step 240 includes:
and 241, performing word segmentation processing on each single sentence to form a text sequence consisting of a plurality of word segments.
When each single sentence is input into the system database, the sentence similarity of each single sentence can be obtained by extracting the keywords in each single sentence and processing based on the corresponding algorithm model, and a word segmentation sourcing tool, such as ICTCLAS, SCWS, etc., can be used to perform word segmentation on each single sentence. In the embodiment of the invention, word segmentation processing can be performed on each single sentence by adopting a character string matching method so as to extract keywords. It will be understood by those skilled in the art that other suitable word segmentation methods may be used with the present invention, and the present invention is not particularly limited thereto.
In a specific application scenario, some of the participles in the text sequence composing the plurality of words in step 141 may not be keywords involved in algorithm matching, and corresponding processing is required for this type of participles, and specifically, the method further includes:
1. searching forbidden words in the text sequence according to a preset forbidden word dictionary;
2. and deleting forbidden words in the text sequence.
The step is to search and delete forbidden words in the text sequence by adopting a preset forbidden word dictionary so as to delete unnecessary words and interfere the accuracy of the algorithm.
And 242, calculating to obtain the attribute similarity of each single sentence according to the text sequence formed by the multiple participles.
The method specifically comprises the following steps:
3. extracting the word segmentation attributes in the multiple words and the key review clauses respectively, wherein the word segmentation attributes comprise: nouns, adjectives, place names, person names, and the like.
The noun, place name and person name are important keywords related to, for example, subject, object, etc., the adjective defines a specific application scenario of the clause, and this step is to extract related keywords through the word segmentation attribute. For example, a single sentence is: "do not utilize company business channel, intellectual property right etc. for this or other people to engage in the activities of profit-making". The keywords it refers to include: a channel of business, intellectual property, a company, their person, etc. and names of persons.
4. Matching the extracted word segmentation attributes of the plurality of words with the word segmentation attributes of the key review term, and determining the successfully matched word segmentation attributes.
For example, in the system database, the key auditing terms are stored as "obey the company privacy regulations, do not use the company business channel, intellectual property rights, etc. to do profit-making activities for this person or other people, and do not reveal the company trade secrets". The related keywords comprise: the matching word segmentation attributes of the nouns such as business channels, business secrets, intellectual property rights, companies and the like comprise: nouns (business channels, intellectual property rights) and names of people (companies).
5. And determining the attribute similarity of each single sentence and the key review clause based on the successfully matched word segmentation attributes.
And obtaining the attribute similarity of each single sentence and the key auditing clause according to the matching rules. Specifically, similarity matching can be performed based on a matching model commonly used in the art, and details are not repeated herein.
And 243, calculating to obtain the grammar similarity of each single sentence according to the text sequence formed by the multiple participles.
In the step, each participle in the text sequence can be mapped into a vector with fixed length, the participle in a single sentence forms a vector space, each participle is a point in the vector space, then the vector similarity of a plurality of participles is calculated by adopting a cosine similarity formula, and finally the grammar similarity of each single sentence is obtained. In one embodiment, the cosine similarity calculation formula is as follows:
Figure BDA0002316573090000071
wherein cos (θ) is the similarity; i is a positive integer of 1-n and represents the number of participles, A is the participle of a single sentence, B is the participle of a key review term, AiRepresenting a certain participle in a single sentence, BiA certain participle in the key review clause is represented.
And 244, performing weighted calculation based on the attribute similarity and the grammar similarity to obtain the sentence similarity of each single sentence.
Wherein, the sentence similarity of each single sentence is obtained by the following formula:
P1=w*p+(1-w)*d,
wherein P1 is sentence similarity, w is weight coefficient, P is attribute similarity, and d is grammar similarity.
By the above formula, after a proper w value is given, the sentence similarity of each single sentence can be obtained according to the attribute similarity and the grammar similarity. By the method, the texts to be audited with high similarity of attributes can be enabled, but the grammar similarity is low, and the auditing accuracy is improved.
And step 250, judging whether the similarity of each sentence is greater than a set threshold, and marking the single sentence when the similarity of each sentence is greater than the set threshold.
And if the similarity of each sentence is greater than the set threshold, the sentence is considered to relate to risk terms and needs to be further checked by legal professionals, so that the single sentence can be marked when the similarity of the sentences is greater than the set threshold.
Further, the marked single sentence can be all displayed in a WEB page (or other display medium, such as a word document), and finally reviewed manually, or in some embodiments, the marked single sentence can be directly programmed to perform a next well-defined operation (such as directly replacing some sentences or key words).
On the basis that the Wrod office tool has the functions of comparison, search, replacement and marking, the problem that the work is operated by manpower one by one and is low in efficiency is solved, namely, the method correspondingly calls a corresponding system database through the document type and the document format, calculates the sentence similarity of a single sentence of the text to be checked and key checking terms in the system database, marks the corresponding sentence according to the similarity, can prompt risk information, reduces the text checking risk and improves the text checking efficiency.
An embodiment of the present invention further provides a text review device corresponding to the text review method in the foregoing embodiment, specifically referring to fig. 4, fig. 4 shows a structural block diagram of the text review device, and as shown in fig. 4, the text review device 400 includes: an obtaining module 410, a calling module 420, a splitting module 430, a matching module 440, and a determining module 450.
The obtaining module 410 is used for obtaining a review document, and a document type and a document format of the review document; the calling module 420 is configured to call a system database corresponding to the document type and the document format, where key review terms are stored in the system database; the segmentation module 430 is configured to perform sentence segmentation on the review document to form a plurality of single sentences; the matching module 440 is configured to input each single sentence into a system database to be matched with the key review term, so as to obtain a sentence similarity of each single sentence; the judging module 450 is configured to judge whether the similarity of each sentence is greater than a set threshold, and mark the single sentence when the similarity of each sentence is greater than the set threshold.
In some embodiments, the matching module 440 includes: the device comprises a dividing unit, a first calculating unit, a second calculating unit and a third calculating unit.
The segmentation unit is used for performing word segmentation processing on each single sentence to form a text sequence consisting of a plurality of words; the first calculation unit is used for calculating the attribute similarity of each single sentence according to a text sequence formed by a plurality of participles; the second calculation unit is used for calculating and obtaining the grammar similarity of each single sentence according to a text sequence formed by a plurality of participles; and the third calculating unit is used for performing weighted calculation based on the attribute similarity and the grammar similarity to obtain the sentence similarity of each single sentence.
In some embodiments, the apparatus further comprises: a search module and a delete module (not shown). The search module is used for searching forbidden words in the text sequence according to a preset forbidden word dictionary; and the deleting module is used for deleting forbidden words in the text sequence.
On the basis that the Wrod office tool has the functions of comparison, search, replacement and marking, the problem that the work is operated by manpower one by one and is low in efficiency is solved, namely, the method correspondingly calls a corresponding system database through the document type and the document format, calculates the sentence similarity of a single sentence of the text to be checked and key checking terms in the system database, marks the corresponding sentence according to the similarity, can prompt risk information, reduces the text checking risk and improves the text checking efficiency.
It will be clear to those skilled in the art that the algorithms or displays provided by the present invention are not inherently related to any particular computer, virtual system, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. In addition, embodiments of the present invention are not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.
In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the embodiments of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the invention and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A text review method, the method comprising:
acquiring an inspection document, and the document type and the document format of the inspection document;
calling a system database corresponding to the document type and the document format, wherein key auditing terms are stored in the system database;
sentence-dividing processing is carried out on the examination document to form a plurality of single sentences;
inputting each single sentence into a system database to be matched with the key auditing clauses to obtain the sentence similarity of each single sentence;
and judging whether the similarity of each sentence is greater than a set threshold, and marking the single sentence when the similarity of each sentence is greater than the set threshold.
2. The text review method of claim 1, wherein the step of inputting each single sentence into a system database to match the key review term to obtain sentence similarity of each single sentence comprises:
performing word segmentation processing on each single sentence to form a text sequence consisting of a plurality of words;
calculating to obtain the attribute similarity of each single sentence according to a text sequence formed by a plurality of participles;
calculating to obtain the grammar similarity of each single sentence according to a text sequence formed by a plurality of participles;
and performing weighted calculation based on the attribute similarity and the grammar similarity to obtain the sentence similarity of each single sentence.
3. The text examination method of claim 2, wherein the calculating of the similarity of the attributes of each of the single sentences according to the text sequence composed of the plurality of participles comprises:
respectively extracting the attribute of each word in the text sequence and the attribute of the word in the key review term, wherein the word attribute comprises: nouns, adjectives, place names, and person names;
matching the extracted word segmentation attributes of the plurality of words with the word segmentation attributes of the key review term, and determining the successfully matched word segmentation attributes;
and determining the attribute similarity of each single sentence and the key review clause based on the successfully matched word segmentation attributes.
4. The text examination method of claim 2, wherein the calculating of the grammar similarity of each single sentence according to the text sequence composed of a plurality of participles comprises:
mapping each word segmentation in the text sequence into a vector with a fixed length;
and calculating the vector similarity of the multiple word segments by adopting a cosine similarity formula to obtain the grammar similarity of each single sentence.
5. The text review method of claim 2, further comprising:
searching forbidden words in the text sequence according to a preset forbidden word dictionary;
and deleting forbidden words in the text sequence.
6. A text review apparatus, comprising:
the acquisition module is used for acquiring an inspection document, and the document type and the document format of the inspection document;
the calling module is used for calling a system database corresponding to the document type and the document format, and key auditing terms are stored in the system database;
the segmentation module is used for carrying out sentence segmentation processing on the examination document to form a plurality of single sentences;
the matching module is used for inputting each single sentence into a system database to be matched with the key examination clauses to obtain the sentence similarity of each single sentence;
and the judging module is used for judging whether the similarity of each sentence is greater than a set threshold value or not, and marking the single sentence when the similarity of each sentence is greater than the set threshold value.
7. The text review device of claim 6, wherein the matching module comprises:
the segmentation unit is used for performing word segmentation processing on each single sentence to form a text sequence consisting of a plurality of words;
the first calculation unit is used for calculating the attribute similarity of each single sentence according to a text sequence formed by a plurality of participles;
the second calculation unit is used for calculating and obtaining the grammar similarity of each single sentence according to a text sequence formed by a plurality of participles;
and the third calculating unit is used for performing weighted calculation based on the attribute similarity and the grammar similarity to obtain the sentence similarity of each single sentence.
8. The text review device of claim 7, further comprising:
the search module is used for searching forbidden words in the text sequence according to a preset forbidden word dictionary;
and the deleting module is used for deleting forbidden words in the text sequence.
9. An electronic device for text examination is characterized by comprising a processor, a memory, a network interface and a system bus, wherein the processor, the memory and the network interface are communicated with each other through the system bus;
the memory is configured to store at least one executable instruction that causes the processor to perform the steps of the text censoring method of any one of claims 1-5.
10. A computer storage medium having stored therein at least one executable instruction that causes a processor to perform the steps of the text review method of any of claims 1-5.
CN201911289135.3A 2019-12-13 2019-12-13 Text examination method and device, electronic equipment and computer storage medium Pending CN111046142A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911289135.3A CN111046142A (en) 2019-12-13 2019-12-13 Text examination method and device, electronic equipment and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911289135.3A CN111046142A (en) 2019-12-13 2019-12-13 Text examination method and device, electronic equipment and computer storage medium

Publications (1)

Publication Number Publication Date
CN111046142A true CN111046142A (en) 2020-04-21

Family

ID=70236475

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911289135.3A Pending CN111046142A (en) 2019-12-13 2019-12-13 Text examination method and device, electronic equipment and computer storage medium

Country Status (1)

Country Link
CN (1) CN111046142A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111625646A (en) * 2020-05-22 2020-09-04 泰康保险集团股份有限公司 Method and device for processing insurance policy, electronic equipment and storage medium
CN111652117A (en) * 2020-05-29 2020-09-11 上海深杳智能科技有限公司 Method and medium for segmenting multi-document image
CN112100373A (en) * 2020-08-25 2020-12-18 南方电网深圳数字电网研究院有限公司 Contract text analysis method and system based on deep neural network
CN112163585A (en) * 2020-11-10 2021-01-01 平安普惠企业管理有限公司 Text auditing method and device, computer equipment and storage medium
CN112307101A (en) * 2020-10-24 2021-02-02 上海东方投资监理有限公司 Project pricing auditing method, device, computer equipment and system
CN112463931A (en) * 2020-12-11 2021-03-09 中国人寿保险股份有限公司 Intelligent analysis method for insurance product clauses and related equipment
CN112597768A (en) * 2020-12-08 2021-04-02 北京百度网讯科技有限公司 Text auditing method and device, electronic equipment, storage medium and program product
CN112685126A (en) * 2021-01-11 2021-04-20 维沃移动通信有限公司 Document content display method and device
CN113837113A (en) * 2021-09-27 2021-12-24 中国平安财产保险股份有限公司 Document verification method, device, equipment and medium based on artificial intelligence
CN116150323A (en) * 2023-04-23 2023-05-23 天津市普迅电力信息技术有限公司 Text language data processing method based on artificial intelligence
CN117236328A (en) * 2023-11-10 2023-12-15 深圳市泰铼科技有限公司 Financial text intelligent checking system based on data analysis

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110059924A (en) * 2019-03-13 2019-07-26 平安城市建设科技(深圳)有限公司 Checking method, device, equipment and the computer readable storage medium of contract terms
CN110163478A (en) * 2019-04-18 2019-08-23 平安科技(深圳)有限公司 A kind of the risk checking method and device of contract terms

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110059924A (en) * 2019-03-13 2019-07-26 平安城市建设科技(深圳)有限公司 Checking method, device, equipment and the computer readable storage medium of contract terms
CN110163478A (en) * 2019-04-18 2019-08-23 平安科技(深圳)有限公司 A kind of the risk checking method and device of contract terms

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111625646A (en) * 2020-05-22 2020-09-04 泰康保险集团股份有限公司 Method and device for processing insurance policy, electronic equipment and storage medium
CN111625646B (en) * 2020-05-22 2023-04-21 泰康保险集团股份有限公司 Method, device, electronic equipment and storage medium for processing insurance policy
CN111652117A (en) * 2020-05-29 2020-09-11 上海深杳智能科技有限公司 Method and medium for segmenting multi-document image
CN111652117B (en) * 2020-05-29 2023-07-04 上海深杳智能科技有限公司 Method and medium for segmenting multiple document images
CN112100373A (en) * 2020-08-25 2020-12-18 南方电网深圳数字电网研究院有限公司 Contract text analysis method and system based on deep neural network
CN112307101A (en) * 2020-10-24 2021-02-02 上海东方投资监理有限公司 Project pricing auditing method, device, computer equipment and system
CN112163585A (en) * 2020-11-10 2021-01-01 平安普惠企业管理有限公司 Text auditing method and device, computer equipment and storage medium
CN112163585B (en) * 2020-11-10 2023-11-10 上海七猫文化传媒有限公司 Text auditing method and device, computer equipment and storage medium
CN112597768A (en) * 2020-12-08 2021-04-02 北京百度网讯科技有限公司 Text auditing method and device, electronic equipment, storage medium and program product
CN112597768B (en) * 2020-12-08 2022-06-28 北京百度网讯科技有限公司 Text auditing method, device, electronic equipment, storage medium and program product
CN112463931B (en) * 2020-12-11 2024-05-28 中国人寿保险股份有限公司 Intelligent analysis method and related equipment for insurance product clauses
CN112463931A (en) * 2020-12-11 2021-03-09 中国人寿保险股份有限公司 Intelligent analysis method for insurance product clauses and related equipment
CN112685126A (en) * 2021-01-11 2021-04-20 维沃移动通信有限公司 Document content display method and device
CN112685126B (en) * 2021-01-11 2023-05-02 维沃移动通信有限公司 Document content display method and device
CN113837113A (en) * 2021-09-27 2021-12-24 中国平安财产保险股份有限公司 Document verification method, device, equipment and medium based on artificial intelligence
CN116150323B (en) * 2023-04-23 2023-06-23 天津市普迅电力信息技术有限公司 Text language data processing method based on artificial intelligence
CN116150323A (en) * 2023-04-23 2023-05-23 天津市普迅电力信息技术有限公司 Text language data processing method based on artificial intelligence
CN117236328A (en) * 2023-11-10 2023-12-15 深圳市泰铼科技有限公司 Financial text intelligent checking system based on data analysis
CN117236328B (en) * 2023-11-10 2024-01-30 深圳市泰铼科技有限公司 Financial text intelligent checking system based on data analysis

Similar Documents

Publication Publication Date Title
CN111046142A (en) Text examination method and device, electronic equipment and computer storage medium
CN110163478B (en) Risk examination method and device for contract clauses
CN112818093B (en) Evidence document retrieval method, system and storage medium based on semantic matching
CN112417885A (en) Answer generation method and device based on artificial intelligence, computer equipment and medium
CN111191275A (en) Sensitive data identification method, system and device
CN110532352B (en) Text duplication checking method and device, computer readable storage medium and electronic equipment
US9348901B2 (en) System and method for rule based classification of a text fragment
CN110442872B (en) Text element integrity checking method and device
CN110929520A (en) Non-named entity object extraction method and device, electronic equipment and storage medium
CN110941951A (en) Text similarity calculation method, text similarity calculation device, text similarity calculation medium and electronic equipment
CN115687621A (en) Short text label labeling method and device
US11941565B2 (en) Citation and policy based document classification
CN110347805A (en) Petroleum industry security risk key element extracting method, device, server and storage medium
CN112395866B (en) Customs clearance sheet data matching method and device
CN116402166B (en) Training method and device of prediction model, electronic equipment and storage medium
CN109766527B (en) Text similarity calculation method and related equipment
CN109871540B (en) Text similarity calculation method and related equipment
CN114842982B (en) Knowledge expression method, device and system for medical information system
CN115906817A (en) Keyword matching method and device for cross-language environment and electronic equipment
CN115481599A (en) Document processing method and device, electronic equipment and storage medium
CN111611340A (en) Information extraction method and device, computer equipment and storage medium
CN112529743B (en) Contract element extraction method, device, electronic equipment and medium
CN114580398A (en) Text information extraction model generation method, text information extraction method and device
CN114492446A (en) Legal document processing method and device, electronic equipment and storage medium
CN113836297A (en) Training method and device for text emotion analysis model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination