CN111274782A - Text auditing method and device, computer equipment and readable storage medium - Google Patents

Text auditing method and device, computer equipment and readable storage medium Download PDF

Info

Publication number
CN111274782A
CN111274782A CN202010116229.7A CN202010116229A CN111274782A CN 111274782 A CN111274782 A CN 111274782A CN 202010116229 A CN202010116229 A CN 202010116229A CN 111274782 A CN111274782 A CN 111274782A
Authority
CN
China
Prior art keywords
text
audit
auditing
audited
rule
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010116229.7A
Other languages
Chinese (zh)
Other versions
CN111274782B (en
Inventor
张晶莹
罗先贤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202010116229.7A priority Critical patent/CN111274782B/en
Publication of CN111274782A publication Critical patent/CN111274782A/en
Priority to PCT/CN2020/111641 priority patent/WO2021169208A1/en
Application granted granted Critical
Publication of CN111274782B publication Critical patent/CN111274782B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a text auditing method body, which comprises the following steps: receiving a text to be audited sent by a user terminal, and matching the text to be audited with text templates of a plurality of text types to determine the text type of the text to be audited; acquiring a classification model corresponding to the text type from a preset classification model library, dividing the text to be audited into a plurality of audit fragments by using the classification model, and adding a corresponding theme label to each audit fragment; according to the theme label of each audit fragment, acquiring the audit rule corresponding to each theme label from the rule base corresponding to the text type; judging whether risk element content exists in the corresponding audit segment or not according to the audit rule, if so, sending the risk element content to the user terminal for risk prompt; the invention can improve the accuracy and speed of text examination.

Description

Text auditing method and device, computer equipment and readable storage medium
Technical Field
The invention relates to the technical field of internet, in particular to a text auditing method and device, computer equipment and a readable storage medium.
Background
With the continuous development of internet technology, more and more information is spread through the internet; wherein, an important carrier of information transmission is text; sensitive or bad information can be contained in the text, so that in order to prevent sensitive information from being leaked and prevent the spread of the bad information, auditors need to manually audit the risk content of the text; however, because the number of characters in each text is large, the content is complicated, the expressions are various, and a large amount of labor cost is required, the auditing efficiency is low, and the auditing accuracy cannot be guaranteed. Therefore, how to improve the efficiency and accuracy of text review becomes a technical problem to be solved urgently at present.
Disclosure of Invention
The invention aims to provide a text auditing method, a text auditing device, computer equipment and a readable storage medium, which can improve the accuracy and speed of auditing texts.
According to an aspect of the present invention, a text auditing method is provided, which specifically includes the following steps:
receiving a text to be audited sent by a user terminal, and matching the text to be audited with text templates of a plurality of text types to determine the text type of the text to be audited;
acquiring a classification model corresponding to the text type from a preset classification model library, dividing the text to be audited into a plurality of audit fragments by using the classification model, and adding a corresponding theme label to each audit fragment;
according to the theme label of each audit fragment, acquiring the audit rule corresponding to each theme label from the rule base corresponding to the text type;
and judging whether the risk element content exists in the corresponding audit segment or not according to the audit rule, if so, sending the risk element content to the user terminal for risk prompt.
Optionally, before the obtaining a classification model corresponding to the text type from a preset classification model library, splitting the text to be audited into a plurality of audit fragments by using the classification model, and adding a corresponding theme tag to each audit fragment, the method further includes:
aiming at a text type, acquiring a training sample set corresponding to the text type; wherein the training sample set comprises: setting a number of historical texts, fragment information of each historical text and a theme label of each fragment;
according to the topic labels contained in each historical text in the training sample set, determining the topic labels contained in all the historical texts as the necessary topic labels of the text type;
and training and learning a preset model according to the training sample set to obtain a classification model corresponding to the text type.
Optionally, the training and learning a preset model according to the training sample set to obtain a classification model corresponding to the text type specifically includes:
aiming at one topic label in the training sample set, acquiring a segment corresponding to the topic label in each historical text; performing word segmentation processing on each acquired segment, and extracting a noun of each segment; determining a set number of significant nouns for representing the topic tag from the nouns of all the segments, and calculating a significant coefficient of each significant noun to form a significant word set corresponding to the topic tag;
and converging the significant word sets of the topic labels in the training sample set to serve as the classification models corresponding to the text types.
Optionally, the splitting the text to be audited into a plurality of audit fragments by using the classification model, and adding a corresponding theme tag to each audit fragment, specifically including:
determining each title contained in the text to be audited, and splitting the text to be audited into a plurality of audit fragments according to the determined titles; wherein, each audit fragment comprises: a title portion and a body portion;
performing word segmentation processing on each audit fragment respectively, and extracting a noun of each audit fragment;
respectively determining target significant words from each significant word set aiming at one audit fragment, wherein the target significant words are nouns which appear in the significant word set and the audit fragment at the same time; calculating the sum of the significant coefficients of each significant word set according to the significant coefficients of the target significant words in each significant word set; and adding the theme label corresponding to the significant word set with the maximum significant coefficient sum to the audit fragment.
Optionally, the obtaining, according to the theme tag of each audit fragment, the audit rule corresponding to each theme tag from the rule base corresponding to the text type includes:
judging whether all the necessary subject labels of the text type are contained in all the subject labels of the text to be audited; if so, acquiring the auditing rule corresponding to each theme label from the rule base corresponding to the text type according to the theme label of each auditing segment; and if not, sending the information containing the missing necessary theme tags to the user terminal.
Optionally, the audit rule includes: auditing elements and auditing sub-rules, wherein one auditing element corresponds to one auditing sub-rule;
the method includes the steps of judging whether risk element content exists in corresponding audit fragments according to the audit rules, if so, sending the risk element content to the user terminal to prompt risks, and specifically includes the following steps:
according to each audit element in the audit rule, element content corresponding to each audit element is extracted from the audit fragment;
aiming at the element content of one auditing element, judging whether the element content meets an auditing sub-rule corresponding to the auditing element; and if not, sending the element content serving as risk element content to the user terminal.
Optionally, after determining whether risk element content exists in the corresponding audit segment according to the audit rule, if yes, sending the risk element content to the user terminal for risk prompt, the method further includes:
receiving audit result information sent by the user terminal, and judging whether the determined risk element content is correct or not according to the audit result information; if so, adding one to the accurate value of the auditing rule corresponding to the risk element content; if not, subtracting one from the accurate value of the auditing rule corresponding to the risk element content;
and sending the audit rule with the accuracy value smaller than the preset threshold value to the user terminal so that the user terminal can modify the audit rule.
According to another aspect of the present invention, there is also provided a text auditing apparatus, specifically including the following components:
the receiving module is used for receiving a text to be audited sent by a user terminal and matching the text to be audited with text templates of a plurality of text types to determine the text type of the text to be audited;
the splitting module is used for acquiring a classification model corresponding to the text type from a preset classification model library, splitting the text to be audited into a plurality of audit fragments by using the classification model, and adding corresponding theme tags to each audit fragment;
the acquisition module is used for respectively acquiring the auditing rules corresponding to the subject labels from the rule base corresponding to the text types according to the subject label of each auditing segment;
and the judging module is used for judging whether the risk element content exists in the corresponding auditing segment or not according to the auditing rule, and if so, sending the risk element content to the user terminal so as to prompt the risk.
According to another aspect of the present invention, there is also provided a computer device, specifically including: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the text auditing method when executing the program.
According to another aspect of the present invention, there is also provided a computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the steps of the above-mentioned text auditing method.
According to the text auditing method, the text auditing device, the computer equipment and the readable storage medium, the text to be audited is divided into a plurality of audit fragments, and corresponding audit rules are set for each audit fragment; and performing text audit on the corresponding audit segment according to each audit rule, so that risk check can be performed in a targeted manner, and the accuracy of the text audit is improved. In addition, in the invention, each audit fragment in the text to be audited can be audited in parallel, thereby improving the efficiency of auditing the text.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
fig. 1 is an alternative flow diagram of a text auditing method according to an embodiment;
fig. 2 is a schematic diagram of an optional program module of the text auditing apparatus according to the second embodiment;
fig. 3 is a schematic diagram of an alternative hardware architecture of the computer device according to the third embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example one
The embodiment of the invention provides a text auditing method, which specifically comprises the following steps as shown in figure 1:
step S101: receiving a text to be audited sent by a user terminal, and matching the text to be audited with text templates of a plurality of text types to determine the text type of the text to be audited.
Preferably, the text in this embodiment may be a contract; the contract is related to the benefit of a company or a person, and in an actual business scene, in order to ensure the right and obligation of both parties of the contract, the contract content needs to be checked. Therefore, in step S101, when a contract to be audited is received, the contract structure of the contract to be audited is analyzed to determine the contract type of the contract to be audited.
In this embodiment, the contract to be audited is generated according to different types of contract templates, each type of contract template has a corresponding contract structure, and the type of the contract template used by the contract to be audited can be determined by analyzing the contract structure of the contract to be audited, so as to obtain the contract type of the contract to be audited.
Specifically, the contract types include: a purchase class, a sales class, an intent-to-collaborate class, and a privacy class.
Step S102: and acquiring a classification model corresponding to the text type from a preset classification model library, dividing the text to be audited into a plurality of audit fragments by using the classification model, and adding a corresponding theme label to each audit fragment.
Specifically, before step S102, the method further includes:
step A1: aiming at a text type, acquiring a training sample set corresponding to the text type; wherein the training sample set comprises: setting a number of historical texts, fragment information of each historical text and a theme label of each fragment;
contracts typically include multiple parts, each with a corresponding title and body; in the scene of manually checking the contracts, the auditor checks one part of the contracts when checking the contracts so as to determine whether each part meets the corresponding terms in the aspects of laws and the like, so that each historical contract in the training sample set is divided into a plurality of segments according to the titles and the texts according to the checking habits of the auditor, and corresponding subject labels are added to each divided segment according to the content of each segment.
For example, the to-be-audited contract for the procurement class is divided into the following segments: both parties rights and obligations, fees and payments, default obligations and limits of liability, third party rights guarantees, independence and divisibility, protocol changes and terminations, contract subject and product/service standards, intellectual property, contract validation and expiration, best-effort treatment.
Step A2: and determining the topic labels contained in all the historical texts as the necessary topic labels of the text types according to the topic labels contained in each historical text in the training sample set.
Step A3: and training and learning a preset model according to the training sample set to obtain a classification model corresponding to the text type.
Further, the training and learning a preset model according to the training sample set to obtain a classification model corresponding to the text type specifically includes:
step A31: aiming at one topic label in the training sample set, acquiring a segment corresponding to the topic label in each historical text;
step A32: performing word segmentation processing on each acquired segment, and extracting a noun of each segment;
step A33: determining a set number of significant nouns for representing the topic tag from the nouns of all the segments, and calculating a significant coefficient of each significant noun to form a significant word set corresponding to the topic tag;
step A34: and converging the significant word sets of the topic labels in the training sample set to serve as the classification models corresponding to the text types.
It should be noted that each significant noun in the display word set has a corresponding significant coefficient; the larger the saliency coefficient value of a salient noun is, the more representative the salient noun is of the corresponding topic label.
Preferably, in practical applications, in step a33, the nouns are sorted in descending order according to the occurrence probability of each noun in each segment, a set number of nouns arranged in front are set as significant nouns, and a corresponding significant coefficient is calculated according to the occurrence probability of each significant noun.
In addition, in practical application, the preset model can also adopt a naive Bayes classification model, and the naive Bayes classification model is trained and learned according to the training sample set to obtain a classification model corresponding to the text type.
Further, step S102 includes:
step B1: determining each title contained in the text to be audited, and splitting the text to be audited into a plurality of audit fragments according to the determined titles; wherein, each audit fragment comprises: a title portion and a body portion;
step B2: performing word segmentation processing on each audit fragment respectively, and extracting a noun of each audit fragment;
step B3: respectively determining target significant words from each significant word set aiming at one audit fragment, wherein the target significant words are nouns which appear in the significant word set and the audit fragment at the same time; calculating the sum of the significant coefficients of each significant word set according to the significant coefficients of the target significant words in each significant word set; and adding the theme label corresponding to the significant word set with the maximum significant coefficient sum to the audit fragment.
Step S103: and respectively acquiring the auditing rule corresponding to each theme label from the rule base corresponding to the text type according to the theme label of each auditing segment.
Specifically, step S103 includes:
judging whether all the necessary subject labels of the text type are contained in all the subject labels of the text to be audited; if so, acquiring the auditing rule corresponding to each theme label from the rule base corresponding to the text type according to the theme label of each auditing segment; and if not, sending the information containing the missing necessary theme tags to the user terminal.
In this embodiment, the integrity of the contract to be audited is first audited, whether the contract to be audited lacks necessary content is determined according to the type of the subject label included in the contract to be audited, and a reminding operation is performed when the contract to be audited lacks the necessary subject label.
In this embodiment, corresponding rule bases are set for different types of contracts in advance; the rule base comprises audit rules corresponding to different subject labels, namely, each audit segment in the contract to be audited has a corresponding audit rule, and risk check is performed in a targeted manner according to the audit rule of each audit segment, so that the contract audit accuracy is improved.
Specifically, the audit rule includes: auditing elements and auditing sub-rules, wherein one auditing element corresponds to one auditing sub-rule; the audit element is the minimum audit unit of the text audit, and the audit sub-rule is a judgment rule used for performing risk audit on the audit element.
For example, when the contract type is purchase type and the subject label of the audit fragment is fee and payment, the corresponding audit element of the audit rule includes: payment deadline, billing period, fee, tax; aiming at the audit factor as expense, the audit sub-rule is as follows: and judging whether the sum and the unit of the sum are included, and if not, risking.
Step S104: and judging whether the risk element content exists in the corresponding audit segment or not according to the audit rule, if so, sending the risk element content to the user terminal for risk prompt.
Specifically, step S104 includes:
step C1: according to each audit element in the audit rule, element content corresponding to each audit element is extracted from the audit fragment;
step C2: aiming at the element content of one auditing element, judging whether the element content meets an auditing sub-rule corresponding to the auditing element; and if not, sending the element content serving as risk element content to the user terminal.
Further, the determining whether the element content meets an audit sub-rule corresponding to the audit element includes:
judging whether the element content contains preset keywords or not; or,
judging whether the element content is consistent with preset content or not; or,
and judging whether the capital and the lowercase of the currency or the amount contained in the element content are consistent.
In this embodiment, the contract to be audited is divided into a plurality of audit fragments, and each audit fragment in the contract to be audited can be audited in parallel, so that the efficiency of auditing the contract is improved; in addition, the corresponding auditing rule is set for each auditing segment, so that contract auditing can be performed in a targeted manner, and the accuracy is higher.
Further, after step S104, the method further includes:
step D1: receiving audit result information sent by the user terminal, judging whether the determined risk element content is correct or not according to the audit result information, if so, adding one to the accurate value of the audit rule corresponding to the risk element content, and if not, subtracting one from the accurate value of the audit rule corresponding to the risk element content;
in this embodiment, an accurate value is set for each audit rule, and the initialized accurate values of each audit rule are consistent; when the risk element content is sent to the user terminal, the user manually corrects the risk element content according to the professional knowledge background of the user and feeds back the auditing result information; and adjusting the accurate value of each audit rule according to the audit result information.
Step D2: sending the audit rule with the accuracy value smaller than a preset threshold value to the user terminal so that the user terminal can modify the audit rule;
in this embodiment, the audit rule is continuously modified by using the audit result information, so that the audit rule is continuously improved.
Example two
An embodiment of the present invention provides a text auditing apparatus, which specifically includes, as shown in fig. 2:
the receiving module 201 is configured to receive a text to be checked sent by a user terminal, and perform text structure matching on the text to be checked and text templates of multiple text types to determine a text type of the text to be checked;
the splitting module 202 is configured to obtain a classification model corresponding to the text type from a preset classification model library, split the text to be checked into multiple checking segments by using the classification model, and add a corresponding theme tag to each checking segment;
the obtaining module 203 is configured to obtain, according to the theme tag of each audit fragment, an audit rule corresponding to each theme tag from the rule base corresponding to the text type;
the judging module 204 is configured to judge whether risk element content exists in the corresponding audit segment according to the audit rule, and if yes, send the risk element content to the user terminal for risk prompt.
Specifically, the apparatus further comprises:
the training module is used for acquiring a classification model corresponding to the text type from a preset classification model library, splitting the text to be audited into a plurality of audit fragments by using the classification model, and acquiring a training sample set corresponding to the text type for one text type before adding a corresponding theme label to each audit fragment; wherein the training sample set comprises: setting a number of historical texts, fragment information of each historical text and a theme label of each fragment; according to the topic labels contained in each historical text in the training sample set, determining the topic labels contained in all the historical texts as the necessary topic labels of the text type; and training and learning a preset model according to the training sample set to obtain a classification model corresponding to the text type.
Further, the training module specifically includes, when implementing the function of training and learning a preset model according to the training sample set to obtain a classification model corresponding to the text type:
aiming at one topic label in the training sample set, acquiring a segment corresponding to the topic label in each historical text; performing word segmentation processing on each acquired segment, and extracting a noun of each segment; determining a set number of significant nouns for representing the topic tag from the nouns of all the segments, and calculating a significant coefficient of each significant noun to form a significant word set corresponding to the topic tag; and converging the significant word sets of the topic labels in the training sample set to serve as the classification models corresponding to the text types.
In addition, the splitting module 202 is specifically configured to:
determining each title contained in the text to be audited, and splitting the text to be audited into a plurality of audit fragments according to the determined titles; wherein, each audit fragment comprises: a title portion and a body portion; performing word segmentation processing on each audit fragment respectively, and extracting a noun of each audit fragment; respectively determining target significant words from each significant word set aiming at one audit fragment, wherein the target significant words are nouns which appear in the significant word set and the audit fragment at the same time; calculating the sum of the significant coefficients of each significant word set according to the significant coefficients of the target significant words in each significant word set; and adding the theme label corresponding to the significant word set with the maximum significant coefficient sum to the audit fragment.
The obtaining module 203 is specifically configured to:
judging whether all the necessary subject labels of the text type are contained in all the subject labels of the text to be audited; if so, acquiring the auditing rule corresponding to each theme label from the rule base corresponding to the text type according to the theme label of each auditing segment; and if not, sending the information containing the missing necessary theme tags to the user terminal.
Further, the audit rule includes: auditing elements and auditing sub-rules, wherein one auditing element corresponds to one auditing sub-rule;
in addition, the determining module 204 is specifically configured to:
according to each audit element in the audit rule, element content corresponding to each audit element is extracted from the audit fragment; aiming at the element content of one auditing element, judging whether the element content meets an auditing sub-rule corresponding to the auditing element; and if not, sending the element content serving as risk element content to the user terminal.
Still further, the apparatus further comprises:
the correction module is used for judging whether risk element content exists in a corresponding audit segment or not according to the audit rule, if so, sending the risk element content to the user terminal so as to receive audit result information sent by the user terminal after risk prompt is carried out, and judging whether the determined risk element content is correct or not according to the audit result information; if so, adding one to the accurate value of the auditing rule corresponding to the risk element content; if not, subtracting one from the accurate value of the auditing rule corresponding to the risk element content; and sending the audit rule with the accuracy value smaller than the preset threshold value to the user terminal so that the user terminal can modify the audit rule.
EXAMPLE III
The embodiment also provides a computer device, such as a smart phone, a tablet computer, a notebook computer, a desktop computer, a rack server, a blade server, a tower server or a rack server (including an independent server or a server cluster composed of a plurality of servers) capable of executing programs, and the like. As shown in fig. 3, the computer device 30 of the present embodiment includes at least but is not limited to: a memory 301, a processor 302 communicatively coupled to each other via a system bus. It is noted that FIG. 3 only shows the computer device 30 having components 301 and 302, but it is understood that not all of the shown components are required and that more or fewer components may be implemented instead.
In this embodiment, the memory 301 (i.e., the readable storage medium) includes a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, and the like. In some embodiments, the storage 301 may be an internal storage unit of the computer device 30, such as a hard disk or a memory of the computer device 30. In other embodiments, the memory 301 may also be an external storage device of the computer device 30, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like, provided on the computer device 30. Of course, the memory 301 may also include both internal and external storage devices for the computer device 30. In this embodiment, the memory 301 is generally used for storing an operating system installed in the computer device 30 and various application software, such as program codes of the text auditing apparatus according to the second embodiment. In addition, the memory 301 may also be used to temporarily store various types of data that have been output or are to be output.
Processor 302 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 302 generally serves to control the overall operation of the computer device 30.
Specifically, in this embodiment, the processor 302 is configured to execute a program of a text auditing method stored in the processor 302, and when executed, the program of the text auditing method implements the following steps:
receiving a text to be audited sent by a user terminal, and matching the text to be audited with text templates of a plurality of text types to determine the text type of the text to be audited;
acquiring a classification model corresponding to the text type from a preset classification model library, dividing the text to be audited into a plurality of audit fragments by using the classification model, and adding a corresponding theme label to each audit fragment;
according to the theme label of each audit fragment, acquiring the audit rule corresponding to each theme label from the rule base corresponding to the text type;
and judging whether the risk element content exists in the corresponding audit segment or not according to the audit rule, if so, sending the risk element content to the user terminal for risk prompt.
The specific embodiment process of the above method steps can be referred to in the first embodiment, and the detailed description of this embodiment is not repeated here.
Example four
The present embodiments also provide a computer readable storage medium, such as a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, a server, an App application mall, etc., having stored thereon a computer program that when executed by a processor implements the method steps of:
receiving a text to be audited sent by a user terminal, and matching the text to be audited with text templates of a plurality of text types to determine the text type of the text to be audited;
acquiring a classification model corresponding to the text type from a preset classification model library, dividing the text to be audited into a plurality of audit fragments by using the classification model, and adding a corresponding theme label to each audit fragment;
according to the theme label of each audit fragment, acquiring the audit rule corresponding to each theme label from the rule base corresponding to the text type;
and judging whether the risk element content exists in the corresponding audit segment or not according to the audit rule, if so, sending the risk element content to the user terminal for risk prompt.
The specific embodiment process of the above method steps can be referred to in the first embodiment, and the detailed description of this embodiment is not repeated here.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A text auditing method, characterized in that the method comprises:
receiving a text to be audited sent by a user terminal, and matching the text to be audited with text templates of a plurality of text types to determine the text type of the text to be audited;
acquiring a classification model corresponding to the text type from a preset classification model library, dividing the text to be audited into a plurality of audit fragments by using the classification model, and adding a corresponding theme label to each audit fragment;
according to the theme label of each audit fragment, acquiring the audit rule corresponding to each theme label from the rule base corresponding to the text type;
and judging whether the risk element content exists in the corresponding audit segment or not according to the audit rule, if so, sending the risk element content to the user terminal for risk prompt.
2. The text auditing method of claim 1, before the obtaining a classification model corresponding to the text type from a preset classification model library, splitting the text to be audited into multiple audit fragments using the classification model, and adding a corresponding topic tag for each audit fragment, the method further comprising:
aiming at a text type, acquiring a training sample set corresponding to the text type; wherein the training sample set comprises: setting a number of historical texts, fragment information of each historical text and a theme label of each fragment;
according to the topic labels contained in each historical text in the training sample set, determining the topic labels contained in all the historical texts as the necessary topic labels of the text type;
and training and learning a preset model according to the training sample set to obtain a classification model corresponding to the text type.
3. The text auditing method according to claim 2, wherein the training and learning of a preset model according to the training sample set to obtain a classification model corresponding to the text type specifically comprises:
aiming at one topic label in the training sample set, acquiring a segment corresponding to the topic label in each historical text; performing word segmentation processing on each acquired segment, and extracting a noun of each segment; determining a set number of significant nouns for representing the topic tag from the nouns of all the segments, and calculating a significant coefficient of each significant noun to form a significant word set corresponding to the topic tag;
and converging the significant word sets of the topic labels in the training sample set to serve as the classification models corresponding to the text types.
4. The text auditing method according to claim 3, wherein the splitting of the text to be audited into multiple audit fragments using the classification model and the addition of a corresponding topic tag for each audit fragment specifically comprises:
determining each title contained in the text to be audited, and splitting the text to be audited into a plurality of audit fragments according to the determined titles; wherein, each audit fragment comprises: a title portion and a body portion;
performing word segmentation processing on each audit fragment respectively, and extracting a noun of each audit fragment;
respectively determining target significant words from each significant word set aiming at one audit fragment, wherein the target significant words are nouns which appear in the significant word set and the audit fragment at the same time; calculating the sum of the significant coefficients of each significant word set according to the significant coefficients of the target significant words in each significant word set; and adding the theme label corresponding to the significant word set with the maximum significant coefficient sum to the audit fragment.
5. The text review method according to claim 2, wherein the obtaining, according to the theme tag of each review piece, the review rule corresponding to each theme tag from the rule base corresponding to the text type includes:
judging whether all the necessary subject labels of the text type are contained in all the subject labels of the text to be audited; if so, acquiring the auditing rule corresponding to each theme label from the rule base corresponding to the text type according to the theme label of each auditing segment; and if not, sending the information containing the missing necessary theme tags to the user terminal.
6. A text auditing method according to claim 1, where the auditing rules include: auditing elements and auditing sub-rules, wherein one auditing element corresponds to one auditing sub-rule;
the method includes the steps of judging whether risk element content exists in corresponding audit fragments according to the audit rules, if so, sending the risk element content to the user terminal to prompt risks, and specifically includes the following steps:
according to each audit element in the audit rule, element content corresponding to each audit element is extracted from the audit fragment;
aiming at the element content of one auditing element, judging whether the element content meets an auditing sub-rule corresponding to the auditing element; and if not, sending the element content serving as risk element content to the user terminal.
7. The text auditing method according to claim 1, where after said determining, according to the auditing rules, whether there is risk element content in the corresponding audit segment, and if so, sending the risk element content to the user terminal for risk prompt, the method further comprises:
receiving audit result information sent by the user terminal, and judging whether the determined risk element content is correct or not according to the audit result information; if so, adding one to the accurate value of the auditing rule corresponding to the risk element content; if not, subtracting one from the accurate value of the auditing rule corresponding to the risk element content;
and sending the audit rule with the accuracy value smaller than the preset threshold value to the user terminal so that the user terminal can modify the audit rule.
8. A text auditing apparatus, characterized in that the apparatus comprises:
the receiving module is used for receiving a text to be audited sent by a user terminal and matching the text to be audited with text templates of a plurality of text types to determine the text type of the text to be audited;
the splitting module is used for acquiring a classification model corresponding to the text type from a preset classification model library, splitting the text to be audited into a plurality of audit fragments by using the classification model, and adding corresponding theme tags to each audit fragment;
the acquisition module is used for respectively acquiring the auditing rules corresponding to the subject labels from the rule base corresponding to the text types according to the subject label of each auditing segment;
and the judging module is used for judging whether the risk element content exists in the corresponding auditing segment or not according to the auditing rule, and if so, sending the risk element content to the user terminal so as to prompt the risk.
9. A computer device, the computer device comprising: memory, processor and computer program stored on the memory and executable on the processor, characterized in that the steps of the method according to any of claims 1 to 7 are implemented when the processor executes the program.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
CN202010116229.7A 2020-02-25 2020-02-25 Text auditing method and device, computer equipment and readable storage medium Active CN111274782B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010116229.7A CN111274782B (en) 2020-02-25 2020-02-25 Text auditing method and device, computer equipment and readable storage medium
PCT/CN2020/111641 WO2021169208A1 (en) 2020-02-25 2020-08-27 Text review method and apparatus, and computer device, and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010116229.7A CN111274782B (en) 2020-02-25 2020-02-25 Text auditing method and device, computer equipment and readable storage medium

Publications (2)

Publication Number Publication Date
CN111274782A true CN111274782A (en) 2020-06-12
CN111274782B CN111274782B (en) 2023-10-20

Family

ID=71000343

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010116229.7A Active CN111274782B (en) 2020-02-25 2020-02-25 Text auditing method and device, computer equipment and readable storage medium

Country Status (2)

Country Link
CN (1) CN111274782B (en)
WO (1) WO2021169208A1 (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112163585A (en) * 2020-11-10 2021-01-01 平安普惠企业管理有限公司 Text auditing method and device, computer equipment and storage medium
CN112182502A (en) * 2020-09-07 2021-01-05 支付宝(杭州)信息技术有限公司 Compliance auditing method, device and equipment
CN112579771A (en) * 2020-12-08 2021-03-30 腾讯科技(深圳)有限公司 Content title detection method and device
CN112597851A (en) * 2020-12-15 2021-04-02 泰康保险集团股份有限公司 Signature acquisition method and device, electronic equipment and storage medium
CN112613285A (en) * 2020-12-21 2021-04-06 红塔烟草(集团)有限责任公司 Bidding document examination and analysis method based on regional processing
CN112950017A (en) * 2021-02-26 2021-06-11 云账户技术(天津)有限公司 Contract risk identification method and device and electronic equipment
WO2021169208A1 (en) * 2020-02-25 2021-09-02 平安科技(深圳)有限公司 Text review method and apparatus, and computer device, and readable storage medium
CN113469732A (en) * 2021-06-11 2021-10-01 北京百度网讯科技有限公司 Content understanding-based auditing method and device and electronic equipment
CN114219443A (en) * 2021-12-16 2022-03-22 中国建设银行股份有限公司 Document data processing method, device and equipment
CN114489432A (en) * 2021-12-27 2022-05-13 掌阅科技股份有限公司 Electronic book auditing method, electronic equipment and storage medium
CN114549177A (en) * 2022-02-22 2022-05-27 招商银行股份有限公司 Insurance letter examination method, device, system and computer readable storage medium
CN114661901A (en) * 2022-03-03 2022-06-24 支付宝(杭州)信息技术有限公司 Virtual resource auditing method, device and equipment
CN114691865A (en) * 2022-03-03 2022-07-01 支付宝(杭州)信息技术有限公司 Fund product auditing method, device and equipment
CN115098629A (en) * 2022-06-22 2022-09-23 马上消费金融股份有限公司 File processing method and device, server and readable storage medium
CN115358751A (en) * 2022-08-22 2022-11-18 中电金信软件有限公司 Automatic auditing method and device for transaction document and electronic equipment
CN115858778A (en) * 2022-11-22 2023-03-28 北京中关村科金技术有限公司 Contract auditing method, device, equipment, storage medium and product
CN117132244A (en) * 2023-10-26 2023-11-28 国网浙江省电力有限公司 Classification processing method, device and storage medium for intelligent compliance management system
CN117151096A (en) * 2023-09-05 2023-12-01 江苏群杰物联科技有限公司 Intelligent contract checking method and device, electronic equipment and storage medium

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113947385A (en) * 2021-10-29 2022-01-18 中国建设银行股份有限公司 Authority auditing method, device, equipment and storage medium
CN114285616B (en) * 2021-12-16 2024-07-12 上海商汤科技开发有限公司 Data transmission method and device, electronic equipment and storage medium
CN114302171B (en) * 2021-12-28 2024-04-09 新瑞鹏宠物医疗集团有限公司 Video auditing method, device and storage medium
CN114372775A (en) * 2021-12-30 2022-04-19 中国民航科学技术研究院 Civil aviation safety auditing method and device and electronic equipment
CN114049215A (en) * 2022-01-06 2022-02-15 杭州衡泰技术股份有限公司 Abnormal transaction identification method, device and application
CN114399199A (en) * 2022-01-14 2022-04-26 中国建设银行股份有限公司 Evaluation data processing method and device, electronic equipment and storage medium
CN114095282B (en) * 2022-01-21 2022-04-15 杭银消费金融股份有限公司 Wind control processing method and device based on short text feature extraction
CN114493530A (en) * 2022-01-27 2022-05-13 北京新氧万维科技咨询有限公司 Content auditing processing method, device, equipment and system
CN114219501B (en) * 2022-02-22 2022-06-28 杭州衡泰技术股份有限公司 Sample labeling resource allocation method, device and application
CN114638543A (en) * 2022-04-12 2022-06-17 中国工商银行股份有限公司 Document auditing method and device, computer equipment and storage medium
CN115130139B (en) * 2022-08-31 2022-12-02 杭州链城数字科技有限公司 Digital asset review method, apparatus, system and storage medium
CN116663525B (en) * 2023-07-21 2023-12-01 科大讯飞股份有限公司 Document auditing method, device, equipment and storage medium
CN116664080B (en) * 2023-07-25 2023-10-10 山东唐和智能科技有限公司 Micro-suggestion information processing system and method
CN116934278A (en) * 2023-09-19 2023-10-24 中铁建设集团有限公司 Method and device for auditing construction scheme
CN117275030B (en) * 2023-09-27 2024-05-14 自然资源部地图技术审查中心 Method and device for auditing map
CN117172249B (en) * 2023-11-03 2024-01-26 青矩技术股份有限公司 Contract checking method, device, equipment and computer readable storage medium
CN117992505A (en) * 2024-02-18 2024-05-07 广东铭太信息科技有限公司 Implementation method for sending audit analysis model result to suspicious point library

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109344382A (en) * 2018-10-23 2019-02-15 出门问问信息科技有限公司 Method, apparatus, electronic equipment and the computer readable storage medium of audit contract
WO2019184217A1 (en) * 2018-03-26 2019-10-03 平安科技(深圳)有限公司 Hotspot event classification method and apparatus, and storage medium
WO2019196224A1 (en) * 2018-04-09 2019-10-17 平安科技(深圳)有限公司 Regulation information processing method and apparatus, computer device and storage medium
CN110442842A (en) * 2019-06-20 2019-11-12 平安科技(深圳)有限公司 The extracting method and device of treaty content, computer equipment, storage medium
CN110705952A (en) * 2019-08-15 2020-01-17 平安信托有限责任公司 Contract auditing method and device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10521464B2 (en) * 2015-12-10 2019-12-31 Agile Data Decisions, Llc Method and system for extracting, verifying and cataloging technical information from unstructured documents
CN110362822B (en) * 2019-06-18 2024-07-02 中国平安财产保险股份有限公司 Text labeling method, device, computer equipment and storage medium for model training
CN111274782B (en) * 2020-02-25 2023-10-20 平安科技(深圳)有限公司 Text auditing method and device, computer equipment and readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019184217A1 (en) * 2018-03-26 2019-10-03 平安科技(深圳)有限公司 Hotspot event classification method and apparatus, and storage medium
WO2019196224A1 (en) * 2018-04-09 2019-10-17 平安科技(深圳)有限公司 Regulation information processing method and apparatus, computer device and storage medium
CN109344382A (en) * 2018-10-23 2019-02-15 出门问问信息科技有限公司 Method, apparatus, electronic equipment and the computer readable storage medium of audit contract
CN110442842A (en) * 2019-06-20 2019-11-12 平安科技(深圳)有限公司 The extracting method and device of treaty content, computer equipment, storage medium
CN110705952A (en) * 2019-08-15 2020-01-17 平安信托有限责任公司 Contract auditing method and device

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021169208A1 (en) * 2020-02-25 2021-09-02 平安科技(深圳)有限公司 Text review method and apparatus, and computer device, and readable storage medium
CN112182502A (en) * 2020-09-07 2021-01-05 支付宝(杭州)信息技术有限公司 Compliance auditing method, device and equipment
CN112163585B (en) * 2020-11-10 2023-11-10 上海七猫文化传媒有限公司 Text auditing method and device, computer equipment and storage medium
CN112163585A (en) * 2020-11-10 2021-01-01 平安普惠企业管理有限公司 Text auditing method and device, computer equipment and storage medium
CN112579771A (en) * 2020-12-08 2021-03-30 腾讯科技(深圳)有限公司 Content title detection method and device
CN112579771B (en) * 2020-12-08 2024-05-07 腾讯科技(深圳)有限公司 Content title detection method and device
CN112597851A (en) * 2020-12-15 2021-04-02 泰康保险集团股份有限公司 Signature acquisition method and device, electronic equipment and storage medium
CN112613285A (en) * 2020-12-21 2021-04-06 红塔烟草(集团)有限责任公司 Bidding document examination and analysis method based on regional processing
CN112950017A (en) * 2021-02-26 2021-06-11 云账户技术(天津)有限公司 Contract risk identification method and device and electronic equipment
CN113469732A (en) * 2021-06-11 2021-10-01 北京百度网讯科技有限公司 Content understanding-based auditing method and device and electronic equipment
CN114219443A (en) * 2021-12-16 2022-03-22 中国建设银行股份有限公司 Document data processing method, device and equipment
CN114489432A (en) * 2021-12-27 2022-05-13 掌阅科技股份有限公司 Electronic book auditing method, electronic equipment and storage medium
CN114489432B (en) * 2021-12-27 2024-04-09 掌阅科技股份有限公司 Electronic book auditing method, electronic equipment and storage medium
CN114549177A (en) * 2022-02-22 2022-05-27 招商银行股份有限公司 Insurance letter examination method, device, system and computer readable storage medium
CN114691865A (en) * 2022-03-03 2022-07-01 支付宝(杭州)信息技术有限公司 Fund product auditing method, device and equipment
CN114661901A (en) * 2022-03-03 2022-06-24 支付宝(杭州)信息技术有限公司 Virtual resource auditing method, device and equipment
CN114691865B (en) * 2022-03-03 2024-09-20 支付宝(杭州)信息技术有限公司 Method, device and equipment for auditing foundation products
CN114661901B (en) * 2022-03-03 2024-09-27 支付宝(杭州)信息技术有限公司 Method, device and equipment for auditing virtual resources
CN115098629A (en) * 2022-06-22 2022-09-23 马上消费金融股份有限公司 File processing method and device, server and readable storage medium
CN115358751B (en) * 2022-08-22 2023-04-28 中电金信软件有限公司 Automatic auditing method and device for transaction receipt and electronic equipment
CN115358751A (en) * 2022-08-22 2022-11-18 中电金信软件有限公司 Automatic auditing method and device for transaction document and electronic equipment
CN115858778A (en) * 2022-11-22 2023-03-28 北京中关村科金技术有限公司 Contract auditing method, device, equipment, storage medium and product
CN117151096A (en) * 2023-09-05 2023-12-01 江苏群杰物联科技有限公司 Intelligent contract checking method and device, electronic equipment and storage medium
CN117151096B (en) * 2023-09-05 2024-05-10 江苏群杰物联科技有限公司 Intelligent contract checking method and device, electronic equipment and storage medium
CN117132244A (en) * 2023-10-26 2023-11-28 国网浙江省电力有限公司 Classification processing method, device and storage medium for intelligent compliance management system
CN117132244B (en) * 2023-10-26 2024-01-09 国网浙江省电力有限公司 Classification processing method, device and storage medium for intelligent compliance management system

Also Published As

Publication number Publication date
WO2021169208A1 (en) 2021-09-02
CN111274782B (en) 2023-10-20

Similar Documents

Publication Publication Date Title
CN111274782B (en) Text auditing method and device, computer equipment and readable storage medium
WO2022048363A1 (en) Website classification method and apparatus, computer device, and storage medium
CN108664471B (en) Character recognition error correction method, device, equipment and computer readable storage medium
CN106095972B (en) Information classification method and device
CN111597309A (en) Similar enterprise recommendation method and device, electronic equipment and medium
CN111475700A (en) Data extraction method and related equipment
CN112115107A (en) Contract text automatic generation method and device
CN111260189A (en) Risk control method, risk control device, computer system and readable storage medium
CN111259207A (en) Short message identification method, device and equipment
CN114743012B (en) Text recognition method and device
CN111723210A (en) Method and device for storing data table, computer equipment and readable storage medium
CN111858686A (en) Data display method and device, terminal equipment and storage medium
CN111625567A (en) Data model matching method, device, computer system and readable storage medium
CN113887202A (en) Text error correction method and device, computer equipment and storage medium
CN109214640A (en) Determination method, equipment and the computer readable storage medium of index result
CN115687790B (en) Advertisement pushing method and system based on big data and cloud platform
CN109324963B (en) Method for automatically testing profit result and terminal equipment
US20230283580A1 (en) Story message generation
CN113282837B (en) Event analysis method, device, computer equipment and storage medium
CN113743982A (en) Advertisement putting scheme recommendation method and device, computer equipment and storage medium
CN114169306A (en) Method, device and equipment for generating electronic receipt and readable storage medium
CN115294586A (en) Invoice identification method and device, storage medium and electronic equipment
CN113901817A (en) Document classification method and device, computer equipment and storage medium
CN114154480A (en) Information extraction method, device, equipment and storage medium
CN115577093A (en) AI analysis method and system of financial information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40030949

Country of ref document: HK

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant