CN111274782B - Text auditing method and device, computer equipment and readable storage medium - Google Patents

Text auditing method and device, computer equipment and readable storage medium Download PDF

Info

Publication number
CN111274782B
CN111274782B CN202010116229.7A CN202010116229A CN111274782B CN 111274782 B CN111274782 B CN 111274782B CN 202010116229 A CN202010116229 A CN 202010116229A CN 111274782 B CN111274782 B CN 111274782B
Authority
CN
China
Prior art keywords
text
audit
auditing
salient
checked
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010116229.7A
Other languages
Chinese (zh)
Other versions
CN111274782A (en
Inventor
张晶莹
罗先贤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202010116229.7A priority Critical patent/CN111274782B/en
Publication of CN111274782A publication Critical patent/CN111274782A/en
Priority to PCT/CN2020/111641 priority patent/WO2021169208A1/en
Application granted granted Critical
Publication of CN111274782B publication Critical patent/CN111274782B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a text auditing method body, which comprises the following steps: receiving a text to be checked sent by a user terminal, and matching the text to be checked with text templates of a plurality of text types in a text structure to determine the text type of the text to be checked; obtaining a classification model corresponding to the text type from a preset classification model library, splitting the text to be checked into a plurality of checking fragments by using the classification model, and adding a corresponding theme label for each checking fragment; according to the topic label of each audit fragment, respectively acquiring audit rules corresponding to each topic label from a rule base corresponding to the text type; judging whether risk element contents exist in the corresponding audit fragments according to the audit rules, if so, sending the risk element contents to the user terminal to carry out risk prompt; the invention can improve the accuracy and speed of the audit text.

Description

Text auditing method and device, computer equipment and readable storage medium
Technical Field
The invention relates to the technical field of internet, in particular to a text auditing method, a text auditing device, computer equipment and a readable storage medium.
Background
With the continuous development of internet technology, more and more information is transmitted through the internet; one important carrier for information propagation is text; because sensitive or bad information is contained in the text, in order to prevent sensitive information from being leaked and bad information from being spread, auditing personnel are required to audit the risk content of the text manually; however, since the number of characters in each text is large, the content is complex, the expression is various, more labor cost is required, the auditing efficiency is low, and the auditing accuracy is not guaranteed. Therefore, how to improve the efficiency and accuracy of the audit text becomes a technical problem to be solved at present.
Disclosure of Invention
The invention aims to provide a text auditing method, a text auditing device, computer equipment and a readable storage medium, which can improve the accuracy and speed of auditing texts.
According to one aspect of the invention, there is provided a text auditing method, comprising the steps of:
receiving a text to be checked sent by a user terminal, and matching the text to be checked with text templates of a plurality of text types in a text structure to determine the text type of the text to be checked;
obtaining a classification model corresponding to the text type from a preset classification model library, splitting the text to be checked into a plurality of checking fragments by using the classification model, and adding a corresponding theme label for each checking fragment;
according to the topic label of each audit fragment, respectively acquiring audit rules corresponding to each topic label from a rule base corresponding to the text type;
judging whether the risk element content exists in the corresponding audit fragment according to the audit rule, if so, sending the risk element content to the user terminal to carry out risk prompt.
Optionally, before the obtaining a classification model corresponding to the text type from a preset classification model library, splitting the text to be checked into a plurality of checking segments by using the classification model, and adding a corresponding topic label to each checking segment, the method further includes:
aiming at a text type, acquiring a training sample set corresponding to the text type; wherein the training sample set comprises: setting a number of historical texts, fragment information of each historical text and a theme label of each fragment;
determining the topic labels contained in all the historical texts as necessary topic labels of the text types according to topic labels contained in each historical text in the training sample set;
training and learning the preset model according to the training sample set to obtain a classification model corresponding to the text type.
Optionally, training and learning the preset model according to the training sample set to obtain a classification model corresponding to the text type, which specifically includes:
aiming at one topic label in the training sample set, obtaining fragments corresponding to the topic labels in each historical text; performing word segmentation on each acquired segment, and extracting nouns of each segment; determining a set number of salient nouns representing the topic label from nouns of all fragments, and calculating a salient coefficient of each salient noun to form a salient word set corresponding to the topic label;
and converging the salient word sets of the theme labels in the training sample set to be used as a classification model corresponding to the text type.
Optionally, the splitting the text to be inspected into a plurality of inspection fragments by using the classification model, and adding a corresponding topic label to each inspection fragment specifically includes:
determining each title contained in the text to be checked, and splitting the text to be checked into a plurality of checking fragments according to each determined title; wherein each audit segment includes: a title portion and a body portion;
word segmentation processing is carried out on each audit segment, and nouns of each audit segment are extracted;
aiming at an audit segment, determining a target salient word from each salient word set, wherein the target salient word is a noun which simultaneously appears in the salient word set and the audit segment; calculating the sum of the significant coefficients of each significant word set according to the significant coefficients of the target significant words in each significant word set; and adding the topic label corresponding to the salient word set with the largest sum of the salient coefficients to the auditing segment.
Optionally, according to the topic label of each audit fragment, audit rules corresponding to each topic label are respectively obtained from a rule base corresponding to the text type, and specifically include:
judging whether all the necessary topic labels of the text type are contained in all topic labels of the text to be checked; if yes, respectively acquiring auditing rules corresponding to each theme label from a rule base corresponding to the text type according to the theme label of each auditing fragment; if not, the information containing the missing necessary theme label is sent to the user terminal.
Optionally, the auditing rule includes: an audit element and audit sub-rules, and one audit element corresponds to one audit sub-rule;
judging whether risk element contents exist in the corresponding audit fragments according to the audit rules, if so, sending the risk element contents to the user terminal to carry out risk prompt, wherein the method specifically comprises the following steps of:
extracting element content corresponding to each audit element from the audit fragment according to each audit element in the audit rule;
judging whether element content meets an audit rule corresponding to an audit element aiming at element content of the audit element; and if not, sending the element content to the user terminal as risk element content.
Optionally, after judging whether the risk element content exists in the corresponding audit segment according to the audit rule, if so, sending the risk element content to the user terminal to perform risk prompt, the method further includes:
receiving auditing result information sent by the user terminal, and judging whether the determined risk element content is correct or not according to the auditing result information; if yes, adding one to the accurate value of the auditing rule corresponding to the risk element content; if not, subtracting one from the accurate value of the auditing rule corresponding to the risk element content;
and sending the auditing rule with the accurate value smaller than the preset threshold value to the user terminal so as to enable the user terminal to modify the auditing rule.
According to another aspect of the present invention, there is also provided a text auditing apparatus, specifically including the following components:
the receiving module is used for receiving the text to be checked sent by the user terminal, and carrying out text structure matching on the text to be checked and the text templates of a plurality of text types so as to determine the text type of the text to be checked;
the splitting module is used for acquiring a classification model corresponding to the text type from a preset classification model library, splitting the text to be checked into a plurality of checking fragments by using the classification model, and adding a corresponding theme label for each checking fragment;
the obtaining module is used for respectively obtaining the auditing rules corresponding to each theme label from the rule base corresponding to the text type according to the theme label of each auditing fragment;
and the judging module is used for judging whether the risk element content exists in the corresponding auditing fragment according to the auditing rule, and if so, the risk element content is sent to the user terminal so as to carry out risk prompt.
According to another aspect of the present invention, there is also provided a computer apparatus, including: the text auditing method comprises the steps of a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor executes the program to realize the text auditing method.
According to another aspect of the present invention, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the text auditing method described above.
According to the text auditing method, the text auditing device, the computer equipment and the readable storage medium, the text to be audited is divided into a plurality of auditing fragments, and a corresponding auditing rule is set for each auditing fragment; and text auditing is carried out on the corresponding auditing fragments through each auditing rule, so that risk checking can be carried out in a targeted manner, and the accuracy of the text auditing is improved. In addition, each audit segment in the text to be audited can be audited in parallel, so that the efficiency of the audit text is improved.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:
FIG. 1 is a schematic flow chart of an alternative text review method according to the first embodiment;
FIG. 2 is a schematic diagram of an alternative program module of the text review apparatus according to the second embodiment;
fig. 3 is a schematic diagram of an alternative hardware architecture of a computer device according to the third embodiment.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
The embodiment of the invention provides a text auditing method, as shown in fig. 1, which specifically comprises the following steps:
step S101: receiving a text to be checked sent by a user terminal, and matching the text to be checked with text templates of a plurality of text types in a text structure to determine the text type of the text to be checked.
Preferably, the text in this embodiment may be a contract; the contract relates to the interests of the company or the person, and in the actual business scene, in order to ensure the rights and obligations of the two parties of the contract, the examination and verification of the contract content are required. Accordingly, in step S101, when a contract to be checked is received, the contract type of the contract to be checked is determined by analyzing the contract structure of the contract to be checked.
In this embodiment, the contract to be checked is generated according to different types of contract templates, each type of contract template has a corresponding contract structure, and the type of the contract template used by the contract to be checked can be determined by analyzing the contract structure of the contract to be checked, so as to obtain the contract type of the contract to be checked.
Specifically, the contract types include: purchase class, sales class, intent collaboration class, and privacy class.
Step S102: and obtaining a classification model corresponding to the text type from a preset classification model library, splitting the text to be checked into a plurality of checking fragments by using the classification model, and adding a corresponding theme label for each checking fragment.
Specifically, before step S102, the method further includes:
step A1: aiming at a text type, acquiring a training sample set corresponding to the text type; wherein the training sample set comprises: setting a number of historical texts, fragment information of each historical text and a theme label of each fragment;
a contract typically includes a plurality of parts, each part having a corresponding title and body; in the manual auditing contract scene, auditing personnel can partially audit the contracts when auditing the contracts, so that whether each part meets corresponding legal terms or not is confirmed, each historical contract in the training sample set is divided into a plurality of fragments according to titles and texts according to auditing habits of the auditing personnel, and corresponding theme labels are added to each divided fragment according to the content of each fragment.
For example, the to-be-checked contract for the purchase class is divided into the following segments: double-party rights and obligations, fees and payments, default liabilities and liability restrictions, third party rights guarantees, independence and partitionability, protocol changes and terminations, contractual and product/service standards, intellectual property rights, contractual validation and deadlines, and best-at treatments.
Step A2: and determining the topic labels contained in all the historical texts as necessary topic labels of the text types according to topic labels contained in each historical text in the training sample set.
Step A3: training and learning the preset model according to the training sample set to obtain a classification model corresponding to the text type.
Further, training and learning the preset model according to the training sample set to obtain a classification model corresponding to the text type, which specifically includes:
step A31: aiming at one topic label in the training sample set, obtaining fragments corresponding to the topic labels in each historical text;
step A32: performing word segmentation on each acquired segment, and extracting nouns of each segment;
step A33: determining a set number of salient nouns representing the topic label from nouns of all fragments, and calculating a salient coefficient of each salient noun to form a salient word set corresponding to the topic label;
step A34: and converging the salient word sets of the theme labels in the training sample set to be used as a classification model corresponding to the text type.
It should be noted that, each salient noun in the displayed word set has a corresponding salient coefficient; the larger the saliency coefficient value of a saliency noun, the more representative the saliency noun can represent the corresponding subject label.
Preferably, in practical application, in step a33, the nouns arranged in the preset number are set as the salient nouns in descending order according to the occurrence probability of each noun in each segment, and the corresponding salient coefficients are calculated according to the occurrence probability of each salient noun.
In addition, in practical application, the preset model can also adopt a naive Bayes classification model, and training and learning are carried out on the naive Bayes classification model according to the training sample set so as to obtain a classification model corresponding to the text type.
Further, step S102 includes:
step B1: determining each title contained in the text to be checked, and splitting the text to be checked into a plurality of checking fragments according to each determined title; wherein each audit segment includes: a title portion and a body portion;
step B2: word segmentation processing is carried out on each audit segment, and nouns of each audit segment are extracted;
step B3: aiming at an audit segment, determining a target salient word from each salient word set, wherein the target salient word is a noun which simultaneously appears in the salient word set and the audit segment; calculating the sum of the significant coefficients of each significant word set according to the significant coefficients of the target significant words in each significant word set; and adding the topic label corresponding to the salient word set with the largest sum of the salient coefficients to the auditing segment.
Step S103: and respectively acquiring the auditing rules corresponding to each theme label from the rule base corresponding to the text type according to the theme label of each auditing fragment.
Specifically, step S103 includes:
judging whether all the necessary topic labels of the text type are contained in all topic labels of the text to be checked; if yes, respectively acquiring auditing rules corresponding to each theme label from a rule base corresponding to the text type according to the theme label of each auditing fragment; if not, the information containing the missing necessary theme label is sent to the user terminal.
In this embodiment, firstly, the integrity of the contract to be checked is checked, whether the contract to be checked lacks necessary content is determined according to the type of the theme label contained in the contract to be checked, and reminding operation is performed when the contract to be checked lacks necessary theme labels.
In this embodiment, corresponding rule bases are set for different types of contracts in advance, respectively; and the rule library comprises auditing rules corresponding to different topic labels, namely, each auditing segment in the contract to be audited has corresponding auditing rules, and risk inspection is carried out in a targeted manner through the auditing rules of each auditing segment, so that the accuracy of contract auditing is improved.
Specifically, the auditing rule includes: an audit element and audit sub-rules, and one audit element corresponds to one audit sub-rule; the auditing element is a minimum auditing unit for text auditing, and the auditing sub rule is a judging rule for risk auditing of the auditing element.
For example, when the contract type is a purchase type and the topic label of the audit fragment is a fee and payment, the audit elements of the corresponding audit rule include: payment period, accounting period, fee, tax; aiming at the audit factors as expense, the audit rules are as follows: judging whether the sum and the sum unit are contained, if not, the risk exists.
Step S104: judging whether the risk element content exists in the corresponding audit fragment according to the audit rule, if so, sending the risk element content to the user terminal to carry out risk prompt.
Specifically, step S104 includes:
step C1: extracting element content corresponding to each audit element from the audit fragment according to each audit element in the audit rule;
step C2: judging whether element content meets an audit rule corresponding to an audit element aiming at element content of the audit element; and if not, sending the element content to the user terminal as risk element content.
Further, the determining whether the element content meets the audit sub rule corresponding to the audit element includes:
judging whether the element content contains preset keywords or not; or alternatively, the process may be performed,
judging whether the element content is consistent with preset content or not; or alternatively, the process may be performed,
and judging whether the currency or the sum of the money contained in the element content is consistent.
In the embodiment, the contract to be audited is split into a plurality of audit fragments, and each audit fragment in the contract to be audited can be audited in parallel, so that the efficiency of the audit contract is improved; in addition, corresponding auditing rules are set for each auditing segment, so that contract auditing can be performed in a targeted manner, and the accuracy is higher.
Still further, after step S104, the method further includes:
step D1: receiving auditing result information sent by the user terminal, judging whether the determined risk element content is correct according to the auditing result information, if so, adding one to the accurate value of the auditing rule corresponding to the risk element content, and if not, subtracting one to the accurate value of the auditing rule corresponding to the risk element content;
in the embodiment, an accurate value is set for each audit rule, and the initialized accurate values of each audit rule are consistent; when the risk element content is sent to the user terminal, the user corrects the risk element content manually according to the professional knowledge background of the user, and feeds back the auditing result information; and then, according to the auditing result information, adjusting the accurate value of each auditing rule.
Step D2: transmitting an audit rule with an accurate value smaller than a preset threshold value to the user terminal so that the user terminal can modify the audit rule;
in this embodiment, the audit rule is continuously revised by using the audit result information, so that the audit rule is continuously perfected.
Example two
The embodiment of the invention provides a text auditing device, as shown in fig. 2, which specifically comprises the following components:
the receiving module 201 is configured to receive a text to be checked sent from a user terminal, and match the text to be checked with text templates of a plurality of text types to determine the text type of the text to be checked;
the splitting module 202 is configured to obtain a classification model corresponding to the text type from a preset classification model library, split the text to be checked into a plurality of audit fragments by using the classification model, and add a corresponding topic label to each audit fragment;
the obtaining module 203 is configured to obtain, according to the topic label of each audit segment, audit rules corresponding to each topic label from a rule base corresponding to the text type;
and the judging module 204 is configured to judge whether risk element content exists in the corresponding audit segment according to the audit rule, and if so, send the risk element content to the user terminal for risk prompting.
Specifically, the device further comprises:
the training module is used for acquiring a classification model corresponding to the text type from a preset classification model library, splitting the text to be checked into a plurality of checking fragments by utilizing the classification model, and acquiring a training sample set corresponding to the text type for one text type before adding a corresponding theme label for each checking fragment; wherein the training sample set comprises: setting a number of historical texts, fragment information of each historical text and a theme label of each fragment; determining the topic labels contained in all the historical texts as necessary topic labels of the text types according to topic labels contained in each historical text in the training sample set; training and learning the preset model according to the training sample set to obtain a classification model corresponding to the text type.
Further, when implementing the training learning on the preset model according to the training sample set to obtain the function of the classification model corresponding to the text type, the training module specifically includes:
aiming at one topic label in the training sample set, obtaining fragments corresponding to the topic labels in each historical text; performing word segmentation on each acquired segment, and extracting nouns of each segment; determining a set number of salient nouns representing the topic label from nouns of all fragments, and calculating a salient coefficient of each salient noun to form a salient word set corresponding to the topic label; and converging the salient word sets of the theme labels in the training sample set to be used as a classification model corresponding to the text type.
In addition, the splitting module 202 is specifically configured to:
determining each title contained in the text to be checked, and splitting the text to be checked into a plurality of checking fragments according to each determined title; wherein each audit segment includes: a title portion and a body portion; word segmentation processing is carried out on each audit segment, and nouns of each audit segment are extracted; aiming at an audit segment, determining a target salient word from each salient word set, wherein the target salient word is a noun which simultaneously appears in the salient word set and the audit segment; calculating the sum of the significant coefficients of each significant word set according to the significant coefficients of the target significant words in each significant word set; and adding the topic label corresponding to the salient word set with the largest sum of the salient coefficients to the auditing segment.
The obtaining module 203 is specifically configured to:
judging whether all the necessary topic labels of the text type are contained in all topic labels of the text to be checked; if yes, respectively acquiring auditing rules corresponding to each theme label from a rule base corresponding to the text type according to the theme label of each auditing fragment; if not, the information containing the missing necessary theme label is sent to the user terminal.
Further, the auditing rule includes: an audit element and audit sub-rules, and one audit element corresponds to one audit sub-rule;
in addition, the judging module 204 is specifically configured to:
extracting element content corresponding to each audit element from the audit fragment according to each audit element in the audit rule; judging whether element content meets an audit rule corresponding to an audit element aiming at element content of the audit element; and if not, sending the element content to the user terminal as risk element content.
Still further, the apparatus further comprises:
the correction module is used for judging whether the risk element content exists in the corresponding audit segment according to the audit rule, if so, the risk element content is sent to the user terminal, after risk prompt is carried out, audit result information sent by the user terminal is received, and whether the determined risk element content is correct is judged according to the audit result information; if yes, adding one to the accurate value of the auditing rule corresponding to the risk element content; if not, subtracting one from the accurate value of the auditing rule corresponding to the risk element content; and sending the auditing rule with the accurate value smaller than the preset threshold value to the user terminal so as to enable the user terminal to modify the auditing rule.
Example III
The present embodiment also provides a computer device, such as a smart phone, a tablet computer, a notebook computer, a desktop computer, a rack-mounted server, a blade server, a tower server, or a rack-mounted server (including an independent server or a server cluster formed by a plurality of servers) that can execute a program. As shown in fig. 3, the computer device 30 of the present embodiment includes at least, but is not limited to: a memory 301, a processor 302, which may be communicatively connected to each other via a system bus. It is noted that FIG. 3 only shows a computer device 30 having components 301-302, but it should be understood that not all of the illustrated components are required to be implemented, and that more or fewer components may alternatively be implemented.
In this embodiment, the memory 301 (i.e., readable storage medium) includes flash memory, a hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, and the like. In some embodiments, the memory 301 may be an internal storage unit of the computer device 30, such as a hard disk or memory of the computer device 30. In other embodiments, the memory 301 may also be an external storage device of the computer device 30, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the computer device 30. Of course, the memory 301 may also include both internal storage units of the computer device 30 and external storage devices. In this embodiment, the memory 301 is typically used to store an operating system and various types of application software installed on the computer device 30, such as program codes of the text auditing apparatus of the second embodiment. In addition, the memory 301 can also be used to temporarily store various types of data that have been output or are to be output.
The processor 302 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 302 is generally used to control the overall operation of the computer device 30.
Specifically, in the present embodiment, the processor 302 is configured to execute a program of a text auditing method stored in the processor 302, where the program of the text auditing method is executed to implement the following steps:
receiving a text to be checked sent by a user terminal, and matching the text to be checked with text templates of a plurality of text types in a text structure to determine the text type of the text to be checked;
obtaining a classification model corresponding to the text type from a preset classification model library, splitting the text to be checked into a plurality of checking fragments by using the classification model, and adding a corresponding theme label for each checking fragment;
according to the topic label of each audit fragment, respectively acquiring audit rules corresponding to each topic label from a rule base corresponding to the text type;
judging whether the risk element content exists in the corresponding audit fragment according to the audit rule, if so, sending the risk element content to the user terminal to carry out risk prompt.
The specific embodiment of the above method steps may refer to the first embodiment, and this embodiment is not repeated here.
Example IV
The present embodiment also provides a computer readable storage medium, such as a flash memory, a hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, a server, an App application store, etc., having stored thereon a computer program that when executed by a processor performs the following method steps:
receiving a text to be checked sent by a user terminal, and matching the text to be checked with text templates of a plurality of text types in a text structure to determine the text type of the text to be checked;
obtaining a classification model corresponding to the text type from a preset classification model library, splitting the text to be checked into a plurality of checking fragments by using the classification model, and adding a corresponding theme label for each checking fragment;
according to the topic label of each audit fragment, respectively acquiring audit rules corresponding to each topic label from a rule base corresponding to the text type;
judging whether the risk element content exists in the corresponding audit fragment according to the audit rule, if so, sending the risk element content to the user terminal to carry out risk prompt.
The specific embodiment of the above method steps may refer to the first embodiment, and this embodiment is not repeated here.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment.
The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims (7)

1. A text review method, the method comprising:
receiving a text to be checked sent by a user terminal, and matching the text to be checked with text templates of a plurality of text types in a text structure to determine the text type of the text to be checked;
obtaining a classification model corresponding to the text type from a preset classification model library, splitting the text to be checked into a plurality of checking fragments by using the classification model, and adding a corresponding theme label for each checking fragment;
according to the topic label of each audit fragment, respectively acquiring audit rules corresponding to each topic label from a rule base corresponding to the text type;
judging whether risk element contents exist in the corresponding audit fragments according to the audit rules, if so, sending the risk element contents to the user terminal to carry out risk prompt;
the method comprises the steps of obtaining a classification model corresponding to the text type from a preset classification model library, splitting the text to be checked into a plurality of checking fragments by using the classification model, and adding a corresponding theme label for each checking fragment, wherein the method comprises the following steps:
aiming at a text type, acquiring a training sample set corresponding to the text type; wherein the training sample set comprises: setting a number of historical texts, fragment information of each historical text and a theme label of each fragment;
determining the topic labels contained in all the historical texts as necessary topic labels of the text types according to topic labels contained in each historical text in the training sample set;
aiming at one topic label in the training sample set, obtaining fragments corresponding to the topic labels in each historical text; performing word segmentation on each acquired segment, and extracting nouns of each segment; determining a set number of salient nouns representing the topic label from nouns of all fragments, and calculating a salient coefficient of each salient noun to form a salient word set corresponding to the topic label;
collecting a salient word set of each topic label in the training sample set as a classification model corresponding to the text type;
determining each title contained in the text to be checked, and splitting the text to be checked into a plurality of checking fragments according to each determined title; wherein each audit segment includes: a title portion and a body portion;
word segmentation processing is carried out on each audit segment, and nouns of each audit segment are extracted;
aiming at an audit segment, determining a target salient word from each salient word set, wherein the target salient word is a noun which simultaneously appears in the salient word set and the audit segment; calculating the sum of the significant coefficients of each significant word set according to the significant coefficients of the target significant words in each significant word set; and adding the topic label corresponding to the salient word set with the largest sum of the salient coefficients to the auditing segment.
2. The text auditing method according to claim 1, wherein the obtaining, according to the topic label of each auditing segment, auditing rules corresponding to the topic labels from a rule base corresponding to the text type includes:
judging whether all the necessary topic labels of the text type are contained in all topic labels of the text to be checked; if yes, respectively acquiring auditing rules corresponding to each theme label from a rule base corresponding to the text type according to the theme label of each auditing fragment; if not, the information containing the missing necessary theme label is sent to the user terminal.
3. The text auditing method of claim 1, wherein the auditing rules include: an audit element and audit sub-rules, and one audit element corresponds to one audit sub-rule;
judging whether risk element contents exist in the corresponding audit fragments according to the audit rules, if so, sending the risk element contents to the user terminal to carry out risk prompt, wherein the method specifically comprises the following steps of:
extracting element content corresponding to each audit element from the audit fragment according to each audit element in the audit rule;
judging whether element content meets an audit rule corresponding to an audit element aiming at element content of the audit element; and if not, sending the element content to the user terminal as risk element content.
4. The text auditing method according to claim 1, wherein after the judging whether there is risk element content in the corresponding auditing segment according to the auditing rule, if so, the risk element content is sent to the user terminal to perform risk prompting, the method further includes:
receiving auditing result information sent by the user terminal, and judging whether the determined risk element content is correct or not according to the auditing result information; if yes, adding one to the accurate value of the auditing rule corresponding to the risk element content; if not, subtracting one from the accurate value of the auditing rule corresponding to the risk element content;
and sending the auditing rule with the accurate value smaller than the preset threshold value to the user terminal so as to enable the user terminal to modify the auditing rule.
5. A text auditing device, the device comprising:
the receiving module is used for receiving the text to be checked sent by the user terminal, and carrying out text structure matching on the text to be checked and the text templates of a plurality of text types so as to determine the text type of the text to be checked;
the splitting module is used for acquiring a classification model corresponding to the text type from a preset classification model library, splitting the text to be checked into a plurality of checking fragments by using the classification model, and adding a corresponding theme label for each checking fragment;
the obtaining module is used for respectively obtaining the auditing rules corresponding to each theme label from the rule base corresponding to the text type according to the theme label of each auditing fragment;
the judging module is used for judging whether the risk element content exists in the corresponding auditing fragment according to the auditing rule, if so, the risk element content is sent to the user terminal to carry out risk prompt;
the splitting module is further configured to:
aiming at a text type, acquiring a training sample set corresponding to the text type; wherein the training sample set comprises: setting a number of historical texts, fragment information of each historical text and a theme label of each fragment;
determining the topic labels contained in all the historical texts as necessary topic labels of the text types according to topic labels contained in each historical text in the training sample set;
aiming at one topic label in the training sample set, obtaining fragments corresponding to the topic labels in each historical text; performing word segmentation on each acquired segment, and extracting nouns of each segment; determining a set number of salient nouns representing the topic label from nouns of all fragments, and calculating a salient coefficient of each salient noun to form a salient word set corresponding to the topic label;
collecting a salient word set of each topic label in the training sample set as a classification model corresponding to the text type;
determining each title contained in the text to be checked, and splitting the text to be checked into a plurality of checking fragments according to each determined title; wherein each audit segment includes: a title portion and a body portion;
word segmentation processing is carried out on each audit segment, and nouns of each audit segment are extracted;
aiming at an audit segment, determining a target salient word from each salient word set, wherein the target salient word is a noun which simultaneously appears in the salient word set and the audit segment; calculating the sum of the significant coefficients of each significant word set according to the significant coefficients of the target significant words in each significant word set; and adding the topic label corresponding to the salient word set with the largest sum of the salient coefficients to the auditing segment.
6. A computer device, the computer device comprising: memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any one of claims 1 to 4 when the program is executed by the processor.
7. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps of the method according to any one of claims 1 to 4.
CN202010116229.7A 2020-02-25 2020-02-25 Text auditing method and device, computer equipment and readable storage medium Active CN111274782B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010116229.7A CN111274782B (en) 2020-02-25 2020-02-25 Text auditing method and device, computer equipment and readable storage medium
PCT/CN2020/111641 WO2021169208A1 (en) 2020-02-25 2020-08-27 Text review method and apparatus, and computer device, and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010116229.7A CN111274782B (en) 2020-02-25 2020-02-25 Text auditing method and device, computer equipment and readable storage medium

Publications (2)

Publication Number Publication Date
CN111274782A CN111274782A (en) 2020-06-12
CN111274782B true CN111274782B (en) 2023-10-20

Family

ID=71000343

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010116229.7A Active CN111274782B (en) 2020-02-25 2020-02-25 Text auditing method and device, computer equipment and readable storage medium

Country Status (2)

Country Link
CN (1) CN111274782B (en)
WO (1) WO2021169208A1 (en)

Families Citing this family (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111274782B (en) * 2020-02-25 2023-10-20 平安科技(深圳)有限公司 Text auditing method and device, computer equipment and readable storage medium
CN112163585B (en) * 2020-11-10 2023-11-10 上海七猫文化传媒有限公司 Text auditing method and device, computer equipment and storage medium
CN112579771A (en) * 2020-12-08 2021-03-30 腾讯科技(深圳)有限公司 Content title detection method and device
CN112597851A (en) * 2020-12-15 2021-04-02 泰康保险集团股份有限公司 Signature acquisition method and device, electronic equipment and storage medium
CN112950017A (en) * 2021-02-26 2021-06-11 云账户技术(天津)有限公司 Contract risk identification method and device and electronic equipment
CN113469732A (en) * 2021-06-11 2021-10-01 北京百度网讯科技有限公司 Content understanding-based auditing method and device and electronic equipment
CN113689148A (en) * 2021-09-26 2021-11-23 支付宝(杭州)信息技术有限公司 Text risk identification method, device and equipment
CN114285616A (en) * 2021-12-16 2022-04-05 上海商汤科技开发有限公司 Data transmission method and device, electronic equipment and storage medium
CN114489432B (en) * 2021-12-27 2024-04-09 掌阅科技股份有限公司 Electronic book auditing method, electronic equipment and storage medium
CN114302171B (en) * 2021-12-28 2024-04-09 新瑞鹏宠物医疗集团有限公司 Video auditing method, device and storage medium
CN114049215A (en) * 2022-01-06 2022-02-15 杭州衡泰技术股份有限公司 Abnormal transaction identification method, device and application
CN114095282B (en) * 2022-01-21 2022-04-15 杭银消费金融股份有限公司 Wind control processing method and device based on short text feature extraction
CN114219501B (en) * 2022-02-22 2022-06-28 杭州衡泰技术股份有限公司 Sample labeling resource allocation method, device and application
CN114691865A (en) * 2022-03-03 2022-07-01 支付宝(杭州)信息技术有限公司 Fund product auditing method, device and equipment
CN114661901A (en) * 2022-03-03 2022-06-24 支付宝(杭州)信息技术有限公司 Virtual resource auditing method, device and equipment
CN115358751B (en) * 2022-08-22 2023-04-28 中电金信软件有限公司 Automatic auditing method and device for transaction receipt and electronic equipment
CN115130139B (en) * 2022-08-31 2022-12-02 杭州链城数字科技有限公司 Digital asset review method, apparatus, system and storage medium
CN116663525B (en) * 2023-07-21 2023-12-01 科大讯飞股份有限公司 Document auditing method, device, equipment and storage medium
CN116664080B (en) * 2023-07-25 2023-10-10 山东唐和智能科技有限公司 Micro-suggestion information processing system and method
CN117151096A (en) * 2023-09-05 2023-12-01 江苏群杰物联科技有限公司 Intelligent contract checking method and device, electronic equipment and storage medium
CN116934278A (en) * 2023-09-19 2023-10-24 中铁建设集团有限公司 Method and device for auditing construction scheme
CN117275030A (en) * 2023-09-27 2023-12-22 自然资源部地图技术审查中心 Method and device for auditing map
CN117132244B (en) * 2023-10-26 2024-01-09 国网浙江省电力有限公司 Classification processing method, device and storage medium for intelligent compliance management system
CN117172249B (en) * 2023-11-03 2024-01-26 青矩技术股份有限公司 Contract checking method, device, equipment and computer readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109344382A (en) * 2018-10-23 2019-02-15 出门问问信息科技有限公司 Method, apparatus, electronic equipment and the computer readable storage medium of audit contract
WO2019184217A1 (en) * 2018-03-26 2019-10-03 平安科技(深圳)有限公司 Hotspot event classification method and apparatus, and storage medium
WO2019196224A1 (en) * 2018-04-09 2019-10-17 平安科技(深圳)有限公司 Regulation information processing method and apparatus, computer device and storage medium
CN110442842A (en) * 2019-06-20 2019-11-12 平安科技(深圳)有限公司 The extracting method and device of treaty content, computer equipment, storage medium
CN110705952A (en) * 2019-08-15 2020-01-17 平安信托有限责任公司 Contract auditing method and device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10521464B2 (en) * 2015-12-10 2019-12-31 Agile Data Decisions, Llc Method and system for extracting, verifying and cataloging technical information from unstructured documents
CN110362822A (en) * 2019-06-18 2019-10-22 中国平安财产保险股份有限公司 Text marking method, apparatus, computer equipment and storage medium for model training
CN111274782B (en) * 2020-02-25 2023-10-20 平安科技(深圳)有限公司 Text auditing method and device, computer equipment and readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019184217A1 (en) * 2018-03-26 2019-10-03 平安科技(深圳)有限公司 Hotspot event classification method and apparatus, and storage medium
WO2019196224A1 (en) * 2018-04-09 2019-10-17 平安科技(深圳)有限公司 Regulation information processing method and apparatus, computer device and storage medium
CN109344382A (en) * 2018-10-23 2019-02-15 出门问问信息科技有限公司 Method, apparatus, electronic equipment and the computer readable storage medium of audit contract
CN110442842A (en) * 2019-06-20 2019-11-12 平安科技(深圳)有限公司 The extracting method and device of treaty content, computer equipment, storage medium
CN110705952A (en) * 2019-08-15 2020-01-17 平安信托有限责任公司 Contract auditing method and device

Also Published As

Publication number Publication date
WO2021169208A1 (en) 2021-09-02
CN111274782A (en) 2020-06-12

Similar Documents

Publication Publication Date Title
CN111274782B (en) Text auditing method and device, computer equipment and readable storage medium
CN110083623B (en) Business rule generation method and device
CN112052305A (en) Information extraction method and device, computer equipment and readable storage medium
CN111475700A (en) Data extraction method and related equipment
CN111582932A (en) Inter-scene information pushing method and device, computer equipment and storage medium
CN111598122B (en) Data verification method and device, electronic equipment and storage medium
CN110321423B (en) Text data risk identification method and server
CN109324963B (en) Method for automatically testing profit result and terminal equipment
CN116758918A (en) Address information identification method and device, electronic equipment and storage medium
CN116385189A (en) Method and system for checking matching degree of account listed subjects of financial account-reporting document
WO2017033200A1 (en) Electronic sorting and classification of documents
CN113282837B (en) Event analysis method, device, computer equipment and storage medium
CN113743982A (en) Advertisement putting scheme recommendation method and device, computer equipment and storage medium
CN114861622A (en) Documentary credit generating method, documentary credit generating device, documentary credit generating equipment, storage medium and program product
CN114549177A (en) Insurance letter examination method, device, system and computer readable storage medium
CN113449506A (en) Data detection method, device and equipment and readable storage medium
CN114154480A (en) Information extraction method, device, equipment and storage medium
EP4165564A1 (en) Methods and systems for matching and optimizing technology solutions to requested enterprise products
CN112199948A (en) Text content identification and illegal advertisement identification method and device and electronic equipment
CN113377972A (en) Multimedia content recommendation method and device, computing equipment and storage medium
CN110879835A (en) Data processing method, device and equipment based on block chain and readable storage medium
CN115618120B (en) Public number information pushing method, system, terminal equipment and storage medium
CN114743012B (en) Text recognition method and device
US20230283580A1 (en) Story message generation
CN113868210A (en) Validity verification method, system, equipment and storage medium for imported data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40030949

Country of ref document: HK

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant