CN112732895A

CN112732895A - Method and device for auditing text, electronic equipment and storage medium

Info

Publication number: CN112732895A
Application number: CN202110117670.1A
Authority: CN
Inventors: 林小虎; 胡陆杰; 卢建章; 梁梓健; 吴哲慧
Original assignee: Guangzhou Huya Information Technology Co Ltd
Current assignee: Guangzhou Huya Information Technology Co Ltd
Priority date: 2018-03-26
Filing date: 2018-03-26
Publication date: 2021-04-30
Anticipated expiration: 2038-03-26
Also published as: CN108491518A; CN112732895B; CN108491518B

Abstract

The embodiment of the invention discloses a method and a device for auditing texts, electronic equipment and a storage medium. The method comprises the following steps: acquiring text information sent by two or more auditing demanders; determining whether the text content comprises words in a word bank corresponding to the identifier of the auditing demander; and if the text content comprises the words in the word bank corresponding to the identifier of the auditing demander, executing an information processing strategy matched with the attribute of the word bank. The technical scheme of the embodiment of the invention solves the technical defects of high software and hardware resource occupancy rate, large workload and high work repetition rate in the process of text audit caused by using different text audit methods to audit different types of texts in the prior art, realizes high-efficiency and accurate audit of different types of text information by using the same audit process, greatly improves the work efficiency of text audit and reduces the occupancy rate of software and hardware resources in the process of text audit.

Description

Method and device for auditing text, electronic equipment and storage medium

The patent application of the invention is a divisional application of Chinese invention patent application with the application date of 2018, 3, month and 26, the application number of 201810253141.2 and the name of 'method, device, electronic equipment and storage medium for auditing texts'.

Technical Field

The embodiment of the invention relates to the technical field of text information processing, in particular to a method and a device for auditing a text, electronic equipment and a storage medium.

Background

With the continuous development of internet technology, people increasingly depend on the internet to spread various information. An important carrier for network information propagation is text, but nowadays, networks are full of various non-civilized terms, and the terms are distributed in text information such as articles, titles, announcements, nicknames and bulletin screens.

Because different text information needs different auditing contents and different auditing severities, in the prior art, independent and different auditing processes are configured for text information such as articles, titles, announcements, nicknames, barracks and the like, and generally, an auditing process for one text information is not suitable for auditing other text information.

In the process of implementing the invention, the inventor finds that the prior art has the following defects: because different text information is respectively configured with independent and different auditing processes, the software and hardware resource occupancy rate is high, the workload is large, and the work repetition rate is high in the process of auditing the text information.

Disclosure of Invention

In view of this, embodiments of the present invention provide a method and an apparatus for auditing texts, an electronic device, and a storage medium, so as to optimize an existing text auditing manner and improve auditing efficiency for at least two types of texts.

In a first aspect, an embodiment of the present invention provides a method for reviewing a text, including:

acquiring text information sent by two or more auditing demanders, wherein the text information comprises an auditing demander identifier and text content;

determining whether the text content comprises words in a word bank corresponding to the identification of the auditing demander;

and if the text content comprises words in a word bank corresponding to the identification of the auditing demander, executing an information processing strategy matched with the attribute of the word bank.

In a second aspect, an embodiment of the present invention provides an apparatus for reviewing a text, including:

the information acquisition module is used for acquiring text information sent by two or more auditing demanders, wherein the text information comprises an auditing demander identifier and text content;

a content determining module, configured to determine whether the text content includes a word in a lexicon corresponding to the identifier of the auditing demander;

and the information processing strategy executing module is used for executing an information processing strategy matched with the attribute of the word bank if the text content comprises the words in the word bank corresponding to the identification of the auditing demander.

In a third aspect, an embodiment of the present invention provides an electronic device, including:

one or more processors;

storage means for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement a method for reviewing text as described in any of the embodiments of the invention.

In a fourth aspect, embodiments of the present invention provide a storage medium containing computer-executable instructions that, when executed by a computer processor, perform a method of reviewing text as described in any of the embodiments of the present invention.

The embodiment of the invention provides a method, a device, electronic equipment and a storage medium for auditing texts, which are characterized in that text information sent by two or more auditing demanders is firstly acquired, then whether the acquired text content comprises words in a word bank corresponding to corresponding auditing demander identifications is determined, if yes, an information processing strategy matched with the attributes of the word bank is executed, the technical defects that different types of texts are audited by using different text auditing methods in the prior art, and the software and hardware resource occupancy rate is high, the workload is large and the work repetition rate is high in the text auditing process are overcome, the high-efficiency and accurate auditing of different types of text information by using the same auditing process is realized, the text auditing work efficiency is greatly improved, and the software and hardware resource occupancy rate in the text auditing process is reduced.

Drawings

Fig. 1 is a flowchart of a method for reviewing a text according to an embodiment of the present invention;

fig. 2 is a flowchart of a method for reviewing a text according to a second embodiment of the present invention;

fig. 3a is a flowchart of a method for reviewing a text according to a third embodiment of the present invention;

FIG. 3b is a diagram of a penalty template provided in the third embodiment of the present invention;

fig. 3c is a schematic diagram of a first data format according to a third embodiment of the present invention;

fig. 4 is a block diagram of an apparatus for reviewing a text according to a fourth embodiment of the present invention;

fig. 5 is a structural diagram of an electronic device according to a fifth embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention are described in further detail below with reference to the accompanying drawings. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention.

It should be further noted that, for the convenience of description, only some but not all of the relevant aspects of the present invention are shown in the drawings. Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the operations (or steps) as a sequential process, many of the operations can be performed in parallel, concurrently or simultaneously. In addition, the order of the operations may be re-arranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.

Example one

Fig. 1 is a flowchart of a method for reviewing a text according to an embodiment of the present invention, where the method may be performed by an apparatus for reviewing a text, and the apparatus may be implemented by hardware and/or software, and may be generally integrated in an electronic device for performing text review. The method of the embodiment specifically includes:

s101, acquiring text information sent by two or more auditing demanders, wherein the text information comprises an auditing demander identifier and text content.

In this embodiment, the audit demander specifically refers to a service system with a text information audit requirement, and typically may be a video playing APP (application program), a microblog platform, and the like. The text information specifically refers to information including text content to be audited and an identifier of an audit requiring party. The text content to be audited may be specifically a bullet screen, a title of an article, a nickname of a user, article content, and the like.

In this embodiment, the identifier of the audit demander may be used to uniquely identify one audit demander, and may also be used to uniquely identify one audit demander and a to-be-audited text type corresponding to the audit demander. Typically, the identifier of the auditing demander may be a name of the auditing demander or a code of the auditing demander. It can be understood that the text content to be audited corresponding to one audit requiring party may be only one type of text content, or may be multiple types of text content. When the text content to be checked corresponding to one checking demander is only one type of text content, the checking demander generally only corresponds to one group of word banks, at the moment, the identification of the checking demander only needs to be used for uniquely identifying one corresponding checking demander, and after the checking demander is determined, the word banks to be used when the text content to be checked is checked can be determined; when the text content to be checked corresponding to one checking demander is the text content of multiple types, the checking demander may correspond to multiple word banks, each word bank corresponds to one text type to be checked, at this time, the checking demander identifier needs to be used for identifying the checking demander and the text type to be checked corresponding to the checking demander at the same time, and after the checking demander and the type of the current text content to be checked are determined, the word bank to be used when the text content to be checked is checked can be determined. Of course, if the word banks corresponding to the multiple text types to be audited corresponding to one auditing demander are the same, then the auditing demander identifier only needs to be used to uniquely identify one auditing demander.

In this embodiment, the acquired text information is sent by two or more auditing demanders, and the text information may be the same type of text information or different types of text information, that is, the method for auditing the text in this embodiment may be interfaced with two or more auditing demanders at the same time to perform text auditing, and may audit two or more different types of text contents at the same time.

And S102, determining whether the text content comprises words in the word bank corresponding to the identifier of the auditing demander.

In this embodiment, the word library specifically refers to a set of words corresponding to one identifier of an audit demander or corresponding to one identifier of an audit demander, where the words in the word library are specifically used to audit text content in text information sent by the audit demander. The word bank corresponding to one to-be-audited text type corresponding to one audit demander identifier or one audit demander identifier may be specifically a word bank or a group of word banks. When the word bank corresponding to one text type to be audited corresponding to one audit demander identifier or one audit demander identifier is a group of word banks, any two word banks in the group of word banks do not include the same words. In addition, word banks corresponding to different identifiers of the auditing demander can be the same or different.

Further, in this embodiment, the thesaurus has its own attribute, and the attribute of the thesaurus is used to determine a processing mode of the text content including the words in the thesaurus. For example, if the attribute of the word bank is interception, the text content including the words in the word bank should be intercepted; if the attribute of the word bank is audit, the text content comprising the words in the word bank is further audited.

Further, if the word bank corresponding to one text type to be checked corresponding to one checking demander identifier or one checking demander identifier is a group of word banks, that is, a plurality of word banks, the word banks generally should have different attributes. The attributes of the lexicons corresponding to the multiple text types to be audited corresponding to one audit demander identifier can be the same.

In this embodiment, the method for determining whether the text content includes the words in the lexicon corresponding to the identifier of the auditing demander may specifically be to match the text content with the words in the lexicon corresponding to the identifier of the auditing demander one by one, and if the lexicon corresponding to the identifier of the auditing demander is a group of lexicons, different lexicons in the group of lexicons may be selected in any order to match the text content, or different lexicons in the group of lexicons may be selected in sequence to match the text content according to a set order corresponding to the attribute of the lexicons.

S103, if the text content comprises words in the word bank corresponding to the identifier of the auditing demander, executing an information processing strategy matched with the attribute of the word bank.

In this embodiment, when the text content includes a word in the lexicon corresponding to the identifier of the auditing demander, the text content is processed according to the information processing policy matched with the attribute of the lexicon. For example, when the attribute of the word bank is interception, an interception instruction corresponding to the text content is sent to the auditing demander; and when the attribute of the word bank is the review, further reviewing the text content, specifically performing manual review and the like.

The embodiment of the invention provides a method for auditing texts, which comprises the steps of firstly obtaining text information sent by two or more auditing demanders, then determining whether the obtained text content comprises words in a word bank corresponding to corresponding auditing demander identifications, and if the obtained text content comprises the words, executing an information processing strategy matched with the attributes of the word bank, so that the technical defects that different types of texts are audited by using different text auditing methods in the prior art, and the software and hardware resource occupancy rate is high, the workload is large and the work repetition rate is high in the text auditing process are overcome, the efficient and accurate auditing of the different types of text information by using the same auditing process is realized, the text auditing work efficiency is greatly improved, and the software and hardware resource occupancy rate in the text auditing process is reduced.

Example two

Fig. 2 is a flowchart of a method for reviewing a text according to a second embodiment of the present invention. In this embodiment, a specific implementation manner is provided for refining the information processing policy into audit or interception and further including the audit text when the user reports the identifier.

Correspondingly, the method of the embodiment specifically includes:

s201, acquiring text information sent by two or more auditing demanders through a pre-constructed standard interface.

In this embodiment, the standard interface specifically refers to a data transmission protocol capable of performing data communication with two or more auditing demanders at the same time, and the standard interface is associated with the two or more auditing demanders and defines a unified data transmission standard in advance in the standard interface.

It can be understood that if the data transmission protocols used for communicating with different auditing demanders are different, the development and use costs of software and hardware are increased, and the data transmission efficiency is not improved.

In this embodiment, the text information not only includes an identifier of the auditing demander and the text content, but also includes a user reporting identifier, where the user reporting identifier is used to indicate whether the text content in the text information has been reported by the user. Generally, an auditing demander audits text contents sent by users with better reputation which may not be sent to a device for auditing texts, so as to reduce operation cost.

S202, determining whether the text information is the user report information according to the user report identification, if so, executing a step 204, and if not, executing a step 203.

In this embodiment, the matching manner of the word bank corresponding to the text content and the identifier of the auditing demander is determined according to the user report identifier in the text information including the text content.

S203, determining whether the text content comprises words in the intercepted word bank with the attribute corresponding to the identifier of the auditing demander, if yes, executing step 207, and if not, executing step 204.

And when the text information is determined not to be the user reporting information according to the user reporting identification, matching the text content with the attribute corresponding to the identification of the auditing demander as the intercepted word bank, and judging whether the text content comprises words in the intercepted word bank.

It should be noted that the thesaurus whose attribute is intercepted should be the thesaurus corresponding to the identifier of the reviewing demander and whose attribute is intercepted at the same time corresponding to the type of the text content, and similarly, the thesaurus whose attribute is reviewed in step 204 should be the thesaurus corresponding to the identifier of the reviewing demander and whose attribute is reviewed at the same time corresponding to the type of the text content, which is the same in this embodiment and other embodiments.

And S204, determining whether the text content comprises words in a word bank with the attribute corresponding to the identifier of the checking demander as the checking, if so, executing the step 205, and if not, executing the step 206.

And when the text information is determined to be the user reporting information according to the user reporting identification, directly matching the text content with the attribute corresponding to the identification of the auditing demander as the audited word bank, and judging whether the text content comprises words in the audited word bank.

In addition, when the text content in the text information reported by the non-user does not include the word with the attribute corresponding to the identifier of the auditing demander as the intercepted word library, the text content in the text information reported by the non-user is further matched with the word library with the attribute corresponding to the identifier of the auditing demander as the auditing word library, and whether the text content includes the word in the auditing word library is judged.

And S205, displaying the text content, and enabling the staff to check the text content and feed back the checking result to the checking demander.

In this embodiment, when the text content includes a word in the word bank whose attribute corresponding to the identifier of the reviewing demander is reviewed, the text content is displayed, so that the staff can review the text content. Specifically, when the text content is displayed, the words in the lexicon whose attribute is to be checked corresponding to the identifier of the checking demander included in the text content may be labeled, for example, the word is red, or underlines are added below the word, so that the worker can more intuitively know the text content, and thus, a faster and more accurate checking judgment can be made.

Further, after the staff completes the examination of the text content, the examination result of the text information corresponding to the text content is sent to the examination demander through the standard interface, so that the examination demander processes the text information according to the examination result.

And S206, executing a text processing mode corresponding to the identifier of the checking demander.

In this embodiment, when the text content does not include a word in any thesaurus corresponding to the identifier of the auditing demander, a text processing mode corresponding to the identifier of the auditing demander is executed. The text processing mode is to generate a display instruction and send the display instruction to the auditing demander, or to display the text content, and is used for the staff to audit the text content and feed back the audit result to the auditing demander. And when the auditing requiring party receives the display instruction, the text content is displayed.

And S207, generating an interception instruction and sending the interception instruction to an audit demander.

In this embodiment, when it is determined that the text content includes a word in the intercepted word library corresponding to the identifier of the auditing demander, an intercepting instruction is generated and sent to the auditing demander, so that the auditing demander can intercept the text content.

The embodiment of the invention provides a method for auditing texts, which embodies an information processing strategy matched with the attributes of a word bank, can determine the processing mode of text contents to be interception, manual auditing or display according to the attributes of the word bank, realizes the processing mode of quickly, simply and accurately determining the text contents, embodies the acquisition mode of the text information, acquires the text information through a pre-constructed standard interface, improves the transmission efficiency of data, reduces the development and use cost of software and hardware, specifically increases user reporting identifiers, and determines the matching process of the text contents and the word bank according to the user reporting identifiers, so that the auditing process of the text contents is more reasonable and effective.

EXAMPLE III

Fig. 3a is a flowchart of a method for reviewing a text according to a third embodiment of the present invention. In this embodiment, a specific implementation manner of the audit text is provided, in which an original lexicon corresponding to an audit request party is first obtained and stored through a public lexicon and an audit request lexicon, the text content is first segmented and then matched with the lexicon stored in the cache in the first data format, and the penalty level of the text content is determined according to a penalty template corresponding to the audit request party.

Correspondingly, the method of the embodiment specifically includes:

s301, acquiring two or more original word banks corresponding to the identifiers of the auditing demander.

It can be understood that the word bank content for auditing the text content is generally determined by the auditing demander, which determines the word bank content according to the factors of controlling the service type, service requirement and word rigor degree of the auditing demander.

Therefore, in this embodiment, two or more original word libraries corresponding to the identifiers of the auditing demander need to be obtained first, that is, the word libraries of the auditing demander themselves are obtained from the auditing demander.

S302, storing the words with the same category of the words and the same attribute of the word library in the original word library corresponding to the obtained identifier of the auditing demander into a public word library, and storing the words which are not stored in the public word library in the original word library corresponding to the obtained identifier of the auditing demander into the vocabulary library corresponding to the identifier of the auditing demander.

In this embodiment, the original word banks obtained in step 301 are not separately and independently stored, and since a large number of words in these word banks are repeated many times, a common word bank is provided in this embodiment in order to save storage space.

It should be noted that, in this embodiment, the words in the word libraries are classified, that is, all the words in one word library may belong to different categories, and of course, all the words may belong to the same category. The thesaurus of different attributes may include the same word categories.

In this embodiment, all the obtained original word banks (i.e., all the original word banks corresponding to the respective identifiers of the auditing demanders) are stored in the common word bank, where the types of the words and the words having the same attribute are all stored, and all the words are classified and stored in the common word bank according to different word bank attributes and different word categories. For example, the public lexicon can store all words with the lexicon attribute of interception and the word category of blocking confidence as a whole; integrally storing all the words with the word library attribute of interception and the word category of common advertisements; and integrally storing all the words with the word library attribute as audit and the word category as the built up confidence, and the like.

In this embodiment, words, which are not stored in the public lexicon, in the original lexicon corresponding to the obtained identifier of the auditing demander are stored in the lexicon of the auditing demander corresponding to the identifier of the auditing demander. The number of the examination requirement word banks corresponding to one examination requirement party identifier is consistent with the number of the original word banks corresponding to the examination requirement party identifier and is in one-to-one correspondence. However, when all words in an original lexicon corresponding to one identifier of an audit demander are all classified into the public lexicon, the number of the vocabulary library corresponding to the identifier of the audit demander is less than the number of the original lexicon corresponding to the identifier of the audit demander.

And S303, obtaining penalty templates respectively corresponding to two or more identifiers of the auditing demander.

In this embodiment, the identifier of the auditing demander corresponds to a penalty template, and the penalty template includes penalty policies corresponding to different penalty levels. The penalty template is specifically used for storing different penalty levels and penalty modes corresponding to each word category in the audited word library, so that in the embodiment, the penalty level and the penalty mode of a word can be determined according to the word category to which the word belongs.

Figure 3b is a schematic diagram of a penalty template. As shown in fig. 3b, the penalty template corresponds to text content of a type of video barrage corresponding to the auditing demander, a word category is not shown in the penalty template, only a violation degree (i.e., a penalty level) corresponding to the word category is displayed, the violation degree in the graph is "serious", and the penalty template also sets contents such as "clear or not", "forbidden to speak" and the like one by one.

It will be appreciated that the manner of penalizing the textual content, including words in the lexicon, is generally determined by the audit trail party based on his own circumstances. Therefore, in this embodiment, the penalty template is also obtained from the audit demander.

S304, acquiring text information sent by two or more auditing demanders.

In this embodiment, the original lexicon and the penalty template corresponding to the identifier of the auditing demander and the text information sent by the auditing demander can be obtained through a preset standard interface or different interfaces, and the different interfaces correspond to different auditing demanders.

S305, according to the public word bank and the examination requirement word bank, according to a first data format, the examination requirement party, the word bank corresponding to the identification of the examination requirement party, the words in the word bank corresponding to the identification of the examination requirement party, the category of the words and the incidence relation between the identifications of the words, and the words and the identifications of the words in the word bank corresponding to the identification of the examination requirement party are stored in a cache space in advance.

In this embodiment, before matching the text content with the words in the lexicon, the words in the lexicon are stored in the cache in the first data format, so as to improve the speed and efficiency of word matching.

The first data format is a hash-structured redis cache format and comprises an auditing demander, a word bank corresponding to the auditing demander identifier, words in the word bank corresponding to the auditing demander identifier, the category of the words and the association relationship between the word identifiers, the words and the identifiers of the words in the word bank corresponding to the auditing demander identifier, and the words and the identifiers of the words in the word bank corresponding to the auditing demander identifier, wherein a group of data stored in the first data format corresponds to one category of the words in one word bank corresponding to the auditing demander identifier. The term identifier is used to uniquely identify a term, and it should be noted here that if a term belongs to both the public thesaurus and the review requirement thesaurus, or to different review requirement thesaurus, the term will have multiple identifiers, and the term category and thesaurus attribute to which the term belongs can be accurately determined according to different identifiers.

As shown in fig. 3c, in the first data format, the identifier of the auditor demander corresponding to the thesaurus, and the category identifier of the word in the auditor demander corresponding to the thesaurus are key values in a reds cache format with a hash structure; all the words and the identifiers of all the words corresponding to the identifier of the auditing demander, the identifier of the word bank corresponding to the identifier of the auditing demander and the category identifier of the words in the word bank corresponding to the identifier of the auditing demander are respectively field values and value values of a redis cache format with a hash structure.

And S306, performing word segmentation processing on the text content by using the word segmentation word bank.

In this embodiment, before the text content is matched with the words in the word stock, the word stock is used to perform word segmentation processing on the text content. The word segmentation word bank comprises all word banks corresponding to two or more verification demand party identifications, and the types of words in the word segmentation word bank comprise Chinese words, Chinese phrases, English words, English phrases and English abbreviations. It can be seen that the words in the word segmentation word bank do not include individual chinese characters or individual english letters, and therefore, the individual chinese characters or the individual english letters included in the word segmentation result obtained by segmenting the text content according to the word segmentation word bank are not the words in the word segmentation word bank.

Further, before word segmentation is carried out on the text content, the word segmentation word bank can be stored in the cache, and then words in the word segmentation word bank stored in the cache are directly called to carry out word segmentation on the text content, so that the word segmentation speed can be greatly improved.

S307, determining whether the word segmentation result with the same word type as the word type in the word segmentation word bank in the word segmentation result of the text content comprises the word in the intercepted word bank with the attribute corresponding to the verification demander identifier stored in the first data format, if so, executing step 312, and if not, executing step 308.

Take the text information sent by the auditing demander as an example in Chinese. Based on the content in step 306, in this embodiment, except for the single Chinese character, the word segmentation result of the text content should have the same type as the word in the word segmentation lexicon. Therefore, in this step, it is actually determined whether the words other than the single chinese characters in the word segmentation result of the text content include the word in the intercepted word bank corresponding to the attribute of the identifier of the auditing demander stored in the first data format. That is to say, in this embodiment, the individual chinese characters in the word segmentation result are not matched with the words in the lexicon, so that the matching efficiency and speed are greatly improved.

In this embodiment, the process of matching the word segmentation result of the text content with the word in the word bank intercepted by the attribute corresponding to the identifier of the auditing demander stored in the first data format includes: firstly, searching each group of data which is stored in a first data format and meets the following two conditions, wherein the first condition is that the identifier of an examination demander is the same as the identifier of the examination demander corresponding to the text content, and the second condition is that the identifier of a word bank belongs to the identifier of the word bank with the attribute of examination; and then, matching the word segmentation result of one non-independent Chinese character in the word segmentation result of the text content with the words in the selected group of data one by one, and if the same words are not matched, continuing to match with the words in the other selected group of data one by one until the same words are matched, or until the matching with all the selected data is completed.

S308, determining whether the word segmentation result with the same word type as the word type in the word segmentation word bank in the word segmentation result of the text content comprises the word in the word bank which is stored in the first data format and has the attribute corresponding to the verification demander identifier as the verification, if so, executing the step 309, and if not, executing the step 311.

Similarly, in this embodiment, it is determined whether, in the word segmentation result of the text content, the word segmentation result whose word type is the same as the word type in the word segmentation library includes a word in the word library whose attribute corresponding to the identifier of the requirement for verification stored in the first data format is used as a verification, that is, it is determined whether, in the word segmentation result of the text content, a word other than the single chinese character and the single english letter includes a word in the word library whose attribute corresponding to the identifier of the requirement for verification stored in the first data format is used as a verification. The matching process in step S08 is similar to the matching process in step S307, and will not be described in detail here.

S309, determining a penalty level corresponding to the text content according to the category of the words in the word bank included in the text content and a penalty template corresponding to the identifier of the auditing demander.

In this embodiment, when the word segmentation result of the text content includes a word in the checked lexicon as an attribute corresponding to the identifier of the checking demander, the penalty level corresponding to the text content may be determined according to the attribute corresponding to the identifier of the checking demander included in the word segmentation result of the text content as the category of the word in the checked lexicon and the penalty template corresponding to the identifier of the checking demander.

Illustratively, the word segmentation result of the text content is "you, good, XX", wherein the word "XX" has the same attribute as the word "XX" in the word stock with the audit requiring party identifier, the category of the word "XX" belongs to the offending term, and the penalty level corresponding to the word category of the offending term "in the penalty template corresponding to the audit requiring party identifier is" severe ", and the penalty level of" hello XX "is determined to be" severe ".

And S310, displaying words in the word bank included in the text content, the text content and the penalty level corresponding to the text content, and enabling a worker to review the text content and feed back the review result to the review demander.

In this embodiment, when the text content is displayed to the staff for auditing, the words in the lexicon included in the text content, and the penalty level corresponding to the text content are displayed at the same time to help the auditor make a correct audit decision.

And S311, executing a text processing mode corresponding to the identifier of the checking demander.

And S312, generating an interception instruction and sending the interception instruction to an audit demander.

The embodiment of the invention provides a method for auditing a text, which specifically increases a word segmentation process of text content, and embodies a matching process of the text content and a word bank into a matching process of a word segmentation result of the text content and the word bank, and also specifically increases a process of storing the word bank corresponding to an identifier of an auditing demander into a cache in a first data format, thereby greatly improving the matching speed and efficiency of the text content and the word bank, and also specifically increasing the acquisition and storage processes of an original word bank, so that the storage efficiency of self data is improved to the greatest extent while realizing effective auditing on text confidence sent by a plurality of auditing demanders, the content displayed during auditing of a worker is also embodied, and the acquisition process of a penalty template is increased, so that the worker can better master the violation condition of the text content.

Example four

Fig. 4 is a block diagram of an apparatus for reviewing a text according to a fourth embodiment of the present invention. As shown in fig. 4, the apparatus includes: an information acquisition module 401, a content determination module 402, and an information processing policy execution module 403, wherein:

the information acquisition module 401 is configured to acquire text information sent by two or more auditing demanders, where the text information includes an auditing demander identifier and text content;

a content determining module 402, configured to determine whether the text content includes a word in a lexicon corresponding to the identifier of the auditing demander;

and an information processing policy executing module 403, configured to execute an information processing policy matched with the attribute of the thesaurus if the text content includes a word in the thesaurus corresponding to the identifier of the auditing demander.

The embodiment of the invention provides a device for auditing texts, which first acquires text information sent by two or more auditing demanders through an information acquisition module 401, then determines whether the text content comprises words in a lexicon corresponding to an auditing demander identifier through a content determination module 402, and finally executes an information processing strategy matched with the attribute of the lexicon through an information processing strategy execution module 403 if the text content comprises the words in the lexicon corresponding to the auditing demander identifier.

The device for auditing the texts solves the technical defects that different text auditing methods are used for auditing different types of texts in the prior art, so that the software and hardware resource occupancy rate is high, the workload is large and the work repetition rate is high in the text auditing process, the high-efficiency and accurate auditing of different types of text information is realized by using the same auditing process, the text auditing work efficiency is greatly improved, and the software and hardware resource occupancy rate in the text auditing process is reduced.

On the basis of the above embodiments, the information processing policy execution module 403 may include:

the audit strategy execution unit is used for displaying the text content if the text content comprises words in a word bank and the attribute of the word bank is audit, and is used for auditing the text content by workers and feeding back an audit result to an audit demander;

the interception policy execution unit is used for generating an interception instruction and sending the interception instruction to the auditing demander if the text content comprises words in the word bank and the attribute of the word bank is interception;

a policy setting execution unit, configured to execute a text processing mode corresponding to the identifier of the auditing demander if the text content does not include the words in the thesaurus; and the text processing mode is to generate a display instruction and send the display instruction to the auditing demander, or display the text content, and is used for the staff to audit the text content and feed back an audit result to the auditing demander.

On the basis of the foregoing embodiments, the information obtaining module 401 may specifically be configured to:

acquiring text information sent by two or more auditing demanders through a pre-constructed standard interface;

the standard interface is associated with two or more auditing demanders, and a unified data transmission standard is predefined in the standard interface.

On the basis of the above embodiments, the text information may further include: the user reports the identification;

the apparatus for reviewing a text may further include:

the user reporting information determining module is used for determining whether the text information is the user reporting information according to the user reporting identification before determining whether the text content comprises the words in the word bank corresponding to the identification of the auditing demander;

the content determination module 402 may include:

the first content determining unit is used for firstly determining whether the text content comprises the words in the intercepted word bank with the attribute corresponding to the identifier of the auditing demander if the text information is not the user report information, and if the text content does not comprise the words in the intercepted word bank with the attribute corresponding to the identifier of the auditing demander, continuously determining whether the text content comprises the words in the intercepted word bank with the attribute corresponding to the identifier of the auditing demander;

and the second content determining unit is used for only determining whether the text content comprises words in a word bank which are checked and have the attribute corresponding to the identifier of the checking demander if the text information is the user report information.

The apparatus for reviewing a text may further include:

the word segmentation module is used for performing word segmentation processing on the text content by using a word segmentation word bank before determining whether the text content comprises words in the word bank corresponding to the verification demand party identification, wherein the word segmentation word bank comprises all word banks corresponding to two or more verification demand party identifications, and the types of the words in the word segmentation word bank comprise Chinese words, Chinese phrases, English words, English phrases and English abbreviations;

the content determination module 402 may be specifically configured to:

and determining whether the word segmentation result with the same word type as the word type in the word segmentation word bank in the word segmentation result of the text content comprises the word in the word bank corresponding to the identifier of the auditing requiring party.

The apparatus for reviewing a text may further include:

the data storage module is used for storing the association relationship among the verification demander, the word bank corresponding to the verification demander identifier, the words in the word bank corresponding to the verification demander identifier, the categories of the words and the identifiers of the words and the words in the word bank corresponding to the verification demander identifier and the identifiers of the words and the words in the cache space in advance according to a first data format before determining whether the text content comprises the words in the word bank corresponding to the verification demander identifier or not, and storing a group of data stored in the first data format corresponding to the words of one category in one word bank corresponding to the verification demander identifier;

the first data format is a redis cache format of a hash structure, and the identifier of the auditor demander, the identifier of the auditor demander corresponding to the word bank and the category identifier of the word in the word bank corresponding to the auditor demander are key values of the redis cache format of the hash structure; all the words and the identifiers of all the words corresponding to the identifiers of the auditing demander, the identifiers of the word bank corresponding to the identifiers of the auditing demander and the category identifiers of the words in the word bank corresponding to the identifiers of the auditing demander are respectively field values and value values of a redis cache format with a hash structure;

the content determination module 402 may be specifically configured to:

and determining whether the word segmentation result with the same word type as the word type in the word segmentation word bank in the word segmentation result of the text content comprises the word in the word bank corresponding to the verification demand party identifier stored in the first data format.

The apparatus for reviewing a text may further include:

the word bank obtaining module is used for obtaining the original word banks corresponding to the two or more identification of the auditing demander before obtaining the text information sent by the two or more auditing demanders;

the word bank storage module is used for storing the words with the same category of the words and the same attribute of the word bank in the original word bank corresponding to the obtained identifier of the auditing demander into a public word bank; storing the words which are not stored in the public word bank in the word bank corresponding to the obtained identifier of the auditing demander into the word bank corresponding to the identifier of the auditing demander;

the data storage module may specifically be configured to:

according to the public word bank and the verification demand word bank, according to a first data format, the association relation among words, word categories and word identifications in the word bank corresponding to the verification demand party identification, the word bank corresponding to the verification demand party identification and the word and word identifications in the word bank corresponding to the verification demand party identification are stored in a cache space in advance.

The audit policy enforcement unit may include:

a penalty grade determining subunit, configured to determine, if the text content includes words in a word bank, and an attribute of the word bank is audit, a penalty grade corresponding to the text content according to a category of the words in the word bank included in the text content and a penalty template corresponding to an identifier of an audit demander;

the audit data display subunit is used for displaying words and phrases in the word bank included in the text content, the text content and the penalty level corresponding to the text content, and is used for auditing the text content by a worker and feeding back an audit result to an audit demander;

wherein the penalty template comprises penalty policies corresponding to different penalty levels.

The apparatus for reviewing a text may further include:

and the penalty template acquisition module is used for acquiring penalty templates respectively corresponding to the two or more identification of the auditing demander before acquiring the text information sent by the two or more auditing demanders.

The device for auditing the text provided by the embodiment of the invention can be used for executing the method for auditing the text provided by any embodiment of the invention, has corresponding functional modules and realizes the same beneficial effect.

EXAMPLE five

Fig. 5 is a schematic structural diagram of an electronic device according to a fifth embodiment of the present invention, as shown in fig. 5, the electronic device includes a processor 50, a memory 51, an input device 52, and an output device 53; the number of the processors 50 in the electronic device may be one or more, and one processor 50 is taken as an example in fig. 5; the processor 50, the memory 51, the input device 52 and the output device 53 in the electronic apparatus may be connected by a bus or other means, and the bus connection is exemplified in fig. 5.

The memory 51, which is a computer-readable storage medium, may be used to store software programs, computer-executable programs, and modules, such as modules corresponding to the method of reviewing a text in the embodiment of the present invention (for example, the information acquisition module 401, the content determination module 402, and the information processing policy execution module 403). The processor 50 executes various functional applications and data processing of the electronic device by executing software programs, instructions and modules stored in the memory 51, namely, implements the method for auditing texts.

The memory 51 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the memory 51 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the memory 51 may further include memory located remotely from the processor 50, which may be connected to the electronic device through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 52 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function controls of the electronic apparatus. The output device 53 may include a display device such as a display screen.

EXAMPLE six

An embodiment of the present invention further provides a storage medium containing computer-executable instructions, which when executed by a computer processor, perform a method for reviewing text, the method comprising:

Of course, the storage medium provided by the embodiment of the present invention contains computer-executable instructions, and the computer-executable instructions are not limited to the method operations described above, and may also perform related operations in the method for reviewing a text provided by any embodiment of the present invention.

From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, an electronic device, or a network device) to execute the methods according to the embodiments of the present invention.

It should be noted that, in the embodiment of the apparatus for reviewing a text, the included units and modules are only divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A method for reviewing text, comprising:

and if the text content comprises words in the word bank and the attribute of the word bank is audit, displaying the text content for the staff to audit the text content and feed back the audit result to the audit demander.

2. The method of claim 1, further comprising:

and if the text content comprises the words in the word bank and the attribute of the word bank is interception, generating an interception instruction and sending the interception instruction to the auditing demander.

3. The method of claim 1, further comprising:

if the text content does not include the words in the word bank, executing a text processing mode corresponding to the identification of the auditing demander;

and the text processing mode is to generate a display instruction and send the display instruction to the auditing demander, or display the text content, and is used for the staff to audit the text content and feed back an audit result to the auditing demander.

4. The method according to claim 1, wherein the acquiring the text information sent by two or more auditing demanders comprises:

wherein the standard interface is associated with the two or more audit demanders and a unified data transmission standard is predefined in the standard interface.

5. The method according to any one of claims 1-4, wherein the text message further comprises a user report identifier;

before the determining whether the text content includes the word in the word bank corresponding to the identifier of the auditing demander, the method further includes:

determining whether the text information is user reporting information according to the user reporting identification;

the determining whether the text content includes a word in a word bank corresponding to the identifier of the auditing demander includes:

if the text information is not user report information, firstly determining whether the text content comprises words in an intercepted word bank corresponding to the attribute of the auditing demander identifier, and if the text content does not comprise the words in the intercepted word bank corresponding to the attribute of the auditing demander identifier, continuously determining whether the text content comprises the words in the audited word bank corresponding to the attribute of the auditing demander identifier;

and if the text information is user report information, only determining whether the text content comprises words in a word bank which is audited and has the attribute corresponding to the audit requiring party identification.

6. The method according to any one of claims 1 to 4, wherein if the text content includes words in the thesaurus and the attribute of the thesaurus is audit, displaying the text content for a worker to audit the text content and feed back an audit result to the audit demander, including:

if the text content comprises words in the word bank and the attribute of the word bank is audit, determining a penalty level corresponding to the text content according to the category of the words in the word bank and a penalty template corresponding to the identifier of the audit demander, wherein the category of the words in the word bank is contained in the text content;

displaying words in the word bank, the text content and the penalty level corresponding to the text content, wherein the words are included in the text content, and the penalty level is used for a worker to review the text content and feed back a review result to the review demander;

7. The method of claim 6, before acquiring the text messages sent by two or more audit demanders, further comprising:

and acquiring penalty templates respectively corresponding to the two or more identifiers of the auditing demander.

8. An apparatus for reviewing text, comprising:

and the information processing strategy execution module is used for displaying the text content if the text content comprises words in the word bank and the attribute of the word bank is audit, and is used for auditing the text content by workers and feeding back the audit result to the audit demander.

9. An electronic device, characterized in that the electronic device comprises:

one or more processors;

storage means for storing one or more programs;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement a method of reviewing text as recited in any one of claims 1-7.

10. A storage medium containing computer executable instructions for performing a method of reviewing text as recited in any one of claims 1-7 when executed by a computer processor.