CN110674255B

CN110674255B - Text content auditing method and device

Info

Publication number: CN110674255B
Application number: CN201910904584.8A
Authority: CN
Inventors: 吴红; 张亦驰; 向钊豫; 欧阳潘义
Original assignee: Hunan MgtvCom Interactive Entertainment Media Co Ltd
Current assignee: Hunan MgtvCom Interactive Entertainment Media Co Ltd
Priority date: 2019-09-24
Filing date: 2019-09-24
Publication date: 2022-08-26
Anticipated expiration: 2039-09-24
Also published as: CN110674255A

Abstract

The invention provides a text content auditing method, which comprises the following steps: and when a text auditing request is received, acquiring text content, and matching each word data of the text content with each keyword data in the database. If the word data are successfully matched, determining that the text content is a negative text, generating a first checking result and sending the first checking result to the client; and if the word data matching is not successful, inputting the text content into the AI auditing model for auditing, acquiring auditing parameters output by the AI auditing model, generating a second auditing result according to the auditing parameters, and sending the second auditing result to the client, wherein the auditing parameters are used for determining whether the text content is a negative text. By applying the method, on the basis of matching by using each keyword data in the database, deep auditing is carried out through the AI auditing model, the auditing precision of the text content is improved, and whether the text content is a negative text or not is accurately known.

Description

Text content auditing method and device

Technical Field

The invention relates to the technical field of information processing, in particular to a text content auditing method and device.

Background

With the development of the internet and the increase of network users, people increasingly depend on the internet and spread various information, such as news comments, video barracks, forums, microblogs, blogs and the like. Users can speak on various network platforms. However, as the network becomes more popular, some civilized users can disseminate some negative commentary with abuse on the network, or publish some irrigated text content without any reference or reading value, which can cause other users to disseminate the negative commentary with the wind, and affect the network environment.

In the prior art, each large network media operator will check various text contents published by a user, and usually, judges whether the text contents published by the user are negative texts or irrigation texts by matching keywords. However, the negative words carried in the text content can not be completely recognized only by matching the keywords, and when the user is making a statement, the negative word group in the text content is replaced by a homophonic word group, for example: WeChat, Wei Xin; or a plurality of words of the same phrase are separated by using some special symbols, so that whether the text content carries negative words or not cannot be correctly judged in the process of examining and verifying the text content.

Disclosure of Invention

In view of this, the present invention provides a method for examining and verifying text content, which matches each word data of the text content with each keyword data in a database to determine whether there is a negative word or phrase in the text content. If the word data are not successfully matched, then the AI auditing model is used for carrying out deep auditing on the text content, so that the accuracy of auditing the text content is improved, and whether the text content is a negative text or not is accurately known.

The invention also provides a text content auditing device used for ensuring the realization and application of the method in practice.

A text content auditing method comprises the following steps:

when a text auditing request sent by a client is received, acquiring text contents contained in the text auditing request;

determining word data corresponding to the text content, and matching each word data with each keyword data in a pre-established database, wherein each word data comprises words and phrases in the text content, word pinyin corresponding to each word and phrase pinyin corresponding to each phrase, and the keyword data is each preset keyword and keyword pinyin corresponding to each keyword;

when word data are matched with the keyword data in the database, determining that the text content is a negative text, generating a first auditing result corresponding to the negative text, and sending the first auditing result to the client;

when no word data is matched with the keyword data in the database, inputting the text content into an AI auditing model which is trained in advance, and triggering the AI auditing model to audit the text content;

and when receiving the auditing parameters output by the AI auditing model according to the text content, generating a second auditing result according to the auditing parameters, and sending the second auditing result to the client, wherein the auditing parameters comprise auditing parameters of normal texts, auditing parameters of negative texts and auditing parameters of irrigation texts.

Optionally, the determining word data corresponding to the text content and matching each word data with each keyword data in a pre-established database includes:

calling a preset word segmentation module to extract a plurality of words and phrases from the text content;

determining word pinyin corresponding to each word and phrase pinyin corresponding to each phrase;

and matching each word and each phrase with each keyword in the database respectively, and matching the word pinyin corresponding to each word and the phrase pinyin corresponding to each phrase with the keyword pinyin in the database respectively.

Optionally, the matching the words and phrases with the keywords in the database, and matching the word pinyin corresponding to each word and the phrase pinyin corresponding to each phrase with the keyword pinyin in the database includes:

sequencing the words, the word groups, the word pinyin and the word group pinyin according to the text content;

matching each word and phrase with each keyword in the database in sequence according to the sequence;

judging whether the matching of the words or phrases to be matched and the keywords is successful or not;

and if the matching words or phrases and the keywords are not successfully matched each time, sequentially matching the word pinyins and the phrase pinyins with the keyword pinyins in the database according to the sequencing sequence.

Optionally, the inputting the text content into a preset AI audit model, and triggering the AI audit model to audit the text content includes:

deleting each text symbol carried in the text content to obtain a text to be audited corresponding to the text content;

inputting the text to be audited into a preset vector converter, triggering the vector converter to convert the text to be audited, and obtaining a coding vector corresponding to the text to be audited;

and acquiring a first audit model and a second audit model in the AI audit model, inputting the coding vectors into the first audit model and the second audit model respectively, triggering the first audit model and the second audit model to audit the coding vectors respectively, and then generating first audit data and second audit data corresponding to the coding vectors respectively so that the AI audit model generates audit parameters corresponding to the text content according to the first audit data and the second audit data.

The above method, optionally, further includes:

updating the first examination result or the second examination result to a preset task list, wherein the task list comprises examination results of the examined historical text contents;

when a task query request sent by a user is received, sending the task list to a preset display interface, so that the user can check the checking result of each historical text content in the task list through the display interface.

Optionally, the method described above includes a training process of the AI auditing model, including:

acquiring a pre-stored training data set, wherein the training data set comprises training data carrying labels;

sequentially applying the training data to train an initial auditing model until the network parameters of the initial auditing model meet preset training conditions;

when each initial auditing model is input, obtaining a current auditing parameter corresponding to training data currently input into the initial auditing model; calling a preset loss function, and calculating the current auditing parameters and the labels corresponding to the training data currently input into the initial auditing model to obtain a loss function value; judging whether the model parameters of the initial auditing model meet the training conditions or not according to the loss function values; if not, adjusting the model parameters of the initial auditing model according to the loss function values; and if so, determining the initial audit model as an AI audit model.

A text content auditing apparatus comprising:

the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring text contents contained in a text audit request when the text audit request sent by a client is received;

the matching unit is used for determining each word data corresponding to the text content and matching each word data with each keyword data in a pre-established database, wherein each word data comprises words and phrases in the text content, word pinyin corresponding to each word and phrase pinyin corresponding to each phrase, and the keyword data is each preset keyword and keyword pinyin corresponding to each keyword;

the first generating unit is used for determining the text content as a negative text when word data are matched with the keyword data in the database, generating a first auditing result corresponding to the negative text, and sending the first auditing result to the client;

the auditing unit is used for inputting the text content into an AI auditing model which is trained in advance when no word data are matched with the keyword data in the database, and triggering the AI auditing model to audit the text content;

and the second generation unit is used for generating a second audit result according to the audit parameter when receiving the audit parameter output by the AI audit model according to the text content, and sending the second audit result to the client, wherein the audit parameter comprises the audit parameter of the normal text, the audit parameter of the negative text and the audit parameter of the irrigation text.

The above apparatus, optionally, the matching unit includes:

the extraction subunit is used for calling a preset word segmentation module to extract a plurality of words and phrases from the text content;

a determining subunit, configured to determine a word pinyin corresponding to each word and a phrase pinyin corresponding to each phrase;

and the first matching subunit is used for respectively matching each word and each phrase with each keyword in the database, and respectively matching the word pinyin corresponding to each word and the phrase pinyin corresponding to each phrase with the keyword pinyin in the database.

The above apparatus, optionally, the matching unit includes:

the sequencing subunit is used for sequencing each word, phrase, word pinyin and phrase pinyin according to the text content;

the second matching subunit is used for sequentially matching each word and each phrase with each keyword in the database according to the sequencing order;

the judging subunit is used for judging whether the matching of the words or phrases to be matched and the keywords is successful or not;

and the third matching subunit is used for matching the word pinyin and the phrase pinyin with the keyword pinyins in the database in sequence according to the sorting sequence if the matching of the word or the phrase and the keyword is not successful.

The above apparatus, optionally, the auditing unit includes:

the deleting subunit is configured to delete each text symbol carried in the text content, and obtain a to-be-audited text corresponding to the text content;

the conversion subunit is configured to input the text to be audited into a preset vector converter, trigger the vector converter to convert the text to be audited, and obtain a coding vector corresponding to the text to be audited;

and the auditing subunit is configured to obtain a first auditing model and a second auditing model in the AI auditing model, input the coding vectors into the first auditing model and the second auditing model respectively, trigger the first auditing model and the second auditing model to audit the coding vectors respectively, and generate first auditing data and second auditing data corresponding to the coding vectors respectively, so that the AI auditing model generates auditing parameters corresponding to the text content according to the first auditing data and the second auditing data.

A storage medium, which includes stored instructions, and when the instructions are executed, controls a device on which the storage medium is located to execute the text content auditing method.

An electronic device includes a memory, and one or more instructions, where the one or more instructions are stored in the memory and configured to be executed by one or more processors to perform the above text content auditing method.

Compared with the prior art, the invention has the following advantages:

the invention provides a text content auditing method, which comprises the following steps: when a text auditing request sent by a client is received, the text content contained in the text auditing request is obtained, and each word data of the text content is matched with each keyword data in a database. If the word data are successfully matched, determining that the text content is a negative text, generating a first checking result and sending the first checking result to the client; and if the word data matching is not successful, inputting the text content into an AI auditing model for auditing, acquiring auditing parameters output by the AI auditing model, generating a second auditing result according to the auditing parameters, and sending the second auditing result to the client, wherein the auditing parameters are used for determining whether the text content is a negative text. By applying the method provided by the invention, on the basis of matching the text content by using each keyword data in the database, deep examination and check are carried out through the AI examination model, the accuracy of examination and check on the text content is improved, and whether the text content is a negative text or not is accurately known.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a flowchart of a method for auditing text contents according to an embodiment of the present invention;

fig. 2 is a flowchart of another method of a text content auditing method according to an embodiment of the present invention;

fig. 3 is a flowchart of another method of a text content auditing method according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a method for auditing text contents according to an embodiment of the present invention;

fig. 5 is a schematic diagram of another method of a text content auditing method according to an embodiment of the present invention;

fig. 6 is a schematic diagram illustrating a method for auditing text contents according to an embodiment of the present invention;

fig. 7 is a device structure diagram of a text content auditing device according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.

In this application, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions, and the terms "comprises", "comprising", or any other variation thereof are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The invention is operational with numerous general purpose or special purpose computing device environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multi-processor apparatus, distributed computing environments that include any of the above devices or equipment, and the like.

The embodiment of the invention provides a text content auditing method, which can be applied to various system platforms, wherein an execution subject of the method can be a computer terminal or a processor of a scheduling cluster in various mobile devices, and a flow chart of the method is shown in figure 1, and the method specifically comprises the following steps:

s101: when a text auditing request sent by a client is received, acquiring text contents contained in the text auditing request;

in the embodiment of the invention, when a user issues text contents such as comments or creations in the client, the client can send the text contents issued by the user to the processor in a text auditing request mode. When the processor receives a text auditing request sent by the client, the text content contained in the text auditing request is obtained.

It should be noted that the text content may be a word, a phrase or a text composed of a plurality of words and phrases.

S102: determining word data corresponding to the text content, and matching each word data with each keyword data in a pre-established database, wherein each word data comprises words and phrases in the text content, word pinyin corresponding to each word and phrase pinyin corresponding to each phrase, and the keyword data is each preset keyword and keyword pinyin corresponding to each keyword;

in the embodiment of the present invention, each word data corresponding to the text content is determined. Each word data includes each word and phrase of the text content, and word pinyin corresponding to each word and phrase pinyin corresponding to the phrase. For example, the text content is "you are really excellent", and the respective word data are "you", "real", "very", "excellent", "ni", "zhende", "hen", "youxiu", respectively. After determining each word data, matching each word data with each keyword data in the database. Wherein, each keyword data in the database also comprises a plurality of keywords and the pinyin of the keyword corresponding to each keyword.

It should be noted that each keyword in the data may be a negative word, and in order to determine whether a text content sent by the client has a negative word, each keyword in the database may be matched to determine whether the text content is a negative text. Negative text contains words that are abusive, insulting, or undeniating.

S103: when word data are matched with the keyword data in the database, determining that the text content is a negative text, generating a first auditing result corresponding to the negative text, and sending the first auditing result to the client;

in the embodiment of the present invention, in the matching process, if there is a word data matching the keyword data in the database, that is, there are some negative words in the text content. And determining the text content as a negative text, and generating a first checking result, wherein the first checking result indicates that the text content is a negative text and comprises parameters such as time for checking the text content. And sending the first examination result to the client, wherein the client can perform operations such as revocation or shielding on the text content.

S104: when no word data is matched with the keyword data in the database, inputting the text content into an AI auditing model which is trained in advance, and triggering the AI auditing model to audit the text content;

in the embodiment of the present invention, after the matching is completed, when none of the word data can be successfully matched with the keyword data in the database, it may be preliminarily determined that there may be no negative vocabulary in the text content. Inputting the text content into an AI auditing model which is trained in advance, and triggering the AI auditing model to carry out deep auditing so as to determine whether negative words or semantics exist in the text content.

S105: and when receiving the auditing parameters output by the AI auditing model according to the text content, generating a second auditing result according to the auditing parameters, and sending the second auditing result to the client, wherein the auditing parameters comprise auditing parameters of normal texts, auditing parameters of negative texts and auditing parameters of irrigation texts.

In the embodiment of the invention, when the AI audit model is received to output the audit parameter, a second audit result is generated according to the audit parameter, and the text content is determined to be a normal text, a negative text or an irrigation text according to the audit parameter. Optionally, the AI audit model may determine negative text, normal text, and water text, as well as other text of various text types, such as foreign language text, numeric text, and the like.

In the text content auditing method provided by the embodiment of the invention, when a text auditing request sent by a client is received, the text content is obtained, and each word data corresponding to the text content is respectively matched with each keyword data in a database through the pre-established database. The word data corresponding to the text content comprises words and phrases of the text content, word pinyin corresponding to each word and phrase pinyin corresponding to each phrase. Similarly, a large amount of keyword data is also stored in the database, wherein the keyword data is keywords and keyword pinyins corresponding to the keywords. Since each keyword is a negative vocabulary, it is determined whether or not a negative vocabulary exists in the text content by matching the word data with each keyword data. And if the text content has negative words, namely when the word data corresponding to the text content are matched, any word data are matched with the keyword data in the database, and the text content is determined to be a negative text. And generating a first examination result, and sending the first examination result to the client, so that the client can perform operations such as revocation or shielding on the text content according to the first examination result. If no negative vocabulary exists in the text content, that is, no word data is matched with each keyword data in the database in the matching process of each word data, it is determined that the text content may not carry a negative text. But each keyword data in the database may not be comprehensive enough, or each word data is not a negative vocabulary, but the semantics of the text content has some negative meanings, or the text content may be a water-pouring text without practical meanings. For example, the text content is: ",,. . . ", the text content does not have any practical meaning, and thus the text content is water-pouring text. And when the word data corresponding to the text content is not successfully matched, inputting the text content into a pre-trained AI auditing model, and triggering the AI auditing model to audit the text content to determine whether the text type of the text content is a negative text. And when the AI auditing model outputs the auditing parameters, generating a second auditing result according to the auditing parameters. The auditing parameters can indicate whether the text content is a normal text, a negative text or an irrigation text, and comprise auditing parameters of the normal text, auditing parameters of the negative text and auditing parameters of the irrigation text. For example: if the auditing parameter is 1, the text content is a normal text; if the auditing parameter is 2, the text content is a negative text; if the auditing parameter is 3, the text content is the irrigation text. And generating a second auditing result according to the auditing parameters, wherein the second auditing result indicates the text type of the text content and comprises parameters such as time for auditing the text content.

It should be noted that the AI auditing model may be a deep neural network model. After deep learning, the deep neural network model can classify and identify various text contents and determine the text type of each text content. The deep neural network model has the function of a natural language processing nlp auditing algorithm, and can audit and calculate various text contents through the nlp auditing algorithm.

It should also be noted that the text content may be short text content. Optionally, when the text content is a long text content, the long text content may be split into a plurality of short text contents, and each short text content may be audited one by one. If any short text content is a negative text, the corresponding long text content is also a negative text.

Optionally, the pre-established database includes a plurality of keyword data, and the keyword data are keywords and keyword pinyins corresponding to each keyword, respectively. The database can be updated in real time to obtain the latest keywords and store the pinyin of the keywords corresponding to the updated keywords. The keywords may be particularly words or phrases that carry abuse, non-civilized semantics, which are negative words. By updating each keyword data in the database, the matching success rate of each word data is improved, and the negative text is effectively identified.

Based on the method provided by the embodiment, the process of auditing the text content can be applied to the process of publishing the video barrage by the user, and the specific embodiment is as follows:

when the user sends the bullet screen content that the actor performs well while watching the video a, the client sends a text audit request corresponding to the bullet screen content to the processor. After the processor receives the text review request, the bullet screen content is acquired, that the actor performs well, and word data corresponding to the bullet screen content is determined as follows: "this", "actor", "technique", "very", "good", "zhege", "yanyuan", "yanji", "hen", "cuco", and matches each word data with each keyword data in the database. When none of them match successfully, the bullet screen content "this actor performed well" is entered into the AI audit model. And when the AI auditing model outputs the auditing parameter 1 representing the normal text, determining that the bullet screen content sent by the user is the normal text. And generating an audit result according to the audit parameter and sending the audit result to the client, wherein the client can display the bullet screen content sent by the user in the bullet screen display area of the video A according to the audit result.

By applying the method provided by the embodiment of the invention, after the text content is obtained, the word data corresponding to the text content is matched with the keyword data in the database, so that whether the text content is a negative text can be preliminarily determined, and when the word data is unsuccessfully matched with the keyword data in the database, the text content is audited through an AI audit model, and the text content is further deeply determined to be a normal text, a negative text or an irrigation text. The method for auditing the text content through the AI auditing model improves the accuracy of auditing the text content and correctly learns whether the text content is a negative text.

In the method provided in the embodiment of the present invention, based on step S102, after obtaining the text content, determining each word data corresponding to the text content, and performing a process of matching each word data with each keyword data in the database is shown in fig. 2, and specifically includes:

s201: calling a preset word segmentation module to extract a plurality of words and phrases from the text content;

in the embodiment of the invention, after the text content is obtained, a preset word segmentation module is used for extracting a plurality of words and phrases in the text content. That is, the text content is divided into a plurality of words and phrases.

It should be noted that the word segmentation module may be a jieba word segmentation module in the chinese word segmentation module, and is used to divide a chinese sentence into a plurality of words and phrases.

S202: determining word pinyin corresponding to each word and phrase pinyin corresponding to each phrase;

in the embodiment of the invention, after a plurality of words and phrases in the text content are extracted according to the word segmentation module, the corresponding pinyin of each word and phrase is marked, and the word pinyin corresponding to each word and the phrase pinyin corresponding to each phrase are determined.

S203, matching each word and each phrase with each keyword in the database, and matching the word pinyin corresponding to each word and the phrase pinyin corresponding to each phrase with the keyword pinyin in the database.

In the embodiment of the invention, each word and each phrase are matched with each keyword in the database respectively, and similarly, the word pinyin and the phrase pinyin are matched with each keyword pinyin in the database.

In the text content auditing method provided by the embodiment of the invention, each word and phrase are extracted by using the word segmentation module, and the word pinyin and the phrase pinyin corresponding to each word and phrase are determined. And matching each word and phrase with each keyword in the database, and simultaneously matching the word pinyin and phrase pinyin with the keyword pinyin of each keyword. By the method, the situation that an uncertified user replaces negative words with homophones can be avoided, and text content can be accurately audited.

In the method provided in the embodiment of the present invention, based on step S203, when matching each word and each phrase with each keyword in the database, and matching the word pinyin corresponding to each word and the phrase pinyin corresponding to each phrase with the keyword pinyin in the database, the method specifically includes:

sequentially matching each word and each phrase with each keyword in the database according to the sequencing sequence;

and if the matched word or phrase is not successfully matched with each keyword every time, sequentially matching the word pinyin and the phrase pinyin with each keyword pinyin in the database according to the sorting sequence.

In the method for examining and verifying the text content provided by the embodiment of the invention, after extracting each word and phrase in the text content according to the word segmentation module and determining the word pinyin corresponding to each word and the phrase pinyin corresponding to each phrase, each word, phrase, word pinyin and phrase pinyin are sequenced. The ordering sequence is consistent with the sequence in the text content, each word and phrase are ordered first, and then each word pinyin and phrase pinyin are ordered. Firstly, sequentially matching each sorted word and phrase with each keyword in a database according to a sorting sequence, and judging whether the word or phrase matched currently is successfully matched or not when the word or phrase matched currently is matched each time. If all the words and phrases are not successfully matched, matching the word pinyin and the phrase pinyin with the keyword pinyins in the database in sequence according to the sorting sequence.

Optionally, based on the method provided in the foregoing embodiment, after the preset word segmentation module is called to extract a plurality of words and phrases in the text content, the words and phrases may be sorted and sequentially matched with each keyword in the database, and if there is no successfully matched word or phrase, the word pinyin and the phrase pinyin corresponding to each word and phrase are determined, and the sorted word pinyin and phrase pinyin are sequentially matched with each keyword pinyin in the database.

Further, in the process of matching each word and each word with each keyword, if any word or phrase is successfully matched with the keyword in the database at present, the text content can be directly determined to be a negative text. And stopping matching the unmatched words or phrases, and simultaneously, not needing to match the word pinyin and the phrase pinyin.

By applying the method provided by the embodiment of the invention, the words and the phrases are matched firstly, and then the word pinyin is matched with the phrase pinyin, so that the subsequent matching process is not needed as long as any word or phrase is successfully matched at present, and the speed of checking the text content is improved.

In the method provided in the embodiment of the present invention, based on step S104, when there is no keyword data matching with the word data in the database, the text content is input into an AI audit model that is trained in advance, and a process of auditing the text content by the AI audit model is triggered, as shown in fig. 3, specifically including:

s301: deleting each text symbol carried in the text content to obtain a text to be audited corresponding to the text content;

in the embodiment of the invention, before the text content is input into the AI auditing model, each text symbol carried in the text content is deleted. The text symbol may specifically include: a "" and a "" are included! "," @ "," # ","% "," … … "," () "," and the like. And deleting each symbol in the text content to obtain the text to be audited corresponding to the text content. For example, the text content is "Tian-he may really be bad! "if the text to be examined is" it can really be bad ".

S302: inputting the text to be audited into a preset vector converter, triggering the vector converter to convert the text to be audited, and obtaining a coding vector corresponding to the text to be audited;

in the embodiment of the present invention, the text to be audited is input into a vector converter, so that the vector converter converts the text to be audited into a coding vector.

It should be noted that the process of converting the text to be checked into the encoding vector may specifically be to correspond each word or phrase in the file to be checked to a tensor with a dimension of 60, intercept 400 characters for the longest text content, and fill 0 in the text content. And converting the text to be audited into a matrix of 400-60 to obtain a coding vector corresponding to the text to be audited.

S303: and acquiring a first audit model and a second audit model in the AI audit model, inputting the coding vectors into the first audit model and the second audit model respectively, triggering the first audit model and the second audit model to audit the coding vectors respectively, and then generating first audit data and second audit data corresponding to the coding vectors respectively so that the AI audit model generates audit parameters corresponding to the text content according to the first audit data and the second audit data.

In the embodiment of the present invention, the AI audit model includes two models, which are a first audit model and a second audit model respectively. And acquiring a first audit model and a second audit model in the AI audit model, and inputting the coding vector into the two models respectively for audit. After the first audit model and the second audit model audit the coding vector, the first audit data and the second audit data corresponding to the coding vector are output, and after the AI audit model obtains the first audit data and the second audit data, the AI audit model generates audit parameters corresponding to the text content.

It should be noted that the first audit model may specifically be a convolutional neural network model. The convolutional neural network model is applied to collect context information between words in the text for the coding vector corresponding to the text to be audited, matrix calculation is performed according to the context information, a feature matrix corresponding to the text content is output, and a feature dimension corresponding to the feature matrix is output, namely, the first audit data mentioned in the above embodiment. Taking three-layer convolution as an example, the feature dimension corresponding to the text content is output as shown in fig. 4. The embedding layer is a matrix dimension of a coding vector, conv is a convolution kernel, and the bn layer is a batch standardized structural layer, so that the training speed can be improved, and the data fitting process can be reduced; the pooled maxpool layer is used for fixing the output length; and finally, the fusion layer fuses the length output by the maxpool layer to obtain first audit data.

The second audit model may specifically be a Long Short-Term Memory network (LSTM) model, which is a special case of a Recurrent Neural Network (RNN). The LSTM model is also used to obtain context information of the entire text content, and a bidirectional LSTM method is used to collect the text content from front to back and from back to front, so as to obtain more complete text features, i.e. the second audit data mentioned in the above embodiments. As shown in fig. 5, the encoding vectors are respectively subjected to forward and backward text content acquisition, and then the text content is acquired from back to front, so as to obtain forward LSTM and backward LSTM, and then the forward LSTM and backward LSTM are fused and output to a bn layer for normalization, so as to obtain second audit data.

Finally, as shown in fig. 6, the AI audit model performs feature fusion on the three layers of convolution feature dimensions, i.e., the first audit data, and the two-way LSTM feature dimensions, i.e., the second audit data, and outputs the audit parameters corresponding to the text content through full connection.

In the text content auditing method provided by the embodiment of the invention, before the text content is input into the AI auditing model, the text content is preprocessed, namely, each text symbol in the text content is deleted to obtain a text to be audited, wherein the text symbol comprises Chinese and English punctuation marks. And then the text to be checked is converted into a coding vector and then is respectively input into a first checking model and a second checking model, the coding vector is checked by the first checking model and the second checking model respectively, context information in the text content is respectively obtained, and first checking data and second checking data are respectively output. By deeply auditing the text content, the accuracy of auditing the text content is improved, and whether the text content is a negative text or not is accurately known.

In the method provided by the embodiment of the invention, the text content is audited by using the AI audit model. The AI audit model is a deep neural network model trained in advance, wherein the training process of the AI audit model specifically comprises:

when each initial auditing model is input, obtaining current auditing parameters corresponding to training data currently input into the initial auditing model; calling a preset loss function, and calculating the current auditing parameters and the labels corresponding to the training data currently input into the initial auditing model to obtain a loss function value; judging whether the model parameters of the initial auditing model meet the training conditions or not according to the loss function values; if not, adjusting the model parameters of the initial auditing model according to the loss function values; and if so, determining the initial audit model as an AI audit model.

In the text content auditing method provided by the embodiment of the invention, a pre-stored training data set is obtained, and the training data set comprises training data carrying labels. Wherein the label of each training data may be manually set. If the text types are divided into normal texts, negative texts and irrigation texts, the training data are the text contents, the labels of the text contents, which are normal texts, are positive samples, the labels of the negative texts are negative samples, and the labels of the irrigation texts are irrigation samples. And training a preset initial auditing model by using each training data in the training data set until the network parameters of the initial auditing model meet preset conditions, and determining the initial auditing model meeting the preset conditions as an AI auditing model. In the process of training the initial auditing model, all training data are sequentially input into the initial auditing model to find out which one to train so that the initial auditing model outputs current auditing parameters. And calling a loss function to calculate the current auditing parameters and the labels of the training data currently input into the initial auditing model, and determining the loss function value of the current auditing parameters. And judging whether the loss function value meets the training condition, namely judging whether the training of the initial auditing model on the training data meets a certain condition. For example, the training condition is that the accuracy of the output of the audit parameter for each training data is 95%, but only 92% after passing the calculation, the training condition is not satisfied. If the calculation result reaches 97%, the training condition is satisfied. And when the training condition is not met, adjusting the model parameters of the initial auditing model so as to improve the training precision when the initial auditing model is subjected to the next auditing training. And when the training condition is met, determining the initial audit model as an AI audit model.

It should be noted that before the initial audit model is trained, a text case may be generated in advance, an audit condition of manually auditing each text content is recorded in the text case, after the initial audit model audits each text content in the text case, an audit result of the initial audit model is recorded in the text case, and consistency between the manual audit and the model audit can be reflected by the text case.

Optionally, the AI audit model may be trained every day, and a text report may be generated in combination with the daily text case, where the text report records the daily audit quantity, accuracy, negative text ratio, and the like for each text content.

By applying the method provided by the embodiment of the invention, the AI auditing model is obtained by training the initial auditing model, so that the accuracy of auditing the text contents is improved when the AI auditing model audits the texts of the text contents.

In the method provided in the embodiment of the present invention, after generating the first audit result or the second audit result, the method specifically further includes:

updating the first audit result or the second audit result to a preset task list, wherein the task list comprises audit results of the audited historical text contents;

In the text content auditing method provided by the embodiment of the invention, when the first auditing result or the second auditing result is generated, the first auditing result or the second auditing result is updated to the preset task list. The task list records a historical audit result of the text content which is audited each time, and specifically comprises a text number of the text content, text creation time, audit time, a result, an operation parameter and the like. When a user sends a task query request, the task list is sent to the user, specifically, the task list is displayed on a preset display interface, and the user can check the auditing result of each text content through the display interface.

By applying the method provided by the embodiment of the invention, the first audit result or the second audit result generated each time can be saved in the task list, and the user can know the audit condition of each text content by accessing the task list.

The specific implementation procedures and derivatives thereof of the above embodiments are within the scope of the present invention.

Corresponding to the method described in fig. 1, an embodiment of the present invention further provides a text content auditing apparatus, which is used for implementing the method in fig. 1 specifically, the text content auditing apparatus provided in the embodiment of the present invention may be applied to a computer terminal or various mobile devices, and a schematic structural diagram of the text content auditing apparatus is shown in fig. 7, and specifically includes:

an obtaining unit 701, configured to obtain text content included in a text audit request when the text audit request sent by a client is received;

a matching unit 702, configured to determine word data corresponding to the text content, and match each word data with each keyword data in a pre-established database, where each word data includes a word and a phrase in the text content, a word pinyin corresponding to each word and a phrase pinyin corresponding to each phrase, and the keyword data is a preset keyword and a keyword pinyin corresponding to each keyword;

a first generating unit 703, configured to determine that the text content is a negative text when there is a match between word data and keyword data in the database, generate a first review result corresponding to the negative text, and send the first review result to the client;

an auditing unit 704, configured to, when there is no matching between word data and keyword data in the database, input the text content into an AI auditing model that is trained in advance, and trigger the AI auditing model to audit the text content;

a second generating unit 705, configured to generate a second audit result according to the audit parameter when receiving the audit parameter output by the AI audit model according to the text content, and send the second audit result to the client, where the audit parameter includes an audit parameter of a normal text, an audit parameter of a negative text, and an audit parameter of an irrigation text.

In the device provided by the embodiment of the invention, when a text auditing request sent by a client is received, the text content needing to be audited is acquired through the acquisition unit, and after each word data is determined, each word data is matched with each keyword data in the database through the matching unit. And if the word data are matched with the keyword data, determining that the text content is a negative text, generating a first checking result by the first generating unit, and sending the first checking result to the client. If the word data are not matched with the keyword data, the text content is input into an AI auditing model for auditing through the auditing unit, the AI auditing model is triggered to audit the text content, and when the AI auditing model outputs auditing parameters, a second auditing result is generated by the second generating unit and is sent to the client.

By applying the device provided by the embodiment of the invention, through the matching unit and the auditing unit, the word data of the text content are matched, and the text content is audited through the AI auditing model, so that whether the text content is a negative text can be more accurately determined, and the situation that an unknown user publishes the negative text on the network is prevented.

In the apparatus provided in the embodiment of the present invention, the matching unit 702 includes:

the sequencing subunit is used for sequencing the words, the phrases, the word pinyin and the phrase pinyin according to the text content;

In the apparatus provided in the embodiment of the present invention, the auditing unit 704 includes:

and the auditing subunit is configured to obtain a first auditing model and a second auditing model in the AI auditing model, input the coding vectors into the first auditing model and the second auditing model respectively, trigger the first auditing model and the second auditing model to audit the coding vectors respectively, and then generate first auditing data and second auditing data corresponding to the coding vectors respectively, so that the AI auditing model generates auditing parameters corresponding to the text content according to the first auditing data and the second auditing data.

The device provided by the embodiment of the invention further comprises:

the updating unit is used for updating the first auditing result or the second auditing result to a preset task list, and the task list comprises auditing results of the audited historical text contents;

and the sending unit is used for sending the task list to a preset display interface when a task query request sent by a user is received, so that the user can check the audit result of each historical text content in the task list through the display interface.

The device provided by the embodiment of the invention further comprises:

the training unit is used for acquiring a pre-stored training data set, and the training data set comprises training data carrying labels; sequentially applying the training data to train an initial auditing model until the network parameters of the initial auditing model meet preset training conditions; when each initial auditing model is input, obtaining a current auditing parameter corresponding to training data currently input into the initial auditing model; calling a preset loss function, and calculating the current auditing parameters and the labels corresponding to the training data currently input into the initial auditing model to obtain a loss function value; judging whether the model parameters of the initial auditing model meet the training conditions or not according to the loss function values; if not, adjusting the model parameters of the initial auditing model according to the loss function values; and if so, determining the initial audit model as an AI audit model.

The specific working processes of each unit and sub-unit of the text content auditing device disclosed in the above embodiment of the present invention can refer to the corresponding contents in the text content auditing method disclosed in the above embodiment of the present invention, and are not described herein again.

The embodiment of the invention also provides a storage medium, which comprises a stored instruction, wherein when the instruction runs, the device where the storage medium is located is controlled to execute the text content auditing method.

An electronic device is provided in an embodiment of the present invention, and the structural diagram of the electronic device is shown in fig. 8, which specifically includes a memory 801 and one or more instructions 802, where the one or more instructions 802 are stored in the memory 801 and configured to be executed by the one or more processors 803 to perform the following operations:

when a text auditing request sent by a client is received, acquiring text content contained in the text auditing request;

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, the system or system embodiments are substantially similar to the method embodiments and therefore are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for related points. The above-described system and system embodiments are only illustrative, wherein the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both.

To clearly illustrate this interchangeability of hardware and software, various illustrative components and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A text content auditing method is characterized by comprising the following steps:

when the word data are matched with the keyword data in the database, determining the text content as a negative text, generating a first checking result corresponding to the negative text, and sending the first checking result to the client so that the client can cancel or shield the text content;

when receiving the auditing parameters output by the AI auditing model according to the text content, generating a second auditing result according to the auditing parameters, and sending the second auditing result to the client, wherein the auditing parameters comprise auditing parameters of normal texts, auditing parameters of negative texts and auditing parameters of irrigation texts;

the inputting the text content into a preset AI auditing model, triggering the AI auditing model to audit the text content, including:

and acquiring a first audit model and a second audit model in the AI audit model, inputting the coding vectors into the first audit model and the second audit model respectively, triggering the first audit model and the second audit model to audit the coding vectors respectively, and then generating first audit data and second audit data corresponding to the coding vectors respectively so that the AI audit model generates audit parameters corresponding to the text contents according to the first audit data and the second audit data.

2. The method of claim 1, wherein determining respective word data corresponding to the text content and matching each of the word data with respective keyword data in a pre-established database comprises:

3. The method of claim 2, wherein the matching each word and phrase with each keyword in the database, and the matching word pinyin corresponding to each word and phrase pinyin corresponding to each phrase with keyword pinyins in the database respectively comprises:

4. The method of claim 1, further comprising:

5. The method according to claim 1, wherein the training process of the AI audit model comprises:

6. A text content auditing apparatus, characterized by comprising:

the first generating unit is used for determining the text content as a negative text when the word data are matched with the keyword data in the database, generating a first examination result corresponding to the negative text, and sending the first examination result to the client so that the client can cancel or shield the text content;

the second generation unit is used for generating a second audit result according to the audit parameter when receiving the audit parameter output by the AI audit model according to the text content, and sending the second audit result to the client, wherein the audit parameter comprises an audit parameter of a normal text, an audit parameter of a negative text and an audit parameter of an irrigation text;

the auditing unit comprises:

7. The apparatus of claim 6, wherein the matching unit comprises:

8. The apparatus of claim 6, wherein the matching unit comprises: