CN113010708A

CN113010708A - Verification method and system for illegal friend circle content and illegal chat content

Info

Publication number: CN113010708A
Application number: CN202110265325.2A
Authority: CN
Inventors: 尤成成; 李俊锋; 龚哲; 陈建勇
Original assignee: Shanghai Maitang Information Technology Co ltd
Current assignee: Shanghai Maitang Information Technology Co ltd
Priority date: 2021-03-11
Filing date: 2021-03-11
Publication date: 2021-06-22
Anticipated expiration: 2041-03-11
Also published as: CN113010708B

Abstract

The invention provides a method and a system for auditing contents of illegal friend circles and contents of illegal chatting, wherein the method comprises the following steps: acquiring content data or chat content to be published; if the data of the contents to be published or the chat contents contain keywords in a preset violation content database, defining the data of the contents to be published or the chat contents as violation contents; and if the user corresponding to the to-be-published content data or the chat content is a user in the preset illegal user database, auditing the to-be-published content data or the chat content according to a preset similar auditing algorithm. According to the method, the data of the content to be published is subjected to double auditing according to the content and the user, the auditing accuracy is improved, and illegal contents such as friend circles, chat contents, live broadcast barracks, individual signatures and the like can be effectively audited.

Description

Verification method and system for illegal friend circle content and illegal chat content

Technical Field

The invention belongs to the technical field of media transmission, and particularly relates to a method and a system for auditing illegal friend circle content and illegal chat content.

Background

At present, with the wide use of interactive platforms such as WeChat and tremble, more and more users like to publish various contents through friend circle, chat content, live broadcast barrage, personalized signature and the like, which leads to the content of friend circle, chat content, live broadcast barrage and personalized signature data being various, and a great number of subjects exist, such as: travel, food, music, sports, etc. And the auditor needs to audit whether illegal contents exist in the contents such as published contents, chatting contents, live broadcast barrage, personalized signature and the like, so that the pressure of content audit and screening is improved. At present, an effective automatic supervision method does not exist, and illegal contents such as friend circles, chat contents, live broadcast barracks, personalized signatures and the like can be effectively screened.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a method and a system for auditing illegal friend circle content and illegal chat content, which can effectively audit illegal content such as friend circle, chat content, live broadcast barrage, individual signature and the like.

In a first aspect, an auditing method for illegal friend circle content and illegal chat content comprises the following steps:

acquiring content data or chat content to be published;

if the data of the contents to be published or the chat contents contain keywords in a preset violation content database, defining the data of the contents to be published or the chat contents as violation contents;

and if the user corresponding to the to-be-published content data or the chat content is a user in the preset illegal user database, auditing the to-be-published content data or the chat content according to a preset similar auditing algorithm.

Preferably, the violation user database is constructed by the following method:

when the number of times that the user issues the illegal contents exceeds the preset violation number upper limit, judging whether the illegal user database exists or not;

if yes, adding the user into the illegal user database;

and if the user does not exist, creating the illegal user database, and adding the user into the illegal user database.

Preferably, the auditing the to-be-published content data or the chat content according to a preset similar auditing algorithm specifically includes:

acquiring historical published content data or historical chat content of the user;

segmenting words of historical published content data or historical chat content, and extracting N words with highest publication frequency;

removing the keywords in the violation content database and the common words in the preset common database from the N words to obtain high-frequency words;

and when the high-frequency vocabulary exists in the to-be-issued table content data or the chat content, defining the to-be-issued table content data or the chat content as illegal content.

Preferably, the high frequency vocabulary includes words, pinyin or english letters.

Preferably, after obtaining the high-frequency vocabulary, the method further includes:

extracting partial high-frequency words of partial users in the illegal user database to obtain check words;

receiving a manually input verification result; the verification result comprises vocabularies left after non-violation vocabularies are manually removed from the verification vocabularies;

inputting the verification vocabulary and the verification result into a preset neural network model for machine learning;

and inputting all high-frequency words of all users in the illegal user database into the learned neural network model, and filtering the high-frequency words of all the users.

In a second aspect, an auditing system for illegal friend circle content and illegal chat content includes:

a collecting unit: the system comprises a detection unit, a display unit and a display unit, wherein the detection unit is used for acquiring content data to be published or chat content and transmitting the content data to be published or the chat content to the detection unit;

a detection unit: the method comprises the steps that when keyword in a preset violation content database is contained in the to-be-issued table content data or the chat content, the to-be-issued table content data or the chat content is defined as violation content; and when the user corresponding to the to-be-published content data or the chat content is a user in a preset illegal user database, auditing the to-be-published content data or the chat content according to a preset similar auditing algorithm.

Preferably, the violation user database is constructed by the following method:

if yes, adding the user into the illegal user database;

Preferably, the detection unit is specifically configured to:

Preferably, the detection unit is further configured to:

According to the technical scheme, the method and the system for auditing the contents of the illegal friend circle and the contents of the illegal chat, provided by the invention, have the advantages that the data of the contents to be published are subjected to double auditing according to the contents and the user, the auditing accuracy is improved, and the illegal contents such as the friend circle, the contents of the chat, the live broadcast barrage, the individual signature and the like can be effectively audited.

Drawings

In order to more clearly illustrate the detailed description of the invention or the technical solutions in the prior art, the drawings that are needed in the detailed description of the invention or the prior art will be briefly described below. Throughout the drawings, like elements or portions are generally identified by like reference numerals. In the drawings, elements or portions are not necessarily drawn to scale.

Fig. 1 is a flowchart of an auditing method for illegal friend circle content and illegal chat content according to an embodiment of the present invention.

Fig. 2 is a block diagram of modules of an auditing system for illegal friend circle content and illegal chat content according to a second embodiment of the present invention.

Detailed Description

Embodiments of the present invention will be described in detail below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and therefore are only examples, and the protection scope of the present invention is not limited thereby. It is to be noted that, unless otherwise specified, technical or scientific terms used herein shall have the ordinary meaning as understood by those skilled in the art to which the invention pertains.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

The first embodiment is as follows:

an auditing method for illegal friend circle content and illegal chat content, referring to fig. 1, includes the following steps:

acquiring content data or chat content to be published;

Specifically, the method prohibits publishing of the to-be-published table content data or sending of the chat record when the to-be-published table content data or the chat content is checked to be illegal. When the method is used for auditing the contents of the illegal friend circle and the contents of the illegal chat, two dimensions are mainly considered: content and user. If the content or chat content includes an offending keyword, the offending content is identified. If the illegal user publishes the content, the illegal user is further deeply audited by using a similar audit algorithm, because the illegal user has high illegal content publishing frequency, and is familiar with the keywords in the illegal content database, similar or replaceable vocabulary is often changed to avoid the keywords in the illegal content database from appearing and escaping the audit of the content. Therefore, when the content data of the to-be-issued list of the illegal user is acquired, further deep auditing needs to be carried out on the data, and the illegal content is prevented from being issued. According to the method, the data of the content to be published is subjected to double auditing according to the content and the user, the auditing accuracy is improved, and illegal contents such as friend circles, chat contents, live broadcast barracks, individual signatures and the like can be effectively audited.

Preferably, the violation user database is constructed by the following method:

if yes, adding the user into the illegal user database;

Specifically, when the number of times of illegal contents published by a certain user is detected to be large, the user is considered as the illegal user, and the illegal contents are published frequently. The method constructs the violation user database according to the users with more violation contents.

Preferably, the auditing the to-be-published content data according to a preset similar auditing algorithm specifically includes:

Specifically, when the method is used for auditing a similar auditing algorithm, historical published content data or historical chat content of a user corresponding to-be-published content data or chat content is firstly obtained, the historical published content data or the historical chat content is analyzed to obtain high-frequency words used by the user, and because the words in the illegal content database cannot be adopted when the user publishes the content or sends the chat, other words are used for replacing the words in the illegal content database without fail, and in the further deep auditing process, the replaced words are required to be judged to be illegal words. When the high-frequency words of the user are extracted, firstly, N words with the highest frequency in historical published content data or historical chat content are extracted, then, keywords and common words are removed from the words, and the rest is the high-frequency words of the user replacing the keywords. And if the user finds that the high-frequency vocabulary exists in the contents to be published or the chat contents when publishing the contents or sending the chat, the contents are considered as illegal contents.

Specifically, the high-frequency vocabulary may be a word synonymous with or substitutable for the keyword, or a pinyin of the keyword, or an english abbreviation of the keyword or a substitute letter, and the like. A high-frequency word is a symbol having the same meaning as a keyword expression.

In particular, to further improve the accuracy of high frequency vocabulary. The method also combines manual examination and machine learning methods to screen high-frequency vocabularies. The method firstly extracts a part of obtained high-frequency words, wherein the part of obtained high-frequency words can be all high-frequency words corresponding to a part of users, or part of high-frequency words corresponding to the part of users. And establishing a check vocabulary according to the extracted high-frequency vocabulary. And then, manually verifying the verification vocabulary, eliminating non-illegal vocabularies in the verification vocabulary, performing machine learning on the rest vocabularies, learning and eliminating the high-frequency vocabularies before and after elimination through a neural network model, and learning the rule of manual examination and verification by using the neural network model. And finally, filtering the high-frequency words of all users by using the well-learned neural network model to obtain the high-frequency words with higher accuracy.

Example two:

an auditing system for illegal friend circle content and illegal chat content, referring to fig. 2, comprises:

Specifically, when the system checks that the to-be-issued table content data or the chat content is illegal, publication of the to-be-issued table content data or sending of the chat record is prohibited. When the system is used for auditing the contents of the illegal friend circle and the contents of the illegal chat, two dimensions are mainly considered: content and user. If the content or chat content includes an offending keyword, the offending content is identified. If the illegal user publishes the content, the illegal user is further deeply audited by using a similar audit algorithm, because the illegal user has high illegal content publishing frequency, and is familiar with the keywords in the illegal content database, similar or replaceable vocabulary is often changed to avoid the keywords in the illegal content database from appearing and escaping the audit of the content. Therefore, when the content data of the to-be-issued list of the illegal user is acquired, further deep auditing needs to be carried out on the data, and the illegal content is prevented from being issued. The system performs double audit on the data of the content to be published according to the content and the user, improves the accuracy of the audit, and can effectively audit illegal contents such as friend circles, chat contents, live broadcast barracks, individual signatures and the like.

Preferably, the violation user database is constructed by the following method:

if yes, adding the user into the illegal user database;

Preferably, the detection unit is specifically configured to:

Preferably, the detection unit is further configured to:

For the sake of brief description, the system provided by the embodiment of the present invention may refer to the corresponding content in the foregoing embodiments.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the present invention, and they should be construed as being included in the following claims and description.

Claims

1. An auditing method for illegal friend circle content and illegal chat content is characterized by comprising the following steps:

acquiring content data or chat content to be published;

2. The method for auditing the contents of the illegal friend circle and the contents of the illegal chat according to claim 1, wherein the database of the illegal user is constructed by the following method:

if yes, adding the user into the illegal user database;

3. The method for auditing contents of illegal friend circles and contents of illegal chats according to claim 1, wherein the auditing of to-be-published content data or chatting contents according to a preset similar auditing algorithm specifically comprises:

4. The method for auditing contents of an illegal circle of friends and contents of illegal chatting according to claim 3,

the high-frequency vocabulary comprises characters, pinyin or English letters.

5. The method for auditing contents of illegal friend circles and contents of illegal chatting according to claim 4, further comprising the following steps after obtaining the high-frequency vocabulary:

6. An auditing system for illegal friend circle content and illegal chat content is characterized by comprising:

7. The system for auditing contents of illegal friend circles and contents of illegal chats according to claim 6, wherein the database of illegal users is constructed by the following method:

if yes, adding the user into the illegal user database;

8. The system for auditing contents of an illegal circle of friends and contents of an illegal chat according to claim 6, wherein the detection unit is specifically configured to:

9. An auditing system for illegal friend circle content and illegal chat content according to claim 8,

the high-frequency vocabulary comprises characters, pinyin or English letters.

10. The system for auditing contents of an illegal circle of friends and contents of an illegal chat according to claim 9, wherein the detection unit is further configured to: