CN113010708A - Verification method and system for illegal friend circle content and illegal chat content - Google Patents

Verification method and system for illegal friend circle content and illegal chat content Download PDF

Info

Publication number
CN113010708A
CN113010708A CN202110265325.2A CN202110265325A CN113010708A CN 113010708 A CN113010708 A CN 113010708A CN 202110265325 A CN202110265325 A CN 202110265325A CN 113010708 A CN113010708 A CN 113010708A
Authority
CN
China
Prior art keywords
illegal
content
contents
chat
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110265325.2A
Other languages
Chinese (zh)
Other versions
CN113010708B (en
Inventor
尤成成
李俊锋
龚哲
陈建勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Maitang Information Technology Co ltd
Original Assignee
Shanghai Maitang Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Maitang Information Technology Co ltd filed Critical Shanghai Maitang Information Technology Co ltd
Priority to CN202110265325.2A priority Critical patent/CN113010708B/en
Publication of CN113010708A publication Critical patent/CN113010708A/en
Application granted granted Critical
Publication of CN113010708B publication Critical patent/CN113010708B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/48Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/483Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Library & Information Science (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Databases & Information Systems (AREA)
  • Biophysics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Information Transfer Between Computers (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method and a system for auditing contents of illegal friend circles and contents of illegal chatting, wherein the method comprises the following steps: acquiring content data or chat content to be published; if the data of the contents to be published or the chat contents contain keywords in a preset violation content database, defining the data of the contents to be published or the chat contents as violation contents; and if the user corresponding to the to-be-published content data or the chat content is a user in the preset illegal user database, auditing the to-be-published content data or the chat content according to a preset similar auditing algorithm. According to the method, the data of the content to be published is subjected to double auditing according to the content and the user, the auditing accuracy is improved, and illegal contents such as friend circles, chat contents, live broadcast barracks, individual signatures and the like can be effectively audited.

Description

Verification method and system for illegal friend circle content and illegal chat content
Technical Field
The invention belongs to the technical field of media transmission, and particularly relates to a method and a system for auditing illegal friend circle content and illegal chat content.
Background
At present, with the wide use of interactive platforms such as WeChat and tremble, more and more users like to publish various contents through friend circle, chat content, live broadcast barrage, personalized signature and the like, which leads to the content of friend circle, chat content, live broadcast barrage and personalized signature data being various, and a great number of subjects exist, such as: travel, food, music, sports, etc. And the auditor needs to audit whether illegal contents exist in the contents such as published contents, chatting contents, live broadcast barrage, personalized signature and the like, so that the pressure of content audit and screening is improved. At present, an effective automatic supervision method does not exist, and illegal contents such as friend circles, chat contents, live broadcast barracks, personalized signatures and the like can be effectively screened.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a method and a system for auditing illegal friend circle content and illegal chat content, which can effectively audit illegal content such as friend circle, chat content, live broadcast barrage, individual signature and the like.
In a first aspect, an auditing method for illegal friend circle content and illegal chat content comprises the following steps:
acquiring content data or chat content to be published;
if the data of the contents to be published or the chat contents contain keywords in a preset violation content database, defining the data of the contents to be published or the chat contents as violation contents;
and if the user corresponding to the to-be-published content data or the chat content is a user in the preset illegal user database, auditing the to-be-published content data or the chat content according to a preset similar auditing algorithm.
Preferably, the violation user database is constructed by the following method:
when the number of times that the user issues the illegal contents exceeds the preset violation number upper limit, judging whether the illegal user database exists or not;
if yes, adding the user into the illegal user database;
and if the user does not exist, creating the illegal user database, and adding the user into the illegal user database.
Preferably, the auditing the to-be-published content data or the chat content according to a preset similar auditing algorithm specifically includes:
acquiring historical published content data or historical chat content of the user;
segmenting words of historical published content data or historical chat content, and extracting N words with highest publication frequency;
removing the keywords in the violation content database and the common words in the preset common database from the N words to obtain high-frequency words;
and when the high-frequency vocabulary exists in the to-be-issued table content data or the chat content, defining the to-be-issued table content data or the chat content as illegal content.
Preferably, the high frequency vocabulary includes words, pinyin or english letters.
Preferably, after obtaining the high-frequency vocabulary, the method further includes:
extracting partial high-frequency words of partial users in the illegal user database to obtain check words;
receiving a manually input verification result; the verification result comprises vocabularies left after non-violation vocabularies are manually removed from the verification vocabularies;
inputting the verification vocabulary and the verification result into a preset neural network model for machine learning;
and inputting all high-frequency words of all users in the illegal user database into the learned neural network model, and filtering the high-frequency words of all the users.
In a second aspect, an auditing system for illegal friend circle content and illegal chat content includes:
a collecting unit: the system comprises a detection unit, a display unit and a display unit, wherein the detection unit is used for acquiring content data to be published or chat content and transmitting the content data to be published or the chat content to the detection unit;
a detection unit: the method comprises the steps that when keyword in a preset violation content database is contained in the to-be-issued table content data or the chat content, the to-be-issued table content data or the chat content is defined as violation content; and when the user corresponding to the to-be-published content data or the chat content is a user in a preset illegal user database, auditing the to-be-published content data or the chat content according to a preset similar auditing algorithm.
Preferably, the violation user database is constructed by the following method:
when the number of times that the user issues the illegal contents exceeds the preset violation number upper limit, judging whether the illegal user database exists or not;
if yes, adding the user into the illegal user database;
and if the user does not exist, creating the illegal user database, and adding the user into the illegal user database.
Preferably, the detection unit is specifically configured to:
acquiring historical published content data or historical chat content of the user;
segmenting words of historical published content data or historical chat content, and extracting N words with highest publication frequency;
removing the keywords in the violation content database and the common words in the preset common database from the N words to obtain high-frequency words;
and when the high-frequency vocabulary exists in the to-be-issued table content data or the chat content, defining the to-be-issued table content data or the chat content as illegal content.
Preferably, the high frequency vocabulary includes words, pinyin or english letters.
Preferably, the detection unit is further configured to:
extracting partial high-frequency words of partial users in the illegal user database to obtain check words;
receiving a manually input verification result; the verification result comprises vocabularies left after non-violation vocabularies are manually removed from the verification vocabularies;
inputting the verification vocabulary and the verification result into a preset neural network model for machine learning;
and inputting all high-frequency words of all users in the illegal user database into the learned neural network model, and filtering the high-frequency words of all the users.
According to the technical scheme, the method and the system for auditing the contents of the illegal friend circle and the contents of the illegal chat, provided by the invention, have the advantages that the data of the contents to be published are subjected to double auditing according to the contents and the user, the auditing accuracy is improved, and the illegal contents such as the friend circle, the contents of the chat, the live broadcast barrage, the individual signature and the like can be effectively audited.
Drawings
In order to more clearly illustrate the detailed description of the invention or the technical solutions in the prior art, the drawings that are needed in the detailed description of the invention or the prior art will be briefly described below. Throughout the drawings, like elements or portions are generally identified by like reference numerals. In the drawings, elements or portions are not necessarily drawn to scale.
Fig. 1 is a flowchart of an auditing method for illegal friend circle content and illegal chat content according to an embodiment of the present invention.
Fig. 2 is a block diagram of modules of an auditing system for illegal friend circle content and illegal chat content according to a second embodiment of the present invention.
Detailed Description
Embodiments of the present invention will be described in detail below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and therefore are only examples, and the protection scope of the present invention is not limited thereby. It is to be noted that, unless otherwise specified, technical or scientific terms used herein shall have the ordinary meaning as understood by those skilled in the art to which the invention pertains.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".
The first embodiment is as follows:
an auditing method for illegal friend circle content and illegal chat content, referring to fig. 1, includes the following steps:
acquiring content data or chat content to be published;
if the data of the contents to be published or the chat contents contain keywords in a preset violation content database, defining the data of the contents to be published or the chat contents as violation contents;
and if the user corresponding to the to-be-published content data or the chat content is a user in the preset illegal user database, auditing the to-be-published content data or the chat content according to a preset similar auditing algorithm.
Specifically, the method prohibits publishing of the to-be-published table content data or sending of the chat record when the to-be-published table content data or the chat content is checked to be illegal. When the method is used for auditing the contents of the illegal friend circle and the contents of the illegal chat, two dimensions are mainly considered: content and user. If the content or chat content includes an offending keyword, the offending content is identified. If the illegal user publishes the content, the illegal user is further deeply audited by using a similar audit algorithm, because the illegal user has high illegal content publishing frequency, and is familiar with the keywords in the illegal content database, similar or replaceable vocabulary is often changed to avoid the keywords in the illegal content database from appearing and escaping the audit of the content. Therefore, when the content data of the to-be-issued list of the illegal user is acquired, further deep auditing needs to be carried out on the data, and the illegal content is prevented from being issued. According to the method, the data of the content to be published is subjected to double auditing according to the content and the user, the auditing accuracy is improved, and illegal contents such as friend circles, chat contents, live broadcast barracks, individual signatures and the like can be effectively audited.
Preferably, the violation user database is constructed by the following method:
when the number of times that the user issues the illegal contents exceeds the preset violation number upper limit, judging whether the illegal user database exists or not;
if yes, adding the user into the illegal user database;
and if the user does not exist, creating the illegal user database, and adding the user into the illegal user database.
Specifically, when the number of times of illegal contents published by a certain user is detected to be large, the user is considered as the illegal user, and the illegal contents are published frequently. The method constructs the violation user database according to the users with more violation contents.
Preferably, the auditing the to-be-published content data according to a preset similar auditing algorithm specifically includes:
acquiring historical published content data or historical chat content of the user;
segmenting words of historical published content data or historical chat content, and extracting N words with highest publication frequency;
removing the keywords in the violation content database and the common words in the preset common database from the N words to obtain high-frequency words;
and when the high-frequency vocabulary exists in the to-be-issued table content data or the chat content, defining the to-be-issued table content data or the chat content as illegal content.
Specifically, when the method is used for auditing a similar auditing algorithm, historical published content data or historical chat content of a user corresponding to-be-published content data or chat content is firstly obtained, the historical published content data or the historical chat content is analyzed to obtain high-frequency words used by the user, and because the words in the illegal content database cannot be adopted when the user publishes the content or sends the chat, other words are used for replacing the words in the illegal content database without fail, and in the further deep auditing process, the replaced words are required to be judged to be illegal words. When the high-frequency words of the user are extracted, firstly, N words with the highest frequency in historical published content data or historical chat content are extracted, then, keywords and common words are removed from the words, and the rest is the high-frequency words of the user replacing the keywords. And if the user finds that the high-frequency vocabulary exists in the contents to be published or the chat contents when publishing the contents or sending the chat, the contents are considered as illegal contents.
Preferably, the high frequency vocabulary includes words, pinyin or english letters.
Specifically, the high-frequency vocabulary may be a word synonymous with or substitutable for the keyword, or a pinyin of the keyword, or an english abbreviation of the keyword or a substitute letter, and the like. A high-frequency word is a symbol having the same meaning as a keyword expression.
Preferably, after obtaining the high-frequency vocabulary, the method further includes:
extracting partial high-frequency words of partial users in the illegal user database to obtain check words;
receiving a manually input verification result; the verification result comprises vocabularies left after non-violation vocabularies are manually removed from the verification vocabularies;
inputting the verification vocabulary and the verification result into a preset neural network model for machine learning;
and inputting all high-frequency words of all users in the illegal user database into the learned neural network model, and filtering the high-frequency words of all the users.
In particular, to further improve the accuracy of high frequency vocabulary. The method also combines manual examination and machine learning methods to screen high-frequency vocabularies. The method firstly extracts a part of obtained high-frequency words, wherein the part of obtained high-frequency words can be all high-frequency words corresponding to a part of users, or part of high-frequency words corresponding to the part of users. And establishing a check vocabulary according to the extracted high-frequency vocabulary. And then, manually verifying the verification vocabulary, eliminating non-illegal vocabularies in the verification vocabulary, performing machine learning on the rest vocabularies, learning and eliminating the high-frequency vocabularies before and after elimination through a neural network model, and learning the rule of manual examination and verification by using the neural network model. And finally, filtering the high-frequency words of all users by using the well-learned neural network model to obtain the high-frequency words with higher accuracy.
Example two:
an auditing system for illegal friend circle content and illegal chat content, referring to fig. 2, comprises:
a collecting unit: the system comprises a detection unit, a display unit and a display unit, wherein the detection unit is used for acquiring content data to be published or chat content and transmitting the content data to be published or the chat content to the detection unit;
a detection unit: the method comprises the steps that when keyword in a preset violation content database is contained in the to-be-issued table content data or the chat content, the to-be-issued table content data or the chat content is defined as violation content; and when the user corresponding to the to-be-published content data or the chat content is a user in a preset illegal user database, auditing the to-be-published content data or the chat content according to a preset similar auditing algorithm.
Specifically, when the system checks that the to-be-issued table content data or the chat content is illegal, publication of the to-be-issued table content data or sending of the chat record is prohibited. When the system is used for auditing the contents of the illegal friend circle and the contents of the illegal chat, two dimensions are mainly considered: content and user. If the content or chat content includes an offending keyword, the offending content is identified. If the illegal user publishes the content, the illegal user is further deeply audited by using a similar audit algorithm, because the illegal user has high illegal content publishing frequency, and is familiar with the keywords in the illegal content database, similar or replaceable vocabulary is often changed to avoid the keywords in the illegal content database from appearing and escaping the audit of the content. Therefore, when the content data of the to-be-issued list of the illegal user is acquired, further deep auditing needs to be carried out on the data, and the illegal content is prevented from being issued. The system performs double audit on the data of the content to be published according to the content and the user, improves the accuracy of the audit, and can effectively audit illegal contents such as friend circles, chat contents, live broadcast barracks, individual signatures and the like.
Preferably, the violation user database is constructed by the following method:
when the number of times that the user issues the illegal contents exceeds the preset violation number upper limit, judging whether the illegal user database exists or not;
if yes, adding the user into the illegal user database;
and if the user does not exist, creating the illegal user database, and adding the user into the illegal user database.
Preferably, the detection unit is specifically configured to:
acquiring historical published content data or historical chat content of the user;
segmenting words of historical published content data or historical chat content, and extracting N words with highest publication frequency;
removing the keywords in the violation content database and the common words in the preset common database from the N words to obtain high-frequency words;
and when the high-frequency vocabulary exists in the to-be-issued table content data or the chat content, defining the to-be-issued table content data or the chat content as illegal content.
Preferably, the high frequency vocabulary includes words, pinyin or english letters.
Preferably, the detection unit is further configured to:
extracting partial high-frequency words of partial users in the illegal user database to obtain check words;
receiving a manually input verification result; the verification result comprises vocabularies left after non-violation vocabularies are manually removed from the verification vocabularies;
inputting the verification vocabulary and the verification result into a preset neural network model for machine learning;
and inputting all high-frequency words of all users in the illegal user database into the learned neural network model, and filtering the high-frequency words of all the users.
For the sake of brief description, the system provided by the embodiment of the present invention may refer to the corresponding content in the foregoing embodiments.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the present invention, and they should be construed as being included in the following claims and description.

Claims (10)

1. An auditing method for illegal friend circle content and illegal chat content is characterized by comprising the following steps:
acquiring content data or chat content to be published;
if the data of the contents to be published or the chat contents contain keywords in a preset violation content database, defining the data of the contents to be published or the chat contents as violation contents;
and if the user corresponding to the to-be-published content data or the chat content is a user in the preset illegal user database, auditing the to-be-published content data or the chat content according to a preset similar auditing algorithm.
2. The method for auditing the contents of the illegal friend circle and the contents of the illegal chat according to claim 1, wherein the database of the illegal user is constructed by the following method:
when the number of times that the user issues the illegal contents exceeds the preset violation number upper limit, judging whether the illegal user database exists or not;
if yes, adding the user into the illegal user database;
and if the user does not exist, creating the illegal user database, and adding the user into the illegal user database.
3. The method for auditing contents of illegal friend circles and contents of illegal chats according to claim 1, wherein the auditing of to-be-published content data or chatting contents according to a preset similar auditing algorithm specifically comprises:
acquiring historical published content data or historical chat content of the user;
segmenting words of historical published content data or historical chat content, and extracting N words with highest publication frequency;
removing the keywords in the violation content database and the common words in the preset common database from the N words to obtain high-frequency words;
and when the high-frequency vocabulary exists in the to-be-issued table content data or the chat content, defining the to-be-issued table content data or the chat content as illegal content.
4. The method for auditing contents of an illegal circle of friends and contents of illegal chatting according to claim 3,
the high-frequency vocabulary comprises characters, pinyin or English letters.
5. The method for auditing contents of illegal friend circles and contents of illegal chatting according to claim 4, further comprising the following steps after obtaining the high-frequency vocabulary:
extracting partial high-frequency words of partial users in the illegal user database to obtain check words;
receiving a manually input verification result; the verification result comprises vocabularies left after non-violation vocabularies are manually removed from the verification vocabularies;
inputting the verification vocabulary and the verification result into a preset neural network model for machine learning;
and inputting all high-frequency words of all users in the illegal user database into the learned neural network model, and filtering the high-frequency words of all the users.
6. An auditing system for illegal friend circle content and illegal chat content is characterized by comprising:
a collecting unit: the system comprises a detection unit, a display unit and a display unit, wherein the detection unit is used for acquiring content data to be published or chat content and transmitting the content data to be published or the chat content to the detection unit;
a detection unit: the method comprises the steps that when keyword in a preset violation content database is contained in the to-be-issued table content data or the chat content, the to-be-issued table content data or the chat content is defined as violation content; and when the user corresponding to the to-be-published content data or the chat content is a user in a preset illegal user database, auditing the to-be-published content data or the chat content according to a preset similar auditing algorithm.
7. The system for auditing contents of illegal friend circles and contents of illegal chats according to claim 6, wherein the database of illegal users is constructed by the following method:
when the number of times that the user issues the illegal contents exceeds the preset violation number upper limit, judging whether the illegal user database exists or not;
if yes, adding the user into the illegal user database;
and if the user does not exist, creating the illegal user database, and adding the user into the illegal user database.
8. The system for auditing contents of an illegal circle of friends and contents of an illegal chat according to claim 6, wherein the detection unit is specifically configured to:
acquiring historical published content data or historical chat content of the user;
segmenting words of historical published content data or historical chat content, and extracting N words with highest publication frequency;
removing the keywords in the violation content database and the common words in the preset common database from the N words to obtain high-frequency words;
and when the high-frequency vocabulary exists in the to-be-issued table content data or the chat content, defining the to-be-issued table content data or the chat content as illegal content.
9. An auditing system for illegal friend circle content and illegal chat content according to claim 8,
the high-frequency vocabulary comprises characters, pinyin or English letters.
10. The system for auditing contents of an illegal circle of friends and contents of an illegal chat according to claim 9, wherein the detection unit is further configured to:
extracting partial high-frequency words of partial users in the illegal user database to obtain check words;
receiving a manually input verification result; the verification result comprises vocabularies left after non-violation vocabularies are manually removed from the verification vocabularies;
inputting the verification vocabulary and the verification result into a preset neural network model for machine learning;
and inputting all high-frequency words of all users in the illegal user database into the learned neural network model, and filtering the high-frequency words of all the users.
CN202110265325.2A 2021-03-11 2021-03-11 Method and system for auditing illegal friend circle content and illegal chat content Active CN113010708B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110265325.2A CN113010708B (en) 2021-03-11 2021-03-11 Method and system for auditing illegal friend circle content and illegal chat content

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110265325.2A CN113010708B (en) 2021-03-11 2021-03-11 Method and system for auditing illegal friend circle content and illegal chat content

Publications (2)

Publication Number Publication Date
CN113010708A true CN113010708A (en) 2021-06-22
CN113010708B CN113010708B (en) 2023-08-25

Family

ID=76405146

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110265325.2A Active CN113010708B (en) 2021-03-11 2021-03-11 Method and system for auditing illegal friend circle content and illegal chat content

Country Status (1)

Country Link
CN (1) CN113010708B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117793043A (en) * 2024-01-06 2024-03-29 广州微阿信息技术有限公司 Chat content auditing and processing method and system

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090034786A1 (en) * 2007-06-02 2009-02-05 Newell Steven P Application for Non-Display of Images Having Adverse Content Categorizations
US20120239663A1 (en) * 2011-03-18 2012-09-20 Citypulse Ltd. Perspective-based content filtering
CN106445998A (en) * 2016-05-26 2017-02-22 达而观信息科技(上海)有限公司 Text content auditing method and system based on sensitive word
CN107846343A (en) * 2016-09-18 2018-03-27 郭荆玮 A kind of flexible real-time purification method in chatroom and chat system
CN109831682A (en) * 2018-12-28 2019-05-31 广州华多网络科技有限公司 Signal auditing method, device, electronic equipment and storage medium
CN110377900A (en) * 2019-06-17 2019-10-25 深圳壹账通智能科技有限公司 Checking method, device, computer equipment and the storage medium of Web content publication
CN111522785A (en) * 2020-04-17 2020-08-11 上海中通吉网络技术有限公司 Data extraction auditing method, device and equipment
CN112231484A (en) * 2020-11-19 2021-01-15 湖南红网新媒体集团有限公司 News comment auditing method, system, device and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090034786A1 (en) * 2007-06-02 2009-02-05 Newell Steven P Application for Non-Display of Images Having Adverse Content Categorizations
US20120239663A1 (en) * 2011-03-18 2012-09-20 Citypulse Ltd. Perspective-based content filtering
CN106445998A (en) * 2016-05-26 2017-02-22 达而观信息科技(上海)有限公司 Text content auditing method and system based on sensitive word
CN107846343A (en) * 2016-09-18 2018-03-27 郭荆玮 A kind of flexible real-time purification method in chatroom and chat system
CN109831682A (en) * 2018-12-28 2019-05-31 广州华多网络科技有限公司 Signal auditing method, device, electronic equipment and storage medium
CN110377900A (en) * 2019-06-17 2019-10-25 深圳壹账通智能科技有限公司 Checking method, device, computer equipment and the storage medium of Web content publication
CN111522785A (en) * 2020-04-17 2020-08-11 上海中通吉网络技术有限公司 Data extraction auditing method, device and equipment
CN112231484A (en) * 2020-11-19 2021-01-15 湖南红网新媒体集团有限公司 News comment auditing method, system, device and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
邹岚;徐芳;: "文本内容信息过滤系统的研究与设计", 电脑知识与技术, no. 34, pages 8187 - 8191 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117793043A (en) * 2024-01-06 2024-03-29 广州微阿信息技术有限公司 Chat content auditing and processing method and system

Also Published As

Publication number Publication date
CN113010708B (en) 2023-08-25

Similar Documents

Publication Publication Date Title
CN105577660B (en) DGA domain name detection method based on random forest
US9781139B2 (en) Identifying malware communications with DGA generated domains by discriminative learning
CN106357618B (en) Web anomaly detection method and device
CN111639177B (en) Text extraction method and device
US9471712B2 (en) Approximate matching of strings for message filtering
KR101715432B1 (en) Word pair acquisition device, word pair acquisition method, and recording medium
US8606795B2 (en) Frequency based keyword extraction method and system using a statistical measure
US20170063893A1 (en) Learning detector of malicious network traffic from weak labels
CN111831824B (en) Public opinion positive and negative surface classification method
CN108874777A (en) A kind of method and device of text anti-spam
CN109446404A (en) A kind of the feeling polarities analysis method and device of network public-opinion
US20170289082A1 (en) Method and device for identifying spam mail
CN103313248B (en) Method and device for identifying junk information
EP3433762A1 (en) Method, system and tool for content moderation
CN113590764B (en) Training sample construction method and device, electronic equipment and storage medium
CN111935097B (en) Method for detecting DGA domain name
CN103984703A (en) Mail classification method and device
CN111079029A (en) Sensitive account detection method, storage medium and computer equipment
CN110830607A (en) Domain name analysis method and device and electronic equipment
CN111523317B (en) Voice quality inspection method and device, electronic equipment and medium
CN113010637A (en) Text auditing method and device
CN115758183A (en) Training method and device for log anomaly detection model
CN113542252A (en) Detection method, detection model and detection device for Web attack
CN111431884B (en) Host computer defect detection method and device based on DNS analysis
CN113010708A (en) Verification method and system for illegal friend circle content and illegal chat content

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant