CN111191139A - Brush detection method and system based on feature model - Google Patents

Brush detection method and system based on feature model Download PDF

Info

Publication number
CN111191139A
CN111191139A CN202010003255.9A CN202010003255A CN111191139A CN 111191139 A CN111191139 A CN 111191139A CN 202010003255 A CN202010003255 A CN 202010003255A CN 111191139 A CN111191139 A CN 111191139A
Authority
CN
China
Prior art keywords
data
brush
characteristic
model
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010003255.9A
Other languages
Chinese (zh)
Inventor
王力
李一文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Yingke Mutual Entertainment Network Information Co Ltd
Original Assignee
Hunan Yingke Mutual Entertainment Network Information Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan Yingke Mutual Entertainment Network Information Co Ltd filed Critical Hunan Yingke Mutual Entertainment Network Information Co Ltd
Priority to CN202010003255.9A priority Critical patent/CN111191139A/en
Publication of CN111191139A publication Critical patent/CN111191139A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Abstract

A brush detection method based on a feature model comprises the following steps: detecting type, namely detecting the type of input data, and detecting text if the input data is text data; and text detection, namely acquiring text data, removing irrelevant information from the text data, analyzing the text data from Chinese, letter and number dimensions to obtain a characteristic form, matching the characteristic data with the characteristics of the model base data, and outputting UID data after matching. A feature model based brush detection system, comprising: the type detection module is used for detecting the type of the input data, and performing text detection if the input data is text data; and the text detection module is used for acquiring text data, removing irrelevant information from the text data, analyzing the text data from Chinese, letter and number dimensions to obtain a characteristic form, matching the characteristic data with the characteristics of the model base data, and outputting UID data after matching. The invention can reduce the auditing to the content generated by the brush, the wool party and the like and accelerate the auditing efficiency.

Description

Brush detection method and system based on feature model
Technical Field
The invention relates to the technical field of networks, in particular to a brush detection method and system based on a feature model.
Background
The platform is used as a platform of a content producer, when a large amount of UGC (user generated content) is generated, corresponding validity verification is required to be carried out on the content for purifying the network content safety, and the content can be circulated on the platform after the verification is passed. However, on the premise of the cross of black products and wool parties, a large amount of non-repetitive contents with certain regularity are generated, and a huge challenge is caused to the examination of platform contents, so that when a brush comes, the examination of hands is insufficient, and the task overstock is caused.
The foregoing description is provided for general background information and is not admitted to be prior art.
Disclosure of Invention
The invention aims to provide a brush detection method and a system based on a feature model, which can carry out primary identification on a brush.
The invention provides a brush detection method based on a feature model, which comprises the following steps: detecting type, namely detecting the type of input data, and detecting text if the input data is text data; and text detection, namely acquiring text data, removing irrelevant information from the text data, analyzing the text data from Chinese, letter and number dimensions to obtain a characteristic form, matching the characteristic data with the characteristics of the model base data, and outputting UID data after matching.
Further, the type detection comprises picture detection if the picture data is the picture data; the brush detection method further includes: and detecting the picture, acquiring picture binary data, downloading the data to the local, generating a characteristic hash value by using a perception algorithm, comparing the hash value with the hash of the brush hash library, and outputting UID data after the hash value is in accordance with the characteristics.
Further, in the picture detection step, a hash threshold is set, and when the hash value and the hash of the brush hash library reach the hash threshold, the picture is considered to be in accordance with the characteristics.
Further, the construction method of the model library comprises the following steps: acquiring text characteristics and group sample data submitted by a user side; analyzing the sample data to obtain common characteristics of the sample, and describing the common characteristics as brush characteristics; and storing the brush characteristics into a model library.
Further, the description of the common feature as a brush feature includes: when the sample data accords with the first model characteristic, taking the first model characteristic as a brush characteristic; when the sample data accords with the second model characteristic, taking the second model characteristic as a brush characteristic; the first model characteristic means that all words of a phrase are scattered and divided into single words, so that a plurality of arrays are obtained, and then a common intersection in the arrays is taken to obtain a same single word set; the second model characteristic refers to a common characteristic point which is found by performing character type analysis on all words of the phrase and by the number of Chinese characters, the number of arrays and the number of letters of the phrase.
A feature model based brush detection system, comprising: the type detection module is used for detecting the type of the input data, and performing text detection if the input data is text data; and the text detection module is used for acquiring text data, removing irrelevant information from the text data, analyzing the text data from Chinese, letter and number dimensions to obtain a characteristic form, matching the characteristic data with the characteristics of the model base data, and outputting UID data after matching.
Further, the type detection module performs picture detection if the data type is detected to be picture data; the brush detection method further includes: and the picture detection module is used for acquiring picture binary data, downloading the data to the local, generating a characteristic hash value by using a perception algorithm, comparing the hash value with the hash of the brush hash library, and outputting UID data after the hash value conforms to the characteristics.
Further, a hash threshold is set in the picture detection module, and when the hash value and the hash of the brush hash library reach the hash threshold, the picture detection module considers that the picture meets the characteristics.
Further, the construction method of the model library comprises the following steps: acquiring text characteristics and group sample data submitted by a user side; analyzing the sample data to obtain common characteristics of the sample, and describing the common characteristics as brush characteristics; and storing the brush characteristics into a model library.
Further, the description of the common feature as a brush feature includes: when the sample data accords with the first model characteristic, taking the first model characteristic as a brush characteristic; when the sample data accords with the second model characteristic, taking the second model characteristic as a brush characteristic; the first model characteristic means that all words of a phrase are scattered and divided into single words, so that a plurality of arrays are obtained, and then a common intersection in the arrays is taken to obtain a same single word set; the second model characteristic refers to a common characteristic point which is found by performing character type analysis on all words of the phrase and by the number of Chinese characters, the number of arrays and the number of letters of the phrase.
According to the brush detection method and system based on the feature model, provided by the invention, the text data is analyzed from Chinese, letter and number dimensions to obtain the feature form, and the feature data is matched with the data features of the model library to preliminarily identify the brush, so that the examination and verification of the content generated by the brush, the wool party and the like can be reduced, and the examination and verification efficiency is accelerated.
Drawings
FIG. 1 is a flowchart illustrating a brush inspection method according to an embodiment of the present invention.
Detailed Description
The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.
As shown in fig. 1, in this embodiment, the brush detection method based on the feature model includes the following steps:
and type detection, namely detecting the type of input data (such as task library data) and judging whether the input data is text data or picture data. If the image data is the text data, carrying out text detection, and if the image data is the picture data, carrying out picture detection;
text detection, namely acquiring text data, removing irrelevant information such as expressions, symbols and the like which are irrelevant to the literal meaning from the text data, analyzing the text data from Chinese, letter and number dimensions to obtain a characteristic form, matching the characteristic data with the characteristics of the model base data, and outputting UID (user identification) data after matching.
And detecting the picture, namely obtaining picture binary data, downloading the data to the local, generating a characteristic hash value by using a perception algorithm, comparing the hash value with the hash of the brush hash library (identification degree comparison), and outputting UID data after the characteristic is met. The judgment method of whether the feature is met can be as follows: and setting a hash threshold, and when the hash value and the hash comparison score of the brush hash library reach the hash threshold, determining that the hash value and the hash comparison score meet the characteristics.
Of course, in other embodiments, only text data may be detected, and the picture detection step need not be provided.
In this embodiment, the method for constructing the model library includes: acquiring text characteristics and group sample data submitted by a user side; analyzing the sample data to obtain common characteristics of the sample, and describing the common characteristics as brush characteristics; and storing the brush characteristics into a model library.
Describing a common feature as a brush feature may be: when the sample data accords with the first model characteristic, taking the first model characteristic as a brush characteristic; and when the sample data conforms to the second model characteristic, taking the second model characteristic as a brush characteristic.
The first model characteristic means that all words of a phrase are scattered and separated into single words, so that a plurality of arrays are obtained, and then common intersection in the arrays is taken to obtain the same single word set. Such as "user 3735145832", "user 5747134863", "user 5977056607", which have the same characters as { "0": with ","1": user", "3": 7","5": 5", "8": 5", etc., through analysis of the text.
The second model characteristic refers to a common characteristic point which is found by performing character type analysis on all words of the phrase and by the number of Chinese characters, the number of arrays and the number of letters of the phrase. For example, [ "Happy snow lambkin _83648", "Jazz person heart separated _66450", " Huatian person double separated _53721" ] through the analysis of the text, they have the feature of Chinese number (6) (5).
In this embodiment, the brush detection system based on the feature model includes:
the type detection module is used for detecting the type of input data, performing text detection if the input data is text data, and performing picture detection if the input data is picture data;
the text detection module is used for acquiring text data, removing irrelevant information from the text data, analyzing the text data in Chinese, letter and number dimensions to obtain a characteristic form, matching the characteristic data with the characteristics of the model base data, and outputting UID data after matching is met;
and the picture detection module is used for acquiring picture binary data, downloading the data to the local, generating a characteristic hash value by using a perception algorithm, comparing the hash value with the hash of the brush hash library, and outputting UID data after the hash value conforms to the characteristics.
In this embodiment, the manner of determining whether the image detection module conforms to the feature is as follows: and setting a hash threshold, and when the hash value and the hash comparison score of the brush hash library reach the hash threshold, determining that the hash value and the hash comparison score meet the characteristics.
Also, in other embodiments, only text data may be detected, which may not require the inclusion of a picture detection module.
In this embodiment, the method for constructing the model library includes: acquiring text characteristics and group sample data submitted by a user side; analyzing the sample data to obtain common characteristics of the sample, and describing the common characteristics as brush characteristics; and storing the brush characteristics into a model library.
In this embodiment, the description of the common feature as the brush feature includes: when the sample data accords with the first model characteristic, taking the first model characteristic as a brush characteristic; when the sample data accords with the second model characteristic, taking the second model characteristic as a brush characteristic; the first model characteristic means that all words of a phrase are scattered and divided into single words, so that a plurality of arrays are obtained, and then a common intersection in the arrays is taken to obtain a same single word set; the second model characteristic refers to a common characteristic point which is found by performing character type analysis on all words of the phrase and by the number of Chinese characters, the number of arrays and the number of letters of the phrase.
According to the brush detection method and system based on the feature model, the text data is analyzed in Chinese, letter and number dimensions to obtain the feature form, the feature data is matched with the data features of the model base to preliminarily identify the brush, so that the examination of the content generated by the brush, the wool party and the like can be reduced, and the examination efficiency is accelerated.
In the implementation, the B/S architecture is adopted to construct the system, sample data in the model library is found and submitted by a front-line auditor, asynchronous processing modes are adopted in both the mode of model matching and the mode of task isolation, and under the condition that the audit service is not influenced, the tasks are cleaned by the brush detection system, so that the detection and isolation of the brushes are realized. The main working flow is as follows: the task library is generated by app service, the task data is stored in a warehouse by an auditing system, when a line worker audits, the line worker checks and selects more than 3 text samples to submit at a sample submitting port, the text samples are verified and generated after being submitted, a brush detection system reads a model library and compares the data of the task library, the text data are processed by a text detection module, the picture data are processed by a picture detection module, the characteristic models are written into a brush library after being matched, a brush isolation module deletes the data of the read brush library, the data of the task library are written into a brush partition, and brush detection and isolation are completed.
As used herein, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, including not only those elements listed, but also other elements not expressly listed.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (10)

1. A brush detection method based on a feature model is characterized by comprising the following steps: detecting type, namely detecting the type of input data, and detecting text if the input data is text data; and text detection, namely acquiring text data, removing irrelevant information from the text data, analyzing the text data from Chinese, letter and number dimensions to obtain a characteristic form, matching the characteristic data with the characteristics of the model base data, and outputting UID data after matching.
2. The brush detection method according to claim 1, wherein the type detection includes picture detection if it is picture data; the brush detection method further includes: and detecting the picture, acquiring picture binary data, downloading the data to the local, generating a characteristic hash value by using a perception algorithm, comparing the hash value with the hash of the brush hash library, and outputting UID data after the hash value is in accordance with the characteristics.
3. The brush detection method according to claim 1, wherein in the picture detection step, a hash threshold is set, and when a hash value and a hash of the brush hash library reach the hash threshold, the characteristic is considered to be satisfied.
4. The brush inspection method of claim 1, wherein the model library is constructed by a method comprising: acquiring text characteristics and group sample data submitted by a user side; analyzing the sample data to obtain common characteristics of the sample, and describing the common characteristics as brush characteristics; and storing the brush characteristics into a model library.
5. The brush detection method of claim 4, wherein the describing the common characteristic as a brush characteristic comprises: when the sample data accords with the first model characteristic, taking the first model characteristic as a brush characteristic; when the sample data accords with the second model characteristic, taking the second model characteristic as a brush characteristic; the first model characteristic means that all words of a phrase are scattered and divided into single words, so that a plurality of arrays are obtained, and then a common intersection in the arrays is taken to obtain a same single word set; the second model characteristic refers to a common characteristic point which is found by performing character type analysis on all words of the phrase and by the number of Chinese characters, the number of arrays and the number of letters of the phrase.
6. A feature model based brush detection system, comprising: the type detection module is used for detecting the type of the input data, and performing text detection if the input data is text data; and the text detection module is used for acquiring text data, removing irrelevant information from the text data, analyzing the text data from Chinese, letter and number dimensions to obtain a characteristic form, matching the characteristic data with the characteristics of the model base data, and outputting UID data after matching.
7. The brush detection system of claim 6, wherein the type detection module performs picture detection if it detects that the data type is picture data; the brush detection method further includes: and the picture detection module is used for acquiring picture binary data, downloading the data to the local, generating a characteristic hash value by using a perception algorithm, comparing the hash value with the hash of the brush hash library, and outputting UID data after the hash value conforms to the characteristics.
8. The brush detection system of claim 6, wherein the picture detection module is configured to set a hash threshold, and when a hash value of the hash value compared with a hash of the brush hash library reaches the hash threshold, the picture is determined to be consistent with the feature.
9. The brush inspection system of claim 6, wherein the model library is constructed by a method comprising: acquiring text characteristics and group sample data submitted by a user side; analyzing the sample data to obtain common characteristics of the sample, and describing the common characteristics as brush characteristics; and storing the brush characteristics into a model library.
10. The brush detection system of claim 9, wherein the describing the common characteristic as a brush characteristic comprises: when the sample data accords with the first model characteristic, taking the first model characteristic as a brush characteristic; when the sample data accords with the second model characteristic, taking the second model characteristic as a brush characteristic; the first model characteristic means that all words of a phrase are scattered and divided into single words, so that a plurality of arrays are obtained, and then a common intersection in the arrays is taken to obtain a same single word set; the second model characteristic refers to a common characteristic point which is found by performing character type analysis on all words of the phrase and by the number of Chinese characters, the number of arrays and the number of letters of the phrase.
CN202010003255.9A 2020-01-02 2020-01-02 Brush detection method and system based on feature model Pending CN111191139A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010003255.9A CN111191139A (en) 2020-01-02 2020-01-02 Brush detection method and system based on feature model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010003255.9A CN111191139A (en) 2020-01-02 2020-01-02 Brush detection method and system based on feature model

Publications (1)

Publication Number Publication Date
CN111191139A true CN111191139A (en) 2020-05-22

Family

ID=70708101

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010003255.9A Pending CN111191139A (en) 2020-01-02 2020-01-02 Brush detection method and system based on feature model

Country Status (1)

Country Link
CN (1) CN111191139A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2275972A1 (en) * 2009-07-06 2011-01-19 Kaspersky Lab Zao System and method for identifying text-based spam in images
CN102200987A (en) * 2011-01-27 2011-09-28 北京开心人信息技术有限公司 Method and system for searching sock puppet identification number based on behavioural analysis of user identification numbers
CN102571484A (en) * 2011-12-14 2012-07-11 上海交通大学 Method for detecting and finding online water army
CN108874777A (en) * 2018-06-11 2018-11-23 北京奇艺世纪科技有限公司 A kind of method and device of text anti-spam
CN109241379A (en) * 2017-07-11 2019-01-18 北京交通大学 A method of across Modal detection network navy
CN110162620A (en) * 2019-01-10 2019-08-23 腾讯科技(深圳)有限公司 Black detection method, device, server and the storage medium for producing advertisement
CN110569509A (en) * 2019-09-12 2019-12-13 广州荔支网络技术有限公司 risk group identification method and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2275972A1 (en) * 2009-07-06 2011-01-19 Kaspersky Lab Zao System and method for identifying text-based spam in images
CN102200987A (en) * 2011-01-27 2011-09-28 北京开心人信息技术有限公司 Method and system for searching sock puppet identification number based on behavioural analysis of user identification numbers
CN102571484A (en) * 2011-12-14 2012-07-11 上海交通大学 Method for detecting and finding online water army
CN109241379A (en) * 2017-07-11 2019-01-18 北京交通大学 A method of across Modal detection network navy
CN108874777A (en) * 2018-06-11 2018-11-23 北京奇艺世纪科技有限公司 A kind of method and device of text anti-spam
CN110162620A (en) * 2019-01-10 2019-08-23 腾讯科技(深圳)有限公司 Black detection method, device, server and the storage medium for producing advertisement
CN110569509A (en) * 2019-09-12 2019-12-13 广州荔支网络技术有限公司 risk group identification method and device

Similar Documents

Publication Publication Date Title
WO2021068843A1 (en) Emotion recognition method and apparatus, electronic device, and readable storage medium
EP2378475A1 (en) Method for calculating semantic similarities between messages and conversations based on enhanced entity extraction
CN108550054B (en) Content quality evaluation method, device, equipment and medium
CN110851591A (en) Judgment document quality evaluation method, device, equipment and storage medium
CN102298587A (en) Satisfaction investigating method and system
CN112468659A (en) Quality evaluation method, device, equipment and storage medium applied to telephone customer service
CN114399379A (en) Artificial intelligence-based collection behavior recognition method, device, equipment and medium
CN114722199A (en) Risk identification method and device based on call recording, computer equipment and medium
CN113450147A (en) Product matching method, device and equipment based on decision tree and storage medium
CN110362826A (en) Periodical submission method, equipment and readable storage medium storing program for executing based on artificial intelligence
CN111200576A (en) Method for realizing malicious domain name recognition based on machine learning
CN113468524A (en) RASP-based machine learning model security detection method
CN113132368A (en) Chat data auditing method and device and computer equipment
CN112016317A (en) Sensitive word recognition method and device based on artificial intelligence and computer equipment
CN110955796B (en) Case feature information extraction method and device based on stroke information
CN111191139A (en) Brush detection method and system based on feature model
CN115186095B (en) Juvenile text recognition method and device
CN111325562A (en) Grain safety tracing system and method
CN113326536A (en) Method and device for judging compliance of application program
CN115438340A (en) Mining behavior identification method and system based on morpheme characteristics
CN115314268A (en) Malicious encrypted traffic detection method and system based on traffic fingerprints and behaviors
CN112163217B (en) Malware variant identification method, device, equipment and computer storage medium
CN114519343A (en) 95598-based repeated incoming call preprocessing method, device, equipment and storage medium
CN113934833A (en) Training data acquisition method, device and system and storage medium
CN104038391B (en) A kind of method and apparatus of spam detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination