CN101446970A - Method for censoring and process text contents issued by user and device thereof - Google Patents

Method for censoring and process text contents issued by user and device thereof Download PDF

Info

Publication number
CN101446970A
CN101446970A CNA2008102200098A CN200810220009A CN101446970A CN 101446970 A CN101446970 A CN 101446970A CN A2008102200098 A CNA2008102200098 A CN A2008102200098A CN 200810220009 A CN200810220009 A CN 200810220009A CN 101446970 A CN101446970 A CN 101446970A
Authority
CN
China
Prior art keywords
content
text
user
issue
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA2008102200098A
Other languages
Chinese (zh)
Other versions
CN101446970B (en
Inventor
刘怀军
刘昌毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Shiji Guangsu Information Technology Co., Ltd.
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN2008102200098A priority Critical patent/CN101446970B/en
Publication of CN101446970A publication Critical patent/CN101446970A/en
Application granted granted Critical
Publication of CN101446970B publication Critical patent/CN101446970B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method for censoring and processing text contents issued by a user and a device thereof. The method comprises the following steps: receiving the text contents issued by the user and judging user information according to a list rule database; if the user information neither belongs to a white list or a white rule nor a black list or a black rule, calculating a first similarity of a first characteristic vector of the text contents of the user and a second characteristic vector of pre-established spam sample contents, and judging whether the text contents issued by the user are qualified contents according to the first similarity, if the text contents are the qualified contents, publishing the text contents issued by the user; or sending the text contents issued by the user for manual censoring. The method and the device can help censor and filter the user information and the text contents issued by the user without total manual censoring of the information issued by the user, thus greatly avoid the manual censoring time and saving the human resources and correspondingly enhancing the censoring efficiency.

Description

A kind of method and device thereof that the content of text audit of user's issue is handled
Technical field
The present invention relates to the communications field, a kind of method and device thereof that the content of text audit of user's issue is handled.
Background technology
At present, ask community (network address: http://wenwen.soso.com) be similar to that Baidu is known, Sina likes to ask etc. question and answer type service, the user can ask a question or answers the problem that other people propose at the page, has made things convenient for user's obtaining information to a great extent.Now, ask community and approximately have more than 20 ten thousand new problems generations every day, the information of asking user's submission in the community needs to consume a large amount of manual examination and verification time all via manually examining, the waste of manpower resource, and review efficiency is lower.
Summary of the invention
The invention provides a kind of method and device thereof that the content of text audit of user's issue is handled, it can save a large amount of manual examination and verification time, has improved review efficiency.
Technical scheme of the present invention is: a kind of method that the content of text audit of user's issue is handled comprises step:
Receive the content of text of user's issue, judge user profile according to the list rule database, described list rule database comprises blacklist, black rule, white list and white rule;
If described user profile neither belongs to white list or white rule, do not belong to blacklist or black rule yet, then the content of text to described user carries out format conversion, extracts the notional word in the described content of text;
Calculate the contrary document frequency weighted value of each notional word in the document database of setting up in advance that extracts, obtain first proper vector of forming by described contrary document frequency weighted value;
First similarity of second proper vector of the spam samples content of calculating described first proper vector and setting up in advance, whether the content of text of judging described user's issue according to described first similarity is qualified content, if qualified content is then announced the content of text of described user's issue.
The invention also discloses a kind of device that the content of text audit of user's issue is handled, it comprises, auditing module, be used to receive the content of text of user's issue, judge user profile according to the list rule database, described list rule database comprises blacklist, black rule, white list and white rule;
Modular converter is used for neither belonging to white list or white rule in described user profile, and when also not belonging to blacklist or black rule, the content of text that described user is issued carries out format conversion, extracts the notional word in the described content of text;
Computing module is used for calculating the contrary document frequency weighted value of each notional word of extraction at the document database of setting up in advance, obtains first proper vector of being made up of described contrary document frequency weighted value; First similarity of second proper vector of the spam samples content of calculating described first proper vector simultaneously and setting up in advance;
Judge module is used for judging according to described first similarity whether described user's content of text is qualified content, if qualified content is then announced the content of text of described user's issue.
The method and apparatus that the content of text audit of user's issue is handled of the present invention, only to neither belonging to white list or white rule, the content of text that does not also belong to user's issue of blacklist or black rule is examined filtration treatment, the content of text and the underproof content of text of user's issue sent to manually that belongs to user's issue of black rule and blacklist can be examined, the content of text of user's issue of belonging to white rule and white list and the qualified content of text that the user issues are directly announced; Need not can save a large amount of manual examination and verification time all via manually examining to user's information releasing like this, save human resources, also improve review efficiency accordingly.
Description of drawings
Fig. 1 is the method flow diagram that the present invention handles the content of text audit of user's issue;
Fig. 2 is the structured flowchart () of the present invention to the device of the content of text audit processing of user's issue;
Fig. 3 is the structured flowchart (two) of the present invention to the device of the content of text audit processing of user's issue;
Fig. 4 is the structured flowchart (three) of the present invention to the device of the content of text audit processing of user's issue.
Embodiment
The method and apparatus that the content of text audit of user's issue is handled of the present invention, only to neither belonging to white list or white rule, the content of text that does not also belong to user's issue of blacklist or black rule is examined filtration treatment, the content of text and the underproof content of text of user's issue sent to manually that will belong to user's issue of black rule and blacklist is examined, and the content of text of user's issue of belonging to white rule and white list and the qualified content of text that the user issues are directly announced; Need not can save a large amount of manual examination and verification time all via manually examining to user's information releasing like this, save human resources, also improve review efficiency accordingly.
Below in conjunction with the drawings and specific embodiments the present invention is done a detailed elaboration.
The method that the content of text audit of user issue is handled of the present invention can be applied in to be asked in the question and answer type services such as community, Baidu are known, Sina likes to ask.
The method that the content of text audit of user's issue is handled of the present invention comprises step, as Fig. 1,
The content of text of S100, reception user issue.S101, judge user profile according to the list rule database; Described list rule database comprises blacklist, black rule, white list and white rule.In one embodiment, blacklist can be to have big probability that the user list of junk information is provided, and white list is to have big probability that the user list of proper information is provided; Black rule is to set according to user's grade or credit rating, and its expression user's lower grade is or credit rating is very low, and white rule also is to set according to user's grade or credit rating, and its grade of representing the user is than higher or credit rating is very high.
If the described user profile of S102 neither belongs to white list or white rule, do not belong to blacklist or black rule yet, then the content of text to described user's issue carries out format conversion, extracts the notional word in the described content of text.In one embodiment, format conversion can comprise that described content of text is carried out the traditional font to be changed, remove the conversion in unnecessary space etc. to half-angle to simplified conversion, full-shape, and notional word is the core word of content of text, and function word is not as core word.
Contrary document frequency (IDF) weighted value of each notional word in the document database of setting up in advance that S103, calculating are extracted obtains first proper vector of being made up of described contrary document frequency (IDF) weighted value.In one embodiment, the document database can be made up of the content of text of all user's issues.Calculate contrary document frequency (IDF) weighted value of each notional word in the document database of setting up in advance that extracts, be specifically as follows: according to formula wgt = t f × lg U V Calculate contrary document frequency (IDF) weighted value of each notional word; Wherein wgt is contrary document frequency (IDF) weighted value, t fBe the frequency values that described notional word occurs in described user's content of text, U is the total number of documents in the described document database, and V is for the number of files of described notional word occurring.
S104, calculate described first proper vector and first similarity of second proper vector of the spam samples content set up in advance.Second proper vector of spam samples content can obtain in advance, it is the same with first proper vector that it obtains process, take out a spam samples content, to its format conversion, extract notional word, calculate the contrary document frequency weighted value of each notional word in described document database then, form second proper vector by these weighted values.In one embodiment, calculate described first proper vector and first similarity of second proper vector of the spam samples content set up in advance, be specially: according to formula
Cos ( X , Y ) = Σ α = 1 , β = 1 α = m , β = n x α y β Σ α = 1 m x α 2 Σ β = 1 n y β 2
Cos(X,Y)
Calculate described first similarity; Wherein represent described first similarity,
X={x 1,K,x m},Y={y 1,K,y n}
Represent described first proper vector and second proper vector respectively.
S105, judge according to described first similarity whether the content of text of described user issue is qualified content.This determination methods has a variety of modes, can set according to user's needs.In one embodiment, can set a predetermined threshold,, otherwise judge that the content of text of this user's issue is qualified content if the value of described first similarity, can judge then that the content of text of this user's issue is defective content greater than this threshold value.
If qualified content, then carry out the content of text that step S107 announces described user's issue, the content of text of described user's issue is sent to manually examine otherwise can carry out step S106 in one embodiment.
In one embodiment, belong to blacklist or black rule, the content of text of described user's issue is sent to manually examine if can also comprise step S102 user profile after the step S101.If the described user profile of S103 belongs to white list or white rule, the content of text of described user's issue will be announced.
In order to judge comprehensively accurately that further whether the content that the user issues is qualified content, reduces the probability of erroneous judgement.In one embodiment, judging that user profile neither belongs to white list or white rule, when not belonging to blacklist or black rule again, can also comprise step, second similarity of the feature database that comprises phone number format, webpage format and Mars word form etc. that detects the content of text of described user's issue and set up in advance judges according to this second similarity and first similarity whether the content of text of described user's issue is qualified content.When whether the content of text of judging user's issue is qualified content, can distribute weights respectively for first similarity and second similarity, whether detect the weights sum greater than a predetermined value, if greater than a predetermined value, the content of text that can judge this user's issue is defective content, otherwise is qualified content.Whether the value that also can only detect this second similarity in addition greater than a predetermined value, if greater than could judge directly that the content of text of this user's issue is defective content.
In order to reach same purpose, judge comprehensively accurately further whether the content of user's issue is qualified content, reduce the probability of erroneous judgement.In one embodiment, judging that user profile neither belongs to white list or white rule, when not belonging to blacklist or black rule again, can also comprise step, add up the number of characters of the content of text of described user's issue, judge according to this number of characters, first similarity and second similarity whether the content of text of described user's issue is qualified content.When judging whether the content of text of issuing with corpse is qualified content, can distribute weights respectively for number of characters, first similarity and second similarity, whether detect the weights sum greater than a predetermined value, if greater than a predetermined value, the content of text that can judge this user's issue is defective content, otherwise is qualified content.Also can set a predetermined value with regard to this number of characters separately in addition,, can judge that directly the content of text that the user issues is defective content if when detecting number of characters less than this predetermined value.
In order to reach same purpose, judge comprehensively accurately further whether the content of user's issue is qualified content, reduce the probability of erroneous judgement.In one embodiment, judging that user profile neither belongs to white list or white rule, when not belonging to blacklist or black rule again, can also comprise step, the third phase that detects the content of text of described user's issue and the data bank of setting up in advance that can not announce words (this data bank is at some special words and short sentence or the set of interior perhaps other settings of requirement shielding at no distant date) judges like degree, described number of characters, first similarity and second similarity whether the content of text that described user issues is qualified content according to this third phase like degree.When whether the content of text of judging user's issue is qualified content, can distribute weights respectively like degree, number of characters, first similarity and second similarity for third phase, whether detect the weights sum greater than predetermined value, if greater than a predetermined value, the content of text that can judge this user's issue is defective content, otherwise is qualified content.Also can detect this third phase in addition separately and seemingly whether spend greater than a predetermined value, if greater than, can judge that then the content of text of this user's issue is defective content.
The present invention has also disclosed a kind of device that the content of text audit of user's issue is handled, and as Fig. 2, it comprises auditing module, modular converter, computing module and the judge module that connects successively;
Auditing module is used to receive the content of text that the user issues, and judges user profile according to the list rule database, and described list rule database comprises blacklist, black rule, white list and white regular.In one embodiment, blacklist can be to have big probability that the user list of junk information is provided, and white list is to have big probability that the user list of proper information is provided; Black rule is to set according to user's grade or credit rating, and its expression user's lower grade is or credit rating is very low, and white rule also is to set according to user's grade or credit rating, and its grade of representing the user is than higher or credit rating is very high.
Modular converter is used for neither belonging to white list or white rule in described user profile, and when also not belonging to blacklist or black rule, the content of text that described user is issued carries out format conversion, extracts the notional word in the described content of text.In one embodiment, format conversion can comprise that described content of text is carried out body to be changed, remove the conversion in unnecessary space etc. to half-angle to simplified conversion, full-shape, and notional word is the core word of content of text, and function word is not as core word.
Computing module is used for calculating contrary document frequency (IDF) weighted value of each notional word of extraction at the document database of setting up in advance, obtains first proper vector of being made up of described contrary document frequency (IDF) weighted value; First similarity of second proper vector of the spam samples content of calculating described first proper vector simultaneously and setting up in advance.In one embodiment, the document database can be made up of the content of text of all user's issues.Calculate contrary document frequency (IDF) weighted value of each notional word in the document database of setting up in advance that extracts, be specifically as follows: according to formula wgt = t f × lg U V Calculate contrary document frequency (IDF) weighted value of each notional word; Wherein wgt is contrary document frequency (IDF) weighted value, t fBe the frequency values that described notional word occurs in described user's content of text, U is the total number of documents in the described document database, and V is for the number of files of described notional word occurring.Second proper vector of spam samples content can obtain in advance, it is the same with first proper vector that it obtains process, take out a spam samples content, to its format conversion, extract notional word, calculate the contrary document frequency weighted value of each notional word in described document database then, form second proper vector by these weighted values.In one embodiment, calculate described first proper vector and first similarity of second proper vector of the spam samples content set up in advance, be specially: according to formula
Cos ( X , Y ) = Σ α = 1 , β = 1 α = m , β = n x α y β Σ α = 1 m x α 2 Σ β = 1 n y β 2
Cos(X,Y)
Calculate described first similarity; Wherein represent described first similarity,
X={x 1,K,x m},Y={y 1,K,y n}
Represent described first proper vector and second proper vector respectively.
Judge module is used for judging according to described first similarity whether described user's content of text is qualified content, if qualified content is then announced the content of text of described user's issue.In one embodiment, be defective content if judge described user's content of text, the content of text that then described judge module is issued described user sends to manually to be examined.
In one embodiment, described auditing module belongs to blacklist or black rule in user profile, the content of text of described user's issue is sent to manually examine; Belong to white list or white rule in described user profile, will announce the content of text of described user's issue.
In order to judge comprehensively accurately that further whether the content that the user issues is qualified content, reduces the probability of erroneous judgement.As Fig. 3, between described auditing module and described judge module, also be connected with detection module, be used for neither belonging to white list or white rule in user profile, when not belonging to blacklist or black rule again, detect the content of text of described user's issue and second similarity of the feature database that comprises phone number format, webpage format and Mars word form of foundation in advance; And/or detect described user's content of text and the third phase of the data bank that can not announce words set up in advance like degree, and described second similarity and/or third phase sent to described judge module like degree, described judge module judges like degree whether the content of text that described user issues is qualified content according to described first similarity, second similarity and/or third phase.When whether the content of text of judging user's issue is qualified content, can distribute weights respectively like degree for first similarity, second similarity and/or third phase, whether detect the weights sum greater than predetermined value, if greater than a predetermined value, the content of text that can judge this user's issue is defective content, otherwise is qualified content.
In order to reach identical purpose, judge comprehensively accurately further whether the content of user's issue is qualified content, reduce the probability of erroneous judgement.As Fig. 4, between described auditing module and described judge module, also be connected with statistical module, be used for neither belonging to white list or white rule in user profile, when not belonging to blacklist or black rule again, add up the number of characters of the content of text of described user's issue, and this number of characters sent to described judge module, described judge module judges like degree whether the content of text that described user issues is qualified content according to this number of characters, described first similarity, second similarity and/or third phase.When whether the content of text of judging user's issue is qualified content, can distribute weights respectively like degree for number of characters, first similarity, second similarity and/or third phase, whether detect the weights sum greater than predetermined value, if greater than a predetermined value, the content of text that can judge this user's issue is defective content, otherwise is qualified content.
In sum, the method and apparatus that the content of text audit of user's issue is handled of the present invention, can examine filtration treatment to the content of text of user profile and user's issue, the content of text and the underproof content of text of user's issue sent to manually that will belong to user's issue of black rule and blacklist is examined, and the content of text of user's issue of belonging to white rule and white list and the qualified content of text that the user issues are directly announced; Need not can save a large amount of manual examination and verification time all via manually examining to user's information releasing like this, save human resources, also improve review efficiency accordingly.
Above-described embodiment of the present invention does not constitute the qualification to protection domain of the present invention.Any modification of being done within the spirit and principles in the present invention, be equal to and replace and improvement etc., all should be included within the claim protection domain of the present invention.

Claims (10)

1, a kind of method that the content of text audit of user's issue is handled is characterized in that, comprises step:
Receive the content of text of user's issue, judge user profile according to the list rule database, described list rule database comprises blacklist, black rule, white list and white rule;
If described user profile neither belongs to white list or white rule, do not belong to blacklist or black rule yet, then the content of text to described user's issue carries out format conversion, extracts the notional word in the described content of text;
Calculate the contrary document frequency weighted value of each notional word in the document database of setting up in advance that extracts, obtain first proper vector of forming by described contrary document frequency weighted value;
First similarity of second proper vector of the spam samples content of calculating described first proper vector and setting up in advance, whether the content of text of judging described user's issue according to described first similarity is qualified content, if qualified content is then announced the content of text of described user's issue.
2, the method that the content of text audit of user's issue is handled according to claim 1, it is characterized in that: neither belong to white list or white rule in described user profile, when not belonging to blacklist or black rule yet, also comprise step, second similarity of the feature database that comprises phone number format, webpage format and Mars word form that detects the content of text of described user's issue and set up in advance judges according to described second similarity and first similarity whether the content of text of described user's issue is qualified content.
3, the method that the content of text audit of user's issue is handled according to claim 2, it is characterized in that: neither belong to white list or white rule in described user profile, when not belonging to blacklist or black rule yet, also comprise step, add up the number of characters of the content of text of described user's issue, judge according to this number of characters, first similarity and second similarity whether the content of text of described user's issue is qualified content.
4, the method that the content of text audit of user's issue is handled according to claim 3, it is characterized in that: neither belong to white list or white rule in described user profile, when not belonging to blacklist or black rule yet, also comprise step, the third phase that comprises the data bank that can not announce words that detects the content of text of described user's issue and set up in advance judges like degree, described number of characters, first similarity and second similarity whether the content of text of described user's issue is qualified content according to this third phase like degree.
5, according to the described method that the content of text audit of user's issue is handled of the arbitrary claim of claim 1 to 4, it is characterized in that: the contrary document frequency weighted value of each notional word that described calculating is extracted in the document database of setting up in advance is specially: according to formula wgt = t f × lg U V Calculate the contrary document frequency weighted value of each notional word; Wherein wgt is contrary document frequency weighted value, t fBe the frequency values that described notional word occurs in described user's content of text, U is the total number of documents in the described document database, and V is for the number of files of described notional word occurring.
6, the method that audit is handled to user profile and content of text according to claim 5 is characterized in that: calculate described first proper vector and first similarity of second proper vector of the spam samples content of foundation in advance, be specially: according to formula
Cos ( X , Y ) = Σ α = 1 , β = 1 α = m , β = n x α y β Σ α = 1 m x α 2 Σ β = 1 n y β 2
Cos(X,Y)
Calculate described first similarity; Wherein represent described first similarity,
X={x 1,K,x m},Y={y 1,K,y n}
Represent described first proper vector and second proper vector respectively.
7, method to user profile and content of text audit processing according to claim 4, it is characterized in that: seemingly spend according to this third phase, described number of characters, first similarity and second similarity judge whether the content of text of described user's issue is qualified content, concrete deterministic process is: be respectively described third phase like degree, described number of characters, first similarity and second similarity are distributed corresponding weights, detect described weights and whether greater than predetermined value, if, the content of text of then judging described user's issue is defective content, otherwise the content of text of described user's issue is qualified content.
8, a kind of device that the content of text audit of user's issue is handled is characterized in that: comprises,
Auditing module is used to receive the content of text that the user issues, and judges user profile according to the list rule database, and described list rule database comprises blacklist, black rule, white list and white regular;
Modular converter is used for neither belonging to white list or white rule in described user profile, and when also not belonging to blacklist or black rule, the content of text that described user is issued carries out format conversion, extracts the notional word in the described content of text;
Computing module is used for calculating the contrary document frequency weighted value of each notional word of extraction at the document database of setting up in advance, obtains first proper vector of being made up of described contrary document frequency weighted value; First similarity of second proper vector of the spam samples content of calculating described first proper vector simultaneously and setting up in advance;
Judge module is used for judging according to described first similarity whether the content of text of described user's issue is qualified content, if qualified content is then announced the content of text of described user's issue.
9, the device that the content of text audit of user's issue is handled according to claim 8, it is characterized in that: also comprise detection module, neither belong to white list or white rule in described user profile, when also not belonging to blacklist or black rule, be used to detect the content of text of described user's issue and second similarity of the feature database that comprises phone number format, webpage format and Mars word form of foundation in advance; And/or the third phase that comprises the data bank that can not announce words that detects described user's content of text and foundation is in advance seemingly spent, and described second similarity and/or third phase sent to described judge module like degree, described judge module judges like degree whether the content of text that described user issues is qualified content according to described first similarity, second similarity and/or third phase.
10, the device to user profile and content of text audit processing according to claim 9, it is characterized in that: also comprise statistical module, neither belong to white list or white rule in described user profile, when not belonging to blacklist or black rule yet, be used to add up the number of characters of described content of text, and described number of characters sent to described judge module, described judge module judges like the degree and first similarity whether the content of text of described user's issue is qualified content according to described number of characters, second similarity, third phase.
CN2008102200098A 2008-12-15 2008-12-15 Method for censoring and process text contents issued by user and device thereof Active CN101446970B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2008102200098A CN101446970B (en) 2008-12-15 2008-12-15 Method for censoring and process text contents issued by user and device thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2008102200098A CN101446970B (en) 2008-12-15 2008-12-15 Method for censoring and process text contents issued by user and device thereof

Publications (2)

Publication Number Publication Date
CN101446970A true CN101446970A (en) 2009-06-03
CN101446970B CN101446970B (en) 2012-07-04

Family

ID=40742648

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2008102200098A Active CN101446970B (en) 2008-12-15 2008-12-15 Method for censoring and process text contents issued by user and device thereof

Country Status (1)

Country Link
CN (1) CN101446970B (en)

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102681979A (en) * 2012-05-15 2012-09-19 北京师范大学 Content editing intelligent verifying method facing to open knowledge community
CN102801640A (en) * 2011-05-23 2012-11-28 腾讯科技(深圳)有限公司 Information auditing method and device
CN102880636A (en) * 2012-08-03 2013-01-16 深圳证券信息有限公司 Bad information detection method and server
CN102982011A (en) * 2011-09-07 2013-03-20 百度在线网络技术(北京)有限公司 Method and device for identifying out-of-sequence texts
WO2013056513A1 (en) * 2011-10-18 2013-04-25 成都竟创科技有限公司 Interactive time-sharing and segmented surface participation method based on spreading media
CN103634283A (en) * 2012-08-24 2014-03-12 腾讯科技(深圳)有限公司 Feedback method of audit result and cloud server
CN103647753A (en) * 2013-11-19 2014-03-19 北京奇虎科技有限公司 LAN file security management method, server and system
CN103778226A (en) * 2014-01-23 2014-05-07 北京奇虎科技有限公司 Method for establishing language information recognition model and language information recognition device
CN104301341A (en) * 2013-07-16 2015-01-21 腾讯科技(深圳)有限公司 Information processing method, apparatus, and system for information releasing platform
CN104580529A (en) * 2015-02-03 2015-04-29 郑州悉知信息技术有限公司 Information checking method and device
CN104572393A (en) * 2013-10-24 2015-04-29 世纪禾光科技发展(北京)有限公司 Buyer and seller login monitoring method and buyer and seller login monitoring system
CN105376199A (en) * 2014-08-25 2016-03-02 腾讯科技(北京)有限公司 Information processing method, system, server and client
CN105763555A (en) * 2016-03-31 2016-07-13 世纪禾光科技发展(北京)有限公司 Website risk control server and method and client
CN106372057A (en) * 2016-08-25 2017-02-01 乐视控股(北京)有限公司 Content auditing method and apparatus
CN106504082A (en) * 2016-10-21 2017-03-15 百望股份有限公司 A kind of answering method for tax control field
JP2017173881A (en) * 2016-03-18 2017-09-28 ヤフー株式会社 Advertising review support device, advertising review support method, and advertising review support program
CN107578268A (en) * 2017-07-31 2018-01-12 上海与德科技有限公司 The dispensing content auditing method and server and jettison system of shared billboard
CN108932283A (en) * 2018-05-21 2018-12-04 平安科技(深圳)有限公司 Customer information screening method, system, computer equipment and storage medium
CN109271768A (en) * 2018-10-26 2019-01-25 Oppo广东移动通信有限公司 Release news management method, device, storage medium and terminal
CN109862062A (en) * 2018-10-24 2019-06-07 平安科技(深圳)有限公司 Content uploading management method and device, electronic equipment and storage medium
CN110334181A (en) * 2019-06-05 2019-10-15 上海易点时空网络有限公司 Original content based on similarity detection declares method and device
CN110377900A (en) * 2019-06-17 2019-10-25 深圳壹账通智能科技有限公司 Checking method, device, computer equipment and the storage medium of Web content publication
CN110659386A (en) * 2019-09-12 2020-01-07 北京达佳互联信息技术有限公司 Digital resource processing method and device, electronic equipment and storage medium
CN110929055A (en) * 2019-11-15 2020-03-27 北京达佳互联信息技术有限公司 Multimedia quality detection method and device, electronic equipment and storage medium
CN111126928A (en) * 2018-10-29 2020-05-08 阿里巴巴集团控股有限公司 Method and device for auditing release content
CN111651981A (en) * 2019-02-19 2020-09-11 阿里巴巴集团控股有限公司 Data auditing method, device and equipment
CN113228583A (en) * 2018-12-17 2021-08-06 微软技术许可有限责任公司 Session maturity model with trusted sources
CN114598699A (en) * 2020-12-07 2022-06-07 国家广播电视总局广播电视科学研究院 File content auditing method and device and electronic equipment
CN116739512A (en) * 2023-06-07 2023-09-12 哈尔滨融美科技有限公司 Data analysis management system and method based on artificial intelligent cloud platform

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100535895C (en) * 2004-08-23 2009-09-02 富士施乐株式会社 Test search apparatus and method
CN101159704A (en) * 2007-10-23 2008-04-09 浙江大学 Microcontent similarity based antirubbish method

Cited By (42)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102801640B (en) * 2011-05-23 2016-06-01 腾讯科技(深圳)有限公司 A kind of method and apparatus of message examination & verification
CN102801640A (en) * 2011-05-23 2012-11-28 腾讯科技(深圳)有限公司 Information auditing method and device
CN102982011A (en) * 2011-09-07 2013-03-20 百度在线网络技术(北京)有限公司 Method and device for identifying out-of-sequence texts
CN102982011B (en) * 2011-09-07 2017-05-31 百度在线网络技术(北京)有限公司 A kind of method and apparatus for recognizing out-of-sequence text
WO2013056513A1 (en) * 2011-10-18 2013-04-25 成都竟创科技有限公司 Interactive time-sharing and segmented surface participation method based on spreading media
CN102681979A (en) * 2012-05-15 2012-09-19 北京师范大学 Content editing intelligent verifying method facing to open knowledge community
CN102681979B (en) * 2012-05-15 2015-04-22 北京师范大学 Content editing intelligent verifying method facing to open knowledge community
CN102880636A (en) * 2012-08-03 2013-01-16 深圳证券信息有限公司 Bad information detection method and server
CN103634283A (en) * 2012-08-24 2014-03-12 腾讯科技(深圳)有限公司 Feedback method of audit result and cloud server
CN103634283B (en) * 2012-08-24 2017-11-28 腾讯科技(深圳)有限公司 The feedback method and cloud server of a kind of auditing result
CN104301341A (en) * 2013-07-16 2015-01-21 腾讯科技(深圳)有限公司 Information processing method, apparatus, and system for information releasing platform
CN104301341B (en) * 2013-07-16 2019-01-29 腾讯科技(深圳)有限公司 Information processing method, the apparatus and system of information publishing platform
CN104572393A (en) * 2013-10-24 2015-04-29 世纪禾光科技发展(北京)有限公司 Buyer and seller login monitoring method and buyer and seller login monitoring system
CN103647753A (en) * 2013-11-19 2014-03-19 北京奇虎科技有限公司 LAN file security management method, server and system
CN103778226A (en) * 2014-01-23 2014-05-07 北京奇虎科技有限公司 Method for establishing language information recognition model and language information recognition device
CN105376199A (en) * 2014-08-25 2016-03-02 腾讯科技(北京)有限公司 Information processing method, system, server and client
CN105376199B (en) * 2014-08-25 2019-09-13 腾讯科技(北京)有限公司 A kind of information processing method and system, server, client
CN104580529A (en) * 2015-02-03 2015-04-29 郑州悉知信息技术有限公司 Information checking method and device
CN104580529B (en) * 2015-02-03 2018-03-23 郑州悉知信息科技股份有限公司 A kind of signal auditing method and device
JP2017173881A (en) * 2016-03-18 2017-09-28 ヤフー株式会社 Advertising review support device, advertising review support method, and advertising review support program
CN105763555A (en) * 2016-03-31 2016-07-13 世纪禾光科技发展(北京)有限公司 Website risk control server and method and client
CN106372057A (en) * 2016-08-25 2017-02-01 乐视控股(北京)有限公司 Content auditing method and apparatus
CN106504082A (en) * 2016-10-21 2017-03-15 百望股份有限公司 A kind of answering method for tax control field
CN107578268A (en) * 2017-07-31 2018-01-12 上海与德科技有限公司 The dispensing content auditing method and server and jettison system of shared billboard
CN108932283A (en) * 2018-05-21 2018-12-04 平安科技(深圳)有限公司 Customer information screening method, system, computer equipment and storage medium
CN108932283B (en) * 2018-05-21 2024-03-05 平安科技(深圳)有限公司 Customer information screening method, system, computer device and storage medium
CN109862062A (en) * 2018-10-24 2019-06-07 平安科技(深圳)有限公司 Content uploading management method and device, electronic equipment and storage medium
CN109271768A (en) * 2018-10-26 2019-01-25 Oppo广东移动通信有限公司 Release news management method, device, storage medium and terminal
CN111126928B (en) * 2018-10-29 2024-03-22 阿里巴巴集团控股有限公司 Method and device for auditing release content
CN111126928A (en) * 2018-10-29 2020-05-08 阿里巴巴集团控股有限公司 Method and device for auditing release content
CN113228583A (en) * 2018-12-17 2021-08-06 微软技术许可有限责任公司 Session maturity model with trusted sources
CN113228583B (en) * 2018-12-17 2023-08-15 微软技术许可有限责任公司 Session maturity model with trusted sources
CN111651981B (en) * 2019-02-19 2023-04-21 阿里巴巴集团控股有限公司 Data auditing method, device and equipment
CN111651981A (en) * 2019-02-19 2020-09-11 阿里巴巴集团控股有限公司 Data auditing method, device and equipment
CN110334181A (en) * 2019-06-05 2019-10-15 上海易点时空网络有限公司 Original content based on similarity detection declares method and device
WO2020253350A1 (en) * 2019-06-17 2020-12-24 深圳壹账通智能科技有限公司 Network content publication auditing method and apparatus, computer device and storage medium
CN110377900A (en) * 2019-06-17 2019-10-25 深圳壹账通智能科技有限公司 Checking method, device, computer equipment and the storage medium of Web content publication
CN110659386A (en) * 2019-09-12 2020-01-07 北京达佳互联信息技术有限公司 Digital resource processing method and device, electronic equipment and storage medium
CN110929055A (en) * 2019-11-15 2020-03-27 北京达佳互联信息技术有限公司 Multimedia quality detection method and device, electronic equipment and storage medium
CN114598699A (en) * 2020-12-07 2022-06-07 国家广播电视总局广播电视科学研究院 File content auditing method and device and electronic equipment
CN114598699B (en) * 2020-12-07 2023-07-28 国家广播电视总局广播电视科学研究院 File content auditing method and device and electronic equipment
CN116739512A (en) * 2023-06-07 2023-09-12 哈尔滨融美科技有限公司 Data analysis management system and method based on artificial intelligent cloud platform

Also Published As

Publication number Publication date
CN101446970B (en) 2012-07-04

Similar Documents

Publication Publication Date Title
CN101446970B (en) Method for censoring and process text contents issued by user and device thereof
CN105005594B (en) Abnormal microblog users recognition methods
CN103336766B (en) Short text garbage identification and modeling method and device
KR101716905B1 (en) Method for calculating entity similarities
CN107291780A (en) A kind of user comment information methods of exhibiting and device
CN101784022A (en) Method and system for filtering and classifying short messages
CN103064987A (en) Bogus transaction information identification method
CN105389389A (en) Network public opinion transmission situation media linked analysis method
Hirst et al. Party status as a confound in the automatic classification of political speech by ideology
CN105912645A (en) Intelligent question and answer method and apparatus
CN102298587A (en) Satisfaction investigating method and system
CN107341157B (en) Customer service conversation clustering method and device
CN111783449A (en) Method and device for extracting elements of judgment result in judgment document
CN111309855A (en) Text information processing method and system
CN103108290A (en) Short message handling method and device
CN111199208A (en) Head portrait gender identification method and system based on deep learning framework
CN101594313A (en) A kind of spam judgement, classification, filter method and system based on potential semantic indexing
CN107992473B (en) Fraud information feature word extraction method and system based on point-to-point mutual information technology
CN104871201A (en) Forensic system, forensic method, and forensic program
CN116257627A (en) Method and system for evaluating privacy policy text
CN105681523A (en) Method and apparatus for sending birthday blessing short message automatically
CN112835810B (en) Interface testing method and device based on log analysis
CN112115236A (en) Method and device for constructing tobacco scientific and technical literature data deduplication model
CN107885706A (en) A kind of system of data similarity detection
Martyr et al. The Catholics in Australia Survey 3-Believing

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
ASS Succession or assignment of patent right

Owner name: SHENZHEN SHIJI LIGHT SPEED INFORMATION TECHNOLOGY

Free format text: FORMER OWNER: TENGXUN SCI-TECH (SHENZHEN) CO., LTD.

Effective date: 20131015

C41 Transfer of patent application or patent right or utility model
COR Change of bibliographic data

Free format text: CORRECT: ADDRESS; FROM: 518044 SHENZHEN, GUANGDONG PROVINCE TO: 518057 SHENZHEN, GUANGDONG PROVINCE

TR01 Transfer of patent right

Effective date of registration: 20131015

Address after: A Tencent Building in Shenzhen Nanshan District City, Guangdong streets in Guangdong province science and technology 518057 16

Patentee after: Shenzhen Shiji Guangsu Information Technology Co., Ltd.

Address before: Shenzhen Futian District City, Guangdong province 518044 Zhenxing Road, SEG Science Park 2 East Room 403

Patentee before: Tencent Technology (Shenzhen) Co., Ltd.