CN103324617A - Identification method and system for history waste information - Google Patents

Identification method and system for history waste information Download PDF

Info

Publication number
CN103324617A
CN103324617A CN2012100744065A CN201210074406A CN103324617A CN 103324617 A CN103324617 A CN 103324617A CN 2012100744065 A CN2012100744065 A CN 2012100744065A CN 201210074406 A CN201210074406 A CN 201210074406A CN 103324617 A CN103324617 A CN 103324617A
Authority
CN
China
Prior art keywords
web page
named web
characteristic information
content characteristic
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2012100744065A
Other languages
Chinese (zh)
Inventor
周斌
刘婷婷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN2012100744065A priority Critical patent/CN103324617A/en
Publication of CN103324617A publication Critical patent/CN103324617A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

The invention is suitable for the technical field of internet, and provides an identification method and system for history waste information. The method comprises the following steps: when receiving a request of browsing an appointed webpage, extracting content feature information of the appointed webpage, according to a feature recognition algorithm stored in a preset knowledge base, carrying out matching identification on the content feature information of the appointed webpage and the feature information stored in the preset knowledge base, acquiring an identification result, and according to the identification result, identifying whether the information in the appointed webpage belongs to the history waste information or not. Through the identification to the history waste information in the webpage based on read-check, the identification cost is enabled to be reduced, and the identification rate, the real-time performance and the adaptability are improved.

Description

A kind of recognition methods of historical rubbish message and system
Technical field
The invention belongs to the internet technique field, relate in particular to a kind of recognition methods, Apparatus and system of historical rubbish message.
Background technology
For the ease of understanding technical solution of the present invention, now following word is described:
PV (network browsing amount): PV is the abbreviation of web page browsing amount (Page View).Identify the page number an of visitor (0: 24 point) access websites in 24 hours.The same page of same visitor views website is not counted in the PV value.
Write operation: refer to contribute in the network application of content users such as blog, forum, message board, comments user's issue, the more operation of the contents such as new literacy, link, video, picture.
Read operation: refer to contribute in the network application of content users such as blog, forum, message board, comments, user's browsing page produces the operation of PV (network browsing amount).
Write audit: refer to contribute in the network application of content users such as blog, forum, message board, comments, the content that the user writes is examined and filtered.Writing audit triggers when user's update content.
Knowledge base: adopt machine learning algorithm etc., in the network applications such as blog, forum, message board, comment, the contents such as literal, link are carried out rubbish message when filtering, the set of the rule that draws through systematic training.
Historical rubbish message: refer in the network applications such as blog, forum, message board, comment the contents such as literal, link are carried out rubbish message when filtering, because the renewal speed hysteresis quality of knowledge base causes, the rubbish message that is not in time identified after the user delivers.
Day by day universal along with network contributes the network application of content such as blog article, comment, message etc. more and more to receive netizen and product development person's concern by the user.Under this background, also there is the part malefactor to utilize these to use the rubbish messages such as the political reaction class of issue, pornographic class, commercial paper.
Existing technology is mainly used and a kind ofly based on the mode of writing audit rubbish message is identified.This mode is utilized automatic identification algorithm, when user's update content the message of its issue is examined and is filtered, and recognizer comprises keyword identification, probability statistics, machine learning etc.Yet, because the rubbish message form in the network application often changes, no matter which kind of automatic identification algorithm, all need to safeguard the knowledge base of a real-time update, the rubbish message that just can guarantee neomorph can not leak recognition logic, and normal messages can not identified by mistake, also be that rubbish message on the network is along with time and hitting dynamics can produce various variations, so that learning process often has hysteresis quality, the historical rubbish message that causes for hysteresis quality, prior art often by manual or automanual mode to the data in whole webpages or be called historical data and scan, identifying historical rubbish message, and this mode exists cost high, reaction is slow, the problems such as adaptivity is poor.
Summary of the invention
The purpose of the embodiment of the invention is to provide a kind of recognition methods and system of historical rubbish message, be intended to solve because prior art can't realize automatically identification based on writing the historical rubbish message that stays after the audit, cause identifying that cost is high, discrimination is low, real-time and the poor problem of adaptivity.
The embodiment of the invention is achieved in that a kind of recognition methods of historical rubbish message, and described method comprises the steps:
When receiving the request of browsing named web page, extract the content characteristic information of described named web page;
According to the feature recognition algorithms of storing in the default knowledge base, the characteristic information of storing in the content characteristic information of described named web page and the described default knowledge base is mated identification, obtain recognition result;
According to described recognition result, whether the information of identifying in the described named web page belongs to historical rubbish message.
Another purpose of the embodiment of the invention is to provide a kind of recognition system of historical rubbish message, and described system comprises:
Feature extraction unit is used for extracting the content characteristic information of described named web page when receiving the request of browsing named web page;
The coupling recognition unit is used for the feature recognition algorithms of storing according to default knowledge base, and the characteristic information of storing in the content characteristic information of described named web page and the described default knowledge base is mated identification, obtains recognition result; And
Recognition unit is used for according to described recognition result, and whether the information of identifying in the described named web page belongs to historical rubbish message.
The embodiment of the invention is by when receiving the request of browsing named web page, the content characteristic information of this named web page of extract real-time, and according to the feature recognition algorithms of storing in the default knowledge base, the content characteristic information of this named web page is preset the characteristic information of storing in knowledge base with this mate identification, according to the recognition result that obtains, whether the information that identifies in this named web page belongs to historical rubbish message, having solved prior art only is to realize the removing of part rubbish message in the contribution content stage, and to historical rubbish message, must scan historical data comprehensively and carry out manual or automanual reset mode, cause identifying cost high, discrimination is low, the problem that real-time and adaptivity are poor, reduced the identification cost, improved discrimination, reached and identified preferably real-time and adaptivity.
Description of drawings
Fig. 1 is the realization flow figure of the recognition methods of the historical rubbish message that provides of first embodiment of the invention;
Fig. 2 is the structural drawing of the recognition system of the historical rubbish message that provides of second embodiment of the invention.
Embodiment
In order to make purpose of the present invention, technical scheme and advantage clearer, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein only in order to explain the present invention, is not intended to limit the present invention.
The embodiment of the invention is by need to browse a certain webpage etc. at every turn, this history rubbish message recognition methods all can be examined this webpage, after also namely the information in creating this webpage and webpage is write audit, again carry out once or repeatedly reading audit identification, so that more effective to the filtration of historical rubbish message, and with respect to methods such as existing manual examination and verification, the identification cost is lower, and real-time and adaptivity are higher.
Below in conjunction with specific embodiment specific implementation of the present invention is described in detail:
Read audit and refer to refer in the network applications such as blog, forum, message board, comment with to write audit relative, the contents such as the literal that the user is contributed, link are examined and the mode of filtering automatically.Read audit and when webpage produces PV (network browsing amount), automatically trigger, be included in the new content of operation triggerings such as clicking when browsing current web page etc.
Embodiment one:
Fig. 1 shows the realization flow of the recognition methods of the historical rubbish message that first embodiment of the invention provides, and details are as follows:
In step S101, when receiving the request of browsing named web page, extract the content characteristic information of this named web page.
Wherein, before execution in step S101, need to utilize prior art to create this named web page and this named web page is write audit, this named web page is through writing the webpage after the audit.
Particularly, the user can contribute in the network application of content users such as blog, forum, message board, comments, issue, the operation of update content, when the user triggers or during the content that begins to issue, upgrade, need to examine and filter the content that the user writes, also namely write audit, issue malice, bad or title rubbish message to prevent the user.And the content that the user writes is examined with the detailed process of filtering as utilizing prior art, based on knowledge base and related algorithm, the characteristic informations such as the literal of user's write operation issue, link, picture, video are identified.Quality that simultaneously can also recognition effect, selection can have the algorithm of higher discrimination or up-to-date recognizer is stored in knowledge base to this characteristic information, impels knowledge base update.In addition, because some rubbish message can't be differentiated by automatic identification algorithm, also can be upgraded knowledge base by the feature of extracting after the manual examination and verification and rule.Yet in actual applications, malicious user tends to the rubbish message of issue is transformed.The renewal of knowledge base tends to lag behind the variation of rubbish message.So message before the rubbish message that knowledge base identification makes new advances changes, can't automatically process, also namely utilize original knowledge base, often can't examine, filter the rubbish message of new generation, this part filter out or the unrecognized rubbish message that goes out namely becomes historical rubbish message.
Rubbish message in the network generally has, but is not limited to following characteristics:
1) has illegal link.The implication of illegal link is the web site url etc. that comprises advertisement, pornographic, political reaction class content;
2) has obvious rubbish message keyword.As: pornographic key word, political reaction key word, swindle class key word etc.;
3) has the keyword that does not significantly meet normal messages.As: special character ← ↑ ↓ etc.
In specific implementation process, owing to generally can have the rubbish message content that is not easy characteristic information extraction in the webpage, in order to prevent from missing rubbish message to be identified, after this receives the step of the request of browsing named web page, this extracts before the step of content characteristic information of this named web page, can also carry out pre-service to the content of this named web page, such as adopting the literal preprocess method, comprise: remove the space, newline, the English unified small letter that is converted to, the Chinese character code conversion, FJZ, Japanese turns the space, special symbol turns the space, whole-angle figure turns English digital, double byte character turns the asci code character, and Chinese figure turns English digital etc., thereby has realized the comprehensive extraction to the content characteristic information of named web page, wherein, the content characteristic packets of information purse rope page or leaf of this named web page link, key word, picture, in the characteristic informations such as video one or more.Further, this step S101 is to Content Feature Extractions such as the literal of user by contributing in the network applications such as blog, forum, message board, comment, links, this leaching process can trigger when webpage produces PV (network browsing amount) automatically, be included in the new content of the operation triggerings such as content of carrying out when browsing this named web page in the webpage clicking etc., as long as thereby the request that detects browsing page waited just realization identification in other words before user's browsing page, improved the real-time of identification.
In step S102, according to the feature recognition algorithms of storing in the default knowledge base, the characteristic information of storing in the default knowledge base of content characteristic information and this with this named web page mates identification, obtains recognition result.
Wherein, should preset knowledge base for this named web page being write the knowledge base after the knowledge base of using when examining is upgraded, the content of this renewal can comprise characteristic information, New Characteristics recognizer of new rubbish message etc., the characteristic information of the rubbish message that this is new can be new crucial character/word, special character, the illegal chained library of network etc., and this feature recognition algorithms can be the recognizers such as machine learning, Bayes, support vector machine.Concrete update method can be carried out feature information extraction and extract the rule that can be used for automatic identification algorithm etc. for rubbish message that manual examination and verification are gone out etc., and storage be somebody's turn to do the specific location of presetting knowledge base.
In specific implementation process, feature recognition algorithms or recognition rule based on rubbish message pre-stored in the default knowledge base, the content characteristic information of this named web page is preset the characteristic information of storing in knowledge base with this mate identification, for example pre-stored all key words or word in the default knowledge base of the key word in the content characteristic information of this named web page and this mated, perhaps all pre-stored web page interlinkages in the default knowledge base of the web page interlinkage in the content characteristic information of this named web page and this are mated etc., judge the key word that whether has identical or satisfied certain matching condition in this default knowledge base, link etc., to obtain recognition result, wherein this recognition result comprises total number of mating the content characteristic information of identifying this named web page successfully, this coupling is identified the number of each type content characteristic information in the content characteristic information of this named web page successfully etc., the key word of this named web page that for example, the match is successful, web page interlinkage, each number that the match is successful etc. in the types such as picture.
In step S103, according to this recognition result, whether the information of identifying in this named web page belongs to historical rubbish message.
In specific implementation process, this step S103 is specially: judge whether the number of mating the content characteristic information of specified type in the content characteristic information of identifying this named web page successfully surpasses the first predetermined threshold value, and/or whether the total number that should mate the content characteristic information of identifying this named web page successfully surpasses the second predetermined threshold value, be, judge that then the information in this named web page belongs to historical rubbish message, otherwise, judge that the information in this named web page does not belong to historical rubbish message.Wherein, this first predetermined threshold value can be identical or not identical with this second predetermined threshold value, a certain numerical value that sets in advance according to actual conditions for the user.This content characteristic information of mating specified type in the content characteristic information of identifying this named web page successfully can be a certain or polytype content characteristic message of appointment, the number that certain class picture appears in the content characteristic information of for example identify this named web page successfully when coupling is during above the first predetermined threshold value of presetting, or certain class picture of the content characteristic information of appearance this named web page that the match is successful, when the individual number average of video surpasses the first predetermined threshold value etc., think that then the information in this named web page belongs to historical rubbish message, further, can process the historical rubbish message or this named web page that identify by this history rubbish recognition methods, open this named web page etc. with the rubbish message or the total ban that reduce in this named web page.
In embodiments of the present invention, this history rubbish recognition methods is by when receiving the request of browsing named web page, extract the content characteristic information of this named web page, according to the feature recognition algorithms of storing in the default knowledge base, the content characteristic information of this named web page is preset the characteristic information of storing in knowledge base with this mate identification, further according to the recognition result that gets access to, whether the information of identifying in this named web page belongs to historical rubbish message, solved because prior art is just removed the part rubbish message based on writing audit, and to based on writing the historical rubbish message that stays after the audit, needing the whole historical datas of scanning to carry out artificial or semi-automatic identification removes, can't realize automatic identification, so that the identification cost is high, discrimination is low, the problem that real-time and adaptivity are poor, thereby the webpage of only user being browsed is examined, and not viewed webpage is generally without focus, can not examine, realized with lower identification cost, higher discrimination, the purpose of real-time and adaptivity identification rubbish message.
One of ordinary skill in the art will appreciate that all or part of step that realizes in above-described embodiment method is to come the relevant hardware of instruction to finish by program, described program can be stored in the computer read/write memory medium, described storage medium is such as ROM/RAM, disk, CD etc.
Embodiment two:
Fig. 2 shows the structure of the recognition system of the historical rubbish message that second embodiment of the invention provides, and for convenience of explanation, only shows the part relevant with the embodiment of the invention.
The recognition system of this history rubbish message comprises feature extraction unit 21, coupling recognition unit 22 and recognition unit 23, wherein:
Feature extraction unit 21 is used for extracting the content characteristic information of this named web page when receiving the request of browsing named web page.
In embodiments of the present invention, before triggering feature extraction unit 21, need to utilize prior art to create this named web page and this named web page is write audit, this named web page is for through writing the webpage after the audit, and the knowledge base of the knowledge base of should default knowledge base using when this named web page is write audit after upgrading; In the content characteristic packets of information purse rope page or leaf link of this named web page, key word, picture, the video information one or more.Before user's browsing page, wait in other words the content characteristic information that just realizes this named web page of extraction and identify further rubbish message as long as the application is the request that detects browsing page, can improve the real-time of rubbish message identification.
In addition, the recognition system of this history rubbish message also comprises pretreatment unit, is used in advance the content of this named web page being carried out pre-service before the content characteristic information of extracting this named web page.Such as adopting the literal preprocess method, comprise: remove space, newline, English unified small letter, Chinese character code conversion, the FJZ of being converted to, Japanese turns the space, and special symbol turns the space, and whole-angle figure turns English digital, double byte character turns the asci code character, Chinese figure turns English digital etc., thereby can prevent from missing difficult indiscernible rubbish message, realizes the comprehensive extraction to the content characteristic information of named web page.
Coupling recognition unit 22 is used for the feature recognition algorithms of storing according to default knowledge base, with the content characteristic information of this named web page with should default knowledge base in the characteristic information stored mate identification, obtain recognition result.
Wherein, should preset knowledge base for this named web page being write the knowledge base after the knowledge base of using when examining is upgraded, the content of this renewal can comprise characteristic information, New Characteristics recognizer of new rubbish message etc., the characteristic information of the rubbish message that this is new can be new crucial character/word, special character, the illegal chained library of network etc., and this feature recognition algorithms can be the recognizers such as machine learning, Bayes, support vector machine.Concrete update method can be carried out feature information extraction and extract the rule that can be used for automatic identification algorithm etc. for rubbish message that manual examination and verification are gone out etc., and storage be somebody's turn to do the specific location of presetting knowledge base.And this recognition result comprises total number of mating the content characteristic information of identifying this named web page successfully and/or the number of mating each type content characteristic information in the content characteristic information of identifying this named web page successfully.
Recognition unit 23 is used for according to this recognition result, and whether the information of identifying in this named web page belongs to historical rubbish message.
This recognition unit 23 specifically comprises coupling recognition unit 231 and identifying unit 232, wherein:
Coupling recognition unit 231, whether be used for judging the number of content characteristic information of content characteristic information specified type that this coupling identify this named web page successfully above the first predetermined threshold value, and/or be somebody's turn to do mate the content characteristic information of identifying this named web page successfully total number whether above the second predetermined threshold value; And
Identifying unit 232, be used for when this coupling recognition unit Output rusults when being, judge that the information in this named web page belongs to historical rubbish message.
In embodiments of the present invention, can utilize existing feature recognition algorithms, such as machine learning algorithm, again based on rubbish message pre-stored in the knowledge base, content characteristic information in this named web page is identified, obtain recognition result, such as the total number that can obtain the content characteristic information that belongs to rubbish message that identifies, the number of the content characteristic information of the rubbish message of each type etc., in addition, the rubbish message of this named web page of identifying of this place is based on and writes the historical rubbish message that audit stays.Then utilize 231 pairs of these recognition results of coupling recognition unit to judge, judge namely that also this coupling identifies the number of the content characteristic information of specified type in the content characteristic information of this named web page successfully and whether surpass the first predetermined threshold value, and/or whether the total number that should mate the content characteristic information of identifying this named web page successfully surpasses the second predetermined threshold value etc., thereby when recognition result is output as while being, identifying unit 232 can judge that the information in this named web page belongs to historical rubbish message, otherwise, information in this named web page does not belong to historical rubbish message
In embodiments of the present invention, the feature recognition algorithms of this history rubbish message recognition system by storing in the default knowledge base of feature extraction unit 21 bases, the content characteristic information of this named web page is preset the characteristic information of storing in knowledge base with this mate identification, obtain recognition result, thereby recognition unit 23 is according to this recognition result, whether the information of identifying in this named web page belongs to historical rubbish message, after also namely the information in creating this webpage and webpage is write audit, again read audit identification, so that more effective to the filtration of historical rubbish message, and with respect to methods such as existing manual examination and verification, the identification cost is lower, and real-time and adaptivity are higher.
The embodiment of the invention is by extracting the content characteristic information of certain named web page that need to browse, this content characteristic information is carried out the identification of historical rubbish message, judge whether the message in this named web page belongs to historical rubbish message, thereby this named web page is carried out respective handling, solved prior art often by the whole historical datas of scanning, and utilize manual or automanual mode that historical rubbish message is identified, can't automatically identify, cause occurring the identification cost high, reaction is slow, the problems such as adaptivity is poor, so that under the prerequisite that does not improve the identification cost, realized the raising adaptivity, the purpose of real-time and discrimination etc.
The above only is preferred embodiment of the present invention, not in order to limiting the present invention, all any modifications of doing within the spirit and principles in the present invention, is equal to and replaces and improvement etc., all should be included within protection scope of the present invention.

Claims (10)

1. the recognition methods of a historical rubbish message is characterized in that, described method comprises the steps:
When receiving the request of browsing named web page, extract the content characteristic information of described named web page;
According to the feature recognition algorithms of storing in the default knowledge base, the characteristic information of storing in the content characteristic information of described named web page and the described default knowledge base is mated identification, obtain recognition result;
According to described recognition result, whether the information of identifying in the described named web page belongs to historical rubbish message.
2. the method for claim 1 is characterized in that, described named web page is write audit webpage afterwards for process, and described default knowledge base is the knowledge base after the knowledge base of use was upgraded when described named web page was write audit.
3. method as claimed in claim 2 is characterized in that, one or more in the content characteristic packets of information purse rope page or leaf link of described named web page, key word, picture, the video information.
4. method as claimed in claim 3, it is characterized in that, described recognition result comprises total number of mating the content characteristic information of identifying described named web page successfully and/or the number of mating each type content characteristic information in the content characteristic information of identifying described named web page successfully.
5. method as claimed in claim 4 is characterized in that, and is described according to described recognition result, and whether the information in the described named web page identified is that the step of historical rubbish message is specially:
Judge that described coupling identifies the number of the content characteristic information of specified type in the content characteristic information of described named web page successfully and whether surpass the first predetermined threshold value, and/or whether described coupling identify total number of content characteristic information of described named web page successfully above the second predetermined threshold value;
Be, judge that then the information in the described named web page belongs to historical rubbish message, no, judge that then the information in the described named web page does not belong to historical rubbish message.
6. the recognition system of a historical rubbish message is characterized in that, described system comprises:
Feature extraction unit is used for extracting the content characteristic information of described named web page when receiving the request of browsing named web page;
The coupling recognition unit is used for the feature recognition algorithms of storing according to default knowledge base, and the characteristic information of storing in the content characteristic information of described named web page and the described default knowledge base is mated identification, obtains recognition result; And
Recognition unit is used for according to described recognition result, and whether the information of identifying in the described named web page belongs to historical rubbish message.
7. system as claimed in claim 6 is characterized in that, described named web page is write audit webpage afterwards for process, and described default knowledge base is the knowledge base after the knowledge base of use was upgraded when described named web page was write audit.
8. system as claimed in claim 7 is characterized in that, one or more in the content characteristic packets of information purse rope page or leaf link of described named web page, key word, picture, the video information.
9. system as claimed in claim 8, it is characterized in that, described recognition result comprises total number of mating the content characteristic information of identifying described named web page successfully and/or the number of mating each type content characteristic information in the content characteristic information of identifying described named web page successfully.
10. system as claimed in claim 9 is characterized in that, described recognition unit specifically comprises:
The coupling recognition unit, whether be used for judging the number of content characteristic information of content characteristic information specified type that described coupling identify described named web page successfully above the first predetermined threshold value, and/or whether described coupling identifies total number of content characteristic information of described named web page successfully above the second predetermined threshold value; And
Identifying unit, be used for when described coupling recognition unit Output rusults when being, judge that the information in the described named web page belongs to historical rubbish message.
CN2012100744065A 2012-03-20 2012-03-20 Identification method and system for history waste information Pending CN103324617A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2012100744065A CN103324617A (en) 2012-03-20 2012-03-20 Identification method and system for history waste information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2012100744065A CN103324617A (en) 2012-03-20 2012-03-20 Identification method and system for history waste information

Publications (1)

Publication Number Publication Date
CN103324617A true CN103324617A (en) 2013-09-25

Family

ID=49193365

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2012100744065A Pending CN103324617A (en) 2012-03-20 2012-03-20 Identification method and system for history waste information

Country Status (1)

Country Link
CN (1) CN103324617A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104391694A (en) * 2014-11-05 2015-03-04 工业和信息化部电子科学技术情报研究所 Intelligent mobile terminal software public service support platform system
WO2015101353A1 (en) * 2014-01-06 2015-07-09 Tencent Technology (Shenzhen) Company Limited Method and apparatus for processing text information
CN105320851A (en) * 2014-08-05 2016-02-10 腾讯科技(深圳)有限公司 Safety detection method and device for webpage
CN105553918A (en) * 2014-10-28 2016-05-04 广州华多网络科技有限公司 Method and apparatus for recognizing malicious information
CN106933863A (en) * 2015-12-30 2017-07-07 华为技术有限公司 Data clearing method and device
CN108833962A (en) * 2018-05-25 2018-11-16 咪咕音乐有限公司 A kind of display information processing method and device and storage medium
CN110020035A (en) * 2017-09-06 2019-07-16 腾讯科技(北京)有限公司 Data identification method and device, storage medium and electronic device
CN110874730A (en) * 2018-09-04 2020-03-10 Oppo广东移动通信有限公司 Information processing method, information processing device and mobile terminal
CN112634090A (en) * 2020-12-15 2021-04-09 深圳市彬讯科技有限公司 Home decoration information reporting management method, system, computer device and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1696943A (en) * 2004-05-13 2005-11-16 上海极软软件技术有限公司 Self-adaptive method for filtering out garbage E-mails safely
US7624274B1 (en) * 2004-02-11 2009-11-24 AOL LLC, a Delaware Limited Company Decreasing the fragility of duplicate document detecting algorithms
CN102158428A (en) * 2011-04-18 2011-08-17 柳州职业技术学院 Rapid and high-accuracy junk mail filtering method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7624274B1 (en) * 2004-02-11 2009-11-24 AOL LLC, a Delaware Limited Company Decreasing the fragility of duplicate document detecting algorithms
CN1696943A (en) * 2004-05-13 2005-11-16 上海极软软件技术有限公司 Self-adaptive method for filtering out garbage E-mails safely
CN102158428A (en) * 2011-04-18 2011-08-17 柳州职业技术学院 Rapid and high-accuracy junk mail filtering method

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10387460B2 (en) 2014-01-06 2019-08-20 Tencent Technology (Shenzhen) Company Limited Method and apparatus for processing text information
WO2015101353A1 (en) * 2014-01-06 2015-07-09 Tencent Technology (Shenzhen) Company Limited Method and apparatus for processing text information
US11151176B2 (en) 2014-01-06 2021-10-19 Tencent Technology (Shenzhen) Company Limited Method and apparatus for processing text information
CN105320851A (en) * 2014-08-05 2016-02-10 腾讯科技(深圳)有限公司 Safety detection method and device for webpage
CN105553918A (en) * 2014-10-28 2016-05-04 广州华多网络科技有限公司 Method and apparatus for recognizing malicious information
CN105553918B (en) * 2014-10-28 2019-07-02 广州华多网络科技有限公司 A kind of method and device identifying fallacious message
CN104391694B (en) * 2014-11-05 2018-04-03 工业和信息化部电子科学技术情报研究所 Intelligent mobile terminal software public service support platform system
CN104391694A (en) * 2014-11-05 2015-03-04 工业和信息化部电子科学技术情报研究所 Intelligent mobile terminal software public service support platform system
CN106933863A (en) * 2015-12-30 2017-07-07 华为技术有限公司 Data clearing method and device
CN106933863B (en) * 2015-12-30 2019-04-19 华为技术有限公司 Data clearing method and device
CN110020035A (en) * 2017-09-06 2019-07-16 腾讯科技(北京)有限公司 Data identification method and device, storage medium and electronic device
CN110020035B (en) * 2017-09-06 2023-05-12 腾讯科技(北京)有限公司 Data identification method and device, storage medium and electronic device
CN108833962A (en) * 2018-05-25 2018-11-16 咪咕音乐有限公司 A kind of display information processing method and device and storage medium
CN110874730A (en) * 2018-09-04 2020-03-10 Oppo广东移动通信有限公司 Information processing method, information processing device and mobile terminal
CN112634090A (en) * 2020-12-15 2021-04-09 深圳市彬讯科技有限公司 Home decoration information reporting management method, system, computer device and storage medium

Similar Documents

Publication Publication Date Title
CN103324617A (en) Identification method and system for history waste information
CN107437038B (en) Webpage tampering detection method and device
CN110297988A (en) Hot topic detection method based on weighting LDA and improvement Single-Pass clustering algorithm
CN105868332A (en) hot topic recommendation method and device
CN103020207B (en) Browser label page grouping management method and device
CN104462509A (en) Review spam detection method and device
CN102870116B (en) Method and apparatus for content matching
CN104484407A (en) Method and system for recognizing fraud information
CN108038173B (en) Webpage classification method and system and webpage classification equipment
CN105446572A (en) Text-editing method and device used for screen display device
CN107203574A (en) Data management and the polymerization of data analysis
CN102567473A (en) Network information retrieval system and retrieval method
CN104009964A (en) Network link detection method and system
CN112132710A (en) Legal element processing method and device, electronic equipment and storage medium
CN114915468B (en) Intelligent analysis and detection method for network crime based on knowledge graph
JP4957796B2 (en) Difference calculation program, difference calculation device, and difference calculation method
CN112307318B (en) Content publishing method, system and device
CN114357335A (en) Information acquisition method, medium, device and computing equipment
CN103605742A (en) Method and device for recognizing network resource entity content page
CN113032001A (en) Intelligent contract classification method and device
US11755958B1 (en) Systems and methods for detecting cryptocurrency wallet artifacts in a file system
CN104182479A (en) Method and device for processing information
CN102467537A (en) Method and device for deleting vocabulary
CN103093213A (en) Video file classification method and terminal
CN113741864B (en) Automatic semantic service interface design method and system based on natural language processing

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20130925

RJ01 Rejection of invention patent application after publication