CN103886016B - A kind of method and apparatus for being used to determine the rubbish text information in the page - Google Patents

A kind of method and apparatus for being used to determine the rubbish text information in the page Download PDF

Info

Publication number
CN103886016B
CN103886016B CN201410058591.8A CN201410058591A CN103886016B CN 103886016 B CN103886016 B CN 103886016B CN 201410058591 A CN201410058591 A CN 201410058591A CN 103886016 B CN103886016 B CN 103886016B
Authority
CN
China
Prior art keywords
information
rubbish text
candidate
text information
rubbish
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410058591.8A
Other languages
Chinese (zh)
Other versions
CN103886016A (en
Inventor
施鹏
牛章鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201410058591.8A priority Critical patent/CN103886016B/en
Publication of CN103886016A publication Critical patent/CN103886016A/en
Application granted granted Critical
Publication of CN103886016B publication Critical patent/CN103886016B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

It is an object of the invention to provide a kind of method and apparatus for being used to determine the rubbish text information in the page.Specifically, pending initial page is obtained;Determine one or more candidate's rubbish text information corresponding to initial page;Determine the cheating degree information corresponding to candidate's rubbish text information;According to cheating degree information, one or more rubbish text information corresponding to initial page are determined from one or more candidate's rubbish text information.Wherein, compared with prior art, cheating degree information of the invention by determining candidate's rubbish text information corresponding to initial page, with according to cheating degree information, the rubbish text information corresponding to initial page is determined from candidate's rubbish text information, realize and candidate's rubbish text information is screened according to cheating degree information, efficiently identify out the rubbish text information in initial page, user is not only increased to obtain the security of information and obtain the efficiency of information, correspondingly, user's search viewing experience is also improved.

Description

A kind of method and apparatus for being used to determine the rubbish text information in the page
Technical field
The present invention relates to Internet technical field, more particularly to a kind of skill for being used to determine the rubbish text information in the page Art.
Background technology
Currently, with the infiltration that the development and the Internet, applications of Internet technology learn to user, worked with life, people More and more by network acquisition information, such as by inputting keyword in search engine search column to express its demand, enter And obtain corresponding search result.When website corresponding for search result there may be security risk, search engine/browser Etc. the security of the website can be prompted the user with, website security risk that may be present is such as prompted to user, however, generally simultaneously It is not that all pages in website have security risk, but there is security risk in some of some pages information, such as when station The no security risk of point but some pages therein are when having rubbish text information, is pointed out using website as the security risk of coarseness The rubbish text information in page-out can not be detected, the security of information is obtained so as to have impact on user and obtains the effect of information Rate, reduces user's search viewing experience.
The content of the invention
It is an object of the invention to provide a kind of method and apparatus for being used to determine the rubbish text information in the page.
According to an aspect of the invention, there is provided a kind of method for being used to determine the rubbish text information in the page, its In, this method comprises the following steps:
A obtains pending initial page;
B determines one or more candidate's rubbish text information corresponding to the initial page;
C determines the cheating degree information corresponding to candidate's rubbish text information;
D determines the initial page according to the cheating degree information from one or more of candidate's rubbish text information One or more rubbish text information corresponding to face.
According to another aspect of the present invention, a kind of rubbish text for being used to determine the rubbish text information in the page is additionally provided This determination equipment, wherein, the rubbish text determines that equipment includes:
Acquisition device, the pending initial page for obtaining;
Candidate's determining device, for determining one or more candidate's rubbish text information corresponding to the initial page;
Cheating degree determining device, for determining the cheating degree information corresponding to candidate's rubbish text information;
Rubbish determining device, for according to the cheating degree information, from one or more of candidate's rubbish text information The middle one or more rubbish text information determined corresponding to the initial page.
Compared with prior art, the present invention is by determining that one or more candidate's rubbish texts corresponding to initial page are believed The cheating degree information of breath, so that according to the cheating degree information, institute is determined from one or more of candidate's rubbish text information One or more rubbish text information corresponding to initial page are stated, are realized according to cheating degree information to candidate's rubbish text This information is screened, and efficiently identifies out the rubbish text information in the initial page, is not only increased user and is obtained letter The security of breath and the efficiency for obtaining information, correspondingly, also improve user's search viewing experience.Moreover, the present invention can also give birth to Into the target pages corresponding with the initial page, wherein, the target pages are included to one or more of rubbish text The display identification information of at least one in this information, to be supplied to user, rower is entered by the rubbish text information in initial page Know, for pointing out user, obtain the security of information so as to further increasing user and obtain the efficiency of information, improve User searches for viewing experience.In addition, the present invention can also be it is determined that during the cheating degree information, except according to candidate's rubbish text Storehouse frequency information and user corresponding to information are presented outside percent information, may also be combined with corresponding to candidate's rubbish text information Presentation probabilistic information so that the obtained cheating degree information is more accurate, thus further improve user acquisition The security of information and the efficiency for obtaining information, improve user's search viewing experience.
Brief description of the drawings
By reading the detailed description made to non-limiting example made with reference to the following drawings, of the invention is other Feature, objects and advantages will become more apparent upon:
Fig. 1 shows the equipment schematic diagram for being used to determine the rubbish text information in the page according to one aspect of the invention;
Fig. 2 shows the pending initial page schematic diagram obtained;
Fig. 3 shows the target pages schematic diagram corresponding with the initial page shown in Fig. 2 of generation, wherein, the target The page includes the display identification information of rubbish text information;
Fig. 4 shows that the equipment for determining the rubbish text information in the page in accordance with a preferred embodiment of the present invention is shown It is intended to;
Fig. 5 shows the method flow for being used to determine the rubbish text information in the page according to a further aspect of the present invention Figure;
Fig. 6 shows the method stream for being used to determine the rubbish text information in the page in accordance with a preferred embodiment of the present invention Cheng Tu.
Same or analogous reference represents same or analogous part in accompanying drawing.
Embodiment
The present invention is described in further detail below in conjunction with the accompanying drawings.
Fig. 1 shows to be determined according to the rubbish text for being used to determine the rubbish text information in the page of one aspect of the invention Equipment 1, wherein, rubbish text determines that equipment 1 includes acquisition device 11, candidate's determining device 12, the and of cheating degree determining device 13 Rubbish determining device 14.Specifically, acquisition device 11 obtains pending initial page;Candidate's determining device 12 determines described first One or more candidate's rubbish text information corresponding to the beginning page;Cheating degree determining device 13 determines candidate's rubbish text Cheating degree information corresponding to information;Rubbish determining device 14 is according to the cheating degree information, from one or more of candidates One or more rubbish text information corresponding to the initial page are determined in rubbish text information.Here, rubbish text is true Locking equipment 1 includes but is not limited to as user by it by oneself original content displaying or can be supplied to the interconnections of other users Net platform, such as i)For providing information storage space for its login user, to realize that the user is uploaded to share its content such as text Shelves, video, picture;It can also be used to provide the user online reading, download, the network platform for exchanging the content that other users are shared Or terminal platform, such as Baidu library, beans fourth, Sina love ask, wherein, the terminal platform include but is not limited to mobile terminal, The user equipmenies such as PC;ii)The net of message reference, information sharing, information issue or synchronization is provided for being embodied as its login user Network platform or terminal platform, such as social network sites, mhkc, forum, knowledge question sharing platform, space, blog, microblogging third party Website.Here, the rubbish text determines that equipment 1 can be passed through by the network equipment, user equipment or the network equipment and user equipment Network is integrated constituted equipment and realized.Taken here, the network equipment includes but is not limited to such as network host, single network Device, multiple webserver collection or the set of computers based on cloud computing etc. of being engaged in are realized;Or realized by user equipment.Here, Cloud is by based on cloud computing(Cloud Computing)A large amount of main frames or the webserver constitute, wherein, cloud computing is distributed One kind of calculating, a super virtual computer being made up of the computer collection of a group loose couplings.Here, the user equipment Can be that any one by modes such as keyboard, mouse, touch pad, touch-screen or handwriting equipments can carry out man-machine friendship with user Mutual electronic product, such as computer, mobile phone, PDA, palm PC PPC or tablet personal computer.The network includes but is not limited to Internet, wide area network, Metropolitan Area Network (MAN), LAN, VPN, wireless self-organization network(Ad Hoc networks)Deng.People in the art Member will be understood that above-mentioned rubbish text determines that equipment 1 is only for example, other existing or network equipments for being likely to occur from now on or User equipment is such as applicable to the present invention, should also be included within the scope of the present invention, and be contained in by reference herein This.Here, the network equipment and user equipment can carry out numerical value automatically including a kind of according to the instruction for being previously set or storing The electronic equipment with information processing is calculated, its hardware includes but is not limited to microprocessor, application specific integrated circuit (ASIC), may be programmed Gate array(FPGA), digital processing unit(DSP), embedded device etc..
For example, when rubbish text determines that equipment 1 is realized by user equipment, it can be obtained by the browser of user's equipment end The accessing page request of family submission is taken, to obtain pending initial page;It is then determined corresponding to the initial page One or more candidate's rubbish text information;Then, then cheating degree information corresponding to candidate's rubbish text information is determined; According to the cheating degree information, determined from one or more of candidate's rubbish text information corresponding to the initial page One or more rubbish text information, the rubbish text information is provided to user equipment by browser, and then is supplied to User.
For example, when rubbish text determines that equipment 1 is realized by the network equipment, it can receive user and be sent by user equipment Accessing page request, and the accessing page request is sent to page server, receive that page server returns with the page The corresponding page of face access request, to obtain pending initial page;It is then determined one corresponding to the initial page Individual or multiple candidate's rubbish text information;Then, then cheating degree information corresponding to candidate's rubbish text information is determined;Root According to the cheating degree information, one corresponding to the initial page is determined from one or more of candidate's rubbish text information Individual or multiple rubbish text information, the rubbish text information is sent to user equipment, such as passes through browsing in user equipment Device shows the rubbish text information, and then is supplied to user.
For example, when rubbish text determines that equipment 1 is realized with user equipment and the network equipment, user equipment can be first Obtain pending initial page;Then, the initial page is sent to the corresponding network equipment by user equipment, set by network The standby one or more candidate's rubbish text information determined corresponding to the initial page;Determine candidate's rubbish text information Corresponding cheating degree information;According to the cheating degree information, determined from one or more of candidate's rubbish text information One or more rubbish text information corresponding to the initial page;Then, the network equipment sends the rubbish text information To user equipment, so that the rubbish text information is supplied into user by user equipment.Also such as, when rubbish text determine equipment 1 by When user equipment and network equipment cooperation are realized, it can also obtain pending initial page first by user equipment and determine described One or more candidate's rubbish text information corresponding to initial page;Then, by user equipment by candidate's rubbish text information Send to the network equipment, as the cheating degree information corresponding to the network equipment determines candidate's rubbish text information;According to described Cheating degree information, determines one or many corresponding to the initial page from one or more of candidate's rubbish text information Individual rubbish text information;Then, then by the network equipment rubbish text information is sent to user equipment, to be incited somebody to action by user equipment The rubbish text information is supplied to user.Here, it will be appreciated by those skilled in the art that above-mentioned user equipment and the network equipment are matched somebody with somebody When conjunction realizes that rubbish text determines equipment 1, those skilled in the art can carry out any to the division of labor of user equipment and the network equipment Appropriate change, the change is all contained within protection scope of the present invention.
Specifically, the application programming interfaces that acquisition device 11 is provided by third party devices such as browser, search engines (API), obtain pending initial page;Or, user is obtained by dynamic web page techniques such as JSP, ASP and passes through user equipment The inquiry operation of submission, such as clicks in the page and links, and with the application programming interfaces provided by browser, obtains the link signified To the page, to obtain pending initial page;Or, by dynamic web page techniques such as JSP, ASP, obtain user by using The search sequence of family equipment input, then the search sequence is submitted into search engine, and receive that search engine fed back with this The corresponding search result of search sequence, to be used as pending initial page;Or arrange to communicate by http, https etc. Mode, obtains pending initial page.For example, user A knows searching for search by its PC equipment in search engine such as Baidu In rope column input keyword " baby have milk powder digestion it is bad, what if", search button is clicked on, then acquisition device 11 passes through The dynamic web page techniques such as ASP, JSP, get the search sequence of user A inputs, and based on by the search sequence to search engine Searching request, the application programming interfaces provided by search engine are provided(API)Search engine is obtained according to the keyword " baby What if having milk powder digestion bad" carry out that matching inquiry obtains " it is bad that baby has milk powder digestion how with the keyword Do" one or more search results for matching are such as:search result1:" baby have milk powder digestion it is bad, what if_ Child-bearing question and answer _ baby tree ", search result2:" baby eat milk powder digestion it is bad what if- child-bearing question and answer-child-bearing net ", search result3:" baby have milk powder digestion it is bad, what if_ Baidu is known ", search result4:" sucking baby What is it about indigestion_ Baidu is known " etc., to be used as pending initial page.
Those skilled in the art will be understood that the mode of the pending initial page of above-mentioned acquisition is only for example, and other are existing Or the mode of the pending initial page of acquisition that is likely to occur from now on be such as applicable to the present invention, should also be included in the present invention Within protection domain, and it is incorporated herein by reference herein.
Candidate's determining device 12 determines one or more candidate's rubbish text information corresponding to the initial page. This, the rubbish text information refers to non-vital data present in the page, risk information etc., as user answers other users Recommend certain destination object during problem, and the destination object might not can answer the problem, then the destination object is rubbish Text message.Wherein, the destination object refers to that people can meet times of certain demand of consumer or user to what market was provided What article or service.Here, candidate's determining device 12 determine candidate's rubbish text information mode include but is not limited to Under any one of at least:
1)User's operating characteristics information according to corresponding to character string in the content of pages information of the initial page, it is determined that One or more candidate's rubbish text information corresponding to the initial page.Specifically, candidate's determining device 12 passes through first Html tag analysis such as is carried out to the initial page, or, by the abstracting method based on wrapper wrapper, obtain The content of pages information of the initial page;Then, semantic analysis processing is carried out to the content of pages information, it is described to obtain Text string included in the content of pages information of initial page;Then, operated further according to the user corresponding to the character string special Reference ceases, and determines one or more candidate's rubbish text information corresponding to the initial page, and such as which text string belongs to time Select rubbish text.Here, user's operating characteristics information includes but is not limited to such as:i)User's weight corresponding to the character string Multiple behavioural information;ii)User's act of deleting information corresponding to the character string.
For example, it is assumed that the pending initial page that acquisition device 11 is obtained is search result2:" baby eats milk powder Digestion it is bad what if- child-bearing question and answer-child-bearing net ", and candidate's determining device 12 carries out html tag analysis to it first, and Content of pages information to initial page search result2 carries out semantic analysis processing, and obtaining its corresponding character string includes As " having some probiotics to baby ", " my family baby drinks Heng Shi milk powder always, does not occur indigestion phenomenon, and parent can have a try ", " can Precious milk powder is thought by Switzerland that has a try, good absorption, and not getting angry being capable of strong baby's enteron aisle and strengthen immunity ", " not being milk powder problem ", " such situation, which occurs, in baby proves this unsuitable milk powder, can try to change plate ", it is assumed that answer and include text string The content of " Heng Shi milk powder " corresponds to same user, and the user repeatedly answers the content for including text string " Heng Shi milk powder ", says The possibility of bright user's Malicious recommendation " Heng Shi milk powder " is larger, then candidate's determining device 12 can be using " Heng Shi milk powder " described in Candidate's rubbish text information;For another example, it is assumed that answer the user comprising text string " precious milk powder is thought by Switzerland " content and there is multiple answer The behavior deleted in turn, illustrates that the cheating suspicion of text string " Switzerland think precious milk powder " is larger, then candidate's determining device 12 can be by " precious milk powder is thought by Switzerland " is used as candidate's rubbish text information.
For another example, it is assumed that the pending initial page that acquisition device 11 is obtained is search result3 as shown in Figure 2: " baby have milk powder digestion it is bad, what if_ Baidu is known ", and the page includes answer I to IV below to the problem:
Ⅰ:Baby's digestion is bad, it may be possible to because the problem of milk powder, trying, with lower Jia Beiaite milk powder, to see that people was used in the past Well;
Ⅱ:It is slow that stomach absorbs
Baby's indigestion is solved the problems, such as, paediatrics specialist is often it is recommended that newborn good shellfish probiotics, newborn good shellfish probiotics can make Intestines and stomach produce a variety of organic acids and digestive ferment, help baby to assimilate food, improve a poor appetite, the lactose of generation, acetic acid etc., can be with Strengthen the intestines peristalsis of baby, promote digestion.
Ⅲ:First change milk powder to have a try, probiotics can also be usually had some to baby improves stomach, digestant;
Ⅳ:My family baby uses Jia Beiaite milk powder, indigestion situation does not occur, and parent can have a try.
Candidate's determining device 12 carries out semantic analysis processing to above-mentioned answer I to IV first, obtains its corresponding character string Including such as " trying with lower Jia Beiaite milk powder ", " baby's indigestion is solved the problems, such as, paediatrics specialist is often it is recommended that newborn good shellfish benefit Raw bacterium ", " first changing milk powder to have a try, probiotics can also be usually had some to baby improves stomach, digestant ", " my family's treasured Treasured uses Jia Beiaite milk powder, indigestion situation does not occur, parent can have a try ", it is assumed that above-mentioned answer I and IV is used from same Family, and the user equally recommends " Jia Beiaite " milk powder in other answers on " baby eats milk powder indigestion " problem, Illustrate that the possibility of user's Malicious recommendation " Jia Beiaite " milk powder is larger, then candidate's determining device 12 can make " Jia Beiaite " For candidate's rubbish text information;For another example, it is assumed that answer comprising text " newborn good shellfish probiotics " content in above-mentioned answer II User there is the behavior deleted in turn of repeatedly answering, illustrate that the cheating suspicion of text string " Switzerland's think of treasured milk powder " is larger, then " newborn good shellfish probiotics " can be used as candidate's rubbish text information by candidate's determining device 12.
Those skilled in the art will be understood that above-mentioned user's operating characteristics information is only for example, and other are existing or from now on may be used The user's operating characteristics information that can occur such as is applicable to the present invention, should also be included within the scope of the present invention, and herein It is incorporated herein by reference.
2)According to the character string in the content of pages information of the initial page, matched in rubbish text information bank Inquiry, to obtain one or more candidate's rubbish text information corresponding to the initial page.For example, connecting example, candidate is true Determining device 12 can be according to the character string in initial page search result2 content of pages information as " being had some to baby prebiotic Bacterium ", " my family baby drinks Heng Shi milk powder always, does not occur indigestion phenomenon, and parent can have a try ", " precious milk powder is thought by the Switzerland that can have a try, Good absorption, not getting angry being capable of strong baby's enteron aisle and strengthen immunity ", " not being milk powder problem ", " there are such feelings in baby Condition proves this unsuitable milk powder, can try to change plate ", matching inquiry is carried out in rubbish text information bank, to obtain Obtain one or more candidate's rubbish text information corresponding to the initial page such as " Heng Shi milk powder ", " precious milk powder is thought by Switzerland ". Here, the rubbish text information bank can be determined in equipment 1 positioned at rubbish text, it may be alternatively located at and pass through with rubbish text equipment 1 In the connected other equipment of network, such as server.
Those skilled in the art will be understood that one or more candidate's rubbish corresponding to the above-mentioned determination initial page The mode of text message is only for example, one described in other existing or determinations for being likely to occur from now on corresponding to initial page Or the mode of multiple candidate's rubbish text information is such as applicable to the present invention, it should also be included within the scope of the present invention, and It is incorporated herein by reference herein.
Cheating degree determining device 13 determines the cheating degree information corresponding to candidate's rubbish text information.Here, described Cheating degree message reflection candidate's rubbish text information belongs to the degree of non-vital data and/or the degree with risk, When the cheating degree information corresponding to candidate's rubbish text information is bigger, illustrate its belong to the degree of non-vital data it is bigger and/ Or the degree with risk is higher.Here, cheating degree determining device 13 determines the work corresponding to candidate's rubbish text information The mode of disadvantage degree information includes but is not limited to following at least any one:
1)Percent information is presented in storehouse frequency information and user according to corresponding to candidate's rubbish text information, it is determined that described Cheating degree information.Here, storehouse frequency information and use of the cheating degree determining device 13 according to corresponding to candidate's rubbish text information Family is presented percent information and determines that the mode of the cheating degree information includes but is not limited to following at least any one:
A) cheating degree determining device 13 can be according to equation below(1)Determine the cheating degree information:
Wherein, C represents the storehouse frequency information corresponding to candidate's rubbish text information, and what is such as occurred in knowledge base includes The quantity of the model of candidate's rubbish text information, here, the knowledge base includes the website corresponding to the initial page Text database, such as forum/question and answer types of web pages, its corresponding knowledge base be correspondence website include user hair Cloth model and the text database for answering model;BiRepresent that ratio is presented on the user of candidate's rubbish text information in user i Occur the ratio of the text of candidate's rubbish text information comprising as described in example information, all texts issued such as user i, The presentation wish degree for being user's presentation to candidate's rubbish text information is expressed, such as when summation numerical value is smaller, illustrates to present Wish is bigger, correspondingly, BiIt is bigger, when summation numerical value is bigger, illustrate that presentation wish is smaller, correspondingly, BiIt is smaller;N represents issue The total number of users amount of text comprising candidate's rubbish text information;Y ' represents the cheating degree information.For example, it is assumed that candidate Determining device 12 determines initial page search result3 as shown in Figure 2:" baby have milk powder digestion it is bad, what if_ Baidu is known " candidate's rubbish text information be " Jia Beiaite " and " newborn good shellfish probiotics ", it is assumed that initial page search Website text database corresponding to result3(That is knowledge base)For post database3, wherein, candidate's rubbish text information C numerical value corresponding to " Jia Beiaite " is 1000, has during 3 users post and occurs in that " Jia Beiaite ", and corresponding BiNumber Value is respectively 1/2,1/3,1/5, and the C numerical value corresponding to candidate's rubbish text information " newborn good shellfish probiotics " is 500, has 2 Individual user occurs in that " newborn good shellfish probiotics " in posting, and corresponding BiNumerical value is respectively 1/15,1/25, then cheating degree determines dress 13 are put according to above-mentioned formula(1), can calculate obtain candidate's rubbish text information " Jia Beiaite " and " newborn good shellfish probiotics " institute it is right The cheating degree information answered is respectively 100,12.5.
B) percent information is presented in storehouse frequency information and user according to corresponding to candidate's rubbish text information, and combines institute The page subject matter information of initial page is stated, the cheating degree information is determined.Specifically, cheating degree determining device 13 is first by right The content of pages information of the initial page carries out word segmentation processing, obtains multiple keywords corresponding to the initial page, so Statistical disposition is carried out to the plurality of keyword afterwards, the most keyword of number of times is such as will appear from as the page master of the initial page Inscribe information;Then, cheating degree determining device 13 determines the type of theme information corresponding to the page subject matter information, such as whether being On article and/or the presentation page of service, to determine that the page subject matter information is joined on the adjustment of the cheating degree information Number, such as when the page subject matter information is on article and/or the presentation page of service, then the page subject matter information is on institute Adjusting parameter d=1 of cheating degree information is stated, when the page subject matter information is not the presentation page on article and/or service When, then adjusting parameter d ∈ (0,1) of the page subject matter information on the cheating degree information, here, adjusting parameter d numerical value It can be predetermined or obtained by machine learning;Then, cheating degree determining device 13 can be according to below equation(2) Determine the cheating degree:
For example, it is assumed that cheating degree determining device 13 determines initial page search result3:" baby has milk powder digestion not It is good, what if_ Baidu is known " page body information for " baby have milk powder digestion it is bad the reason for ", be not on article And/or service the presentation page, then cheating degree determining device 13 can determine that the page subject matter information on the cheating degree information Adjusting parameter such as d=0.1, then cheating degree determining device 13 is according to above-mentioned formula(2)It can calculate and obtain candidate's rubbish text information " Jia Beiaite " is respectively 100*0.1=10,12.5*0.1=1.25 with the cheating degree information corresponding to " newborn good shellfish probiotics ".
C) percent information is presented in storehouse frequency information and user according to corresponding to candidate's rubbish text information, and combines institute State the user corresponding to candidate's rubbish text information and delete percent information, determine the cheating degree information.Here, the user deletes Except percent information refers to that the number of times that the content comprising candidate's rubbish text information that user is issued is deleted accounts for it The ratio of the number of times of all contents comprising candidate's rubbish text information of issue, such as user issues comprising candidate's rubbish altogether Text message such as candidate garbage text model number has m, and it is deleted n in the m model Remove, then it is n/m that the user corresponding to the candidate rubbish text information candidate garbage text, which deletes percent information,.Tool Body, cheating degree determining device 13 can be according to equation below(3)Determine the cheating degree information:
Wherein, d' represents that the user corresponding to candidate's rubbish text information deletes percent information.For example, it is assumed that candidate It is respectively 0.5,0.3 that rubbish text information " Jia Beiaite " deletes percent information with the user corresponding to " newborn good shellfish probiotics ", Then cheating degree determining device 13 can be according to above-mentioned formula(3)Calculating obtains candidate's rubbish text information " Jia Beiaite " and " breast is good Cheating degree information corresponding to shellfish probiotics " is respectively 100*(1+0.5)=150、12.5*(1+0.3)=16.25.
D) percent information is presented in storehouse frequency information and user according to corresponding to candidate's rubbish text information, with reference to described Presentation probabilistic information corresponding to candidate's rubbish text information, determines the cheating degree information.Here, the presentation probabilistic information Refer to the probability that candidate's rubbish text information occurs in the vocabulary included by foregoing knowledge base.Specifically, cheating degree is true Determining device 13 can be by below equation(4)Determine the cheating degree information:
Wherein, α represents the presentation probabilistic information corresponding to candidate's rubbish text information, as described candidate's rubbish text The probability that information occurs in the vocabulary of the knowledge base corresponding to the storehouse frequency information represented by numerical value C, if user waits to described The presentation wish for selecting junk information to be presented is big, and it uses the possibility of non-generic word larger, correspondingly, candidate's rubbish text α corresponding to this information is corresponding smaller;Y represents the cheating degree information.For example, it is assumed that candidate's rubbish text information " Jia Beiai Spy " is respectively 0.1,0.5 with the presentation probabilistic information corresponding to " newborn good shellfish probiotics ", then cheating degree determining device 13 is according to upper State formula(4)It can calculate and obtain candidate's rubbish text information " Jia Beiaite " and the cheating degree letter corresponding to " newborn good shellfish probiotics " Breath is respectively 100*(1/0.1)=1000、12.5*(1/0.5)=25.
2)Word segmentation processing is carried out respectively to candidate's rubbish text information first, believed with obtaining candidate's rubbish text The corresponding one or more participle information of breath;Then, further according to one corresponding to candidate's rubbish text information or many Cheating degree information corresponding to individual participle information, determines the cheating degree information.For example, it is assumed that candidate's determining device 12 is determined just Candidate rubbish text information of the beginning page as corresponding to initial web is " Chongqing red building hospital ", then cheating degree determining device 13 carry out word segmentation processing to candidate's rubbish text information first, and " Chongqing is red by the participle information such as word1 for obtaining corresponding to it Building " and word2 " red building hospital ", it is assumed that cheating degree determining device 13 once determines participle information word1 " Chongqing red building " and word2 Cheating degree information corresponding to " red building hospital " is respectively y1 and y2, then cheating degree determining device 13 can be according to participle information Cheating degree information corresponding to word1 " Chongqing red building " and word2 " red building hospital ", determines candidate's rubbish text information " Chongqing Cheating degree information corresponding to red building hospital ", such as by word1 cheating degree information corresponding with word2 and average value, as Cheating degree information corresponding to candidate's rubbish text information " Chongqing red building hospital ", that is, determine candidate's rubbish text information " Chongqing Cheating degree information corresponding to red building hospital " is(y1+y2)/2.
Here, the work corresponding to one or more participle information that can be according to corresponding to candidate's rubbish text information of the invention Disadvantage degree information, to determine the cheating degree information of candidate's rubbish text information so that can using priori collocations storehouse with And the cheating degree information corresponding to rubbish word, without the candidate rubbish new to this after new candidate's rubbish text information is run into Text message is according to determining during the cheating degree information corresponding to other candidate's rubbish text information to determine it by the way of Corresponding cheating degree information, realizes the beneficial effect for simplifying the cheating degree information for determining candidate's rubbish text information.
Those skilled in the art will be understood that the cheating degree information corresponding to above-mentioned determination candidate's rubbish text information Mode be only for example, the cheating described in other existing or determinations for being likely to occur from now on corresponding to candidate's rubbish text information The mode of degree information is such as applicable to the present invention, should also be included within the scope of the present invention, and wrap by reference herein It is contained in this.
Rubbish determining device 14 is according to the cheating degree information, from one or more of candidate's rubbish text information really One or more rubbish text information corresponding to the fixed initial page, the cheating degree information as described in basis, from one Or filter out candidate's rubbish text information that cheating degree information meets predetermined threshold in multiple candidate's rubbish text information, using as Rubbish text information corresponding to the initial page.For example, it is assumed that cheating degree determining device 13 determine it is as shown in Figure 2 initial Page search result3:" baby have milk powder digestion it is bad, what if_ Baidu is known " candidate's rubbish text information it is " good Bei Aite " is respectively 100* with the cheating degree information corresponding to " newborn good shellfish probiotics "(1/0.1)=1000、12.5*(1/0.5)= 25, then rubbish determining device 14 can be according to the cheating degree information, and determination is initial from described two candidate's rubbish text information One or more candidate's rubbish text information corresponding to page search result3, such as meet predetermined threshold by cheating degree information Candidate's rubbish text information " Jia Beiaite " of value such as 100 is used as the rubbish text information.
Those skilled in the art will be understood that one or more rubbish texts corresponding to the above-mentioned determination initial page The mode of information is only for example, one or many described in other existing or determinations for being likely to occur from now on corresponding to initial page The mode of individual rubbish text information is such as applicable to the present invention, should also be included within the scope of the present invention, and herein to draw It is incorporated herein with mode.
Rubbish text determines constantly to work between each device of equipment 1.Specifically, acquisition device 11 continues Obtain pending initial page;Candidate's determining device 12 persistently determines one or more candidates corresponding to the initial page Rubbish text information;Cheating degree determining device 13 persistently determines the cheating degree information corresponding to candidate's rubbish text information; Rubbish determining device 14 continues according to the cheating degree information, and institute is determined from one or more of candidate's rubbish text information State one or more rubbish text information corresponding to initial page.Here, those skilled in the art will be understood that " lasting " is Refer to rubbish text to determine to distinguish the acquisition for constantly carrying out initial page, candidate's rubbish text letter between each device of equipment 1 Determination, the determination of cheating degree information and the determination of rubbish text information of breath, until rubbish text determines equipment 1 in the long period The interior acquisition for stopping initial page.
Preferably, rubbish text determines that equipment 1 also includes webpage generating device(It is not shown)With offer device(It is not shown). Specifically, webpage generating device generates target pages corresponding with the initial page, wherein, the target pages are included pair The display identification information of at least one in one or more of rubbish text information;Device is provided to provide the target pages To correspondence user.
Specifically, webpage generating device generates the target pages corresponding with the initial page, wherein, the page object Bread contains to the display identification information of at least one in one or more of rubbish text information.Here, the display mark Information includes but is not limited to background color corresponding to the rubbish text information, font color, font size, display mode Deng, such as marked with the background color and/or block diagram that have discrimination with the initial page, or, it is also possible to use the progress of float layer word Mark.Specifically, webpage generating device determines the display identification information corresponding to the rubbish text information first, as according to institute The cheating degree information corresponding to rubbish text information is stated, display mark of the rubbish text information in the target pages is determined Know information, if the cheating degree information corresponding to the rubbish text information is larger, such as exceed predetermined threshold 90, then with red mark Know and marked with square frame, if the cheating degree information corresponding to the rubbish text information is in interval(50,90], then with orange mark Know;Then, display identification information of the webpage generating device according to corresponding to the rubbish text information, enters to the initial page Row renewal is handled, and rubbish text information is identified with its corresponding display identification information as will be described, is generated and described initial The corresponding target pages of the page, wherein, the target pages include in one or more of rubbish text information at least The display identification information of one.
For example, it is assumed that rubbish determining device 14 determines the rubbish text information corresponding to initial page search result3 For " Jia Beiaite ", it is assumed that the page layout background color corresponding to initial page search result3 is light gray, then the page is given birth to It is to have differentiation with the initial page that the display identification information corresponding to rubbish text information " Jia Beiaite " is can determine that into device The color of degree such as Dark grey is marked, or, webpage generating device can also be according to corresponding to rubbish text information " Jia Beiaite " Cheating degree information, determines its corresponding display identification information, the cheating degree letter as corresponding to rubbish text information " Jia Beiaite " Cease for 100, it exceedes predetermined threshold 90, then webpage generating device can determine that aobvious corresponding to rubbish text information " Jia Beiaite " It is to be identified and marked with square frame with red to show identification information;Then, webpage generating device is according to rubbish text information " Jia Beiai Display identification information corresponding to spy " is such as identified with Dark grey, is updated processing to initial page search result3, such as Rubbish text information " Jia Beiaite " is identified with its corresponding display identification information, generated relative with the initial page The target pages answered, as shown in figure 3, wherein, the target pages include in one or more of rubbish text information extremely Few one display identification information.
Those skilled in the art will be understood that the display identification information corresponding to the above-mentioned determination rubbish text information Mode is only for example, the display mark letter described in other existing or determinations for being likely to occur from now on corresponding to rubbish text information The mode of breath is such as applicable to the present invention, should also be included within the scope of the present invention, and be contained in by reference herein This.
Those skilled in the art will be understood that the mode of the above-mentioned generation target pages corresponding with the initial page only For citing, the mode of other generations that are existing or being likely to occur from now on target pages corresponding with the initial page such as may be used Suitable for the present invention, it should also be included within the scope of the present invention, and be incorporated herein by reference herein.
There is provided device by dynamic web page techniques such as ASP, JSP or PHP, or other agreement communication mode, such as http Or the communication protocol such as https, the target pages are supplied to correspondence user, to point out user.
It is highly preferred that webpage generating device includes pattern determining unit is presented(It is not shown)And page generating unit(Do not show Go out).Specifically, work of the pattern determining unit according to corresponding at least one in one or more of rubbish text information is presented Disadvantage degree information, determines the presentation pattern corresponding at least one in one or more of rubbish text information;Page generation is single Member generates the target pages corresponding with the initial page according to the presentation pattern, wherein, the target pages comprising with The presentation pattern is corresponding, to the display identification information of at least one in one or more of rubbish text information.
Specifically, pattern determining unit is presented according to corresponding at least one in one or more of rubbish text information Cheating degree information, determine the presentation pattern corresponding at least one in one or more of rubbish text information, it is such as different The rubbish text information of cheating degree information, corresponding tupe is different, and such as cheating degree information is more than the rubbish text of predetermined threshold This information, can delete and not show.Here, the presentation pattern includes but is not limited to corresponding to rubbish text information as described Position of appearing, presentation color, presentation mode etc..For example, it is assumed that rubbish determining device 14 determines initial page search Rubbish text information corresponding to result3 is " Jia Beiaite ", and degree of cheating determining device 13 determines that its cheating degree information is 100, then pattern determining unit is presented and can determine that presentation pattern corresponding to it is to have with initial page search result3 The color of discrimination such as Dark grey is marked, and is shown;For another example, it is assumed that cheating degree determining device 13 determines rubbish text information It is 1000 for the cheating degree information of " Jia Beiaite ", more than predetermined threshold such as 500, but pattern determining unit is then presented corresponding to it Presentation pattern do not show to be deleted it.
Those skilled in the art will be understood that above-mentioned presentation pattern is only for example, and other are existing or are likely to occur from now on Presentation pattern is such as applicable to the present invention, should also be included within the scope of the present invention, and be contained in by reference herein This.
Those skilled in the art will be understood that the mode of the above-mentioned determination presentation pattern is only for example, other it is existing or The mode that pattern is presented described in the determination being likely to occur from now on is such as applicable to the present invention, should also be included in the scope of the present invention Within, and be incorporated herein by reference herein.
Page generating unit generates the target pages corresponding with the initial page according to the presentation pattern, wherein, The target pages comprising it is corresponding with the presentation pattern, at least one in one or more of rubbish text information Display identification information.Here, page generating unit is according to the presentation schema creation mesh corresponding with the initial page The mode for marking the page is identical or essentially identical with the foregoing webpage generating device mode for generating the target pages, is concise rise See, therefore will not be repeated here, and include by reference and this.
In another preferred embodiment, the above-mentioned rubbish text for being used to determine the rubbish text information in the page can be determined Equipment 1, is combined with existing browser, constitutes a kind of new browser, it is public that existing browser includes such as Microsoft The IE browser of department, the netscape browser of Netscape companies, the Firefox browser of Mozilla companies, Google are public The Chrome browsers of department, the Maxthon browsers for company of roaming, the opera browsers of Opera companies, the 360 of 360 companies Browser, the sogou browser of Sohu.com Inc., tencent TT browser of Tencent etc..
In another preferred embodiment, the above-mentioned rubbish text for being used to determine the rubbish text information in the page can be determined Equipment 1, is combined with existing browser plug-in, constitutes a kind of new browser plug-in, and existing browser plug-in is included such as Flash plug-in units, RealPlayer plug-in units, MMS plug-in units, MIDI staffs plug-in unit, ActiveX plug-in units etc..
In another preferred embodiment, the above-mentioned rubbish text for being used to determine the rubbish text information in the page can be determined Equipment 1, is combined with existing mobile phone browser APP, constitutes a kind of new mobile phone browser APP, existing mobile phone browser App includes such as UC browsers, UCmobile, UEWEB, baidu mobile phone browser, QQ browsers.
In another preferred embodiment, the above-mentioned rubbish text for being used to determine the rubbish text information in the page can be determined Equipment 1, is combined with existing search engine, constitutes a kind of new search engine, and existing search engine includes but is not limited to such as The Google search engine of Google companies, the baidu search engines of baidu company, Baidu are known.
In another preferred embodiment, the above-mentioned rubbish text for being used to determine the rubbish text information in the page can be determined Equipment 1, is combined with existing search engine plug-in unit, constitutes a kind of new search engine plug-in unit, existing to include but is not limited to such as The Google ToolBar of Google companies, the Baidu of baidu company search the search engines such as despot, the MSN ToolBar of Microsoft Plug-in unit.
Fig. 4 shows that the equipment for determining the rubbish text information in the page in accordance with a preferred embodiment of the present invention is shown It is intended to, wherein, rubbish text determines that equipment 1 includes acquisition device 11 ', candidate's determining device 12 ', cheating degree determining device 13 ' With rubbish determining device 14 '.Specifically, acquisition device 11 ' obtains pending initial page;Candidate's determining device 12 ' is in institute State in initial page the character string that detection meets predetermined characteristics of spam, using the character string for meeting predetermined characteristics of spam as One or more candidate's rubbish text information;Cheating degree determining device 13 ' is determined corresponding to candidate's rubbish text information Cheating degree information;Rubbish determining device 14 ' is according to the cheating degree information, from one or more of candidate's rubbish text information The middle one or more rubbish text information determined corresponding to the initial page.Here, acquisition device 11 ', cheating degree are determined Device 13 ' and rubbish determining device 14 ' are identical or essentially identical with the content of corresponding intrument in Fig. 1 embodiments, for simplicity, Therefore will not be repeated here, and include by reference and this.
Specifically, candidate's determining device 12 ' detects the character string for meeting predetermined characteristics of spam in the initial page, with It regard the character string for meeting predetermined characteristics of spam as one or more candidate's rubbish text information.Here, the predetermined rubbish Rubbish feature includes but is not limited to such as:1)Meet the character string of predetermined phrase pattern, such as meet recommendation sentence pattern as " I am used XXX Product, pretty good, you also have a try ", " XXX fat-reducing effects are fine ", " trying, with lower XXX milk powder, to see that people was used well in the past " Deng;2)Meet the character string of predetermined prefix characteristic and/or suffix feature, such as include the character string of prefix and/or suffix such as place name Deng;3)Meet the character string at predetermined rubbish text position, be such as located at the character string of the head and the tail position of paragraph;4)Meet pre- Determine the character string of part of speech combination, such as continuous some contaminations.For example, it is assumed that the acquisition of acquisition device 11 ' is pending Initial page is search result3 as shown in Figure 2:" baby have milk powder digestion it is bad, what if_ Baidu is known ", and The page includes the answer I to IV to the problem, wherein, included in answering I and meet predetermined phrase pattern as " tried to use down XXX milk powder, was shown in that people was used well in the past " character string as " baby digestion it is bad, it may be possible to milk powder problem, trying, it is good to use down Bei Aite milk powder, was shown in that people was used well in the past ", answer in IV comprising character string " the my family baby for meeting predetermined phrase pattern Jia Beiaite milk powder is used, indigestion situation does not occur, parent can have a try ", then candidate's determining device 12 ' is to the initial page When search result3 content of pages information carries out semantic analysis, initial page search result3 are just can detect In include and meet the character string of predetermined characteristics of spam such as " Jia Beiaite ", then candidate's determining device 12 ' is by the character string " Jia Bei Ai Te " as initial page search result3 candidate's rubbish text information.For another example, it is assumed that what acquisition device 11 ' was obtained Pending initial page is initial web, includes character string " having individual Chongqing red building hospital " in the initial page, then waits When selecting the determining device 12 ' to carry out syntactic analysis to initial page initial web content of pages information, word just can detect Symbol string " having individual Chongqing red building hospital " is to meet the predetermined part of speech combination such as character string of continuous some nouns, then candidate's determining device 12 ' using character string " having individual Chongqing red building hospital " as initial page initial web candidate's rubbish text information.
Those skilled in the art will be understood that above-mentioned predetermined characteristics of spam is only for example, and other are existing or may go out from now on Existing predetermined characteristics of spam is such as applicable to the present invention, should also be included within the scope of the present invention, and herein with reference side Formula is incorporated herein.
In a preferred embodiment(With reference to Fig. 4), rubbish text determine equipment 1 include acquisition device 11 ', candidate determine Device 12 ', cheating degree determining device 13 ', rubbish determining device 14 ' and pretreatment unit(It is not shown).Below with reference to Fig. 4 to this Preferred embodiment is described:Specifically, acquisition device 11 ' obtains pending initial page;Candidate's determining device 12 ' is in institute The character string that detection in initial page meets predetermined part of speech combination is stated, the character string for meeting predetermined part of speech is regard as one Or multiple candidate's rubbish text information;Grammar property letter of the pretreatment unit according to corresponding to candidate's rubbish text information One or more of candidate's rubbish text information are pre-processed, to obtain pretreated one or more candidates by breath Rubbish text information;Cheating degree determining device 13 ' determines the cheating degree letter corresponding to pretreated candidate's rubbish text Breath;Cheating degree information of the rubbish determining device 14 ' according to corresponding to pretreated candidate's rubbish text, after pretreatment One or more of candidate's rubbish text information in determine one or more rubbish texts corresponding to the initial page Information.Here, acquisition device 11 ' is identical or essentially identical with the content of corresponding intrument in Fig. 1 embodiments, for simplicity, therefore It will not be repeated here, and include by reference and this.
Specifically, candidate's determining device 12 ' detects the character string for meeting predetermined part of speech combination in the initial page, with It regard the character string for meeting predetermined part of speech as one or more candidate's rubbish text information.For example, it is assumed that acquisition device 11 ' The pending initial page obtained is initial web, and character string is included in the initial page " has individual Chongqing red building doctor Institute ", then when candidate's determining device 12 ' carries out syntactic analysis to initial page initial web content of pages information, just may be used It is to meet the predetermined part of speech combination such as character string of continuous some nouns to detect character string " having individual Chongqing red building hospital ", then candidate Determining device 12 ' is believed character string " having individual Chongqing red building hospital " as initial page initial web candidate's rubbish text Breath.
Grammar property information of the pretreatment unit according to corresponding to candidate's rubbish text information, to one or many Individual candidate's rubbish text information is pre-processed, to obtain pretreated one or more candidate's rubbish text information.Here, The grammar property information refers to whether the position in whole sentence of the candidate's rubbish text information belonging to it meets pair The syntactic structure answered, such as V-O construction, whether candidate's rubbish text information meets the guest in corresponding V-O construction Language, such as subject-predicate phrase, whether candidate's rubbish text information meets subject or object in corresponding subject-predicate phrase. This, the pretreatment includes but is not limited to such as carry out the rubbish text information cutting, trimming.For example, connecting example, it is assumed that Candidate's rubbish text " having individual Chongqing red building hospital " in the initial page initial web that candidate's determining device 12 ' is determined exists Should be the object in V-O construction in sentence belonging to it, but its in the sentence by syntax cutting, then pretreatment unit needs Pruning modes are carried out to candidate's rubbish text " having individual Chongqing red building hospital ", are such as split as " having individual/Chongqing red building doctor Institute ", that is, candidate's rubbish text information after being trimmed is " Chongqing red building hospital ".
Those skilled in the art will be understood that above-mentioned to be pre-processed to one or more of candidate's rubbish text information Mode be only for example, other it is existing or be likely to occur from now on to one or more of candidate's rubbish text information carry out The mode of pretreatment is such as applicable to the present invention, should also be included within the scope of the present invention, and wrap by reference herein It is contained in this.
Cheating degree determining device 13 ' determines the cheating degree information corresponding to pretreated candidate's rubbish text. This, cheating degree determining device 13 ' determine the mode of the cheating degree information corresponding to pretreated candidate's rubbish text with Cheating degree determining device 13 determines that the mode of the cheating degree information corresponding to candidate's rubbish text is identical or basic phase in Fig. 1 Together, for simplicity, thus will not be repeated here, and include by reference and this.
Cheating degree information of the rubbish determining device 14 ' according to corresponding to pretreated candidate's rubbish text, from pre- One or more rubbish corresponding to the initial page are determined in one or more of candidate's rubbish text information after processing Rubbish text message.Here, rubbish determining device 14 ' is true from pretreated one or more of candidate's rubbish text information Rubbish determining device 14 is from described in the mode and Fig. 1 of one or more rubbish text information corresponding to the fixed initial page One or more rubbish text information corresponding to the initial page are determined in one or more candidate's rubbish text information Mode is identical or essentially identical, for simplicity, therefore will not be repeated here, and includes by reference and this.
Fig. 5 shows the method flow for being used to determine the rubbish text information in the page according to a further aspect of the present invention Figure.
Specifically, in step sl, rubbish text determines that equipment 1 obtains pending initial page;In step s 2, rubbish Rubbish text determines that equipment 1 determines one or more candidate's rubbish text information corresponding to the initial page;In step s3, Rubbish text determines that equipment 1 determines the cheating degree information corresponding to candidate's rubbish text information;In step s 4, rubbish text This determination equipment 1 is determined described initial according to the cheating degree information from one or more of candidate's rubbish text information One or more rubbish text information corresponding to the page.Here, rubbish text determines that equipment 1 includes but is not limited to user such as and led to Crossing it by oneself original content displaying or can be supplied to the internet platforms of other users, such as i)For for its login user Information storage space is provided, to realize that the user is uploaded to share its content such as document, video, picture;It can be additionally used in for user Online reading, the network platform of content that downloads, exchange other users are shared or terminal platform be provided, such as Baidu library, beans fourth, Sina love ask, wherein, the terminal platform includes but is not limited to the user equipmenies such as mobile terminal, PC;ii)For being embodied as it The network platform or terminal platform of login user offer message reference, information sharing, information issue or synchronization, such as social network sites, The third party websites such as mhkc, forum, knowledge question sharing platform, space, blog, microblogging.Here, the rubbish text determines to set Standby 1 can be integrated constituted equipment by the network equipment, user equipment or the network equipment and user equipment by network realizes. This, the network equipment includes but is not limited to such as network host, single network server, multiple webserver collection or based on cloud Set of computers of calculating etc. is realized;Or realized by user equipment.Here, cloud is by based on cloud computing(Cloud Computing)A large amount of main frames or the webserver constitute, wherein, cloud computing is one kind of Distributed Calculation, loose by a group One super virtual computer of the computer collection composition of coupling.Here, the user equipment can be any one can with Family carries out the electronic product of man-machine interaction by modes such as keyboard, mouse, touch pad, touch-screen or handwriting equipments, for example, calculate Machine, mobile phone, PDA, palm PC PPC or tablet personal computer etc..The network include but is not limited to internet, wide area network, Metropolitan Area Network (MAN), LAN, VPN, wireless self-organization network(Ad Hoc networks)Deng.Those skilled in the art will be understood that above-mentioned rubbish text This determination equipment 1 is only for example, and other network equipments or user equipment existing or be likely to occur from now on are such as applicable to this Invention, should also be included within the scope of the present invention, and be incorporated herein by reference herein.Here, the network equipment and use Family equipment according to the instruction for being previously set or storing, can carry out the electronics of numerical computations and information processing automatically including a kind of Equipment, its hardware includes but is not limited to microprocessor, application specific integrated circuit (ASIC), programmable gate array(FPGA), numeral at Manage device(DSP), embedded device etc..
For example, when rubbish text determines that equipment 1 is realized by user equipment, it can be obtained by the browser of user's equipment end The accessing page request of family submission is taken, to obtain pending initial page;It is then determined corresponding to the initial page One or more candidate's rubbish text information;Then, then cheating degree information corresponding to candidate's rubbish text information is determined; According to the cheating degree information, determined from one or more of candidate's rubbish text information corresponding to the initial page One or more rubbish text information, the rubbish text information is provided to user equipment by browser, and then is supplied to User.
For example, when rubbish text determines that equipment 1 is realized by the network equipment, it can receive user and be sent by user equipment Accessing page request, and the accessing page request is sent to page server, receive that page server returns with the page The corresponding page of face access request, to obtain pending initial page;It is then determined one corresponding to the initial page Individual or multiple candidate's rubbish text information;Then, then cheating degree information corresponding to candidate's rubbish text information is determined;Root According to the cheating degree information, one corresponding to the initial page is determined from one or more of candidate's rubbish text information Individual or multiple rubbish text information, the rubbish text information is sent to user equipment, such as passes through browsing in user equipment Device shows the rubbish text information, and then is supplied to user.
For example, when rubbish text determines that equipment 1 is realized with user equipment and the network equipment, user equipment can be first Obtain pending initial page;Then, the initial page is sent to the corresponding network equipment by user equipment, set by network The standby one or more candidate's rubbish text information determined corresponding to the initial page;Determine candidate's rubbish text information Corresponding cheating degree information;According to the cheating degree information, determined from one or more of candidate's rubbish text information One or more rubbish text information corresponding to the initial page;Then, the network equipment sends the rubbish text information To user equipment, so that the rubbish text information is supplied into user by user equipment.Also such as, when rubbish text determine equipment 1 by When user equipment and network equipment cooperation are realized, it can also obtain pending initial page first by user equipment and determine described One or more candidate's rubbish text information corresponding to initial page;Then, by user equipment by candidate's rubbish text information Send to the network equipment, as the cheating degree information corresponding to the network equipment determines candidate's rubbish text information;According to described Cheating degree information, determines one or many corresponding to the initial page from one or more of candidate's rubbish text information Individual rubbish text information;Then, then by the network equipment rubbish text information is sent to user equipment, to be incited somebody to action by user equipment The rubbish text information is supplied to user.Here, it will be appreciated by those skilled in the art that above-mentioned user equipment and the network equipment are matched somebody with somebody When conjunction realizes that rubbish text determines equipment 1, those skilled in the art can carry out any to the division of labor of user equipment and the network equipment Appropriate change, the change is all contained within protection scope of the present invention.
Specifically, in step sl, rubbish text determines that equipment 1 is carried by third party devices such as browser, search engines The application programming interfaces of confession(API), obtain pending initial page;Or, obtained by dynamic web page techniques such as JSP, ASP The inquiry operation that user is submitted by user equipment, is such as clicked in the page and linked, connect with the application program provided by browser Mouthful, the page pointed by the link is obtained, to obtain pending initial page;Or, pass through the dynamic web page skill such as JSP, ASP Art, obtains the search sequence that user is inputted by user equipment, then the search sequence is submitted into search engine, and receives search The search result corresponding with the search sequence that engine is fed back, to be used as pending initial page;Or by http, Https etc. arranges communication mode, obtains pending initial page.For example, user A by its PC equipment in search engine such as hundred Degree knows " it is bad that baby has milk powder digestion to input keyword in the search column of search what if", search button is clicked on, then is existed In step S1, rubbish text determines that equipment 1, by dynamic web page techniques such as ASP, JSP, gets the inquiry sequence of user A inputs Row, and be based on the search sequence submitting searching request, the application programming interfaces provided by search engine to search engine (API)Obtaining search engine, " it is bad that baby has milk powder digestion what if according to the keyword" carry out matching inquiry obtain with " it is bad that baby has milk powder digestion what if for the keyword" one or more search results for matching are such as:search result1:" baby have milk powder digestion it is bad, what if_ child-bearing question and answer _ baby tree ", search result2:" baby sucks the breast Powder digestion it is bad what if- child-bearing question and answer-child-bearing net ", search result3:" baby have milk powder digestion it is bad, how Do_ Baidu is known ", search result4:" what is it about sucking baby's indigestion_ Baidu is known " etc., using as Pending initial page.
Those skilled in the art will be understood that the mode of the pending initial page of above-mentioned acquisition is only for example, and other are existing Or the mode of the pending initial page of acquisition that is likely to occur from now on be such as applicable to the present invention, should also be included in the present invention Within protection domain, and it is incorporated herein by reference herein.
In step s 2, rubbish text determines that equipment 1 determines one or more candidate's rubbish corresponding to the initial page Rubbish text message.Here, the rubbish text information refers to non-vital data present in the page, risk information etc., such as user Recommend certain destination object during the problem of answering other users, and the destination object might not can answer the problem, then the mesh It is rubbish text information to mark object.Wherein, the destination object refers to that people can meet consumer or use to what market was provided Any article of certain demand of family or service.Here, in step s 2, rubbish text determines that equipment 1 determines candidate's rubbish The mode of text message includes but is not limited to following at least any one:
1)User's operating characteristics information according to corresponding to character string in the content of pages information of the initial page, it is determined that One or more candidate's rubbish text information corresponding to the initial page.Specifically, in step s 2, rubbish text is determined Equipment 1 first by such as to the initial page carry out html tag analysis, or, by based on wrapper wrapper's Abstracting method, obtains the content of pages information of the initial page;Then, the content of pages information is carried out at semantic analysis Reason, to obtain the text string included in the content of pages information of the initial page;Then, further according to corresponding to the character string User's operating characteristics information, determine one or more candidate's rubbish text information corresponding to the initial page, such as which Text string belongs to candidate's rubbish text.Here, user's operating characteristics information includes but is not limited to such as:i)The character string institute Corresponding user repeats behavioural information;ii)User's act of deleting information corresponding to the character string.
For example, it is assumed that in step sl, rubbish text determines that the pending initial page that equipment 1 is obtained is search result2:" baby eat milk powder digestion it is bad what if- child-bearing question and answer-child-bearing net ", and in step s 2, rubbish text is determined Equipment 1 carries out html tag analysis to it first, and carries out language to initial page search result2 content of pages information Justice analyzing and processing, obtain its corresponding character string include as " having some probiotics to baby ", " my family baby drinks Heng Shi milk always Powder, does not occur indigestion phenomenon, and parent can have a try ", " precious milk powder is thought by the Switzerland that can have a try, good absorption, and not getting angry being capable of strong treasured Precious enteron aisle and strengthen immunity ", " not being milk powder problem ", " such situation, which occurs, in baby proves this unsuitable milk powder, can Plate is changed to try ", it is assumed that answer the content comprising text string " Heng Shi milk powder " and correspond to same user, and the user is multiple The content for including text string " Heng Shi milk powder " is answered, illustrates that the possibility of user's Malicious recommendation " Heng Shi milk powder " is larger, then exists In step S2, rubbish text determines that " Heng Shi milk powder " can be used as candidate's rubbish text information by equipment 1;For another example, it is assumed that return Answer the user comprising text string " precious milk powder is thought by Switzerland " content and there is the behavior repeatedly answered and deleted in turn, illustrate text string The cheating suspicion of " precious milk powder is thought by Switzerland " is larger, then in step s 2, and rubbish text determines that equipment 1 can be by " precious milk powder be thought by Switzerland " It is used as candidate's rubbish text information.
For another example, it is assumed that in step sl, rubbish text determines that the pending initial page that equipment 1 is obtained is such as Fig. 2 institutes The search result3 shown:" baby have milk powder digestion it is bad, what if_ Baidu is known ", and the page include it is right below The answer I to IV of the problem:
Ⅰ:Baby's digestion is bad, it may be possible to because the problem of milk powder, trying, with lower Jia Beiaite milk powder, to see that people was used in the past Well;
Ⅱ:It is slow that stomach absorbs
Baby's indigestion is solved the problems, such as, paediatrics specialist is often it is recommended that newborn good shellfish probiotics, newborn good shellfish probiotics can make Intestines and stomach produce a variety of organic acids and digestive ferment, help baby to assimilate food, improve a poor appetite, the lactose of generation, acetic acid etc., can be with Strengthen the intestines peristalsis of baby, promote digestion.
Ⅲ:First change milk powder to have a try, probiotics can also be usually had some to baby improves stomach, digestant;
Ⅳ:My family baby uses Jia Beiaite milk powder, indigestion situation does not occur, and parent can have a try.
In step s 2, rubbish text determines that equipment 1 carries out semantic analysis processing to above-mentioned answer I to IV first, obtains Its corresponding character string includes such as " trying with lower Jia Beiaite milk powder ", " solves the problems, such as baby's indigestion, paediatrics specialist often pushes away What is recommended is newborn good shellfish probiotics ", " first change milk powder to have a try, probiotics can also be usually had some to baby improves stomach, helps and disappears Change ", " my family baby uses Jia Beiaite milk powder, indigestion situation does not occur, and parent can have a try ", it is assumed that the above-mentioned He of answer I IV comes from same user, and the user equally recommend in other answers on " baby eats milk powder indigestion " problem it is " good Bei Aite " milk powder, illustrates that the possibility of user's Malicious recommendation " Jia Beiaite " milk powder is larger, then in step s 2, rubbish text " Jia Beiaite " can be used as candidate's rubbish text information by this determination equipment 1;For another example, it is assumed that answer and include above-mentioned answer II In the user of text " newborn good shellfish probiotics " content there is the behavior deleted in turn of repeatedly answering, illustrate text string " Switzerland Think precious milk powder " cheating suspicion it is larger, then in step s 2, rubbish text determine equipment 1 can using " newborn good shellfish probiotics " as Candidate's rubbish text information.
Those skilled in the art will be understood that above-mentioned user's operating characteristics information is only for example, and other are existing or from now on may be used The user's operating characteristics information that can occur such as is applicable to the present invention, should also be included within the scope of the present invention, and herein It is incorporated herein by reference.
2)According to the character string in the content of pages information of the initial page, matched in rubbish text information bank Inquiry, to obtain one or more candidate's rubbish text information corresponding to the initial page.For example, example is connected, in step In S2, rubbish text determines that equipment 1 can be according to the character string in initial page search result2 content of pages information such as " having some probiotics to baby ", " my family baby drinks Heng Shi milk powder always, does not occur indigestion phenomenon, and parent can have a try ", " it can try Try Switzerland and think precious milk powder, good absorption, not getting angry being capable of strong baby's enteron aisle and strengthen immunity ", " not being milk powder problem ", " such situation, which occurs, in baby proves this unsuitable milk powder, can try to change plate ", in rubbish text information bank Matching inquiry is carried out, to obtain one or more candidate's rubbish text information such as " Heng Shi milk corresponding to the initial page Powder ", " precious milk powder is thought by Switzerland ".Here, the rubbish text information bank can be determined in equipment 1 positioned at rubbish text, it may be alternatively located at In the other equipment being connected with rubbish text equipment 1 by network, such as server.
Those skilled in the art will be understood that one or more candidate's rubbish corresponding to the above-mentioned determination initial page The mode of text message is only for example, one described in other existing or determinations for being likely to occur from now on corresponding to initial page Or the mode of multiple candidate's rubbish text information is such as applicable to the present invention, it should also be included within the scope of the present invention, and It is incorporated herein by reference herein.
In step s3, rubbish text determines that equipment 1 determines the cheating degree letter corresponding to candidate's rubbish text information Breath.Here, cheating degree message reflection candidate's rubbish text information belongs to the degree of non-vital data and/or had The degree of risk, when the cheating degree information corresponding to candidate's rubbish text information is bigger, illustrates that it belongs to non-vital data Degree is bigger and/or degree of with risk is higher.Here, in step s3, rubbish text determines that equipment 1 determines the candidate The mode of cheating degree information corresponding to rubbish text information includes but is not limited to following at least any one:
1)Percent information is presented in storehouse frequency information and user according to corresponding to candidate's rubbish text information, it is determined that described Cheating degree information.Here, in step s3, rubbish text determines equipment 1 according to corresponding to candidate's rubbish text information Storehouse frequency information and user are presented percent information and determine that the mode of the cheating degree information includes but is not limited to following at least any one:
A) in step s3, rubbish text determines that equipment 1 can be according to equation below(5)Determine the cheating degree information:
Wherein, C represents the storehouse frequency information corresponding to candidate's rubbish text information, and what is such as occurred in knowledge base includes The quantity of the model of candidate's rubbish text information, here, the knowledge base includes the website corresponding to the initial page Text database, such as forum/question and answer types of web pages, its corresponding knowledge base be correspondence website include user hair Cloth model and the text database for answering model;BiRepresent that ratio is presented on the user of candidate's rubbish text information in user i Occur the ratio of the text of candidate's rubbish text information comprising as described in example information, all texts issued such as user i, The presentation wish degree for being user's presentation to candidate's rubbish text information is expressed, such as when summation numerical value is smaller, illustrates to present Wish is bigger, correspondingly, BiIt is bigger, when summation numerical value is bigger, illustrate that presentation wish is smaller, correspondingly, BiIt is smaller;N represents issue The total number of users amount of text comprising candidate's rubbish text information;Y ' represents the cheating degree information.For example, it is assumed that in step In rapid S2, rubbish text determines that equipment 1 determines initial page search result3 as shown in Figure 2:" baby has milk powder digestion It is bad, what if_ Baidu is known " candidate's rubbish text information be " Jia Beiaite " and " newborn good shellfish probiotics ", it is assumed that initially Website text database corresponding to page search result3(That is knowledge base)For post database3, wherein, candidate C numerical value corresponding to rubbish text information " Jia Beiaite " is 1000, has during 3 users post and occurs in that " Jia Beiaite ", And corresponding BiNumerical value is respectively 1/2,1/3,1/5, and the C numerical value corresponding to candidate's rubbish text information " newborn good shellfish probiotics " For 500, have during 2 users post and occur in that " newborn good shellfish probiotics ", and corresponding BiNumerical value is respectively 1/15,1/25, then In step s3, rubbish text determines equipment 1 according to above-mentioned formula(5), can calculate and obtain candidate's rubbish text information " Jia Beiai Spy " is respectively 100,12.5 with the cheating degree information corresponding to " newborn good shellfish probiotics ".
B) percent information is presented in storehouse frequency information and user according to corresponding to candidate's rubbish text information, and combines institute The page subject matter information of initial page is stated, the cheating degree information is determined.Specifically, in step s3, rubbish text determines to set Standby 1 carries out word segmentation processing by the content of pages information to the initial page first, obtains corresponding to the initial page Multiple keywords, then carry out statistical disposition to the plurality of keyword, such as will appear from the most keyword of number of times as described first The page subject matter information of the beginning page;Then, in step s3, rubbish text determines that equipment 1 determines the page subject matter information institute Corresponding type of theme information, such as whether being the presentation page on article and/or service, to determine the page subject matter information It is the presentation page on article and/or service on the adjusting parameter of the cheating degree information, such as the page subject matter information as described in During face, then adjusting parameter d=1 of the page subject matter information on the cheating degree information, when the page subject matter information is not to close When article and/or the presentation page of service, then adjusting parameter d ∈ of the page subject matter information on the cheating degree information (0,1), here, adjusting parameter d numerical value can be predetermined or be obtained by machine learning;Then, in step S3 In, rubbish text determines that equipment 1 can be according to below equation(6)Determine the cheating degree:
For example, it is assumed that in step s3, rubbish text determines that equipment 1 determines initial page search result3:" baby What if having milk powder digestion bad_ Baidu is known " page body information for " baby have milk powder digestion it is bad the reason for ", and It is not the presentation page on article and/or service, then in step s3, rubbish text determines that equipment 1 can determine that page master Adjusting parameter such as d=0.1 of the information on the cheating degree information is inscribed, then in step s3, rubbish text determines the basis of equipment 1 Above-mentioned formula(6)It can calculate and obtain candidate's rubbish text information " Jia Beiaite " and the cheating degree corresponding to " newborn good shellfish probiotics " Information is respectively 100*0.1=10,12.5*0.1=1.25.
C) percent information is presented in storehouse frequency information and user according to corresponding to candidate's rubbish text information, and combines institute State the user corresponding to candidate's rubbish text information and delete percent information, determine the cheating degree information.Here, the user deletes Except percent information refers to that the number of times that the content comprising candidate's rubbish text information that user is issued is deleted accounts for it The ratio of the number of times of all contents comprising candidate's rubbish text information of issue, such as user issues comprising candidate's rubbish altogether Text message such as candidate garbage text model number has m, and it is deleted n in the m model Remove, then it is n/m that the user corresponding to the candidate rubbish text information candidate garbage text, which deletes percent information,.Tool Body, in step s3, rubbish text determines that equipment 1 can be according to equation below(7)Determine the cheating degree information:
Wherein, d' represents that the user corresponding to candidate's rubbish text information deletes percent information.For example, it is assumed that candidate It is respectively 0.5,0.3 that rubbish text information " Jia Beiaite " deletes percent information with the user corresponding to " newborn good shellfish probiotics ", Then in step s3, rubbish text determines that equipment 1 can be according to above-mentioned formula(7)Calculating obtains candidate's rubbish text information " Jia Bei Ai Te " is respectively 100* with the cheating degree information corresponding to " newborn good shellfish probiotics "(1+0.5)=150、12.5*(1+0.3)= 16.25。
D) percent information is presented in storehouse frequency information and user according to corresponding to candidate's rubbish text information, with reference to described Presentation probabilistic information corresponding to candidate's rubbish text information, determines the cheating degree information.Here, the presentation probabilistic information Refer to the probability that candidate's rubbish text information occurs in the vocabulary included by foregoing knowledge base.Specifically, in step S3 In, rubbish text determines that equipment 1 can be by below equation(8)Determine the cheating degree information:
Wherein, α represents the presentation probabilistic information corresponding to candidate's rubbish text information, as described candidate's rubbish text The probability that information occurs in the vocabulary of the knowledge base corresponding to the storehouse frequency information represented by numerical value C, if user waits to described The presentation wish for selecting junk information to be presented is big, and it uses the possibility of non-generic word larger, correspondingly, candidate's rubbish text α corresponding to this information is corresponding smaller;Y represents the cheating degree information.For example, it is assumed that candidate's rubbish text information " Jia Beiai Spy " is respectively 0.1,0.5 with the presentation probabilistic information corresponding to " newborn good shellfish probiotics ", then in step s3, rubbish text is true Locking equipment 1 is according to above-mentioned formula(8)It can calculate and obtain candidate's rubbish text information " Jia Beiaite " and " newborn good shellfish probiotics " institute Corresponding cheating degree information is respectively 100*(1/0.1)=1000、12.5*(1/0.5)=25.
2)Word segmentation processing is carried out respectively to candidate's rubbish text information first, believed with obtaining candidate's rubbish text The corresponding one or more participle information of breath;Then, further according to one corresponding to candidate's rubbish text information or many Cheating degree information corresponding to individual participle information, determines the cheating degree information.For example, it is assumed that in step s 2, rubbish text Determine that equipment 1 determines that candidate rubbish text information of the initial page as corresponding to initial web is " Chongqing red building hospital ", then In step s3, rubbish text determines that equipment 1 carries out word segmentation processing to candidate's rubbish text information first, obtains corresponding to it Participle information such as word1 " Chongqing red building " and word2 " red building hospital ", it is assumed that in step s3, rubbish text determines equipment 1 Once determined that the cheating degree information corresponding to participle information word1 " Chongqing red building " and word2 " red building hospital " was respectively y1 and y2, Then in step s3, rubbish text determines that equipment 1 can be according to participle information word1 " Chongqing red building " and word2 " red building hospital " Corresponding cheating degree information, determines the cheating degree information corresponding to candidate's rubbish text information " Chongqing red building hospital ", such as will Word1 cheating degree information corresponding with word2 and average value, be used as candidate's rubbish text information " Chongqing red building hospital " institute Corresponding cheating degree information, that is, determine that the cheating degree information corresponding to candidate's rubbish text information " Chongqing red building hospital " is(y1+ y2)/2.
Here, the work corresponding to one or more participle information that can be according to corresponding to candidate's rubbish text information of the invention Disadvantage degree information, to determine the cheating degree information of candidate's rubbish text information so that can using priori collocations storehouse with And the cheating degree information corresponding to rubbish word, without the candidate rubbish new to this after new candidate's rubbish text information is run into Text message is according to determining during the cheating degree information corresponding to other candidate's rubbish text information to determine it by the way of Corresponding cheating degree information, realizes the beneficial effect for simplifying the cheating degree information for determining candidate's rubbish text information.
Those skilled in the art will be understood that the cheating degree information corresponding to above-mentioned determination candidate's rubbish text information Mode be only for example, the cheating described in other existing or determinations for being likely to occur from now on corresponding to candidate's rubbish text information The mode of degree information is such as applicable to the present invention, should also be included within the scope of the present invention, and wrap by reference herein It is contained in this.
In step s 4, rubbish text determines equipment 1 according to the cheating degree information, from one or more of candidate's rubbish One or more rubbish text information corresponding to the initial page are determined in rubbish text message, such as the cheating degree letter according to Breath, filters out candidate's rubbish text that cheating degree information meets predetermined threshold from one or more of candidate's rubbish text information This information, to be used as the rubbish text information corresponding to the initial page.For example, it is assumed that in step s3, rubbish text is true Locking equipment 1 determines initial page search result3 as shown in Figure 2:" baby have milk powder digestion it is bad, what if_ Baidu Know " candidate's rubbish text information " Jia Beiaite " and " newborn good shellfish probiotics " corresponding to cheating degree information be respectively 100* (1/0.1)=1000、12.5*(1/0.5)=25, then in step s 4, rubbish text determines that equipment 1 can be believed according to the cheating degree Breath, is determined one or more corresponding to initial page search result3 from described two candidate's rubbish text information Candidate's rubbish text information, such as meets cheating degree information candidate's rubbish text information " Jia Beiaite " of predetermined threshold such as 100 It is used as the rubbish text information.
Those skilled in the art will be understood that one or more rubbish texts corresponding to the above-mentioned determination initial page The mode of information is only for example, one or many described in other existing or determinations for being likely to occur from now on corresponding to initial page The mode of individual rubbish text information is such as applicable to the present invention, should also be included within the scope of the present invention, and herein to draw It is incorporated herein with mode.
Rubbish text determines constantly to work between each step of equipment 1.Specifically, in step sl, rubbish Rubbish text determines that equipment 1 persistently obtains pending initial page;In step s 2, rubbish text determines the persistently determination of equipment 1 One or more candidate's rubbish text information corresponding to the initial page;In step s3, rubbish text determines that equipment 1 is held The continuous cheating degree information determined corresponding to candidate's rubbish text information;In step s 4, rubbish text determines that equipment 1 continues According to the cheating degree information, determined from one or more of candidate's rubbish text information corresponding to the initial page One or more rubbish text information.Here, those skilled in the art will be understood that " lasting " refers to that rubbish text determines equipment 1 Each step between constantly carry out the acquisition of initial page, the determination of candidate's rubbish text information, cheating degree information respectively Determination and rubbish text information determination, until rubbish text determines that equipment 1 stops obtaining for initial page in a long time Take.
Preferably, rubbish text determines that equipment 1 also includes step S5(It is not shown)With step S6(It is not shown).Specifically, In step s 5, rubbish text determines that equipment 1 generates the target pages corresponding with the initial page, wherein, the target The page is included to the display identification information of at least one in one or more of rubbish text information;In step s 6, rubbish Text determines that the target pages are supplied to correspondence user by equipment 1.
Specifically, in step s 5, rubbish text determines that equipment 1 generates the page object corresponding with the initial page Face, wherein, the target pages are included to the display identification information of at least one in one or more of rubbish text information. Here, the display identification information includes but is not limited to background color, font color, the word corresponding to the rubbish text information Body size, display mode etc., are such as marked with the background color and/or block diagram that have discrimination with the initial page, or, it is also possible to use Float layer word is identified.Specifically, in step s 5, rubbish text determines that equipment 1 determines the rubbish text letter first The corresponding display identification information of breath, the cheating degree information corresponding to the rubbish text information according to determines the rubbish Display identification information of the text message in the target pages, if the cheating degree information corresponding to the rubbish text information It is larger, such as exceed predetermined threshold 90, then identified and marked with square frame with red, if the cheating corresponding to the rubbish text information Information is spent in interval(50,90], then with orange mark;Then, in step s 5, rubbish text determines equipment 1 according to the rubbish Display identification information corresponding to rubbish text message, processing is updated to the initial page, and rubbish text is believed as will be described Breath is identified with its corresponding display identification information, generates the target pages corresponding with the initial page, wherein, it is described Target pages are included to the display identification information of at least one in one or more of rubbish text information.
For example, it is assumed that in step s 4, rubbish text determines that equipment 1 is determined corresponding to initial page search result3 Rubbish text information be " Jia Beiaite ", it is assumed that page layout background color corresponding to initial page search result3 is shallow Grey, then in step s 5, rubbish text determine that equipment 1 can determine that the display mark corresponding to rubbish text information " Jia Beiaite " It is that the color such as Dark grey for having discrimination with the initial page is marked to know information, or, in step s 5, rubbish text is determined Equipment 1 can also determine its corresponding display mark letter according to the cheating degree information corresponding to rubbish text information " Jia Beiaite " Breath, the cheating degree information as corresponding to rubbish text information " Jia Beiaite " is 100, and it exceedes predetermined threshold 90, then in step In S5, rubbish text determines that equipment 1 can determine that the display identification information corresponding to rubbish text information " Jia Beiaite " is with red Colour code is known and marked with square frame;Then, in step s 5, rubbish text determines equipment 1 according to rubbish text information " Jia Beiai Display identification information corresponding to spy " is such as identified with Dark grey, is updated processing to initial page search result3, such as Rubbish text information " Jia Beiaite " is identified with its corresponding display identification information, generated relative with the initial page The target pages answered, as shown in figure 3, wherein, the target pages include in one or more of rubbish text information extremely Few one display identification information.
Those skilled in the art will be understood that the display identification information corresponding to the above-mentioned determination rubbish text information Mode is only for example, the display mark letter described in other existing or determinations for being likely to occur from now on corresponding to rubbish text information The mode of breath is such as applicable to the present invention, should also be included within the scope of the present invention, and be contained in by reference herein This.
Those skilled in the art will be understood that the mode of the above-mentioned generation target pages corresponding with the initial page only For citing, the mode of other generations that are existing or being likely to occur from now on target pages corresponding with the initial page such as may be used Suitable for the present invention, it should also be included within the scope of the present invention, and be incorporated herein by reference herein.
In step s 6, rubbish text determines equipment 1 by dynamic web page techniques such as ASP, JSP or PHP, or other are about The target pages are supplied to correspondence user, to point out user by fixed communication mode, such as http or https communication protocols.
It is highly preferred that step S5 includes step S51(It is not shown)With step S52(It is not shown).Specifically, in step S51 In, rubbish text determines cheating degree letter of the equipment 1 according to corresponding at least one in one or more of rubbish text information Breath, determines the presentation pattern corresponding at least one in one or more of rubbish text information;In step S52, rubbish Text determines that equipment 1, according to the presentation pattern, generates the target pages corresponding with the initial page, wherein, the mesh Mark the page comprising it is corresponding with the presentation pattern, to the display of at least one in one or more of rubbish text information Identification information.
Specifically, in step s 51, rubbish text determines equipment 1 according in one or more of rubbish text information Cheating degree information corresponding at least one, determine be in corresponding at least one in one or more of rubbish text information Existing pattern, such as the rubbish text information of different cheating degree information, corresponding tupe are different, and such as cheating degree information is more than predetermined The rubbish text information of threshold value, can delete and not show.Here, the presentation pattern includes but is not limited to rubbish text as described Position of appearing, presentation color, presentation mode corresponding to information etc..For example, it is assumed that in step s 4, rubbish text determines equipment 1 determines that the rubbish text information corresponding to initial page search result3 is " Jia Beiaite ", and in step s3, rubbish Text determines that equipment 1 determines that its cheating degree information is 100, then in step s 51, rubbish text determines that equipment 1 can determine that its institute Corresponding presentation pattern is to there is the color such as Dark grey of discrimination to mark with initial page search result3, and to be shown Show;For another example, it is assumed that in step s3, rubbish text determines that equipment 1 determines the cheating degree that rubbish text information is " Jia Beiaite " Information is 1000, more than predetermined threshold such as 500, then in step s 51, but rubbish text determines presentation of the equipment 1 corresponding to it Pattern is not shown to be deleted it.
Those skilled in the art will be understood that above-mentioned presentation pattern is only for example, and other are existing or are likely to occur from now on Presentation pattern is such as applicable to the present invention, should also be included within the scope of the present invention, and be contained in by reference herein This.
Those skilled in the art will be understood that the mode of the above-mentioned determination presentation pattern is only for example, other it is existing or The mode that pattern is presented described in the determination being likely to occur from now on is such as applicable to the present invention, should also be included in the scope of the present invention Within, and be incorporated herein by reference herein.
In step S52, rubbish text determines that equipment 1, according to the presentation pattern, is generated relative with the initial page The target pages answered, wherein, the target pages comprising it is corresponding with the presentation pattern, to one or more of rubbish The display identification information of at least one in text message.Here, in step S52, rubbish text determines that equipment 1 is according to described The mode of the existing schema creation target pages corresponding with the initial page with it is foregoing in step s 5, rubbish text determination is set The mode of the standby 1 generation target pages is identical or essentially identical, for simplicity, therefore will not be repeated here, and with reference Mode is included and this.
Fig. 6 shows the method stream for being used to determine the rubbish text information in the page in accordance with a preferred embodiment of the present invention Cheng Tu.
Wherein, the method comprising the steps of S1 ', step S2 ', step S3 ' and step S4 '.Specifically, in step S1 ', rubbish Rubbish text determines that equipment 1 obtains pending initial page;In step S2 ', rubbish text determines equipment 1 in the initial page Detection meets the character string of predetermined characteristics of spam in face, regard the character string for meeting predetermined characteristics of spam as one or many Individual candidate's rubbish text information;In step S3 ', rubbish text determines that equipment 1 determines that candidate's rubbish text information institute is right The cheating degree information answered;In step S4 ', rubbish text determines equipment 1 according to the cheating degree information, from one or many One or more rubbish text information corresponding to the initial page are determined in individual candidate's rubbish text information.Here, step S1 ', step S3 ' and step S4 ' are identical or essentially identical with the content of correspondence step in Fig. 5 embodiments, for simplicity, therefore This is repeated no more, and is included by reference and this.
Specifically, in step S2 ', rubbish text determines that equipment 1 is detected in the initial page and meets predetermined rubbish The character string of feature, regard the character string for meeting predetermined characteristics of spam as one or more candidate's rubbish text information. Here, the predetermined characteristics of spam includes but is not limited to such as:1)Meet the character string of predetermined phrase pattern, such as meet recommendation sentence mould Formula such as " my used XXX products, pretty good, you also have a try ", " XXX fat-reducing effects are fine ", " tries, with lower XXX milk powder, in the past to see People is used well " etc.;2)Meet the character string of predetermined prefix characteristic and/or suffix feature, such as comprising prefix and/or suffix such as Character string of place name etc.;3)Meet the character string at predetermined rubbish text position, be such as located at the word of the head and the tail position of paragraph Symbol string;4)Meet the character string of predetermined part of speech combination, such as continuous some contaminations.For example, it is assumed that in step S1 ', Rubbish text determines that the pending initial page that equipment 1 is obtained is search result3 as shown in Figure 2:" baby has milk Powder digestion is bad what if_ Baidu is known ", and the page includes the answer I to IV to the problem, wherein, wrapped in answering I Containing meet predetermined phrase pattern such as " trying, with lower XXX milk powder, to see that people was used well in the past " character string as " baby digestion not It is good, it may be possible to milk powder problem, try, with lower Jia Beiaite milk powder, to see that people was used well in the past ", included in answering IV and meet pre- Determine the character string " my family baby uses Jia Beiaite milk powder, indigestion situation does not occur, and parent can have a try " of statement pattern, then exist In step S2 ', rubbish text determines that 1 couple of initial page search result3 of equipment content of pages information carries out semantic point During analysis, it just can detect and include the character string for meeting predetermined characteristics of spam in initial page search result3 as " good Bei Aite ", then in step S2 ', rubbish text determines that equipment 1 ' regard the character string " Jia Beiaite " as initial page Search result3 candidate's rubbish text information.For another example, it is assumed that in step S1 ', rubbish text determines what equipment 1 was obtained Pending initial page is initial web, includes character string " having individual Chongqing red building hospital " in the initial page, then exists In step S2 ', rubbish text determines that 1 couple of initial page initial web of equipment content of pages information carries out syntactic analysis When, just can detect character string " having individual Chongqing red building hospital " is to meet predetermined part of speech combination such as the character of continuous some nouns String, then in step S2 ', rubbish text determines that equipment 1 regard character string " having individual Chongqing red building hospital " as initial page Initial web candidate's rubbish text information.
Those skilled in the art will be understood that above-mentioned predetermined characteristics of spam is only for example, and other are existing or may go out from now on Existing predetermined characteristics of spam is such as applicable to the present invention, should also be included within the scope of the present invention, and herein with reference side Formula is incorporated herein.
In a preferred embodiment(With reference to Fig. 6), wherein, the method comprising the steps of S1 ', step S2 ', step S3 ', step Rapid S4 ' and step S7 '(It is not shown).The preferred embodiment is described below with reference to Fig. 6:Specifically, in step S1 ', Rubbish text determines that equipment 1 obtains pending initial page;In step S2 ', rubbish text determines equipment 1 described initial Detection meets the character string of predetermined part of speech combination in the page, using the character string for meeting predetermined part of speech as one or more Candidate's rubbish text information;Grammar property information of the pretreatment unit according to corresponding to candidate's rubbish text information, to institute State one or more candidate's rubbish text information to be pre-processed, to obtain pretreated one or more candidate's rubbish texts Information;In step S3 ', rubbish text determines that equipment 1 determines the cheating corresponding to pretreated candidate's rubbish text Spend information;In step S4 ', rubbish text determines work of the equipment 1 according to corresponding to pretreated candidate's rubbish text Disadvantage degree information, is determined corresponding to the initial page from pretreated one or more of candidate's rubbish text information One or more rubbish text information.Here, step S1 ' is identical or essentially identical with the content of correspondence step in Fig. 5 embodiments, For simplicity, thus will not be repeated here, and include by reference and this.
Specifically, in step S2 ', rubbish text determines that equipment 1 is detected in the initial page and meets predetermined part of speech The character string of combination, regard the character string for meeting predetermined part of speech as one or more candidate's rubbish text information.For example, Assuming that in step S1 ', rubbish text determines that the pending initial page that equipment 1 is obtained is initial web, the initial page Include character string " having individual Chongqing red building hospital " in face, then in step S2 ', rubbish text determines 1 pair of initial page of equipment When face initial web content of pages information carries out syntactic analysis, character string " having individual Chongqing red building hospital " just can detect To meet the predetermined part of speech combination such as character string of continuous some nouns, then in step S2 ', rubbish text determines equipment 1 by word String " having individual Chongqing red building hospital " is accorded with as initial page initial web candidate's rubbish text information.
In step S7 ', rubbish text determines grammar property of the equipment 1 according to corresponding to candidate's rubbish text information One or more of candidate's rubbish text information are pre-processed by information, to obtain pretreated one or more times Select rubbish text information.Here, the grammar property information refers to whole sentence of the candidate's rubbish text information belonging to it Whether the position in son meets corresponding syntactic structure, such as V-O construction, and whether candidate's rubbish text information meets Object in corresponding V-O construction, such as subject-predicate phrase, whether candidate's rubbish text information meets corresponding subject-predicate Subject or object in structure.Here, the pretreatment includes but is not limited to such as carry out cutting to the rubbish text information, repaiied Cut.For example, connecting example, it is assumed that in step S2 ', rubbish text is determined in the initial page initial web that equipment 1 is determined Sentence of the candidate's rubbish text " having individual Chongqing red building hospital " belonging to it in should be object in V-O construction, but it is at this By syntax cutting in sentence, then in step S7 ', rubbish text determines that equipment 1 needs " to have individual Chongqing to candidate's rubbish text Red building hospital " carries out pruning modes, is such as split as " having individual/Chongqing red building hospital ", that is, candidate's rubbish after being trimmed Text message is " Chongqing red building hospital ".
Those skilled in the art will be understood that above-mentioned to be pre-processed to one or more of candidate's rubbish text information Mode be only for example, other it is existing or be likely to occur from now on to one or more of candidate's rubbish text information carry out The mode of pretreatment is such as applicable to the present invention, should also be included within the scope of the present invention, and wrap by reference herein It is contained in this.
In step S3 ', rubbish text determines that equipment 1 determines the work corresponding to pretreated candidate's rubbish text Disadvantage degree information.Here, in step S2 ', rubbish text determines that equipment 1 determines that pretreated candidate's rubbish text institute is right In the mode and Fig. 5 of the cheating degree information answered in step s3, rubbish text determines that equipment 1 determines candidate's rubbish text institute The mode of corresponding cheating degree information is identical or essentially identical, for simplicity, therefore will not be repeated here, and by reference Comprising with this.
In step S4 ', rubbish text determines work of the equipment 1 according to corresponding to pretreated candidate's rubbish text Disadvantage degree information, is determined corresponding to the initial page from pretreated one or more of candidate's rubbish text information One or more rubbish text information.Here, in step S4 ', rubbish text determines equipment 1 from pretreated one Or the mode of one or more rubbish text information corresponding to the initial page is determined in multiple candidate's rubbish text information With in step s 4, it is described that rubbish text determines that equipment 1 is determined from one or more of candidate's rubbish text information in Fig. 5 The mode of one or more rubbish text information corresponding to initial page is identical or essentially identical, for simplicity, therefore herein Repeat no more, and include by reference and this.
It should be noted that the present invention can be carried out in the assembly of software and/or software and hardware, for example, can adopt Use application specific integrated circuit(ASIC), general purpose computer or any other similar hardware device realize.In one embodiment In, software program of the invention can realize steps described above or function by computing device.Similarly, it is of the invention Software program(Including related data structure)It can be stored in computer readable recording medium storing program for performing, for example, RAM memory, Magnetically or optically driver or floppy disc and similar devices.In addition, some steps or function of the present invention can employ hardware to realize, example Such as, as coordinating with processor so as to performing the circuit of each step or function.
In addition, the part of the present invention can be applied to computer program product, such as computer program instructions, when its quilt When computer is performed, by the operation of the computer, the method according to the invention and/or technical scheme can be called or provided. And the programmed instruction of the method for the present invention is called, it is possibly stored in fixed or moveable recording medium, and/or pass through Broadcast or the data flow in other signal bearing medias and be transmitted, and/or be stored according to described program instruction operation In the working storage of computer equipment.Here, including a device according to one embodiment of present invention, the device includes using In the memory and processor for execute program instructions of storage computer program instructions, wherein, when the computer program refers to When order is by the computing device, method and/or skill of the plant running based on foregoing multiple embodiments according to the present invention are triggered Art scheme.
It is obvious to a person skilled in the art that the invention is not restricted to the details of above-mentioned one exemplary embodiment, Er Qie In the case of without departing substantially from spirit or essential attributes of the invention, the present invention can be realized in other specific forms.Therefore, no matter From the point of view of which point, embodiment all should be regarded as exemplary, and be nonrestrictive, the scope of the present invention is by appended power Profit is required rather than described above is limited, it is intended that all in the implication and scope of the equivalency of claim by falling Change is included in the present invention.Any reference in claim should not be considered as to the claim involved by limitation.This Outside, it is clear that the word of " comprising " one is not excluded for other units or step, and odd number is not excluded for plural number.That is stated in device claim is multiple Unit or device can also be realized by a unit or device by software or hardware.The first, the second grade word is used for table Show title, and be not offered as any specific order.

Claims (14)

1. a kind of method for being used to determine the rubbish text information in the page, wherein, this method comprises the following steps:
A obtains pending initial page;
B determines one or more candidate's rubbish text information corresponding to the initial page;
Percent information is presented in storehouse frequency information and user of the c according to corresponding to candidate's rubbish text information, determines the candidate Cheating degree information corresponding to rubbish text information;
D determines the initial page institute according to the cheating degree information from one or more of candidate's rubbish text information Corresponding one or more rubbish text information;
Wherein, the step c includes:
The cheating degree information is determined according to following formula:
<mrow> <mi>y</mi> <mo>=</mo> <mfrac> <mi>C</mi> <mrow> <mi>&amp;alpha;</mi> <mo>*</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </munderover> <mfrac> <mn>1</mn> <msub> <mi>B</mi> <mi>i</mi> </msub> </mfrac> </mrow> </mfrac> </mrow>
Wherein, C represents the storehouse frequency information corresponding to candidate's rubbish text information, BiRepresent user i on candidate's rubbish Percent information is presented in the user of text message, and n represents the total number of users of text of the issue comprising candidate's rubbish text information Amount, α represents the presentation probabilistic information corresponding to candidate's rubbish text information, and y represents the cheating degree information.
2. according to the method described in claim 1, wherein, the step b includes:
- in the initial page detection meet the character string of predetermined characteristics of spam, meet predetermined characteristics of spam by described Character string is used as one or more candidate's rubbish text information.
3. method according to claim 2, wherein, the step b includes:
- character string for meeting predetermined part of speech combination is detected in the initial page, by the character for meeting predetermined part of speech String is used as one or more candidate's rubbish text information;
Wherein, this method also includes:
- grammar property the information according to corresponding to candidate's rubbish text information, to one or more of candidate's rubbish text This information is pre-processed, to obtain pretreated one or more candidate's rubbish text information;
Wherein, the step c includes:
Percent information is presented in-storehouse frequency information according to corresponding to candidate's rubbish text information and user, it is determined that after pretreatment Candidate's rubbish text corresponding to cheating degree information;
Wherein, the step d includes:
- cheating degree the information according to corresponding to pretreated candidate's rubbish text, from pretreated one or One or more rubbish text information corresponding to the initial page are determined in multiple candidate's rubbish text information.
4. according to the method described in claim 1, wherein, the step c includes:
- storehouse frequency information according to corresponding to candidate's rubbish text information and user are presented percent information, and with reference to it is described just The page subject matter information of the beginning page, determines the cheating degree information.
5. method according to claim 1 or 2, wherein, the step c includes:
- word segmentation processing is carried out respectively to candidate's rubbish text information, to obtain corresponding to candidate's rubbish text information One or more participle information;
Cheating degree information corresponding to-one or more participle information according to corresponding to candidate's rubbish text information, really The fixed cheating degree information.
6. method according to any one of claim 1 to 4, wherein, this method also includes step:
M generates the target pages corresponding with the initial page, wherein, the target pages are included to one or more of The display identification information of at least one in rubbish text information;
- target pages are supplied to correspondence user.
7. method according to claim 6, wherein, the step m includes:
- cheating degree the information according to corresponding at least one in one or more of rubbish text information, is determined one Or the presentation pattern corresponding at least one in multiple rubbish text information;
- according to the presentation pattern, the target pages corresponding with the initial page are generated, wherein, the page object bread Containing display mark letter corresponding with the presentation pattern, at least one in one or more of rubbish text information Breath.
8. a kind of be used to determine that the rubbish text of the rubbish text information in the page determines equipment, wherein, the rubbish text is determined Equipment includes:
Acquisition device, the pending initial page for obtaining;
Candidate's determining device, for determining one or more candidate's rubbish text information corresponding to the initial page;
Cheating degree determining device, ratio is presented for the storehouse frequency information according to corresponding to candidate's rubbish text information and user Information, determines the cheating degree information corresponding to candidate's rubbish text information;
Rubbish determining device, it is true from one or more of candidate's rubbish text information for according to the cheating degree information One or more rubbish text information corresponding to the fixed initial page;
Wherein, the cheating degree determining device is used to, by following formula, determine the cheating degree information:
<mrow> <mi>y</mi> <mo>=</mo> <mfrac> <mi>C</mi> <mrow> <mi>&amp;alpha;</mi> <mo>*</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </munderover> <mfrac> <mn>1</mn> <msub> <mi>B</mi> <mi>i</mi> </msub> </mfrac> </mrow> </mfrac> </mrow>
Wherein, C represents the storehouse frequency information corresponding to candidate's rubbish text information, BiRepresent user i on candidate's rubbish Percent information is presented in the user of text message, and n represents the total number of users of text of the issue comprising candidate's rubbish text information Amount, α represents the presentation probabilistic information corresponding to candidate's rubbish text information, and y represents the cheating degree information.
9. rubbish text according to claim 8 determines equipment, wherein, candidate's determining device is used for:
- in the initial page detection meet the character string of predetermined characteristics of spam, meet predetermined characteristics of spam by described Character string is used as one or more candidate's rubbish text information.
10. rubbish text according to claim 9 determines equipment, wherein, candidate's determining device is used for:
- character string for meeting predetermined part of speech combination is detected in the initial page, by the character for meeting predetermined part of speech String is used as one or more candidate's rubbish text information;
Wherein, the rubbish text determines that equipment also includes:
Pretreatment unit, for the grammar property information according to corresponding to candidate's rubbish text information, to one or Multiple candidate's rubbish text information are pre-processed, to obtain pretreated one or more candidate's rubbish text information;
Wherein, the cheating degree determining device is used for:
Percent information is presented in-storehouse frequency information according to corresponding to candidate's rubbish text information and user, it is determined that after pretreatment Candidate's rubbish text corresponding to cheating degree information;
Wherein, the rubbish text determining device is used for:
- cheating degree the information according to corresponding to pretreated candidate's rubbish text, from pretreated one or One or more rubbish text information corresponding to the initial page are determined in multiple candidate's rubbish text information.
11. rubbish text according to claim 8 determines equipment, wherein, the cheating degree determining device is used for:
- storehouse frequency information according to corresponding to candidate's rubbish text information and user are presented percent information, and with reference to it is described just The page subject matter information of the beginning page, determines the cheating degree information.
12. rubbish text according to claim 8 or claim 9 determines equipment, wherein, the cheating degree determining device is used for:
- word segmentation processing is carried out respectively to candidate's rubbish text information, to obtain corresponding to candidate's rubbish text information One or more participle information;
Cheating degree information corresponding to-one or more participle information according to corresponding to candidate's rubbish text information, really The fixed cheating degree information.
13. the rubbish text according to any one of claim 8 to 11 determines equipment, wherein, the rubbish text determines to set It is standby also to include:
Webpage generating device, for generating the target pages corresponding with the initial page, wherein, the target pages are included To the display identification information of at least one in one or more of rubbish text information;
Device is provided, for the target pages to be supplied into correspondence user.
14. rubbish text according to claim 13 determines equipment, wherein, the webpage generating device includes:
Pattern determining unit is presented, for the cheating according to corresponding at least one in one or more of rubbish text information Information is spent, the presentation pattern corresponding at least one in one or more of rubbish text information is determined;
Page generating unit, for according to the presentation pattern, generating the target pages corresponding with the initial page, its In, the target pages comprising it is corresponding with the presentation pattern, in one or more of rubbish text information at least The display identification information of one.
CN201410058591.8A 2014-02-20 2014-02-20 A kind of method and apparatus for being used to determine the rubbish text information in the page Active CN103886016B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410058591.8A CN103886016B (en) 2014-02-20 2014-02-20 A kind of method and apparatus for being used to determine the rubbish text information in the page

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410058591.8A CN103886016B (en) 2014-02-20 2014-02-20 A kind of method and apparatus for being used to determine the rubbish text information in the page

Publications (2)

Publication Number Publication Date
CN103886016A CN103886016A (en) 2014-06-25
CN103886016B true CN103886016B (en) 2017-11-03

Family

ID=50954908

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410058591.8A Active CN103886016B (en) 2014-02-20 2014-02-20 A kind of method and apparatus for being used to determine the rubbish text information in the page

Country Status (1)

Country Link
CN (1) CN103886016B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104408087A (en) * 2014-11-13 2015-03-11 百度在线网络技术(北京)有限公司 Method and system for identifying cheating text
CN105704005B (en) * 2014-11-28 2020-07-24 深圳市腾讯计算机系统有限公司 Malicious user reporting method and device, and reported information processing method and device
CN106411988A (en) * 2016-03-31 2017-02-15 北京金山安全软件有限公司 Garbage treatment method and device and mobile terminal
CN107544967B (en) * 2016-06-23 2022-03-25 北京搜狗科技发展有限公司 Network access method and device and electronic equipment
CN108804413B (en) * 2018-04-28 2022-03-22 百度在线网络技术(北京)有限公司 Text cheating identification method and device
CN111460110B (en) * 2019-01-22 2023-04-25 阿里巴巴集团控股有限公司 Abnormal text detection method, abnormal text sequence detection method and device
CN110688540B (en) * 2019-10-08 2022-06-10 腾讯科技(深圳)有限公司 Cheating account screening method, device, equipment and medium

Also Published As

Publication number Publication date
CN103886016A (en) 2014-06-25

Similar Documents

Publication Publication Date Title
CN103886016B (en) A kind of method and apparatus for being used to determine the rubbish text information in the page
Windhager et al. Visualization of cultural heritage collection data: State of the art and future challenges
US10474708B2 (en) Entity-centric knowledge discovery
Gon Local experiences on Instagram: Social media data as source of evidence for experience design.
CN103544178B (en) It is a kind of for providing the method and apparatus of reconstruction page corresponding with target pages
CN103294781B (en) A kind of method and apparatus for processing page data
US10776885B2 (en) Mutually reinforcing ranking of social media accounts and contents
CN107220386A (en) Information-pushing method and device
CN105518661B (en) Segment via the hyperlink text of excavation carrys out image browsing
US10733658B2 (en) Methods and systems of discovery of products in E-commerce
Xie et al. Efficient browsing of web search results on mobile devices based on block importance model
CN106716399A (en) Ranking external content on online social networks
CN107256269A (en) Method, computer-readable non-transitory storage medium and system
CN102298612A (en) Adjusting search results based on user social profiles
Stuart Practical ontologies for information professionals
CN102955848A (en) Semantic-based three-dimensional model retrieval system and method
Díez et al. Towards explainable personalized recommendations by learning from users’ photos
Afzaal et al. Multiaspect‐based opinion classification model for tourist reviews
CN111813905A (en) Corpus generation method and device, computer equipment and storage medium
CN105009117B (en) Fuzzy structure search inquiry in online social networkies
CN112989208B (en) Information recommendation method and device, electronic equipment and storage medium
Wade et al. Sociological images: blogging as public sociology
CN108984555A (en) User Status is excavated and information recommendation method, device and equipment
Blanco et al. You should read this! let me explain you why: explaining news recommendations to users
CN111680131A (en) Document clustering method and system based on semantics and computer equipment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant