CN105183894A - Method and device for filtering internal chains of website - Google Patents

Method and device for filtering internal chains of website Download PDF

Info

Publication number
CN105183894A
CN105183894A CN201510633911.2A CN201510633911A CN105183894A CN 105183894 A CN105183894 A CN 105183894A CN 201510633911 A CN201510633911 A CN 201510633911A CN 105183894 A CN105183894 A CN 105183894A
Authority
CN
China
Prior art keywords
chain
interior chain
feature
setting
interior
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510633911.2A
Other languages
Chinese (zh)
Other versions
CN105183894B (en
Inventor
王波
门阳阳
陈琳
李�浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baidu Online Network Technology Beijing Co Ltd
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201510633911.2A priority Critical patent/CN105183894B/en
Publication of CN105183894A publication Critical patent/CN105183894A/en
Application granted granted Critical
Publication of CN105183894B publication Critical patent/CN105183894B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9558Details of hyperlinks; Management of linked annotations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Abstract

The invention discloses a method and device for filtering internal chains of a website. The method comprises the steps that the set characteristics of the internal chains of the specified website are extracted; the set characteristics of the internal chains are input into a set machine learning model which completes training, and then the internal chains are classified; the internal chains are filtered according to a classifying result. According to the method and device for filtering the internal chains of the website, the low-quality internal chains in the website can be filtered according to the filtering result, the high-quality internal chains in the website can be retained, the quality of the internal chains of the specified website is improved, and the user experience is improved.

Description

The method of chain and device in filtering website
Technical field
The embodiment of the present invention relates to interior chain optimisation technique, particularly relates to method and the device of chain in a kind of filtering website.
Background technology
Interior chain refers to interlinking between the content page under the domain name of same website, and in rational website, chain structure can improve including and weight of website of search engine, increases average daily visit capacity, promotes overall visit capacity.Meanwhile, need during chain in structure to respect Consumer's Experience, be also noted that the correlativity of link, the link that correlativity is high contributes to improving search engine and includes, and contributes to Consumer's Experience, and then promote the pageview of website, on the contrary, lower, the insignificant interior chain of correlativity is little on click, the impact of page topological relation, but affects Consumer's Experience, belong to chain in low-quality, reduce the interior chain quality of website.
Summary of the invention
In view of this, the embodiment of the present invention provides method and the device of chain in a kind of filtering website, to improve chain quality in website.
First aspect, embodiments provides the method for chain in a kind of filtering website, and described method comprises:
Extract the setting feature of the interior chain of appointed website;
In the setting machine learning model that the input of the setting feature of described interior chain has been trained, described interior chain is classified;
According to classification results, described interior chain is filtered.
Second aspect, the embodiment of the present invention additionally provides the device of chain in a kind of filtering website, and described device comprises:
Characteristic extracting module, for extracting the setting feature of chain in appointed website;
Interior chain sort module, in the setting machine learning model of the input of the setting feature of described interior chain having been trained, classifies to described interior chain;
Interior chain filtering module, for according to classification results, filters described interior chain.
The method of chain and device in the filtering website that the embodiment of the present invention provides, by extracting the setting feature of chain in appointed website, in the setting machine learning model that the input of the setting feature of described interior chain has been trained, described interior chain is classified, according to classification results, described interior chain is filtered, the high-quality interior chain in website can be retained according to filter result, improve the interior chain quality of appointed website.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of the method for chain in a kind of filtering website of providing of the embodiment of the present invention one;
Fig. 2 is the process flow diagram of the method for chain in a kind of filtering website of providing of the embodiment of the present invention two;
Fig. 3 is the flow process of the method for chain in a kind of filtering website of providing of the embodiment of the present invention three;
Fig. 4 is the exemplary plot of the decision tree in the random forest in the method for chain in the filtering website that provides of the embodiment of the present invention;
Fig. 5 is the structural representation of the device of chain in a kind of filtering website of providing of the embodiment of the present invention four.
Embodiment
Below in conjunction with drawings and Examples, the present invention is described in further detail.Be understandable that, specific embodiment described herein is only for explaining the present invention, but not limitation of the invention.It also should be noted that, for convenience of description, illustrate only part related to the present invention in accompanying drawing but not full content.
Embodiment one
Fig. 1 is the process flow diagram of the method for chain in a kind of filtering website of providing of the embodiment of the present invention one, and the present embodiment is applicable to the situation of filtering chain in appointed website, and the method can be performed by computing machine, specifically comprises as follows:
S110, extracts the setting feature of the interior chain of appointed website.
Extract the setting feature of the interior chain of appointed website (as Baidupedia), described setting feature can comprise and the feature of interior chain text dependent or the link URL (UniformResourceLocator with interior chain, URL(uniform resource locator)) relevant feature etc., namely, the setting feature of chain wants the character of the quality height that can reflect interior chain.
Wherein, described setting feature preferably includes: the average daily visit capacity feature of proper name recognition feature, the interior chain page, the tfidf feature of interior chain text, the characteristic of division of interior chain text, the link URL authentication feature of interior chain and entity similarity feature.Can extract from chain URL in interior chain text or interior chain, also can be other statistical informations based on this interior chain.
There is in chain text in proper name recognition feature refers to and identifies the entity of certain sense, mainly comprise name, place name or mechanism's name etc.Table 1 is the example utilizing the identification of proper name identification facility.Be that in NOR and PHRASE, chain text is that the possibility of chain in low-quality is comparatively large for recognition result, table 2 illustrates the proper name recognition result of chain text in part.
Table 1 utilizes proper name identification facility identification example
Interior chain text Recognition result
High round PER (name)
Beijing LOC (place)
Means NOR (non-proper name)
On a large scale PHRASE (phrase)
The proper name recognition result of chain text in table 2 part
The average daily visit capacity feature of the interior chain page can reflect the concern situation of people to related pages, and the interior chain page that generally average daily visit capacity is lower is chain in the low-quality not too paid close attention to of people often, and table 3 illustrates the average daily visit capacity feature of chain in part.
The average daily visit capacity feature of chain in table 3 part
The tfidf feature of interior chain text: tfidf is a kind of statistical method, in order to assess the significance level of a words for a copy of it file in a file set or a corpus, the importance of words to be directly proportional increase along with the number of times that it occurs hereof, the decline but the frequency also occurred in corpus along with it is inversely proportional to simultaneously.The main thought of tfidf is, if the frequency that certain word or phrase occur in one section of article is high, and seldom occurs in other articles, then think that this word or phrase have good class discrimination ability, is applicable to for classification.Wherein, tfidf is actually tf*idf, wherein, tf (termfrequency, word frequency) represents the frequency that entry occurs in a document, idf (inversedocumentfrequency, reverse document-frequency) main thought is if the document comprising an entry is fewer, idf is larger, and illustrate that this entry has good class discrimination ability, the height of tfidf value represents the height of class discrimination ability.Table 4 illustrates the tfidf feature of chain in part.
The tfidf feature of chain in table 4 part
The characteristic of division of interior chain text: the classification of interior chain text can as one-dimensional characteristic, generally, comparatively popular classification (as Chinese idiom, ancient times government post etc. there is the classification of practical significance) belong to high-quality interior chain.When concrete internal chain text is classified, the classification results of dictionary can be obtained.Table 5 illustrates the characteristic of division of the interior chain text of chain in part.
The characteristic of division of the interior chain text of chain in table 5 part
The link URL authentication feature of interior chain, by judging whether the link URL of interior chain is present in the url list of inside, website, and whether the link URL of this interior chain is unique, obtains the link URL whether necessary being of interior chain.In some, the link URL of chain is not present in the inner url list in website, and this interior chain is judged as chain in low-quality; The link URL of chain not unique (as in some in Baidupedia, the link URL of chain has multiple senses of a dictionary entry) in some, this interior chain is also judged as chain in low-quality.Table 6 illustrates the link URL authentication feature of chain in part.
The link URL authentication feature of chain in table 6 part
Entity similarity feature, refers to the similarity between interior chain entity and source Pages entity.Similarity height between interior chain entity and source Pages entity can reflect the correlativity size of this interior chain and the source page, and the interior chain that correlativity is low can think chain in low-quality.Wherein, interior chain entity i.e. interior chain text, source Pages entity and source page text.Table 7 illustrates the entity similarity feature of chain in part:
The entity similarity feature of chain in table 7 part
S120, in the setting machine learning model of the input of the setting feature of described interior chain trained, classifies to described interior chain.
According to the setting feature of described interior chain, utilize setting machine learning model to classify to described interior chain, described interior chain is categorized as chain in high-quality interior chain and low-quality.Before this, first utilize the great amount of samples of known results to train setting machine learning model, obtain the setting machine learning model of having trained, to make the classification results setting machine learning model optimum.
Described setting machine learning model preferably includes Random Forest model or SVM (SupportVectorMachine, support vector machine) model.Wherein, Random Forest model, in machine learning, is a sorter comprising multiple decision tree, and the mode that its classification exported is the classification exported by indivedual tree is determined; SVM is a learning model having supervision in machine learning, is commonly used to carry out pattern-recognition, classification and regretional analysis.
Before the setting machine learning model of the input of the setting feature of described interior chain having been trained, first to obtain the eigenwert of the setting feature of described interior chain.The Different Results recognized can be defined as different numerical value by the eigenwert for proper name recognition feature, and sets threshold value, exceedes threshold value or is chain in low-quality lower than threshold value; For the eigenwert of the characteristic of division of the text of interior chain, can be different numerical value by different class definitions, and set corresponding threshold value, exceed threshold value or be chain in low-quality lower than threshold value.For the average daily visit capacity feature of the interior chain page, setting threshold value, what exceed threshold value is high-quality interior chain; Tfidf feature for interior chain text can represent by tfidf value, and sets threshold value, and what exceed threshold value is high-quality interior chain; Eigenwert for the link URL authentication feature of interior chain can represent with 0, represents otherwise eigenwert is 0 represent high-quality interior chain with 1; For entity similarity feature, setting similarity threshold, what exceed similarity threshold is high-quality interior chain.
S130, according to classification results, filters described interior chain.
According to classification results, described interior chain is filtered, obtain chain or high-quality interior chain in low-quality, to retain the high-quality interior chain in website, and then improve the interior chain quality of website.
The present embodiment is by extracting the setting feature of the interior chain of appointed website, in the setting machine learning model that the input of the setting feature of described interior chain has been trained, described interior chain is classified, according to classification results, described interior chain is filtered, chain in the low-quality in website can be filtered out according to filter result, retain the high-quality interior chain in website, improve the interior chain quality of appointed website, promote the experience of user.
Embodiment two
Fig. 2 is the process flow diagram of the method for chain in a kind of filtering website of providing of the embodiment of the present invention two, specifically comprises as follows:
S210, extracts the setting feature of the interior chain of appointed website.
S220, in the setting machine learning model of the input of the setting feature of described interior chain trained, classifies to described interior chain.
S230, according to classification results, filters described interior chain.
S240, carries out rule-based filtering to chain in the low-quality filtered out, and the similarity filtered out in described low-quality in chain between chain entity and source Pages entity exceedes the interior chain setting threshold value.
Further rule-based filtering is carried out again to chain in the low-quality filtered out, namely again chain in the low-quality filtered out is filtered by entity similarity feature, exceed with the similarity filtered out in described low-quality in chain between chain entity and source Pages entity and set chain in threshold value, filter away from chain in low-quality by high-quality interior chain, avoid high-quality interior chain to delete from chain in website.
The present embodiment is by extracting the setting feature of the interior chain of appointed website, in the setting machine learning model that the input of the setting feature of described interior chain has been trained, described interior chain is classified, according to classification results, described interior chain is classified, rule-based filtering is carried out to chain in the low-quality filtered out, the similarity filtered out in described low-quality in chain between chain entity and source Pages entity exceedes the interior chain setting threshold value, improve the interior chain quality of appointed website, compared with embodiment one, high-quality interior chain can be avoided to delete from chain in website.
Embodiment three
Fig. 3 is the process flow diagram of the method for chain in a kind of filtering website of providing of the embodiment of the present invention three, and the present embodiment is on the basis of embodiment one, and described setting machine learning model is Random Forest model, specifically comprises as follows:
S310, extracts the setting feature of the interior chain of appointed website.
S320, inputs the setting feature of each interior chain respectively in the Random Forest model of having trained, obtains the classification results of this interior chain.
In machine learning, Random Forest model is made up of many decision trees because these decision trees be formed by random method, be therefore also called stochastic decision tree.Not association between decision tree in Random Forest model.When test data enters Random Forest model, each decision tree is classified respectively according to test data, and finally getting that maximum class of classification results in all decision trees is final result.Therefore Random Forest model is a sorter comprising multiple decision tree, and the mode that its classification exported is the classification exported by indivedual tree is determined.Random Forest model can processing attribute be both the amount of discrete value, can processing attribute be also the amount of successive value.
Decision tree is actually a kind of method being carried out by space lineoid dividing, and when each segmentation, is all divided into two in current space, decision tree as shown in Figure 4.Fig. 4 is the exemplary plot of the decision tree in the random forest in the method for chain in the filtering website that provides of the embodiment of the present invention.As shown in Figure 4, the classification results of every decision tree obtains from the leaf node of this decision tree, and wherein, root node and intermediate node represent the setting feature of interior chain.
The advantage of Random Forest model: be relatively applicable to doing many classification problems; Training and predetermined speed fast; Strong to the fault-tolerant ability of training data, be a kind of a kind of method effectively estimating missing data, precision still can be kept when data centralization has the shortage of data of vast scale constant; Can effectively process large data set; Influencing each other and importance degree between feature can be detected; Realize simple and easy parallelization.
Utilize Random Forest model to when chain is classified in appointed website, first by the eigenwert of all setting features of interior chain input Random Forest model, obtain the classification results of this interior chain; And then by the eigenwert input Random Forest model of all setting features of next interior chain, obtain the classification results of next interior chain; So each interior chain is classified successively, until obtain the classification results of all interior chain of appointed website.
Classification results for every decision tree can utilize 0 and 1 to represent chain in high-quality interior chain and low-quality respectively, and the classification results adding up whole Random Forest model obtains final classification results.
S330, according to classification results, filters described interior chain.
The present embodiment is by extracting the setting feature of the interior chain of appointed website, the setting feature of each interior chain is inputted in the Random Forest model of having trained respectively, obtain the classification results of this interior chain, according to classification results, described interior chain is filtered, chain in the low-quality in website can be filtered out according to filter result, retain the high-quality interior chain in website, improve the interior chain quality of appointed website, and classified by the internal chain of Random Forest model, make classification results more accurate.With chain data in Baidupedia as an example, in the part low-quality in the low-quality that the method utilizing the present embodiment to provide obtains in chain data, chain carries out upper thread test, and overall clicking rate compares decline 1% than removing chain in this part low-quality; By deleting in the low-quality in Baidupedia after chain, in low-quality, chain accounting reduces to 7.6% by original 25.7%, improves the quality of chain in Baidupedia.
Embodiment four
Fig. 5 is the structural representation of the device of chain in a kind of filtering website of providing of the embodiment of the present invention four, and as shown in Figure 5, in the filtering website that the present embodiment improves, the device of chain comprises: characteristic extracting module 510, interior chain sort module 520 and interior chain filtering module 530.
Wherein, characteristic extracting module 510 is for extracting the setting feature of chain in appointed website;
Interior chain sort module 520, in the setting machine learning model of the input of the setting feature of described interior chain having been trained, is classified to described interior chain;
Interior chain filtering module 530, for according to classification results, filters described interior chain.
Preferably, also comprise:
Rule-based filtering module, for after filtering described interior chain, carries out rule-based filtering to chain in the low-quality filtered out, and the similarity filtered out in described low-quality in chain between chain entity and source Pages entity exceedes the interior chain setting threshold value.
Wherein, described setting feature preferably includes: the average daily visit capacity feature of proper name recognition feature, the interior chain page, the tfidf feature of interior chain text, the characteristic of division of interior chain text, the link URL authentication feature of interior chain and entity similarity feature.
Wherein, described setting machine learning model preferably includes Random Forest model or SVM model.
Preferably, described setting machine learning model is Random Forest model;
Described interior chain sort module specifically for:
The setting feature of each interior chain is inputted in the Random Forest model of having trained respectively, obtains the classification results of this interior chain.
The said goods can perform the method that any embodiment of the present invention provides, and possesses the corresponding functional module of manner of execution and beneficial effect.
Note, above are only preferred embodiment of the present invention and institute's application technology principle.Skilled person in the art will appreciate that and the invention is not restricted to specific embodiment described here, various obvious change can be carried out for a person skilled in the art, readjust and substitute and can not protection scope of the present invention be departed from.Therefore, although be described in further detail invention has been by above embodiment, the present invention is not limited only to above embodiment, when not departing from the present invention's design, can also comprise other Equivalent embodiments more, and scope of the present invention is determined by appended right.

Claims (10)

1. the method for chain in filtering website, it is characterized in that, described method comprises:
Extract the setting feature of the interior chain of appointed website;
In the setting machine learning model that the input of the setting feature of described interior chain has been trained, described interior chain is classified;
According to classification results, described interior chain is filtered.
2. method according to claim 1, is characterized in that, after filtering described interior chain, also comprises:
Carry out rule-based filtering to chain in the low-quality filtered out, the similarity filtered out in described low-quality in chain between chain entity and source Pages entity exceedes the interior chain setting threshold value.
3. method according to claim 1 and 2, it is characterized in that, described setting feature comprises: the average daily visit capacity feature of proper name recognition feature, the interior chain page, the word frequency of interior chain text-reverse document-frequency tfidf feature, the characteristic of division of interior chain text, the link uniform resource position mark URL authentication feature of interior chain and entity similarity feature.
4. method according to claim 3, is characterized in that, described setting machine learning model comprises Random Forest model or support vector machines model.
5. method according to claim 4, is characterized in that, described setting machine learning model is Random Forest model;
In the setting machine learning model that the input of the setting feature of described interior chain has been trained, described interior chain is classified, comprising:
The setting feature of each interior chain is inputted in the Random Forest model of having trained respectively, obtains the classification results of this interior chain.
6. the device of chain in filtering website, it is characterized in that, described device comprises:
Characteristic extracting module, for extracting the setting feature of chain in appointed website;
Interior chain sort module, in the setting machine learning model of the input of the setting feature of described interior chain having been trained, classifies to described interior chain;
Interior chain filtering module, for according to classification results, filters described interior chain.
7. device according to claim 6, is characterized in that, also comprises:
Rule-based filtering module, for after filtering described interior chain, carries out rule-based filtering to chain in the low-quality filtered out, and the similarity filtered out in described low-quality in chain between chain entity and source Pages entity exceedes the interior chain setting threshold value.
8. the device according to claim 6 or 7, it is characterized in that, described setting feature comprises: the average daily visit capacity feature of proper name recognition feature, the interior chain page, the tfidf feature of interior chain text, the characteristic of division of interior chain text, the link URL authentication feature of interior chain and entity similarity feature.
9. device according to claim 8, is characterized in that, described setting machine learning model comprises Random Forest model or SVM model.
10. device according to claim 9, is characterized in that, described setting machine learning model is Random Forest model;
Described interior chain sort module specifically for:
The setting feature of each interior chain is inputted in the Random Forest model of having trained respectively, obtains the classification results of this interior chain.
CN201510633911.2A 2015-09-29 2015-09-29 Method and device for filtering website internal links Active CN105183894B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510633911.2A CN105183894B (en) 2015-09-29 2015-09-29 Method and device for filtering website internal links

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510633911.2A CN105183894B (en) 2015-09-29 2015-09-29 Method and device for filtering website internal links

Publications (2)

Publication Number Publication Date
CN105183894A true CN105183894A (en) 2015-12-23
CN105183894B CN105183894B (en) 2020-03-10

Family

ID=54905975

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510633911.2A Active CN105183894B (en) 2015-09-29 2015-09-29 Method and device for filtering website internal links

Country Status (1)

Country Link
CN (1) CN105183894B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110138794A (en) * 2019-05-22 2019-08-16 杭州安恒信息技术股份有限公司 A kind of counterfeit website identification method, device, equipment and readable storage medium storing program for executing
CN113919347A (en) * 2021-12-14 2022-01-11 山东捷瑞数字科技股份有限公司 Method and device for extracting and matching internal link words of text data

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102004764A (en) * 2010-11-04 2011-04-06 中国科学院计算机网络信息中心 Internet bad information detection method and system
WO2012083892A1 (en) * 2010-12-24 2012-06-28 北大方正集团有限公司 Method and device for filtering harmful information
CN102654875A (en) * 2011-03-04 2012-09-05 北京百度网讯科技有限公司 Method and device for automatically processing inner link of web text
CN103116638A (en) * 2013-02-19 2013-05-22 人民搜索网络股份公司 Webpage screening method and device thereof

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102004764A (en) * 2010-11-04 2011-04-06 中国科学院计算机网络信息中心 Internet bad information detection method and system
WO2012083892A1 (en) * 2010-12-24 2012-06-28 北大方正集团有限公司 Method and device for filtering harmful information
CN102654875A (en) * 2011-03-04 2012-09-05 北京百度网讯科技有限公司 Method and device for automatically processing inner link of web text
CN103116638A (en) * 2013-02-19 2013-05-22 人民搜索网络股份公司 Webpage screening method and device thereof

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110138794A (en) * 2019-05-22 2019-08-16 杭州安恒信息技术股份有限公司 A kind of counterfeit website identification method, device, equipment and readable storage medium storing program for executing
CN113919347A (en) * 2021-12-14 2022-01-11 山东捷瑞数字科技股份有限公司 Method and device for extracting and matching internal link words of text data
CN113919347B (en) * 2021-12-14 2022-04-05 山东捷瑞数字科技股份有限公司 Method and device for extracting and matching internal link words of text data

Also Published As

Publication number Publication date
CN105183894B (en) 2020-03-10

Similar Documents

Publication Publication Date Title
CN103914478B (en) Webpage training method and system, webpage Forecasting Methodology and system
WO2017167067A1 (en) Method and device for webpage text classification, method and device for webpage text recognition
CN102193936B (en) Data classification method and device
US20170185680A1 (en) Chinese website classification method and system based on characteristic analysis of website homepage
TWI689825B (en) Method and device for obtaining document quality index
CN108984518A (en) A kind of file classification method towards judgement document
US20180357302A1 (en) Method and device for processing a topic
US20220147023A1 (en) Method and device for identifying industry classification of enterprise and particular pollutants of enterprise
CN106250513A (en) A kind of event personalization sorting technique based on event modeling and system
CN102495892A (en) Webpage information extraction method
CN102411563A (en) Method, device and system for identifying target words
CN104239485A (en) Statistical machine learning-based internet hidden link detection method
CN107506472B (en) Method for classifying browsed webpages of students
CN108319672B (en) Mobile terminal bad information filtering method and system based on cloud computing
CN103886077B (en) Short text clustering method and system
CN105447161A (en) Data feature based intelligent information classification method
CN111767725A (en) Data processing method and device based on emotion polarity analysis model
CN106021578A (en) Improved text classification algorithm based on integration of cluster and membership degree
CN104142960A (en) Internet data analysis system
CN112699232A (en) Text label extraction method, device, equipment and storage medium
Dong et al. An adult image detection algorithm based on Bag-of-Visual-Words and text information
CN104794209B (en) Chinese microblogging mood sorting technique based on Markov logical network and system
CN103020286A (en) Internet ranking list grasping system based on ranking website
Mahmoudi et al. Web spam detection based on discriminative content and link features
CN110516710A (en) Web page classification method, device, computer installation and computer readable storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant