CN105183894A - Method and device for filtering internal chains of website - Google Patents
Method and device for filtering internal chains of website Download PDFInfo
- Publication number
- CN105183894A CN105183894A CN201510633911.2A CN201510633911A CN105183894A CN 105183894 A CN105183894 A CN 105183894A CN 201510633911 A CN201510633911 A CN 201510633911A CN 105183894 A CN105183894 A CN 105183894A
- Authority
- CN
- China
- Prior art keywords
- chain
- interior chain
- feature
- setting
- interior
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/955—Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
- G06F16/9558—Details of hyperlinks; Management of linked annotations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
Abstract
The invention discloses a method and device for filtering internal chains of a website. The method comprises the steps that the set characteristics of the internal chains of the specified website are extracted; the set characteristics of the internal chains are input into a set machine learning model which completes training, and then the internal chains are classified; the internal chains are filtered according to a classifying result. According to the method and device for filtering the internal chains of the website, the low-quality internal chains in the website can be filtered according to the filtering result, the high-quality internal chains in the website can be retained, the quality of the internal chains of the specified website is improved, and the user experience is improved.
Description
Technical field
The embodiment of the present invention relates to interior chain optimisation technique, particularly relates to method and the device of chain in a kind of filtering website.
Background technology
Interior chain refers to interlinking between the content page under the domain name of same website, and in rational website, chain structure can improve including and weight of website of search engine, increases average daily visit capacity, promotes overall visit capacity.Meanwhile, need during chain in structure to respect Consumer's Experience, be also noted that the correlativity of link, the link that correlativity is high contributes to improving search engine and includes, and contributes to Consumer's Experience, and then promote the pageview of website, on the contrary, lower, the insignificant interior chain of correlativity is little on click, the impact of page topological relation, but affects Consumer's Experience, belong to chain in low-quality, reduce the interior chain quality of website.
Summary of the invention
In view of this, the embodiment of the present invention provides method and the device of chain in a kind of filtering website, to improve chain quality in website.
First aspect, embodiments provides the method for chain in a kind of filtering website, and described method comprises:
Extract the setting feature of the interior chain of appointed website;
In the setting machine learning model that the input of the setting feature of described interior chain has been trained, described interior chain is classified;
According to classification results, described interior chain is filtered.
Second aspect, the embodiment of the present invention additionally provides the device of chain in a kind of filtering website, and described device comprises:
Characteristic extracting module, for extracting the setting feature of chain in appointed website;
Interior chain sort module, in the setting machine learning model of the input of the setting feature of described interior chain having been trained, classifies to described interior chain;
Interior chain filtering module, for according to classification results, filters described interior chain.
The method of chain and device in the filtering website that the embodiment of the present invention provides, by extracting the setting feature of chain in appointed website, in the setting machine learning model that the input of the setting feature of described interior chain has been trained, described interior chain is classified, according to classification results, described interior chain is filtered, the high-quality interior chain in website can be retained according to filter result, improve the interior chain quality of appointed website.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of the method for chain in a kind of filtering website of providing of the embodiment of the present invention one;
Fig. 2 is the process flow diagram of the method for chain in a kind of filtering website of providing of the embodiment of the present invention two;
Fig. 3 is the flow process of the method for chain in a kind of filtering website of providing of the embodiment of the present invention three;
Fig. 4 is the exemplary plot of the decision tree in the random forest in the method for chain in the filtering website that provides of the embodiment of the present invention;
Fig. 5 is the structural representation of the device of chain in a kind of filtering website of providing of the embodiment of the present invention four.
Embodiment
Below in conjunction with drawings and Examples, the present invention is described in further detail.Be understandable that, specific embodiment described herein is only for explaining the present invention, but not limitation of the invention.It also should be noted that, for convenience of description, illustrate only part related to the present invention in accompanying drawing but not full content.
Embodiment one
Fig. 1 is the process flow diagram of the method for chain in a kind of filtering website of providing of the embodiment of the present invention one, and the present embodiment is applicable to the situation of filtering chain in appointed website, and the method can be performed by computing machine, specifically comprises as follows:
S110, extracts the setting feature of the interior chain of appointed website.
Extract the setting feature of the interior chain of appointed website (as Baidupedia), described setting feature can comprise and the feature of interior chain text dependent or the link URL (UniformResourceLocator with interior chain, URL(uniform resource locator)) relevant feature etc., namely, the setting feature of chain wants the character of the quality height that can reflect interior chain.
Wherein, described setting feature preferably includes: the average daily visit capacity feature of proper name recognition feature, the interior chain page, the tfidf feature of interior chain text, the characteristic of division of interior chain text, the link URL authentication feature of interior chain and entity similarity feature.Can extract from chain URL in interior chain text or interior chain, also can be other statistical informations based on this interior chain.
There is in chain text in proper name recognition feature refers to and identifies the entity of certain sense, mainly comprise name, place name or mechanism's name etc.Table 1 is the example utilizing the identification of proper name identification facility.Be that in NOR and PHRASE, chain text is that the possibility of chain in low-quality is comparatively large for recognition result, table 2 illustrates the proper name recognition result of chain text in part.
Table 1 utilizes proper name identification facility identification example
Interior chain text | Recognition result |
High round | PER (name) |
Beijing | LOC (place) |
Means | NOR (non-proper name) |
On a large scale | PHRASE (phrase) |
The proper name recognition result of chain text in table 2 part
The average daily visit capacity feature of the interior chain page can reflect the concern situation of people to related pages, and the interior chain page that generally average daily visit capacity is lower is chain in the low-quality not too paid close attention to of people often, and table 3 illustrates the average daily visit capacity feature of chain in part.
The average daily visit capacity feature of chain in table 3 part
The tfidf feature of interior chain text: tfidf is a kind of statistical method, in order to assess the significance level of a words for a copy of it file in a file set or a corpus, the importance of words to be directly proportional increase along with the number of times that it occurs hereof, the decline but the frequency also occurred in corpus along with it is inversely proportional to simultaneously.The main thought of tfidf is, if the frequency that certain word or phrase occur in one section of article is high, and seldom occurs in other articles, then think that this word or phrase have good class discrimination ability, is applicable to for classification.Wherein, tfidf is actually tf*idf, wherein, tf (termfrequency, word frequency) represents the frequency that entry occurs in a document, idf (inversedocumentfrequency, reverse document-frequency) main thought is if the document comprising an entry is fewer, idf is larger, and illustrate that this entry has good class discrimination ability, the height of tfidf value represents the height of class discrimination ability.Table 4 illustrates the tfidf feature of chain in part.
The tfidf feature of chain in table 4 part
The characteristic of division of interior chain text: the classification of interior chain text can as one-dimensional characteristic, generally, comparatively popular classification (as Chinese idiom, ancient times government post etc. there is the classification of practical significance) belong to high-quality interior chain.When concrete internal chain text is classified, the classification results of dictionary can be obtained.Table 5 illustrates the characteristic of division of the interior chain text of chain in part.
The characteristic of division of the interior chain text of chain in table 5 part
The link URL authentication feature of interior chain, by judging whether the link URL of interior chain is present in the url list of inside, website, and whether the link URL of this interior chain is unique, obtains the link URL whether necessary being of interior chain.In some, the link URL of chain is not present in the inner url list in website, and this interior chain is judged as chain in low-quality; The link URL of chain not unique (as in some in Baidupedia, the link URL of chain has multiple senses of a dictionary entry) in some, this interior chain is also judged as chain in low-quality.Table 6 illustrates the link URL authentication feature of chain in part.
The link URL authentication feature of chain in table 6 part
Entity similarity feature, refers to the similarity between interior chain entity and source Pages entity.Similarity height between interior chain entity and source Pages entity can reflect the correlativity size of this interior chain and the source page, and the interior chain that correlativity is low can think chain in low-quality.Wherein, interior chain entity i.e. interior chain text, source Pages entity and source page text.Table 7 illustrates the entity similarity feature of chain in part:
The entity similarity feature of chain in table 7 part
S120, in the setting machine learning model of the input of the setting feature of described interior chain trained, classifies to described interior chain.
According to the setting feature of described interior chain, utilize setting machine learning model to classify to described interior chain, described interior chain is categorized as chain in high-quality interior chain and low-quality.Before this, first utilize the great amount of samples of known results to train setting machine learning model, obtain the setting machine learning model of having trained, to make the classification results setting machine learning model optimum.
Described setting machine learning model preferably includes Random Forest model or SVM (SupportVectorMachine, support vector machine) model.Wherein, Random Forest model, in machine learning, is a sorter comprising multiple decision tree, and the mode that its classification exported is the classification exported by indivedual tree is determined; SVM is a learning model having supervision in machine learning, is commonly used to carry out pattern-recognition, classification and regretional analysis.
Before the setting machine learning model of the input of the setting feature of described interior chain having been trained, first to obtain the eigenwert of the setting feature of described interior chain.The Different Results recognized can be defined as different numerical value by the eigenwert for proper name recognition feature, and sets threshold value, exceedes threshold value or is chain in low-quality lower than threshold value; For the eigenwert of the characteristic of division of the text of interior chain, can be different numerical value by different class definitions, and set corresponding threshold value, exceed threshold value or be chain in low-quality lower than threshold value.For the average daily visit capacity feature of the interior chain page, setting threshold value, what exceed threshold value is high-quality interior chain; Tfidf feature for interior chain text can represent by tfidf value, and sets threshold value, and what exceed threshold value is high-quality interior chain; Eigenwert for the link URL authentication feature of interior chain can represent with 0, represents otherwise eigenwert is 0 represent high-quality interior chain with 1; For entity similarity feature, setting similarity threshold, what exceed similarity threshold is high-quality interior chain.
S130, according to classification results, filters described interior chain.
According to classification results, described interior chain is filtered, obtain chain or high-quality interior chain in low-quality, to retain the high-quality interior chain in website, and then improve the interior chain quality of website.
The present embodiment is by extracting the setting feature of the interior chain of appointed website, in the setting machine learning model that the input of the setting feature of described interior chain has been trained, described interior chain is classified, according to classification results, described interior chain is filtered, chain in the low-quality in website can be filtered out according to filter result, retain the high-quality interior chain in website, improve the interior chain quality of appointed website, promote the experience of user.
Embodiment two
Fig. 2 is the process flow diagram of the method for chain in a kind of filtering website of providing of the embodiment of the present invention two, specifically comprises as follows:
S210, extracts the setting feature of the interior chain of appointed website.
S220, in the setting machine learning model of the input of the setting feature of described interior chain trained, classifies to described interior chain.
S230, according to classification results, filters described interior chain.
S240, carries out rule-based filtering to chain in the low-quality filtered out, and the similarity filtered out in described low-quality in chain between chain entity and source Pages entity exceedes the interior chain setting threshold value.
Further rule-based filtering is carried out again to chain in the low-quality filtered out, namely again chain in the low-quality filtered out is filtered by entity similarity feature, exceed with the similarity filtered out in described low-quality in chain between chain entity and source Pages entity and set chain in threshold value, filter away from chain in low-quality by high-quality interior chain, avoid high-quality interior chain to delete from chain in website.
The present embodiment is by extracting the setting feature of the interior chain of appointed website, in the setting machine learning model that the input of the setting feature of described interior chain has been trained, described interior chain is classified, according to classification results, described interior chain is classified, rule-based filtering is carried out to chain in the low-quality filtered out, the similarity filtered out in described low-quality in chain between chain entity and source Pages entity exceedes the interior chain setting threshold value, improve the interior chain quality of appointed website, compared with embodiment one, high-quality interior chain can be avoided to delete from chain in website.
Embodiment three
Fig. 3 is the process flow diagram of the method for chain in a kind of filtering website of providing of the embodiment of the present invention three, and the present embodiment is on the basis of embodiment one, and described setting machine learning model is Random Forest model, specifically comprises as follows:
S310, extracts the setting feature of the interior chain of appointed website.
S320, inputs the setting feature of each interior chain respectively in the Random Forest model of having trained, obtains the classification results of this interior chain.
In machine learning, Random Forest model is made up of many decision trees because these decision trees be formed by random method, be therefore also called stochastic decision tree.Not association between decision tree in Random Forest model.When test data enters Random Forest model, each decision tree is classified respectively according to test data, and finally getting that maximum class of classification results in all decision trees is final result.Therefore Random Forest model is a sorter comprising multiple decision tree, and the mode that its classification exported is the classification exported by indivedual tree is determined.Random Forest model can processing attribute be both the amount of discrete value, can processing attribute be also the amount of successive value.
Decision tree is actually a kind of method being carried out by space lineoid dividing, and when each segmentation, is all divided into two in current space, decision tree as shown in Figure 4.Fig. 4 is the exemplary plot of the decision tree in the random forest in the method for chain in the filtering website that provides of the embodiment of the present invention.As shown in Figure 4, the classification results of every decision tree obtains from the leaf node of this decision tree, and wherein, root node and intermediate node represent the setting feature of interior chain.
The advantage of Random Forest model: be relatively applicable to doing many classification problems; Training and predetermined speed fast; Strong to the fault-tolerant ability of training data, be a kind of a kind of method effectively estimating missing data, precision still can be kept when data centralization has the shortage of data of vast scale constant; Can effectively process large data set; Influencing each other and importance degree between feature can be detected; Realize simple and easy parallelization.
Utilize Random Forest model to when chain is classified in appointed website, first by the eigenwert of all setting features of interior chain input Random Forest model, obtain the classification results of this interior chain; And then by the eigenwert input Random Forest model of all setting features of next interior chain, obtain the classification results of next interior chain; So each interior chain is classified successively, until obtain the classification results of all interior chain of appointed website.
Classification results for every decision tree can utilize 0 and 1 to represent chain in high-quality interior chain and low-quality respectively, and the classification results adding up whole Random Forest model obtains final classification results.
S330, according to classification results, filters described interior chain.
The present embodiment is by extracting the setting feature of the interior chain of appointed website, the setting feature of each interior chain is inputted in the Random Forest model of having trained respectively, obtain the classification results of this interior chain, according to classification results, described interior chain is filtered, chain in the low-quality in website can be filtered out according to filter result, retain the high-quality interior chain in website, improve the interior chain quality of appointed website, and classified by the internal chain of Random Forest model, make classification results more accurate.With chain data in Baidupedia as an example, in the part low-quality in the low-quality that the method utilizing the present embodiment to provide obtains in chain data, chain carries out upper thread test, and overall clicking rate compares decline 1% than removing chain in this part low-quality; By deleting in the low-quality in Baidupedia after chain, in low-quality, chain accounting reduces to 7.6% by original 25.7%, improves the quality of chain in Baidupedia.
Embodiment four
Fig. 5 is the structural representation of the device of chain in a kind of filtering website of providing of the embodiment of the present invention four, and as shown in Figure 5, in the filtering website that the present embodiment improves, the device of chain comprises: characteristic extracting module 510, interior chain sort module 520 and interior chain filtering module 530.
Wherein, characteristic extracting module 510 is for extracting the setting feature of chain in appointed website;
Interior chain sort module 520, in the setting machine learning model of the input of the setting feature of described interior chain having been trained, is classified to described interior chain;
Interior chain filtering module 530, for according to classification results, filters described interior chain.
Preferably, also comprise:
Rule-based filtering module, for after filtering described interior chain, carries out rule-based filtering to chain in the low-quality filtered out, and the similarity filtered out in described low-quality in chain between chain entity and source Pages entity exceedes the interior chain setting threshold value.
Wherein, described setting feature preferably includes: the average daily visit capacity feature of proper name recognition feature, the interior chain page, the tfidf feature of interior chain text, the characteristic of division of interior chain text, the link URL authentication feature of interior chain and entity similarity feature.
Wherein, described setting machine learning model preferably includes Random Forest model or SVM model.
Preferably, described setting machine learning model is Random Forest model;
Described interior chain sort module specifically for:
The setting feature of each interior chain is inputted in the Random Forest model of having trained respectively, obtains the classification results of this interior chain.
The said goods can perform the method that any embodiment of the present invention provides, and possesses the corresponding functional module of manner of execution and beneficial effect.
Note, above are only preferred embodiment of the present invention and institute's application technology principle.Skilled person in the art will appreciate that and the invention is not restricted to specific embodiment described here, various obvious change can be carried out for a person skilled in the art, readjust and substitute and can not protection scope of the present invention be departed from.Therefore, although be described in further detail invention has been by above embodiment, the present invention is not limited only to above embodiment, when not departing from the present invention's design, can also comprise other Equivalent embodiments more, and scope of the present invention is determined by appended right.
Claims (10)
1. the method for chain in filtering website, it is characterized in that, described method comprises:
Extract the setting feature of the interior chain of appointed website;
In the setting machine learning model that the input of the setting feature of described interior chain has been trained, described interior chain is classified;
According to classification results, described interior chain is filtered.
2. method according to claim 1, is characterized in that, after filtering described interior chain, also comprises:
Carry out rule-based filtering to chain in the low-quality filtered out, the similarity filtered out in described low-quality in chain between chain entity and source Pages entity exceedes the interior chain setting threshold value.
3. method according to claim 1 and 2, it is characterized in that, described setting feature comprises: the average daily visit capacity feature of proper name recognition feature, the interior chain page, the word frequency of interior chain text-reverse document-frequency tfidf feature, the characteristic of division of interior chain text, the link uniform resource position mark URL authentication feature of interior chain and entity similarity feature.
4. method according to claim 3, is characterized in that, described setting machine learning model comprises Random Forest model or support vector machines model.
5. method according to claim 4, is characterized in that, described setting machine learning model is Random Forest model;
In the setting machine learning model that the input of the setting feature of described interior chain has been trained, described interior chain is classified, comprising:
The setting feature of each interior chain is inputted in the Random Forest model of having trained respectively, obtains the classification results of this interior chain.
6. the device of chain in filtering website, it is characterized in that, described device comprises:
Characteristic extracting module, for extracting the setting feature of chain in appointed website;
Interior chain sort module, in the setting machine learning model of the input of the setting feature of described interior chain having been trained, classifies to described interior chain;
Interior chain filtering module, for according to classification results, filters described interior chain.
7. device according to claim 6, is characterized in that, also comprises:
Rule-based filtering module, for after filtering described interior chain, carries out rule-based filtering to chain in the low-quality filtered out, and the similarity filtered out in described low-quality in chain between chain entity and source Pages entity exceedes the interior chain setting threshold value.
8. the device according to claim 6 or 7, it is characterized in that, described setting feature comprises: the average daily visit capacity feature of proper name recognition feature, the interior chain page, the tfidf feature of interior chain text, the characteristic of division of interior chain text, the link URL authentication feature of interior chain and entity similarity feature.
9. device according to claim 8, is characterized in that, described setting machine learning model comprises Random Forest model or SVM model.
10. device according to claim 9, is characterized in that, described setting machine learning model is Random Forest model;
Described interior chain sort module specifically for:
The setting feature of each interior chain is inputted in the Random Forest model of having trained respectively, obtains the classification results of this interior chain.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510633911.2A CN105183894B (en) | 2015-09-29 | 2015-09-29 | Method and device for filtering website internal links |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510633911.2A CN105183894B (en) | 2015-09-29 | 2015-09-29 | Method and device for filtering website internal links |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105183894A true CN105183894A (en) | 2015-12-23 |
CN105183894B CN105183894B (en) | 2020-03-10 |
Family
ID=54905975
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510633911.2A Active CN105183894B (en) | 2015-09-29 | 2015-09-29 | Method and device for filtering website internal links |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105183894B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110138794A (en) * | 2019-05-22 | 2019-08-16 | 杭州安恒信息技术股份有限公司 | A kind of counterfeit website identification method, device, equipment and readable storage medium storing program for executing |
CN113919347A (en) * | 2021-12-14 | 2022-01-11 | 山东捷瑞数字科技股份有限公司 | Method and device for extracting and matching internal link words of text data |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102004764A (en) * | 2010-11-04 | 2011-04-06 | 中国科学院计算机网络信息中心 | Internet bad information detection method and system |
WO2012083892A1 (en) * | 2010-12-24 | 2012-06-28 | 北大方正集团有限公司 | Method and device for filtering harmful information |
CN102654875A (en) * | 2011-03-04 | 2012-09-05 | 北京百度网讯科技有限公司 | Method and device for automatically processing inner link of web text |
CN103116638A (en) * | 2013-02-19 | 2013-05-22 | 人民搜索网络股份公司 | Webpage screening method and device thereof |
-
2015
- 2015-09-29 CN CN201510633911.2A patent/CN105183894B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102004764A (en) * | 2010-11-04 | 2011-04-06 | 中国科学院计算机网络信息中心 | Internet bad information detection method and system |
WO2012083892A1 (en) * | 2010-12-24 | 2012-06-28 | 北大方正集团有限公司 | Method and device for filtering harmful information |
CN102654875A (en) * | 2011-03-04 | 2012-09-05 | 北京百度网讯科技有限公司 | Method and device for automatically processing inner link of web text |
CN103116638A (en) * | 2013-02-19 | 2013-05-22 | 人民搜索网络股份公司 | Webpage screening method and device thereof |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110138794A (en) * | 2019-05-22 | 2019-08-16 | 杭州安恒信息技术股份有限公司 | A kind of counterfeit website identification method, device, equipment and readable storage medium storing program for executing |
CN113919347A (en) * | 2021-12-14 | 2022-01-11 | 山东捷瑞数字科技股份有限公司 | Method and device for extracting and matching internal link words of text data |
CN113919347B (en) * | 2021-12-14 | 2022-04-05 | 山东捷瑞数字科技股份有限公司 | Method and device for extracting and matching internal link words of text data |
Also Published As
Publication number | Publication date |
---|---|
CN105183894B (en) | 2020-03-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103914478B (en) | Webpage training method and system, webpage Forecasting Methodology and system | |
WO2017167067A1 (en) | Method and device for webpage text classification, method and device for webpage text recognition | |
CN102193936B (en) | Data classification method and device | |
US20170185680A1 (en) | Chinese website classification method and system based on characteristic analysis of website homepage | |
TWI689825B (en) | Method and device for obtaining document quality index | |
CN108984518A (en) | A kind of file classification method towards judgement document | |
US20180357302A1 (en) | Method and device for processing a topic | |
US20220147023A1 (en) | Method and device for identifying industry classification of enterprise and particular pollutants of enterprise | |
CN106250513A (en) | A kind of event personalization sorting technique based on event modeling and system | |
CN102495892A (en) | Webpage information extraction method | |
CN102411563A (en) | Method, device and system for identifying target words | |
CN104239485A (en) | Statistical machine learning-based internet hidden link detection method | |
CN107506472B (en) | Method for classifying browsed webpages of students | |
CN108319672B (en) | Mobile terminal bad information filtering method and system based on cloud computing | |
CN103886077B (en) | Short text clustering method and system | |
CN105447161A (en) | Data feature based intelligent information classification method | |
CN111767725A (en) | Data processing method and device based on emotion polarity analysis model | |
CN106021578A (en) | Improved text classification algorithm based on integration of cluster and membership degree | |
CN104142960A (en) | Internet data analysis system | |
CN112699232A (en) | Text label extraction method, device, equipment and storage medium | |
Dong et al. | An adult image detection algorithm based on Bag-of-Visual-Words and text information | |
CN104794209B (en) | Chinese microblogging mood sorting technique based on Markov logical network and system | |
CN103020286A (en) | Internet ranking list grasping system based on ranking website | |
Mahmoudi et al. | Web spam detection based on discriminative content and link features | |
CN110516710A (en) | Web page classification method, device, computer installation and computer readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |