CN106339459A - Method for pre-classifying Chinese webpages based on keyword matching - Google Patents

Method for pre-classifying Chinese webpages based on keyword matching Download PDF

Info

Publication number
CN106339459A
CN106339459A CN201610741134.8A CN201610741134A CN106339459A CN 106339459 A CN106339459 A CN 106339459A CN 201610741134 A CN201610741134 A CN 201610741134A CN 106339459 A CN106339459 A CN 106339459A
Authority
CN
China
Prior art keywords
key word
key
value
tec
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610741134.8A
Other languages
Chinese (zh)
Other versions
CN106339459B (en
Inventor
张云
冯多
木伟民
王伟平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN201610741134.8A priority Critical patent/CN106339459B/en
Publication of CN106339459A publication Critical patent/CN106339459A/en
Application granted granted Critical
Publication of CN106339459B publication Critical patent/CN106339459B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Abstract

The invention relates to a method for pre-classifying Chinese webpages based on keyword matching. The method comprises the following steps: in a process of making a training set needed by a classifying algorithm, annotating keywords representing webpages in the webpages while manually annotating training webpages to generate a keyword table; extracting keywords occurring in the webpages according to the keyword table for each test webpage, and transferring a tag of the training set to the test webpage by performing keyword matching calculation with the training set; if classifying results of the training webpages are not given by a pre-classifying method, performing further classification calculation on the test webpages. By adopting the method, running time of classifying technologies with complicated calculation such as SVM, KNN and naive Bayesian classification is shortened, and meanwhile the accuracy and recall rate of the classifying results are increased.

Description

The method that Chinese web page is presorted is carried out based on Keywords matching
Technical field
The present invention relates to the information processing aspect of computer realm, more particularly, to Chinese network is carried out based on Keywords matching The method that page is presorted.
Background technology
With the high speed development of the Internet, with the information of form web page storage still in explosive growth, therefore webpage is believed Breath is categorized into obtains one of indispensable method of useful information for people.The sorting algorithm of main flow includes at present Svm, knn, three kinds of algorithms of naive Bayesian, the training set needed for wherein svm seldom, to the classifying quality of English webpage also very Outstanding.The classification results accuracy rate of the Chinese Web page classification system with svm technology as core and recall rate are all unable to reach requirement. This is because English has natural separator, and Chinese can only first by Chinese word segmentation machine web page text is made vectorization it Before carry out participle.But Chinese word segmentation machine outstanding more also cannot make participle entirely accurate, this greatly have impact on Chinese web page The effect of classification.
Content of the invention
For the problems referred to above, the present invention proposes a kind of method Chinese web page presorted based on Keywords matching, The method can substantially reduce the run time of main flow sorting technique, improve accuracy rate and the recall rate of classification results simultaneously.
The technical scheme is that
The method that Chinese web page is presorted is carried out based on Keywords matching, comprises the steps:
1) each training class label tag of webpage tr and characterize the other pass of this web page class in mark training set trs Keyword collection kws, generates antistop list kwt;
2) key word comprising in each test webpage te in test set tes is extracted according to kwt, form keyword set tek;
3) calculate each two tuple (the i.e. key word to) tec of tek and travel through training set, kws is comprised the tr of this tec Tag be transferred to the corresponding te of this tec, and this tag is deposited in the tally set tags of this te;
4) label in tags is carried out frequency statistics, take the several label of frequency highest according to demand, as this test The label of presorting of webpage.
On the basis of technique scheme, the present invention can also do following improvement.
Further, step 1) in, also include entering keyword set kws belonging to the other all training webpages of same class Antistop list kwt is generated after row duplicate removal.
Further, the specifically comprising the following steps that of above-mentioned generation antistop list kwt
1 1) newly-built one mapping m, using first training webpage tr kws each key word k as m key, accordingly Initial value is all set to 1;
1 2) to each key word k in the kws of second tr, first determine whether whether contained k in m, if existing, The value of the key-value pair for k for the key is added 1;If not existing,<k, 1>this key-value pair is added in m;
1 3) to remaining tr, repeat step 1-2), until last tr;
1 4) set threshold value s, when the value of key-value pair is less than s, value is set to 0;Otherwise the value of key-value pair is set to 1.
Further, step 2) in calculate the specifically comprising the following steps that of two tuples tec of tek
31 1) for comprising n key word k1, the tek of k2 ... ..., kn, according to following sequential search key word pair, Each key word is to needing to enter into step 3-1-2) judge: the key word comprising k1 is to as<k1, k2>,<k1, k3 >... ...,<k1, kn>, comprise the key word of k2 to for<k2, k3>... ...,<k2, kn>, so until comprising the pass of k (n-1) Keyword is to<k (n-1), kn>;
31 2) if it is 1 that current key word centering at least meets its corresponding value in m, tec is traveled through Training set trs;If being unsatisfactory for, return to step 3-1-1), search next key word pair.
Further, step 2) in tek all two tuples (i.e. key word to) tec at least one key word attach most importance to Want key word, important key word is frequency of occurrence highest key word.
Further, step 2) in also include to test webpage in occur key word carry out frequency statistics, when secondary When the frequency that key word occurs exceedes given threshold, can be allowed to mark becomes important key word, in the pass calculating test webpage Obtain more two tuples during keyword two tuple, improve accuracy rate and the recall rate of classification results.
Wherein, " secondary key word " is that when counting for the first time, frequency of occurrence is not very high key word, but with statistics Increase in fact it could happen that the frequency has change, become the high key word of frequency of occurrence." important key word " be frequency of occurrence High key word, rank is 0, and its secondary and important division is by the way of artificial mark, but is also provided with threshold value, for area Point.Such as 0.2 is threshold value, when being labeled as secondary key word, with the statistics frequency increase above 0.2 after, then be changed into important Key word.Initially by the way of artificial mark.
Further, step 3) in each tec travel through training set trs process specifically include:
If 3-2-1) kws of tr comprises first key word of tec, enter step 3-2-2);Otherwise, calculate under tek One tec simultaneously begins stepping through training set again;
If 3-2-2) kws of tr comprises at least one important key word of tec, the tag of tr is added to the tally set of te In tags.If having contained this tag in tags, using this tag as the key-value pair of key corresponding to value add 1;Otherwise incite somebody to action Tag, 1 > key-value pair is added in tags.
Further, step 4) also include to the test webpage te (Test Network of failure of presorting not comprising key word Page) carry out classified counting.
The invention has the beneficial effects as follows:
1., while the class label of artificial mark training set, provide the key word characterizing the category, then to all Key word carry out frequency statistics, given threshold, key word is divided into important and secondary, obtains antistop list.By key Root is divided into important and secondary two ranks according to frequency of occurrence, can make full use of the frequency information of key word, make test webpage Key word two tuple more can reflect the attribute of webpage itself, improve the accuracy rate of Chinese Web page classification.
2. every test webpage in pair test set, travels through antistop list first, obtains the keyword set that this webpage comprises; Then all two tuples of keyword set are obtained it is desirable at least one key word is important in two tuples;By with this pass Candidate's label of the test webpage that keyword obtains to coupling is more accurate, equally improves the accuracy of Web page classifying result.
3., to each two tuple after, travel through training set, if the keyword set of training webpage contains this two tuple, will The label of this training webpage is added in the tally set of this test webpage;Finally the label in the tally set of test webpage is carried out Frequency statistics, take the several label of frequency highest as needed, as the label of presorting of test webpage.Reasonably give training Webpage multi-tag improves accuracy rate and the recall rate of classification results.
4. due to training set quantity seldom, the time that whole process is consumed is with test set size linear increase.This is big Reduce greatly the run time of Chinese Web page classification, improve accuracy rate and the recall rate of classification results simultaneously.
Brief description
Fig. 1 is the composition structure chart of training set and test set.
Fig. 2 is the flow chart carrying out the method that Chinese web page is presorted based on Keywords matching.
Specific embodiment
Below in conjunction with accompanying drawing, the principle of the present invention and feature are described, example is served only for explaining the present invention, and Non- for limiting the scope of the present invention.
Now a Chinese Web page classification system is achieved based on svm technology.The training set being provided and test set are respectively Trs and tes, as shown in figure 1, wherein trs resolves into several tr, trid refers to the numbering of each tr;Tes resolves into some Individual te, teid then refer to the numbering of each te, and tecs refers to the collection of two tuples tec;Ket is antistop list, including key word Kw and its value of corresponding key-value pair.
The method that Chinese web page is presorted, namely step tes presorted by trs are carried out based on Keywords matching Rapid as shown in Fig. 2 specific as follows:
Step 1: to each training webpage tr in training set trs, mark its class label tag, and characterize this net Keyword set kws of page classification, repeat step 1 terminates to all training sets mark;
Step 2: the kws of all training webpages is carried out duplicate removal, is stored in antistop list kwt, enters step 3;
Step 3: to each test webpage te in test set tes, travel through kwt, search the key word comprising in te, group Become keyword set tek.If tek is not empty, enter step 4;If tek is sky, enter step 8;
Step 4: calculate first two tuple tec of tek, i.e. key word pair, enter step 5;
Step 5: to tec, travel through training set trs, if the kws of tr contains tec, the tag of this tr is transferred to this te, It is deposited in the tally set tags of this te, label occurrence number is counted simultaneously.Enter step 6;
Step 6: repeat step 4,5, until last two tuple of tek;Enter step 7;
Step 7: the label in tags is carried out descending by occurrence number, take tags top n (n be integer, permissible Take one or more as needed), as the label of presorting of this te, enter step 8;
Step 8: repeat step 3 to step 7, until the last item te presorts end;Chinese for failure of presorting Webpage, enters into the classified counting stage, calculates and completes the classification to training set after terminating, otherwise directly terminates to presort.
The kws by all training webpages described in step 2 carries out duplicate removal, is stored in specifically comprising the following steps that of antistop list kwt
Step 2.1: a newly-built mapping m, using first key training kws each key word k of webpage tr as m, phase The value answered all is set to 1;
Step 2.2: to each key word k in the kws of second tr, first determine whether whether contained k in m, if Exist, the value of the key-value pair for k for the key is added 1;If not existing,<k, 1>this key-value pair is added in m;
Step 2.3: to remaining tr, repeat step 2.2, up to last tr;
Step 2.4: set threshold value s, when the value of key-value pair is less than s, value is set to 0;Otherwise by the value of key-value pair It is set to 1;
Two tuples tec of the calculating tek described in step 4, that is, key word is to specifically comprising the following steps that
A n key word k1 is had in step 4.1:tek, k2 ... ..., kn, according to following sequential search key word pair, often One key word judges to all needing to enter into step 4.2: the key word comprising k1 to as<k1, k2>,<k1, k3>..., <k1, kn>, comprises the key word of k2 to for<k2, k3>... ...,<k2, kn>, so until comprising the key word of k (n-1) to<k (n-1),kn>;
Step 4.2: if it is 1 that current key word centering at least meets its corresponding value in m, enter into step Rapid 5;If being unsatisfactory for, returning to step 4.1, searching next key word pair.
If the kws of the tr described in step 5 contains tec, the tag of this tr is transferred to this te, is deposited into the mark of this te Sign in collection tags, simultaneously to specifically comprising the following steps that label occurrence number is counted
Step 5.1: if the kws of tr comprises first key word of tec, enter step 5.2;Otherwise, enter step 6;
Step 5.2: if the kws of tr comprises second key word of tec, the tag of tr is added to the tally set tags of te In.If having contained this tag in tags, using this tag as the key-value pair of key corresponding to value add 1;Otherwise by<tag, 1> Key-value pair is added in tags;
Wherein step 5.2 only needs te is to be carried out by the key word out to artificial mark the reason providing two key words Count, set suitable threshold value, obtained important key word, if there being at least one important key word to occur in tr in te In then it is assumed that the label of tr can be transferred to te.
Embodiment
Now with 7 classifications totally 200 training sets, illustrate as a example 1000 test sets.
To in training set, each trains one label of webpage label, and wherein can be characterized 3 key words of its classification (as keyword set) mark out, all stores in internal memory.
The threshold value dividing important key word is set to 0.2, obtain containing after calculating through the frequency 30 important Key word and the antistop list of 40 secondary key words.
Webpage is tested to each in test set, travels through antistop list first, find out the key word wherein comprising, average feelings Condition is 3 (not including the test webpage not comprising key word, this part webpage is presorted unsuccessfully), therefore be up to 3 satisfactions Key word two tuple of condition.
To each two tuple, travel through training set, the label containing the training webpage of this two tuple in keyword set is added Enter in the tally set of this test webpage.Finally tally set is sorted according to the frequency, take the first two label that occurrence number is most Label as this test webpage.
Of the present invention the method that Chinese web page presorts carried out based on Keywords matching real Chinese web page is carried out Test, for last classification results compare the Chinese Web page classification result do not presorted, accuracy rate and recall rate are divided At least do not improve 10%, 15%, make the classifying quality of whole Chinese Web page classification system reach desired value.Specific formula As follows:
The all related total number of files of associated documents/system that recall rate (recall)=system retrieval arrives
Accuracy rate (precision)=system retrieval to all total number of files retrieving of associated documents/system
Original method: recall=25%, precision=18%;
Context of methods: recall=40%, precision=28%.
The foregoing is only presently preferred embodiments of the present invention, not in order to limit the present invention, all spirit in the present invention and Within principle, any modification, equivalent substitution and improvement made etc., should be included within the scope of the present invention.

Claims (8)

1. the method that Chinese web page is presorted is carried out based on Keywords matching, comprise the steps:
1) each training class label tag of webpage tr and characterize the other key word of this web page class in mark training set trs Collection kws, generates antistop list kwt;
2) key word comprising in each test webpage te in test set tes is extracted according to kwt, form keyword set tek;
3) calculate each two tuple tec of tek and travel through training set, the tag of the tr that kws is comprised this tec is transferred to this tec Corresponding te, and this tag is deposited in the tally set tags of this te;
4) label in tags is carried out frequency statistics, take the several label of frequency highest according to demand, as this test webpage Label of presorting.
2. as claimed in claim 1 method that Chinese web page presorts is carried out based on Keywords matching it is characterised in that step 1), in, also include carrying out generating antistop list after duplicate removal by keyword set kws belonging to the other all training webpages of same class kwt.
3. as claimed in claim 1 method that Chinese web page presorts is carried out it is characterised in that generating based on Keywords matching Antistop list kwt specifically comprises the following steps that
1 1) newly-built one mapping m, using first training webpage tr kws each key word k as m key, accordingly initially Value is all set to 1;
1 2) to each key word k in the kws of second tr, first determine whether whether contained k in m, if existing, by key The value of the key-value pair for k adds 1;If not existing,<k, 1>this key-value pair is added in m;
1 3) to remaining tr, repeat step 1-2), until last tr;
1 4) set threshold value s, when the value of key-value pair is less than s, value is set to 0;Otherwise the value of key-value pair is set to 1.
4. as claimed in claim 3 method that Chinese web page presorts is carried out based on Keywords matching it is characterised in that step 2) in, two tuples tec of calculating tek specifically comprises the following steps that
31 1) for comprising n key word k1, the tek of k2 ... ..., kn, according to following sequential search key word pair, each Individual key word is to all needing to enter into step 3-1-2) judge: the key word comprising k1 is to as<k1, k2>,<k1, k3 >... ...,<k1, kn>, comprise the key word of k2 to for<k2, k3>... ...,<k2, kn>, so until comprising the pass of k (n-1) Keyword is to<k (n-1), kn>;
31 2) if it is 1 that current key word centering at least meets its corresponding value in m, training is traveled through to tec Collection trs;If being unsatisfactory for, return to step 3-1-1), search next key word pair.
5. as claimed in claim 1 method that Chinese web page presorts is carried out based on Keywords matching it is characterised in that step 2) in all two tuples tec of tek in, at least one key word is important key word, and described important key word is frequency of occurrence Highest key word.
6. as claimed in claim 5 method that Chinese web page presorts is carried out based on Keywords matching it is characterised in that step 2) also include in carrying out frequency statistics to the key word occurring in test webpage, when the frequency that secondary key word occurs exceedes setting During threshold value, then it is labeled as important key word.
7. as claimed in claim 1 method that Chinese web page presorts is carried out based on Keywords matching it is characterised in that step 3) in, the process of each tec traversal training set trs specifically includes:
If 3-2-1) kws of tr comprises first key word of tec, enter step 3-2-2);Otherwise, calculate the next one of tek Tec simultaneously begins stepping through training set again;
If 3-2-2) kws of tr comprises at least one important key word of tec, described important key word is frequency of occurrence highest Key word, the tag of tr is added in the tally set tags of te, if having contained this tag in tags, using this tag as Value corresponding to the key-value pair of key adds 1;Otherwise by<tag, 1>key-value pair is added in tags.
8. as claimed in claim 1 method that Chinese web page presorts is carried out based on Keywords matching it is characterised in that step 4) also include carrying out classified counting to the test webpage te not comprising key word.
CN201610741134.8A 2016-08-26 2016-08-26 The method that Chinese web page is presorted is carried out based on Keywords matching Active CN106339459B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610741134.8A CN106339459B (en) 2016-08-26 2016-08-26 The method that Chinese web page is presorted is carried out based on Keywords matching

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610741134.8A CN106339459B (en) 2016-08-26 2016-08-26 The method that Chinese web page is presorted is carried out based on Keywords matching

Publications (2)

Publication Number Publication Date
CN106339459A true CN106339459A (en) 2017-01-18
CN106339459B CN106339459B (en) 2019-11-26

Family

ID=57822407

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610741134.8A Active CN106339459B (en) 2016-08-26 2016-08-26 The method that Chinese web page is presorted is carried out based on Keywords matching

Country Status (1)

Country Link
CN (1) CN106339459B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107506472A (en) * 2017-09-05 2017-12-22 淮阴工学院 A kind of student browses Web page classification method
CN107545020A (en) * 2017-05-10 2018-01-05 新华三信息安全技术有限公司 A kind of determination method and device of Web page classifying
CN108874996A (en) * 2018-06-13 2018-11-23 北京知道创宇信息技术有限公司 website classification method and device
CN113377467A (en) * 2021-06-29 2021-09-10 中国平安财产保险股份有限公司 Information decoupling method and device, server and storage medium
CN113934848A (en) * 2021-10-22 2022-01-14 马上消费金融股份有限公司 Data classification method and device and electronic equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050114348A1 (en) * 1995-12-14 2005-05-26 Wesinger Ralph E.Jr. Method and apparatus for classifying a search by keyword
US20060282416A1 (en) * 2005-04-29 2006-12-14 William Gross Search apparatus and method for providing a collapsed search
US20090119276A1 (en) * 2007-11-01 2009-05-07 Antoine Sorel Neron Method and Internet-based Search Engine System for Storing, Sorting, and Displaying Search Results
CN101593200A (en) * 2009-06-19 2009-12-02 淮海工学院 Chinese Web page classification method based on the keyword frequency analysis
CN101814083A (en) * 2010-01-08 2010-08-25 上海复歌信息科技有限公司 Automatic webpage classification method and system
CN104424308A (en) * 2013-09-04 2015-03-18 中兴通讯股份有限公司 Web page classification standard acquisition method and device and web page classification method and device
CN105512143A (en) * 2014-09-26 2016-04-20 中兴通讯股份有限公司 Method and device for web page classification

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050114348A1 (en) * 1995-12-14 2005-05-26 Wesinger Ralph E.Jr. Method and apparatus for classifying a search by keyword
US20060282416A1 (en) * 2005-04-29 2006-12-14 William Gross Search apparatus and method for providing a collapsed search
US20090119276A1 (en) * 2007-11-01 2009-05-07 Antoine Sorel Neron Method and Internet-based Search Engine System for Storing, Sorting, and Displaying Search Results
CN101593200A (en) * 2009-06-19 2009-12-02 淮海工学院 Chinese Web page classification method based on the keyword frequency analysis
CN101814083A (en) * 2010-01-08 2010-08-25 上海复歌信息科技有限公司 Automatic webpage classification method and system
CN104424308A (en) * 2013-09-04 2015-03-18 中兴通讯股份有限公司 Web page classification standard acquisition method and device and web page classification method and device
CN105512143A (en) * 2014-09-26 2016-04-20 中兴通讯股份有限公司 Method and device for web page classification

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107545020A (en) * 2017-05-10 2018-01-05 新华三信息安全技术有限公司 A kind of determination method and device of Web page classifying
CN107506472A (en) * 2017-09-05 2017-12-22 淮阴工学院 A kind of student browses Web page classification method
CN107506472B (en) * 2017-09-05 2020-09-08 淮阴工学院 Method for classifying browsed webpages of students
CN108874996A (en) * 2018-06-13 2018-11-23 北京知道创宇信息技术有限公司 website classification method and device
CN113377467A (en) * 2021-06-29 2021-09-10 中国平安财产保险股份有限公司 Information decoupling method and device, server and storage medium
CN113377467B (en) * 2021-06-29 2022-04-01 中国平安财产保险股份有限公司 Information decoupling method and device, server and storage medium
CN113934848A (en) * 2021-10-22 2022-01-14 马上消费金融股份有限公司 Data classification method and device and electronic equipment
CN113934848B (en) * 2021-10-22 2023-04-07 马上消费金融股份有限公司 Data classification method and device and electronic equipment

Also Published As

Publication number Publication date
CN106339459B (en) 2019-11-26

Similar Documents

Publication Publication Date Title
CN109492077B (en) Knowledge graph-based petrochemical field question-answering method and system
CN101430695B (en) System and method for computing difference affinities of word
CN106294593B (en) In conjunction with the Relation extraction method of subordinate clause grade remote supervisory and semi-supervised integrated study
CN106339459B (en) The method that Chinese web page is presorted is carried out based on Keywords matching
US8341159B2 (en) Creating taxonomies and training data for document categorization
CN106599054B (en) Method and system for classifying and pushing questions
KR100756921B1 (en) Method of classifying documents, computer readable record medium on which program for executing the method is recorded
CN108415902A (en) A kind of name entity link method based on search engine
CN104881458B (en) A kind of mask method and device of Web page subject
CN110188197B (en) Active learning method and device for labeling platform
CN107992633A (en) Electronic document automatic classification method and system based on keyword feature
CN103617157A (en) Text similarity calculation method based on semantics
CN108763321A (en) A kind of related entities recommendation method based on extensive related entities network
CN110688474B (en) Embedded representation obtaining and citation recommending method based on deep learning and link prediction
CN105159932A (en) Data retrieving and sorting system and method
CN103559191A (en) Cross-media sorting method based on hidden space learning and two-way sorting learning
CN104484380A (en) Personalized search method and personalized search device
CN105653562A (en) Calculation method and apparatus for correlation between text content and query request
CN105512333A (en) Product comment theme searching method based on emotional tendency
CN101916263A (en) Fuzzy keyword query method and system based on weighing edit distance
CN110134799B (en) BM25 algorithm-based text corpus construction and optimization method
CN103778206A (en) Method for providing network service resources
CN106886512A (en) Article sorting technique and device
CN110990676A (en) Social media hotspot topic extraction method and system
CN103761286B (en) A kind of Service Source search method based on user interest

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant