CN106339459A - Method for pre-classifying Chinese webpages based on keyword matching - Google Patents
Method for pre-classifying Chinese webpages based on keyword matching Download PDFInfo
- Publication number
- CN106339459A CN106339459A CN201610741134.8A CN201610741134A CN106339459A CN 106339459 A CN106339459 A CN 106339459A CN 201610741134 A CN201610741134 A CN 201610741134A CN 106339459 A CN106339459 A CN 106339459A
- Authority
- CN
- China
- Prior art keywords
- key word
- key
- value
- tec
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/955—Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
Abstract
The invention relates to a method for pre-classifying Chinese webpages based on keyword matching. The method comprises the following steps: in a process of making a training set needed by a classifying algorithm, annotating keywords representing webpages in the webpages while manually annotating training webpages to generate a keyword table; extracting keywords occurring in the webpages according to the keyword table for each test webpage, and transferring a tag of the training set to the test webpage by performing keyword matching calculation with the training set; if classifying results of the training webpages are not given by a pre-classifying method, performing further classification calculation on the test webpages. By adopting the method, running time of classifying technologies with complicated calculation such as SVM, KNN and naive Bayesian classification is shortened, and meanwhile the accuracy and recall rate of the classifying results are increased.
Description
Technical field
The present invention relates to the information processing aspect of computer realm, more particularly, to Chinese network is carried out based on Keywords matching
The method that page is presorted.
Background technology
With the high speed development of the Internet, with the information of form web page storage still in explosive growth, therefore webpage is believed
Breath is categorized into obtains one of indispensable method of useful information for people.The sorting algorithm of main flow includes at present
Svm, knn, three kinds of algorithms of naive Bayesian, the training set needed for wherein svm seldom, to the classifying quality of English webpage also very
Outstanding.The classification results accuracy rate of the Chinese Web page classification system with svm technology as core and recall rate are all unable to reach requirement.
This is because English has natural separator, and Chinese can only first by Chinese word segmentation machine web page text is made vectorization it
Before carry out participle.But Chinese word segmentation machine outstanding more also cannot make participle entirely accurate, this greatly have impact on Chinese web page
The effect of classification.
Content of the invention
For the problems referred to above, the present invention proposes a kind of method Chinese web page presorted based on Keywords matching,
The method can substantially reduce the run time of main flow sorting technique, improve accuracy rate and the recall rate of classification results simultaneously.
The technical scheme is that
The method that Chinese web page is presorted is carried out based on Keywords matching, comprises the steps:
1) each training class label tag of webpage tr and characterize the other pass of this web page class in mark training set trs
Keyword collection kws, generates antistop list kwt;
2) key word comprising in each test webpage te in test set tes is extracted according to kwt, form keyword set
tek;
3) calculate each two tuple (the i.e. key word to) tec of tek and travel through training set, kws is comprised the tr of this tec
Tag be transferred to the corresponding te of this tec, and this tag is deposited in the tally set tags of this te;
4) label in tags is carried out frequency statistics, take the several label of frequency highest according to demand, as this test
The label of presorting of webpage.
On the basis of technique scheme, the present invention can also do following improvement.
Further, step 1) in, also include entering keyword set kws belonging to the other all training webpages of same class
Antistop list kwt is generated after row duplicate removal.
Further, the specifically comprising the following steps that of above-mentioned generation antistop list kwt
1 1) newly-built one mapping m, using first training webpage tr kws each key word k as m key, accordingly
Initial value is all set to 1;
1 2) to each key word k in the kws of second tr, first determine whether whether contained k in m, if existing,
The value of the key-value pair for k for the key is added 1;If not existing,<k, 1>this key-value pair is added in m;
1 3) to remaining tr, repeat step 1-2), until last tr;
1 4) set threshold value s, when the value of key-value pair is less than s, value is set to 0;Otherwise the value of key-value pair is set to
1.
Further, step 2) in calculate the specifically comprising the following steps that of two tuples tec of tek
31 1) for comprising n key word k1, the tek of k2 ... ..., kn, according to following sequential search key word pair,
Each key word is to needing to enter into step 3-1-2) judge: the key word comprising k1 is to as<k1, k2>,<k1, k3
>... ...,<k1, kn>, comprise the key word of k2 to for<k2, k3>... ...,<k2, kn>, so until comprising the pass of k (n-1)
Keyword is to<k (n-1), kn>;
31 2) if it is 1 that current key word centering at least meets its corresponding value in m, tec is traveled through
Training set trs;If being unsatisfactory for, return to step 3-1-1), search next key word pair.
Further, step 2) in tek all two tuples (i.e. key word to) tec at least one key word attach most importance to
Want key word, important key word is frequency of occurrence highest key word.
Further, step 2) in also include to test webpage in occur key word carry out frequency statistics, when secondary
When the frequency that key word occurs exceedes given threshold, can be allowed to mark becomes important key word, in the pass calculating test webpage
Obtain more two tuples during keyword two tuple, improve accuracy rate and the recall rate of classification results.
Wherein, " secondary key word " is that when counting for the first time, frequency of occurrence is not very high key word, but with statistics
Increase in fact it could happen that the frequency has change, become the high key word of frequency of occurrence." important key word " be frequency of occurrence
High key word, rank is 0, and its secondary and important division is by the way of artificial mark, but is also provided with threshold value, for area
Point.Such as 0.2 is threshold value, when being labeled as secondary key word, with the statistics frequency increase above 0.2 after, then be changed into important
Key word.Initially by the way of artificial mark.
Further, step 3) in each tec travel through training set trs process specifically include:
If 3-2-1) kws of tr comprises first key word of tec, enter step 3-2-2);Otherwise, calculate under tek
One tec simultaneously begins stepping through training set again;
If 3-2-2) kws of tr comprises at least one important key word of tec, the tag of tr is added to the tally set of te
In tags.If having contained this tag in tags, using this tag as the key-value pair of key corresponding to value add 1;Otherwise incite somebody to action
Tag, 1 > key-value pair is added in tags.
Further, step 4) also include to the test webpage te (Test Network of failure of presorting not comprising key word
Page) carry out classified counting.
The invention has the beneficial effects as follows:
1., while the class label of artificial mark training set, provide the key word characterizing the category, then to all
Key word carry out frequency statistics, given threshold, key word is divided into important and secondary, obtains antistop list.By key
Root is divided into important and secondary two ranks according to frequency of occurrence, can make full use of the frequency information of key word, make test webpage
Key word two tuple more can reflect the attribute of webpage itself, improve the accuracy rate of Chinese Web page classification.
2. every test webpage in pair test set, travels through antistop list first, obtains the keyword set that this webpage comprises;
Then all two tuples of keyword set are obtained it is desirable at least one key word is important in two tuples;By with this pass
Candidate's label of the test webpage that keyword obtains to coupling is more accurate, equally improves the accuracy of Web page classifying result.
3., to each two tuple after, travel through training set, if the keyword set of training webpage contains this two tuple, will
The label of this training webpage is added in the tally set of this test webpage;Finally the label in the tally set of test webpage is carried out
Frequency statistics, take the several label of frequency highest as needed, as the label of presorting of test webpage.Reasonably give training
Webpage multi-tag improves accuracy rate and the recall rate of classification results.
4. due to training set quantity seldom, the time that whole process is consumed is with test set size linear increase.This is big
Reduce greatly the run time of Chinese Web page classification, improve accuracy rate and the recall rate of classification results simultaneously.
Brief description
Fig. 1 is the composition structure chart of training set and test set.
Fig. 2 is the flow chart carrying out the method that Chinese web page is presorted based on Keywords matching.
Specific embodiment
Below in conjunction with accompanying drawing, the principle of the present invention and feature are described, example is served only for explaining the present invention, and
Non- for limiting the scope of the present invention.
Now a Chinese Web page classification system is achieved based on svm technology.The training set being provided and test set are respectively
Trs and tes, as shown in figure 1, wherein trs resolves into several tr, trid refers to the numbering of each tr;Tes resolves into some
Individual te, teid then refer to the numbering of each te, and tecs refers to the collection of two tuples tec;Ket is antistop list, including key word
Kw and its value of corresponding key-value pair.
The method that Chinese web page is presorted, namely step tes presorted by trs are carried out based on Keywords matching
Rapid as shown in Fig. 2 specific as follows:
Step 1: to each training webpage tr in training set trs, mark its class label tag, and characterize this net
Keyword set kws of page classification, repeat step 1 terminates to all training sets mark;
Step 2: the kws of all training webpages is carried out duplicate removal, is stored in antistop list kwt, enters step 3;
Step 3: to each test webpage te in test set tes, travel through kwt, search the key word comprising in te, group
Become keyword set tek.If tek is not empty, enter step 4;If tek is sky, enter step 8;
Step 4: calculate first two tuple tec of tek, i.e. key word pair, enter step 5;
Step 5: to tec, travel through training set trs, if the kws of tr contains tec, the tag of this tr is transferred to this te,
It is deposited in the tally set tags of this te, label occurrence number is counted simultaneously.Enter step 6;
Step 6: repeat step 4,5, until last two tuple of tek;Enter step 7;
Step 7: the label in tags is carried out descending by occurrence number, take tags top n (n be integer, permissible
Take one or more as needed), as the label of presorting of this te, enter step 8;
Step 8: repeat step 3 to step 7, until the last item te presorts end;Chinese for failure of presorting
Webpage, enters into the classified counting stage, calculates and completes the classification to training set after terminating, otherwise directly terminates to presort.
The kws by all training webpages described in step 2 carries out duplicate removal, is stored in specifically comprising the following steps that of antistop list kwt
Step 2.1: a newly-built mapping m, using first key training kws each key word k of webpage tr as m, phase
The value answered all is set to 1;
Step 2.2: to each key word k in the kws of second tr, first determine whether whether contained k in m, if
Exist, the value of the key-value pair for k for the key is added 1;If not existing,<k, 1>this key-value pair is added in m;
Step 2.3: to remaining tr, repeat step 2.2, up to last tr;
Step 2.4: set threshold value s, when the value of key-value pair is less than s, value is set to 0;Otherwise by the value of key-value pair
It is set to 1;
Two tuples tec of the calculating tek described in step 4, that is, key word is to specifically comprising the following steps that
A n key word k1 is had in step 4.1:tek, k2 ... ..., kn, according to following sequential search key word pair, often
One key word judges to all needing to enter into step 4.2: the key word comprising k1 to as<k1, k2>,<k1, k3>...,
<k1, kn>, comprises the key word of k2 to for<k2, k3>... ...,<k2, kn>, so until comprising the key word of k (n-1) to<k
(n-1),kn>;
Step 4.2: if it is 1 that current key word centering at least meets its corresponding value in m, enter into step
Rapid 5;If being unsatisfactory for, returning to step 4.1, searching next key word pair.
If the kws of the tr described in step 5 contains tec, the tag of this tr is transferred to this te, is deposited into the mark of this te
Sign in collection tags, simultaneously to specifically comprising the following steps that label occurrence number is counted
Step 5.1: if the kws of tr comprises first key word of tec, enter step 5.2;Otherwise, enter step 6;
Step 5.2: if the kws of tr comprises second key word of tec, the tag of tr is added to the tally set tags of te
In.If having contained this tag in tags, using this tag as the key-value pair of key corresponding to value add 1;Otherwise by<tag, 1>
Key-value pair is added in tags;
Wherein step 5.2 only needs te is to be carried out by the key word out to artificial mark the reason providing two key words
Count, set suitable threshold value, obtained important key word, if there being at least one important key word to occur in tr in te
In then it is assumed that the label of tr can be transferred to te.
Embodiment
Now with 7 classifications totally 200 training sets, illustrate as a example 1000 test sets.
To in training set, each trains one label of webpage label, and wherein can be characterized 3 key words of its classification
(as keyword set) mark out, all stores in internal memory.
The threshold value dividing important key word is set to 0.2, obtain containing after calculating through the frequency 30 important
Key word and the antistop list of 40 secondary key words.
Webpage is tested to each in test set, travels through antistop list first, find out the key word wherein comprising, average feelings
Condition is 3 (not including the test webpage not comprising key word, this part webpage is presorted unsuccessfully), therefore be up to 3 satisfactions
Key word two tuple of condition.
To each two tuple, travel through training set, the label containing the training webpage of this two tuple in keyword set is added
Enter in the tally set of this test webpage.Finally tally set is sorted according to the frequency, take the first two label that occurrence number is most
Label as this test webpage.
Of the present invention the method that Chinese web page presorts carried out based on Keywords matching real Chinese web page is carried out
Test, for last classification results compare the Chinese Web page classification result do not presorted, accuracy rate and recall rate are divided
At least do not improve 10%, 15%, make the classifying quality of whole Chinese Web page classification system reach desired value.Specific formula
As follows:
The all related total number of files of associated documents/system that recall rate (recall)=system retrieval arrives
Accuracy rate (precision)=system retrieval to all total number of files retrieving of associated documents/system
Original method: recall=25%, precision=18%;
Context of methods: recall=40%, precision=28%.
The foregoing is only presently preferred embodiments of the present invention, not in order to limit the present invention, all spirit in the present invention and
Within principle, any modification, equivalent substitution and improvement made etc., should be included within the scope of the present invention.
Claims (8)
1. the method that Chinese web page is presorted is carried out based on Keywords matching, comprise the steps:
1) each training class label tag of webpage tr and characterize the other key word of this web page class in mark training set trs
Collection kws, generates antistop list kwt;
2) key word comprising in each test webpage te in test set tes is extracted according to kwt, form keyword set tek;
3) calculate each two tuple tec of tek and travel through training set, the tag of the tr that kws is comprised this tec is transferred to this tec
Corresponding te, and this tag is deposited in the tally set tags of this te;
4) label in tags is carried out frequency statistics, take the several label of frequency highest according to demand, as this test webpage
Label of presorting.
2. as claimed in claim 1 method that Chinese web page presorts is carried out based on Keywords matching it is characterised in that step
1), in, also include carrying out generating antistop list after duplicate removal by keyword set kws belonging to the other all training webpages of same class
kwt.
3. as claimed in claim 1 method that Chinese web page presorts is carried out it is characterised in that generating based on Keywords matching
Antistop list kwt specifically comprises the following steps that
1 1) newly-built one mapping m, using first training webpage tr kws each key word k as m key, accordingly initially
Value is all set to 1;
1 2) to each key word k in the kws of second tr, first determine whether whether contained k in m, if existing, by key
The value of the key-value pair for k adds 1;If not existing,<k, 1>this key-value pair is added in m;
1 3) to remaining tr, repeat step 1-2), until last tr;
1 4) set threshold value s, when the value of key-value pair is less than s, value is set to 0;Otherwise the value of key-value pair is set to 1.
4. as claimed in claim 3 method that Chinese web page presorts is carried out based on Keywords matching it is characterised in that step
2) in, two tuples tec of calculating tek specifically comprises the following steps that
31 1) for comprising n key word k1, the tek of k2 ... ..., kn, according to following sequential search key word pair, each
Individual key word is to all needing to enter into step 3-1-2) judge: the key word comprising k1 is to as<k1, k2>,<k1, k3
>... ...,<k1, kn>, comprise the key word of k2 to for<k2, k3>... ...,<k2, kn>, so until comprising the pass of k (n-1)
Keyword is to<k (n-1), kn>;
31 2) if it is 1 that current key word centering at least meets its corresponding value in m, training is traveled through to tec
Collection trs;If being unsatisfactory for, return to step 3-1-1), search next key word pair.
5. as claimed in claim 1 method that Chinese web page presorts is carried out based on Keywords matching it is characterised in that step
2) in all two tuples tec of tek in, at least one key word is important key word, and described important key word is frequency of occurrence
Highest key word.
6. as claimed in claim 5 method that Chinese web page presorts is carried out based on Keywords matching it is characterised in that step
2) also include in carrying out frequency statistics to the key word occurring in test webpage, when the frequency that secondary key word occurs exceedes setting
During threshold value, then it is labeled as important key word.
7. as claimed in claim 1 method that Chinese web page presorts is carried out based on Keywords matching it is characterised in that step
3) in, the process of each tec traversal training set trs specifically includes:
If 3-2-1) kws of tr comprises first key word of tec, enter step 3-2-2);Otherwise, calculate the next one of tek
Tec simultaneously begins stepping through training set again;
If 3-2-2) kws of tr comprises at least one important key word of tec, described important key word is frequency of occurrence highest
Key word, the tag of tr is added in the tally set tags of te, if having contained this tag in tags, using this tag as
Value corresponding to the key-value pair of key adds 1;Otherwise by<tag, 1>key-value pair is added in tags.
8. as claimed in claim 1 method that Chinese web page presorts is carried out based on Keywords matching it is characterised in that step
4) also include carrying out classified counting to the test webpage te not comprising key word.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610741134.8A CN106339459B (en) | 2016-08-26 | 2016-08-26 | The method that Chinese web page is presorted is carried out based on Keywords matching |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610741134.8A CN106339459B (en) | 2016-08-26 | 2016-08-26 | The method that Chinese web page is presorted is carried out based on Keywords matching |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106339459A true CN106339459A (en) | 2017-01-18 |
CN106339459B CN106339459B (en) | 2019-11-26 |
Family
ID=57822407
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610741134.8A Active CN106339459B (en) | 2016-08-26 | 2016-08-26 | The method that Chinese web page is presorted is carried out based on Keywords matching |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106339459B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107506472A (en) * | 2017-09-05 | 2017-12-22 | 淮阴工学院 | A kind of student browses Web page classification method |
CN107545020A (en) * | 2017-05-10 | 2018-01-05 | 新华三信息安全技术有限公司 | A kind of determination method and device of Web page classifying |
CN108874996A (en) * | 2018-06-13 | 2018-11-23 | 北京知道创宇信息技术有限公司 | website classification method and device |
CN113377467A (en) * | 2021-06-29 | 2021-09-10 | 中国平安财产保险股份有限公司 | Information decoupling method and device, server and storage medium |
CN113934848A (en) * | 2021-10-22 | 2022-01-14 | 马上消费金融股份有限公司 | Data classification method and device and electronic equipment |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050114348A1 (en) * | 1995-12-14 | 2005-05-26 | Wesinger Ralph E.Jr. | Method and apparatus for classifying a search by keyword |
US20060282416A1 (en) * | 2005-04-29 | 2006-12-14 | William Gross | Search apparatus and method for providing a collapsed search |
US20090119276A1 (en) * | 2007-11-01 | 2009-05-07 | Antoine Sorel Neron | Method and Internet-based Search Engine System for Storing, Sorting, and Displaying Search Results |
CN101593200A (en) * | 2009-06-19 | 2009-12-02 | 淮海工学院 | Chinese Web page classification method based on the keyword frequency analysis |
CN101814083A (en) * | 2010-01-08 | 2010-08-25 | 上海复歌信息科技有限公司 | Automatic webpage classification method and system |
CN104424308A (en) * | 2013-09-04 | 2015-03-18 | 中兴通讯股份有限公司 | Web page classification standard acquisition method and device and web page classification method and device |
CN105512143A (en) * | 2014-09-26 | 2016-04-20 | 中兴通讯股份有限公司 | Method and device for web page classification |
-
2016
- 2016-08-26 CN CN201610741134.8A patent/CN106339459B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050114348A1 (en) * | 1995-12-14 | 2005-05-26 | Wesinger Ralph E.Jr. | Method and apparatus for classifying a search by keyword |
US20060282416A1 (en) * | 2005-04-29 | 2006-12-14 | William Gross | Search apparatus and method for providing a collapsed search |
US20090119276A1 (en) * | 2007-11-01 | 2009-05-07 | Antoine Sorel Neron | Method and Internet-based Search Engine System for Storing, Sorting, and Displaying Search Results |
CN101593200A (en) * | 2009-06-19 | 2009-12-02 | 淮海工学院 | Chinese Web page classification method based on the keyword frequency analysis |
CN101814083A (en) * | 2010-01-08 | 2010-08-25 | 上海复歌信息科技有限公司 | Automatic webpage classification method and system |
CN104424308A (en) * | 2013-09-04 | 2015-03-18 | 中兴通讯股份有限公司 | Web page classification standard acquisition method and device and web page classification method and device |
CN105512143A (en) * | 2014-09-26 | 2016-04-20 | 中兴通讯股份有限公司 | Method and device for web page classification |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107545020A (en) * | 2017-05-10 | 2018-01-05 | 新华三信息安全技术有限公司 | A kind of determination method and device of Web page classifying |
CN107506472A (en) * | 2017-09-05 | 2017-12-22 | 淮阴工学院 | A kind of student browses Web page classification method |
CN107506472B (en) * | 2017-09-05 | 2020-09-08 | 淮阴工学院 | Method for classifying browsed webpages of students |
CN108874996A (en) * | 2018-06-13 | 2018-11-23 | 北京知道创宇信息技术有限公司 | website classification method and device |
CN113377467A (en) * | 2021-06-29 | 2021-09-10 | 中国平安财产保险股份有限公司 | Information decoupling method and device, server and storage medium |
CN113377467B (en) * | 2021-06-29 | 2022-04-01 | 中国平安财产保险股份有限公司 | Information decoupling method and device, server and storage medium |
CN113934848A (en) * | 2021-10-22 | 2022-01-14 | 马上消费金融股份有限公司 | Data classification method and device and electronic equipment |
CN113934848B (en) * | 2021-10-22 | 2023-04-07 | 马上消费金融股份有限公司 | Data classification method and device and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN106339459B (en) | 2019-11-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109492077B (en) | Knowledge graph-based petrochemical field question-answering method and system | |
CN101430695B (en) | System and method for computing difference affinities of word | |
CN106294593B (en) | In conjunction with the Relation extraction method of subordinate clause grade remote supervisory and semi-supervised integrated study | |
CN106339459B (en) | The method that Chinese web page is presorted is carried out based on Keywords matching | |
US8341159B2 (en) | Creating taxonomies and training data for document categorization | |
CN106599054B (en) | Method and system for classifying and pushing questions | |
KR100756921B1 (en) | Method of classifying documents, computer readable record medium on which program for executing the method is recorded | |
CN108415902A (en) | A kind of name entity link method based on search engine | |
CN104881458B (en) | A kind of mask method and device of Web page subject | |
CN110188197B (en) | Active learning method and device for labeling platform | |
CN107992633A (en) | Electronic document automatic classification method and system based on keyword feature | |
CN103617157A (en) | Text similarity calculation method based on semantics | |
CN108763321A (en) | A kind of related entities recommendation method based on extensive related entities network | |
CN110688474B (en) | Embedded representation obtaining and citation recommending method based on deep learning and link prediction | |
CN105159932A (en) | Data retrieving and sorting system and method | |
CN103559191A (en) | Cross-media sorting method based on hidden space learning and two-way sorting learning | |
CN104484380A (en) | Personalized search method and personalized search device | |
CN105653562A (en) | Calculation method and apparatus for correlation between text content and query request | |
CN105512333A (en) | Product comment theme searching method based on emotional tendency | |
CN101916263A (en) | Fuzzy keyword query method and system based on weighing edit distance | |
CN110134799B (en) | BM25 algorithm-based text corpus construction and optimization method | |
CN103778206A (en) | Method for providing network service resources | |
CN106886512A (en) | Article sorting technique and device | |
CN110990676A (en) | Social media hotspot topic extraction method and system | |
CN103761286B (en) | A kind of Service Source search method based on user interest |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |