CN103425735B - A kind of method for building up and system based on website subject term inquiry - Google Patents
A kind of method for building up and system based on website subject term inquiry Download PDFInfo
- Publication number
- CN103425735B CN103425735B CN201310223294.XA CN201310223294A CN103425735B CN 103425735 B CN103425735 B CN 103425735B CN 201310223294 A CN201310223294 A CN 201310223294A CN 103425735 B CN103425735 B CN 103425735B
- Authority
- CN
- China
- Prior art keywords
- vocabulary
- website
- score
- webpage
- title
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Information Transfer Between Computers (AREA)
Abstract
The present invention relates to information retrieval field there is provided a kind of method for building up based on website subject term inquiry, including, obtain web data;Importance between the website of vocabulary is counted according to web data;Subject of Web site word is extracted according to web data;According to the subject of Web site word information of extraction, storage resource dictionary is set up;Set up a web site theme query interface.System set up based on website subject term inquiry present invention also offers a kind of.Using technical scheme, flow is simply easily realized, can quickly realize renolation, for being experienced under line, on line using to lift Professional search.
Description
Technical field
The present invention relates to information retrieval field, particularly a kind of method for building up based on website subject term inquiry and it is
System.
Background technology
With the development of information technology, the information of internet is increasingly enriched, and has penetrated into side's aspect of people's life
Face.The especially appearance of search engine, allows user quickly to search the information of oneself needs from mass data.Traditional
Search engine is for the purpose of the demand for meeting user, and everybody shares a search engine, and meeting most of Man's Demands just can be with.
As a popular network tool, most search engine is also difficult to meet specific industry, specific user to spy
Fixed information or the demand of service.Specialized search engine thus is occurred in that, is absorbed in and collects related to a certain theme important
The page, and ensure including and upgrading in time to a certain realm information.
Search engine should not be only to meet people's life-stylize, the instrument of entertainment orientation information, but also should meet
People's more extensive, benefited instrument of more demand of specialty.Search engine how is allowed to play bigger effect, more professional practicality,
Butcher,baker,and the candlestick maker can use and the required of oneself is obtained, is search engine problems faced.
Vertical search is one kind of search engine, and most search engine all possesses vertical search function.Vertically search
Rope can be regarded as the search of certain class professional domain, cover the fields such as novel, music, video, picture.Such as when user searches for certain song
Qu Shi, can directly obtain the information of the song, audition, download etc., so directly meet the Search Requirement of user, allow user
Pleasure., can in this kind of field but vertical search covering is a kind of field of resource-type, is wilfully activated, entertainment orientation field
Directly to meet user's real needs.
But the distribution field of user, professional domain interested are not limited only to vertical resources domain.Especially internet is provided
Source is more and more rich and varied, increasing professionals, and encountering problems can tend to, by search engine, retrieve a few thing
On, professional problem.But for increasing professional demand, search engine seems unable to do what one wishes, it is impossible to provide specially
The result of industry.This is not because have specialized data on internet, but search engine is to point on internet data
Analysis and understand still not enough, simply capture and retrieve, lack more analyses, understand, it is necessary to mass data is done into data mining, it is whole
Manage into more high-quality data.
Make search engine search results more specialized, more allow people to convince, most important one is exactly to make the data of oneself special
Industry.Understand internet data under line, have clearly to the data type in the searching database of oneself, distribution, theme etc.
Understand and arrange.After user search keyword, computer can provide the more specialized, authoritativeization of keyword correlation
Site result, user can really be benefited.Therefore analysis and understanding is done under needing online to internet data, extract numerous websites letters
Breath, understands the theme of website.Once being aware of the theme of website, the conception of the website has been known that.When user provides keyword
Information, can be supplied to the data of the corresponding website of user's key word information.Effective retrieval can not be provided in the prior art
Scheme.
The content of the invention
Present invention solves the technical problem that it is the provision of a kind of method for building up based on website subject term inquiry and system,
More preferably to improve the validity of search engine.
To solve the above problems, the invention provides a kind of method for building up based on website subject term inquiry, including,
Obtain web data;
Importance between the website of vocabulary is counted according to web data;
Subject of Web site word is extracted according to web data;
According to the subject of Web site word information of extraction, storage resource dictionary is set up;
Set up a web site theme query interface.
Above-mentioned method, wherein, the acquisition web data includes,
The data for the webpage that website is included are obtained, mainly include the title title and URL of webpage
url。
Above-mentioned method, wherein, importance includes between the website that vocabulary is counted according to web data,
After web page title participle, a series of vocabulary is obtained, after being filtered out by part of speech, each vocabulary is exported and calculates
Reverse website frequency ISF values as importance measurement.
Above-mentioned method, wherein, the formula of the reverse website frequency ISF values is defined as,
Wherein, n represents T containing vocabularyjWebsite number;N is the number of all websites.
Above-mentioned method, wherein, the extraction subject of Web site word includes,
After the title participle of webpage, a series of vocabulary Term is obtained;
The vocabulary is subjected to part of speech filtering;
Vocabulary marking to completing above-mentioned steps, according to descriptor of the selected vocabulary of marking as website.
Above-mentioned method, wherein, it is described that vocabulary progress marking is included, the title of webpage is given a mark, the marking
Formula is
p_score(Tj)=index_score (Tj)*pos_score(Tj)
Wherein, index_score (Tj) it is vocabulary TjPosition score, pos_score (Tj) it is vocabulary TjPart of speech obtain
Point.
Wherein, N is the vocabulary number that title contains, dpos (Tj) it is vocabulary TjPart of speech grade;
Also include giving a mark to vocabulary in website,
Wherein, s_score (Tj) it is vocabulary T in certain websitejFraction, page_num (Tj) included for title in certain website
Vocabulary TjWebpage number, page_num is that the number of webpage is contained in the website.
Above-mentioned method, wherein, the storage resource dictionary of setting up includes,
By the subject of Web site word information of extraction, the resource dictionary of structuring is set up, including website is to the positive inquiry of descriptor
Module, and/or descriptor is to the inverse enquiry module of website.
Above-mentioned method, wherein, the positive enquiry module and inverse enquiry module include data field and structural area, structural area
Storage is the object directly inquired about, and what data field was stored is the data that structural area is shared.
Above-mentioned method, wherein, the subject of Web site query interface includes positive inquiry and reverse inquiry, and the forward direction is looked into
Ask, by station address, to inquire about the descriptor and its weights of the website;The reverse inquiry is, by descriptor, to inquire about the master
Write inscription the website covered and its weights.
System set up based on website subject term inquiry present invention also offers a kind of, including,
Acquisition module, for obtaining web data;
Importance between statistical module, the website for counting vocabulary according to web data;
Extraction module, for extracting subject of Web site word according to web data;
Module is set up, for the subject of Web site word information according to extraction, storage resource dictionary is set up;
Interface module, for the theme query interface that sets up a web site.
Using technical scheme, based on the web data of search engine itself, count important between the website of vocabulary
Property, calculate ISF(inverse site frequency);Based on the crucial numeric field data of webpage, extract important vocabulary and merger goes out it
The descriptor of place website;Most result is stored as resource dictionary at last, and provides descriptor related forward and reverse query interface, convenient
Inquiry between website and descriptor is used.The scheme that the present invention is provided is summarized, flow is simply easily realized, can quickly be realized
Renolation, for being experienced under line, on line using to lift Professional search.
Brief description of the drawings
Accompanying drawing described herein is used for providing a further understanding of the present invention, constitutes the part of the present invention, this hair
Bright schematic description and description is used to explain the present invention, does not constitute inappropriate limitation of the present invention.In the accompanying drawings:
Fig. 1 is first embodiment of the invention flow chart;
Fig. 2 is second embodiment of the invention flow chart.
Embodiment
In order that technical problems, technical solutions and advantages to be solved are clearer, clear, tie below
Drawings and examples are closed, the present invention will be described in further detail.It should be appreciated that specific embodiment described herein is only
To explain the present invention, it is not intended to limit the present invention.
In view of at present search engine to the understanding of internet data, analysis work is not enough, the present invention is based on internet web page
Data, to extract the descriptor of website, and set up a kind of resource dictionary there is provided the just reverse interface between website and descriptor,
The specialized result for lifting itself for search engine provides a kind of basis.
As shown in figure 1, being that there is provided a kind of foundation based on website subject term inquiry for first embodiment of the invention flow chart
Method, including,
Step S101, obtains web data;
As one embodiment, the data for the webpage that website is included are obtained, mainly include the title (title) of webpage,
And url(URL).
Search engine is that the term of user and mass data set up into hinge, therefore two indispensable big datas of search engine
Source:Retrieve daily record and web data.Retrieval daily record can be used for analyzing, understand user view and core demand;Web data can
For analyzing, understanding itself back-end data, including do data mining, do used in knowledge base etc..
The present invention extracts the descriptor of website, it is contemplated that website homepage content is less, is not suitable for doing the excavation of content property,
And typically capture is web data for search, thus data be not website homepage data, but website is included
Webpage data.And using the actual web data for indexing storehouse of search engine, because the establishing data of search engine is week
What phase property updated, therefore the present invention can also update in the cycle therewith.Most important key is the title (marks of webpage in webpage
Topic), and url(URL).Therefore for carrying the two attributes for also mainly using webpage of descriptor.Url is used
In extracting corresponding web-site, title is then used to extract key vocabularies.
Step S102, importance between the website of vocabulary is counted according to web data;
As one embodiment, first to web page title participle after, obtain a series of vocabulary, secondly filtered out by part of speech
Some stop words, punctuation mark etc.;Each vocabulary and the reverse website frequency ISF values calculated are finally exported as importance
Measurement.
The descriptor of website is calculated, first it is to be understood that a probably distribution of the vocabulary between website.If some word T is one
The frequency TF occurred in individual website is high, and seldom occurs in other websites, then it is assumed that this word has area between good website
The ability of dividing.An ISF is calculated in this step(Inverse site frequency, reverse website frequency), similar to text point
IDF in class(Inverse document frequency, reverse document-frequency).
As the measurement of a word general importance, the ISF of a certain particular words can be by total website number divided by bag
The number of website containing the word, then obtained business is taken the logarithm obtained.
ISF formula are defined as
Wherein, n represents T containing vocabularyjWebsite number;N is the number of all websites.
Step S103, extracts subject of Web site word;
The ISF of vocabulary illustrates a kind of importance of this vocabulary between website, is a kind of measurement of overall importance.ISF is got over
It is high, it can be understood as more to show the theme of website.And this step is to combine specific each web data, Term points are locally
Analysis, extracts the important vocabulary in each web page title.
One webpage includes many contents, url, title, content, link etc..The title of webpage is the key of a webpage
Domain, be it is most important be also the part for best embodying web page contents theme, only cheating, low quality webpage, title and content are not
Consistent such case.But the web data that we take before has been done under the web data that search engine indexes storehouse, line
Cross quality analysis, it is believed that be high-quality webpage, dead chain, the webpage situation of the type such as cheating is few.
First, by after the title participle of webpage, a series of vocabulary Term is obtained.
Secondly, not all word be may serve to do what is analyzed, in order to improve efficiency, and we are filtered by part of speech,
Part is filtered out to look at a glance with regard to unessential word, such as auxiliary word, punctuation mark, conjunction, preposition etc..
Again, the vocabulary for completing above-mentioned steps is given a mark.Marking is with reference to two factors herein:Position, part of speech.What position referred to
It is positions of the vocabulary T in web page title, point front portion, middle part, three kinds of rear portion.Front portion is defined as Term positions less than Term numbers
20%, rear portion is defined as 80% that Term positions are more than Term numbers, and other is middle part.Position where vocabulary is different, what it rose
Significance level is also different, general a word or an article, important vocabulary, the vocabulary related to theme, can be located at
Front and rear.The importance of part of speech is self-evident, web page title, more meets natural language custom and logic of language, is
A kind of language for comparing specification, it is possible to judge which word is important from the angle of part of speech, which is unessential.I
From the theme angle of title, part of speech is fallen into three classes:One-level part of speech mainly has noun, verb, name, place name, mechanism
Name, proper noun, Chinese idiom, initialism;Two grades of parts of speech mainly have adjective, adverbial word, the noun of locality, measure word;Other parts of speech are classified as
Three grades, are some morpheme words, only serve certain modification.
Marking formula be
p_score(Tj)=index_score (Tj)*pos_score(Tj)
Wherein, index_score (Tj) it is vocabulary TjPosition score, pos_score (Tj) it is vocabulary TjPart of speech obtain
Point.
Wherein, N is the vocabulary number that title contains, dpos (Tj) it is vocabulary TjPart of speech grade.
The fraction of the vocabulary contained in the title that a webpage has been obtained to this.But this is only webpage rank, vocabulary
Fraction in certain website, it is also contemplated that the quantity for the webpage that the website is contained, and the webpage containing the vocabulary number.
Wherein, s_score (Tj) it is vocabulary T in certain websitejFraction, page_num (Tj) included for title in certain website
Vocabulary TjWebpage number, page_num is that the number of webpage is contained in the website.
This step completes the process of the vocabulary marking to website, both with reference to importance of the vocabulary between website, and had joined again
The specifying information of vocabulary in the webpage and webpage for contained in website vocabulary is examined.The number of the vocabulary contained in view of a website
Amount is huge, it would be desirable to be not whole vocabulary in website, therefore only regard before fraction ranking 1% word as the website
Descriptor, is preserved.
Step S104, sets up storage resource dictionary;
By the subject of Web site word information of extraction, a kind of resource dictionary of structuring is built up, carrys out fast and easy inquiry.The present invention
The dictionary being related to is divided into two modules:Website is to the positive enquiry module of descriptor, and descriptor is to the inverse enquiry module of website.
Each module is containing two parts:Data field and structural area, structural area storage is the object directly inquired about, number
What is stored according to area is data that structural area is shared.
Positive enquiry module is to inquire about its corresponding descriptor by station address.Structural area stores site information, such as all kinds of
Station address, website includes the number of descriptor, the weights of descriptor;Data field is the theme word information, and storage is descriptor
Character string.Such as sina.com and two websites of sohu.com, have " door " this descriptor, and what structural area was stored is
The information of the two websites, including address, descriptor number, descriptor is in the position of data field, the weight of descriptor, and data
Area's storage is " door " this kind of word, and is not repeated.When inquiry sina.com websites, descriptor " door " can be found, is inquired about
Sohu.com can also inquire descriptor " door ".
Inverse enquiry module is then the station address inquired about under descriptor covering according to descriptor.Write inscription based on structural area
Information, containing all kinds of descriptor, corresponding website is in the position of data field, the weights in website;Data field is site information, is deposited
What is stored up is the address character string of website.For example by inquiring about " door ", the website " sina.com " being the theme with " door " is exported,
" sohu.com " etc..
The such storage organization of design, can both accomplish resource-sharing, again can reasonable utilization space, when reduction is used
Loading consumption.
Visualization structure such as Fig. 2 of two modules of dictionary.
Step S105, set up a web site theme query interface.
The information of subject of Web site word is preserved to resource dictionary, in order that inquiry is convenient and swift.Inquiry is externally provided herein
Interface, when inputting keyword, different query interfaces can carry out the inquiry of disparate modules.
With the design of above-mentioned resource dictionary, the present invention provides two kinds of query interfaces, forward direction inquiry and reverse inquiry.Forward direction inquiry
By station address, to inquire about the descriptor and its weights of the website;Reverse inquiry is, by descriptor, to inquire about descriptor culvert
The website of lid and its weights.
Why the query interface of both of which is provided, and being able to do Flexible Query for different terms makes
With.In practice, the inquiry of different modes can play different effects:Web page quality analysis, webpage Term marking are done under line
When, positive inquiry can be used, a little specially treateds are done with reference to the descriptor of output;It can be used when being analyzed on line user search word
Reverse inquiry, if the term of user includes some descriptor, the website that descriptor covers, which can contemplate, preferentially to be showed, this energy
It is enough that certain castering action is played to the professional, authoritative of search result.
The simple explanation that gives an actual example, forward direction inquiry, query pattern is 1, such as input " sina.com ", and Query Result is " door
0.8;Sina website 0.8;Sina 0.8 " etc., in web page analysis under making line, for the webpage of sina.com websites, this kind of theme
Word is, it is necessary to pay attention to.
Reverse inquiry, query pattern is 2, such as input " literature ", and Query Result is " rongshuxia.com0.4;
tianyibook.com0.4;D5wx.com0.3 " etc., when doing user search analysis on line, has " literature " intention for user
Demand, it may be considered that give the result of this kind of website or by before the result of this kind of website row, knot that more can be to user authority
Really.
As described in Figure 2, it is that there is provided a kind of foundation based on website subject term inquiry for second embodiment of the invention structure chart
System, including,
Acquisition module 201, for obtaining web data;
Importance between statistical module 202, the website for counting vocabulary according to web data;
Extraction module 203, for extracting subject of Web site word according to web data;
Module 204 is set up, for the subject of Web site word information according to extraction, storage resource dictionary is set up;
Interface module 205, for the theme query interface that sets up a web site.
The acquisition module, for obtaining web data, is specifically included, the number for obtaining the webpage that website is included
According to mainly including the title title and URL url of webpage.
Importance between the statistical module, the website for counting vocabulary according to web data, is specifically included, for net
After page head participle, a series of vocabulary is obtained, after being filtered out by part of speech, each vocabulary and the reverse website calculated frequency is exported
Rate ISF values as importance measurement.
The extraction module, for extracting subject of Web site word according to web data, after the title participle by webpage, is obtained
To a series of vocabulary Term;The vocabulary is subjected to part of speech filtering;Vocabulary marking to completing above-mentioned steps, is selected according to marking
Vocabulary is determined as the descriptor of website.
It is described to set up module, for the subject of Web site word information according to extraction, storage resource dictionary is set up, including for inciting somebody to action
The subject of Web site word information of extraction, sets up the resource dictionary of structuring, including website to the positive enquiry module of descriptor, and/or
Inverse enquiry module of the descriptor to website.
Relative to existing search engine under the not enough situation of understanding, the analysis of its data so that search result is at certain
A little professional domains can not meet higher demand.The technical scheme that the present invention is provided, with reference to importance of the vocabulary between website,
It with reference to contain the specifying information of vocabulary in the webpage and webpage of vocabulary again in website, provide the score of vocabulary in website.For
Prominent important vocabulary, can take before website score ranking 1% word as the descriptor of the website, save as the money of clear in structure
Source dictionary, for being used under line, on line.It is available on the important vocabulary of web page analysis reference process, line that user search can be directed under line
Word, provides the search result of more authoritative, more professional website.This method is simple and easy to apply, can quickly update, to meeting particular row
Industry, specific user play positive impetus to the demand of customizing messages or service.
The preferred embodiments of the present invention have shown and described in described above, but as previously described, it should be understood that the present invention is not
Form disclosed herein is confined to, the exclusion to other embodiment is not to be taken as, and available for various other combinations, modification
And environment, and can be carried out in invention contemplated scope described herein by the technology or knowledge of above-mentioned teaching or association area
Change., then all should be in institute of the present invention and the change and change that those skilled in the art are carried out do not depart from the spirit and scope of the present invention
In attached scope of the claims.
Claims (7)
1. a kind of method for building up based on website subject term inquiry, it is characterised in that including,
Obtain web data;
Importance between the website of vocabulary is counted according to web data, specifically includes, after web page title participle, obtains a series of
Vocabulary, after being filtered out by part of speech, exports each vocabulary and the reverse website frequency ISF values calculated as the measurement of importance;
Subject of Web site word is extracted according to web data, specifically includes, after the title participle of webpage, obtains a series of vocabulary
Term;The vocabulary is subjected to part of speech filtering;Vocabulary marking to completing above-mentioned steps, website is used as according to the selected vocabulary of marking
Descriptor;It is described that vocabulary progress marking is included, the title of webpage is given a mark, the marking formula is
p_score(Tj)=index_score (Tj)*pos_score(Tj)
Wherein, index_score (Tj) it is vocabulary TjPosition score, pos_score (Tj) it is vocabulary TjPart of speech score;
Wherein, N is the vocabulary number that title contains, dpos (Tj) it is vocabulary TjPart of speech grade;
Also include giving a mark to vocabulary in website,
Wherein, s_score (Tj) it is vocabulary T in certain websitejFraction, page_num (Tj) include vocabulary T for title in certain websitej
Webpage number, page_num is that the number of webpage is contained in the website;
According to the subject of Web site word information of extraction, storage resource dictionary is set up;
Set up a web site theme query interface.
2. according to the method described in claim 1, it is characterised in that the acquisition web data includes,
The data for the webpage that website is included are obtained, mainly include the title title and URL url of webpage.
3. method according to claim 2, it is characterised in that the formula of the reverse website frequency ISF values is defined as,
Wherein, njRepresent T containing vocabularyjWebsite number;N is the number of all websites.
4. method according to claim 3, it is characterised in that the storage resource dictionary of setting up includes,
By the subject of Web site word information of extraction, the resource dictionary of structuring, including website are set up to the positive enquiry module of descriptor,
And/or descriptor is to the inverse enquiry module of website.
5. method according to claim 4, it is characterised in that the positive enquiry module and inverse enquiry module include data
Area and structural area, structural area storage is the object directly inquired about, and what data field was stored is the data that structural area is shared.
6. method according to claim 5, it is characterised in that the subject of Web site query interface includes positive inquiry and inverse
To inquiry, the positive inquiry is, by station address, to inquire about the descriptor and its weights of the website;The reverse inquiry is logical
Descriptor is crossed, website and its weights that the descriptor covers are inquired about.
7. a kind of set up system based on website subject term inquiry, it is characterised in that including,
Acquisition module, for obtaining web data;
Importance between statistical module, the website for counting vocabulary according to web data;Specifically include, to web page title participle
Afterwards, a series of vocabulary is obtained, after being filtered out by part of speech, each vocabulary is exported and the reverse website frequency ISF values calculated is made
For the measurement of importance;
Extraction module, for extracting subject of Web site word according to web data;Specifically include, after the title participle of webpage, obtain
A series of vocabulary Term;The vocabulary is subjected to part of speech filtering;Vocabulary marking to completing above-mentioned steps, it is selected according to marking
Vocabulary as website descriptor;It is described that vocabulary progress marking is included, the title of webpage is given a mark, the marking formula
For
p_score(Tj)=index_score (Tj)*pos_score(Tj)
Wherein, index_score (Tj) it is vocabulary TjPosition score, pos_score (Tj) it is vocabulary TjPart of speech score;
Wherein, N is the vocabulary number that title contains, dpos (Tj) it is vocabulary TjPart of speech grade;
Also include giving a mark to vocabulary in website,
Wherein, s_score (Tj) it is vocabulary T in certain websitejFraction, page_num (Tj) include vocabulary T for title in certain websitej
Webpage number, page_num is that the number of webpage is contained in the website;
Module is set up, for the subject of Web site word information according to extraction, storage resource dictionary is set up;
Interface module, for the theme query interface that sets up a web site.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310223294.XA CN103425735B (en) | 2013-06-06 | 2013-06-06 | A kind of method for building up and system based on website subject term inquiry |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310223294.XA CN103425735B (en) | 2013-06-06 | 2013-06-06 | A kind of method for building up and system based on website subject term inquiry |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103425735A CN103425735A (en) | 2013-12-04 |
CN103425735B true CN103425735B (en) | 2017-08-11 |
Family
ID=49650474
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310223294.XA Active CN103425735B (en) | 2013-06-06 | 2013-06-06 | A kind of method for building up and system based on website subject term inquiry |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103425735B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111488511B (en) * | 2019-01-25 | 2024-04-09 | 深信服科技股份有限公司 | Website theme extraction method and system, electronic equipment and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101079031A (en) * | 2006-06-15 | 2007-11-28 | 腾讯科技(深圳)有限公司 | Web page subject extraction system and method |
CN102541910A (en) * | 2010-12-27 | 2012-07-04 | 上海杉达学院 | Keywords extraction method |
CN102622373A (en) * | 2011-01-31 | 2012-08-01 | 中国科学院声学研究所 | Statistic text classification system and statistic text classification method based on term frequency-inverse document frequency (TF*IDF) algorithm |
-
2013
- 2013-06-06 CN CN201310223294.XA patent/CN103425735B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101079031A (en) * | 2006-06-15 | 2007-11-28 | 腾讯科技(深圳)有限公司 | Web page subject extraction system and method |
CN102541910A (en) * | 2010-12-27 | 2012-07-04 | 上海杉达学院 | Keywords extraction method |
CN102622373A (en) * | 2011-01-31 | 2012-08-01 | 中国科学院声学研究所 | Statistic text classification system and statistic text classification method based on term frequency-inverse document frequency (TF*IDF) algorithm |
Also Published As
Publication number | Publication date |
---|---|
CN103425735A (en) | 2013-12-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103678576B (en) | The text retrieval system analyzed based on dynamic semantics | |
CN100416570C (en) | FAQ based Chinese natural language ask and answer method | |
KR101060594B1 (en) | Keyword Extraction and Association Network Configuration for Document Data | |
TWI695277B (en) | Automatic website data collection method | |
CN107247780A (en) | A kind of patent document method for measuring similarity of knowledge based body | |
CN103455487B (en) | The extracting method and device of a kind of search term | |
CN110020189A (en) | A kind of article recommended method based on Chinese Similarity measures | |
KR101100830B1 (en) | Entity searching and opinion mining system of hybrid-based using internet and method thereof | |
CN101404036B (en) | Keyword abstraction method for PowerPoint electronic demonstration draft | |
CN104035972B (en) | A kind of knowledge recommendation method and system based on microblogging | |
CN102955853B (en) | A kind of generation method and device across language digest | |
CN106649823A (en) | Webpage classification recognition method based on comprehensive subject term vertical search and focused crawler | |
CN107491465A (en) | For searching for the method and apparatus and data handling system of content | |
Renouf et al. | Filling the gaps: Using the WebCorp Linguist’s Search Engine to supplement existing text resources | |
Grieve et al. | Site-restricted web searches for data collection in regional dialectology | |
CN101013440A (en) | Method for constructing digital library based on book knowledge element | |
CN107766398A (en) | For the method, apparatus and data handling system for image is matched with content item | |
Lioma et al. | A syntactically-based query reformulation technique for information retrieval | |
CN103425735B (en) | A kind of method for building up and system based on website subject term inquiry | |
CN109544394A (en) | A kind of tourist site appraisal procedure and calculate equipment | |
CN111259136A (en) | Method for automatically generating theme evaluation abstract based on user preference | |
Mosavi Miangah | Constructing a large-scale english-persian parallel corpus | |
CN103258053B (en) | The extracting method and system of a kind of domain feature words | |
WO2015043389A1 (en) | Participle information push method and device based on video search | |
Thanadechteemapat et al. | Thai word segmentation for visualization of thai web sites |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP03 | Change of name, title or address | ||
CP03 | Change of name, title or address |
Address after: 518057 5 C block 403-409 of Nanshan District software industrial base, Shenzhen, Guangdong. Patentee after: Shenzhen easou world Polytron Technologies Inc Address before: 518026 A5501-A, A tower, joint Plaza, Binhe Road and colored field road, Futian District, Shenzhen, Guangdong Patentee before: Shenzhen Yisou Science & Technology Development Co., Ltd. |