CN103473317A - Method and equipment for extracting keywords - Google Patents

Method and equipment for extracting keywords Download PDF

Info

Publication number
CN103473317A
CN103473317A CN2013104151379A CN201310415137A CN103473317A CN 103473317 A CN103473317 A CN 103473317A CN 2013104151379 A CN2013104151379 A CN 2013104151379A CN 201310415137 A CN201310415137 A CN 201310415137A CN 103473317 A CN103473317 A CN 103473317A
Authority
CN
China
Prior art keywords
candidate keywords
feature
keyword
type
keywords
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2013104151379A
Other languages
Chinese (zh)
Inventor
路遥
陈镜
唐进
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN2013104151379A priority Critical patent/CN103473317A/en
Publication of CN103473317A publication Critical patent/CN103473317A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method and equipment for extracting keywords. The method includes: extracting candidate keywords from predetermined information, acquiring at least one type of characteristic of each candidate keyword, calculating an importance value of each candidate keyword according to the acquired characteristics and weight, acquired by training, of the characteristic of each type, and selecting the keywords from the candidate keywords according to the calculated importance values. According to the method and the equipment for extracting the keywords, the keywords better in relevance to source information can be acquired.

Description

Extract the method and apparatus of keyword
Technical field
The present invention relates to information search and Data Mining.More particularly, relate to a kind of method and apparatus that extracts keyword.
Background technology
Along with the development of infotech, search engine is used more and more widely.By search engine, general search subscriber (that is, general netizen) can search relevant information.In addition, promoting service user (for example, advertisement promotion user) also utilizes the promoting service service that search engine provides to carry out promoting service.When carrying out promoting service, the business that the promoting service user will be to be promoted (for example, certain advertisement) is associated with predetermined keyword, and when this predetermined keyword of search subscriber input is searched for, search engine can provide the business be associated with this keyword like this.For this reason, the promoting service user need to know the keyword of the traffic aided to be promoted with oneself connection, and business that will be to be promoted and keyword associate.Like this, when search subscriber is input to keyword search engine and is searched for, can search corresponding business.
In the prior art, search engine can be recommended some keywords in advance according to the relevant information for the treatment of promotion business.Yet, the keyword that these are recommended in advance and treat that the relevance between promotion business is likely not strong.In other words, in search, these treat that promotion business may not used these predetermined keywords to search subscriber.Like this, cause search subscriber when utilizing search engine to be searched for, may need to replace repeatedly keyword and just can search the business of hope.
Therefore, obtaining exactly the relevance for the treatment of promotion business and keyword is very important factor.The relevance for the treatment of promotion business and keyword is higher, and search subscriber just more easily searches this and treats promotion business.
Like this, need a kind of technology of extracting keyword to help the promoting service user and obtain the keyword that relevance is high, to improve the search experience of search subscriber.
Summary of the invention
The invention reside in a kind of method and apparatus that extracts keyword is provided, it can obtain the keyword had with the better relevance of source information (that is, extracting the information of keyword).
An aspect of of the present present invention provides a kind of method of extracting keyword, comprising: a) among predetermined information, extract candidate keywords; B) obtain the feature of at least one type of each candidate keywords; The importance values of each candidate keywords of weight calculation of the feature of every type that the feature that c) basis is obtained and training obtain; D) select keyword according to the importance values of calculating among candidate keywords.
Alternatively, described feature comprises at least one in text feature, language feature, statistical nature, mark feature.
Alternatively, according to the popularization log information of search engine, come the candidate keywords classification, inhomogeneous candidate keywords has different mark features.
Alternatively, the search log information of search engine comprises that whether number of clicks, the indication candidate keywords of the promotion business corresponding with described predetermined information searched based on candidate keywords are the indication information of purchased popularization word.
Alternatively, candidate keywords is classified as: the candidate keywords that is more than or equal to predetermined threshold as purchased popularization word and number of clicks, the candidate keywords that is less than predetermined threshold as purchased popularization word and number of clicks, candidate keywords, remaining other candidate keywords of as purchased popularization word and number of clicks, being zero.
Alternatively, utilize the feature of described at least one type and the importance values of the sample keyword of predetermined quantity to train the sequence learning model to obtain the weight of the feature of every type.
Alternatively, step c) comprises: the weight of the feature of every type of utilizing training to obtain, calculate the weighted sum of feature of at least one type of each candidate keywords.
Another aspect of the present invention provides a kind of equipment that extracts keyword, comprising: the word extraction unit extracts candidate keywords among predetermined information; Feature extraction unit, obtain the feature of at least one type of each candidate keywords; The importance degree computing unit, the importance values of each candidate keywords of weight calculation of the feature of every type obtained according to the feature of obtaining and training; Selected cell is selected keyword according to the importance values of calculating among candidate keywords.
Alternatively, described feature comprises at least one in text feature, language feature, statistical nature, mark feature.
Alternatively, according to the popularization log information of search engine, come the candidate keywords classification, inhomogeneous candidate keywords has different mark features.
Alternatively, the search log information of search engine comprises that whether number of clicks, the indication candidate keywords of the promotion business corresponding with described predetermined information searched based on candidate keywords are the indication information of purchased popularization word.
Alternatively, candidate keywords is classified as: the candidate keywords that is more than or equal to predetermined threshold as purchased popularization word and number of clicks, the candidate keywords that is less than predetermined threshold as purchased popularization word and number of clicks, candidate keywords, remaining other candidate keywords of as purchased popularization word and number of clicks, being zero.
Alternatively, utilize the feature of described at least one type and the importance values of the sample keyword of predetermined quantity to train the sequence learning model to obtain the weight of the feature of every type.
Alternatively, the weight of the feature of every type that importance degree computing unit utilization training obtains, calculate the weighted sum of feature of at least one type of each candidate keywords.
Extract according to an embodiment of the invention the method and apparatus of keyword, by the feature of extracting candidate keywords, also utilize the feature of extracting to obtain the importance values of candidate keywords, thereby provide keyword according to the size of importance values.Such extracting mode can detect the value of each keyword more accurately, be also, relevance between keyword and popularization content, thereby when helping the promoting service user to recommend rational keyword, also finally be convenient to the content that search subscriber (that is, netizen) can search hope more accurately.
In addition, also can be applied to other various occasions that need to extract keyword according to the method and apparatus that extracts according to an embodiment of the invention keyword, various features by utilizing candidate keywords (feature of especially extracting by the log information of search engine), can extract the keyword had with the better relevance of source information.
The accompanying drawing explanation
By the detailed description of carrying out below in conjunction with accompanying drawing, above and other objects of the present invention, characteristics and advantage will become apparent, wherein:
Fig. 1 illustrates the process flow diagram of the method for extracting according to an embodiment of the invention keyword;
Fig. 2 illustrates the block diagram of the equipment that extracts according to an embodiment of the invention keyword.
Embodiment
Describe below with reference to accompanying drawings embodiments of the invention in detail.
Fig. 1 illustrates the process flow diagram of the method for extracting according to an embodiment of the invention keyword.
As shown in Figure 1, in step 101, among predetermined information, extract candidate keywords.Can utilize existing various keyword extraction technology to extract keyword.For example, participle technique, natural language processing technique etc.
In one example, the structured message of the predetermined prod in the product database of the extension system that described predetermined information is search engine, it can comprise name of product, product specific descriptions, product classification etc.
Should be appreciated that, predetermined information of the present invention is not limited to above-mentioned information, and predetermined information can be the various information that can extract keyword.In other words, the promoting service that the invention is not restricted to search engine is served the extraction of keyword used, and it goes for the various occasions that need to extract keyword.
In step 102, obtain the feature of at least one type of each candidate keywords.
In one embodiment of the invention, this feature can be at least one in text feature, language feature, statistical nature, mark feature.
Text feature refers to that candidate keywords itself and/or candidate keywords are positioned at field information and/or the positional information among predetermined information.
Language feature refers to the characteristic of speech sounds of candidate keywords self.For example, part of speech (for example, noun, verb, adjective), whether be proper noun (for example, trade name, brand name, place name, name), for example, by least one in various language feature information after natural language processing (, whether be major component, whether be specific object information of trunk information, trunk information analysis etc.) etc.
Statistical nature refers to the statistical property of candidate keywords.For example, the number of times that candidate keywords occurs in described predetermined information and/or the inverse document frequency of candidate keywords in predetermined corpus etc.
The mark feature refers to the sort feature of candidate keywords.In the present invention, according to the popularization log information of search engine, come the candidate keywords classification, inhomogeneous candidate keywords has different mark features.
The search log information of search engine comprises that whether number of clicks, the indication candidate keywords of the promotion business corresponding with described predetermined information searched based on candidate keywords are the indication information of purchased popularization word.
In other words, the number of clicks here refers to that the promotion business corresponding with described predetermined information is utilized that candidate keywords searches and clicked number of times.Whether the indication information indication candidate keywords popularization user of searched engine is bought.
In one example, during the structured message of the predetermined prod in the product database of the extension system that described predetermined information is search engine, whether indication information indication candidate keywords is bought by the popularization user of this predetermined prod.
Can come candidate keywords is classified according to described number of clicks and indication information.In one example, candidate keywords is divided into: the candidate keywords that is more than or equal to predetermined threshold as purchased popularization word and number of clicks, the candidate keywords that is less than predetermined threshold as purchased popularization word and number of clicks, candidate keywords, remaining other candidate keywords of as purchased popularization word and number of clicks, being zero.Here, described predetermined threshold is greater than 0.
Should be appreciated that, also can classify to candidate keywords based on other modes according to described number of clicks and indication information.
In a preferred embodiment of the invention, this feature at least comprises the mark feature.
In step 103, the importance values of each candidate keywords of weight calculation of the feature of every type obtained according to the feature of obtaining and training.
Specifically, the weight that can training in advance obtains the feature of every type.Like this, for each candidate keywords, can calculate the weighted sum of the feature of each type.
For example, the importance values of candidate keywords can be expressed as followsin:
f(x)=w×x,
Wherein, x is the proper vector of various types of features of this candidate keywords of expression, and w means the vector of the weight of various types of features.
Can utilize feature and the importance values of each type of the sample keyword of predetermined quantity to train the sequence learning model, thereby obtain the weight of the feature of every type.Can utilize existing various sequence learning model (for example, sequence support vector machine (Rank SVM) model, neural network sequence learning model, Boost sequence learning model etc.) to train to obtain described weight.
An example carrying out the weight of training characteristics with RankS VM model below is shown.Rank SVM model is expressed as followsin:
min w M ( w ) = 1 2 | | w | | 2 + C Σ i = 1 n ξ i ,
Make: &xi; i &GreaterEqual; 0 , z i [ w &times; ( x i 1 - x i 2 ) ] &GreaterEqual; 1 - &xi; i , 1 < i &le; n ,
z i = 1 x i 1 &GreaterEqual; x i 2 - 1 x i 1 < x i 2
Wherein, w means the vector of weight of the feature of each type, and C means penalty factor, and n means the quantity that the sample keyword is right, and i is the right label of sample keyword, ξ imean i the corresponding slack variable of sample keyword,
Figure BDA0000381371750000057
the proper vector that means the first sample keyword of i sample keyword centering,
Figure BDA0000381371750000054
mean the proper vector of i sample to the second sample keyword in keyword,
Figure BDA0000381371750000055
mean that the importance value of the first sample keyword of i sample keyword centering is more than or equal to the importance value of the second sample keyword of i sample keyword centering,
Figure BDA0000381371750000056
mean that the importance value of the first sample keyword of i sample keyword centering is less than the importance value of the second sample keyword of i sample keyword centering.
In the above-mentioned example of utilizing Rank SVM model, by function M (w) is minimized, thereby determine the weight w of the feature of each type.During according to the training of Rank SVM model, can from the sample keyword of predetermined quantity, provoke any two sample keywords and form a sample keyword pair.For example, if exist m(m to be greater than 1 natural number) individual sample keyword, can obtain at most
Figure BDA0000381371750000061
individual sample keyword pair.Preferably, there is different importance value between different sample keywords.
In addition, in the situation that there is feature and the importance values of the sample keyword of predetermined quantity, (for example also can use other sequence learning model, neural network sequence learning model, the Boost learning model that sorts) train the weight of the feature that obtains each type, the invention is not restricted to the RankSVM model.
In addition, should be appreciated that, in the processing of step 103, the feature that participates in calculating, training is in advance by normalization (or digitizing).Because this part content is prior art, repeat no more.
In step 104, according to the importance values of calculating, among candidate keywords, select keyword.For example, according to importance values, candidate keywords is sorted, and the candidate keywords of selecting the forward predetermined quantity of importance values is as keyword.
Fig. 2 illustrates the block diagram of the equipment that extracts according to an embodiment of the invention keyword.
As shown in Figure 2, comprise word extraction unit 210, feature extraction unit 220, importance degree computing unit 230, selected cell 240 according to the equipment 200 of extraction keyword of the present invention.
Word extraction unit 210 extracts candidate keywords among predetermined information.
Word extraction unit 210 can utilize existing various keyword extraction technology to extract keyword.For example, participle technique, natural language processing technique etc.
In one example, the structured message of the predetermined prod in the product database of the extension system that described predetermined information is search engine, it can comprise name of product, product specific descriptions, product classification etc.
Should be appreciated that, predetermined information of the present invention is not limited to above-mentioned information, and predetermined information can be the various information that can extract keyword.In other words, the promoting service that the invention is not restricted to search engine is served the extraction of keyword used, and it goes for the various occasions that need to extract keyword.
Feature extraction unit 220 is obtained the feature of at least one type of each candidate keywords.This feature can be at least one in previously described text feature, language feature, statistical nature, mark feature.In a preferred embodiment of the invention, this feature at least comprises the mark feature.
The importance values of each candidate keywords of weight calculation of the feature of every type that the feature that importance degree computing unit 230 bases are obtained and training obtain.
Specifically, the weight that can training in advance obtains the feature of every type.Like this, for each candidate keywords, importance degree computing unit 230 can calculate the weighted sum of the feature of each type.
Can utilize feature and the importance values of each type of the sample keyword of predetermined quantity to train the sequence learning model, thereby obtain the weight of the feature of every type.For example, can use sequence support vector machine (Rank SVM) model, neural network sequence learning model, Boost sequence learning model etc. to train to obtain the weight of the feature of each type.
Selected cell 240 is selected keyword according to the importance values of calculating among candidate keywords.For example, selected cell 240 is sorted to candidate keywords according to importance values, and the candidate keywords of selecting the forward predetermined quantity of importance values is as keyword.
In addition, should be appreciated that, according to the unit in the equipment of the extraction keyword of exemplary embodiment of the present invention, can be implemented nextport hardware component NextPort.Those skilled in the art, according to the performed processing of unit limited, can for example use field programmable gate array (FPGA) or special IC (ASIC) to realize unit.
In addition, may be implemented as the computer code in computer readable recording medium storing program for performing according to the method for extraction keyword of the present invention.Those skilled in the art can realize according to the description to said method described computer code.When being performed in computing machine, realizes described computer code said method of the present invention.
Extract according to an embodiment of the invention the method and apparatus of keyword, by the feature of extracting candidate keywords, also utilize the feature of extracting to obtain the importance values of candidate keywords, thereby provide keyword according to the size of importance values.Such extracting mode can detect the value of each keyword more accurately, be also, relevance between keyword and popularization content, thereby when helping the promoting service user to recommend rational keyword, also finally be convenient to the content that search subscriber (that is, netizen) can search hope more accurately.
In addition, also can be applied to other various occasions that need to extract keyword according to the method and apparatus that extracts according to an embodiment of the invention keyword, various features by utilizing candidate keywords (feature of especially extracting by the log information of search engine), can extract the keyword had with the better relevance of source information.
Although with reference to its exemplary embodiment, specifically shown and described the present invention, but it should be appreciated by those skilled in the art, in the situation that do not break away from the spirit and scope of the present invention that claim limits, can carry out the various changes on form and details to it.

Claims (14)

1. a method of extracting keyword comprises:
A) extract candidate keywords among predetermined information;
B) obtain the feature of at least one type of each candidate keywords;
The importance values of each candidate keywords of weight calculation of the feature of every type that the feature that c) basis is obtained and training obtain;
D) select keyword according to the importance values of calculating among candidate keywords.
2. method according to claim 1, wherein, described feature comprises at least one in text feature, language feature, statistical nature, mark feature.
3. method according to claim 1, wherein, come the candidate keywords classification according to the popularization log information of search engine, and inhomogeneous candidate keywords has different mark features.
4. method according to claim 3, wherein, the search log information of search engine comprises that whether number of clicks, the indication candidate keywords of the promotion business corresponding with described predetermined information searched based on candidate keywords are the indication information of purchased popularization word.
5. method according to claim 4, wherein, candidate keywords is classified as: the candidate keywords that is more than or equal to predetermined threshold as purchased popularization word and number of clicks, the candidate keywords that is less than predetermined threshold as purchased popularization word and number of clicks, candidate keywords, remaining other candidate keywords of as purchased popularization word and number of clicks, being zero.
6. method according to claim 1, wherein, utilize feature and the importance values of described at least one type of the sample keyword of predetermined quantity to train the sequence learning model to obtain the weight of the feature of every type.
7. method according to claim 1, wherein, step c) comprises: the weight of the feature of every type of utilizing training to obtain, calculate the weighted sum of feature of at least one type of each candidate keywords.
8. an equipment that extracts keyword comprises:
The word extraction unit extracts candidate keywords among predetermined information;
Feature extraction unit, obtain the feature of at least one type of each candidate keywords;
The importance degree computing unit, the importance values of each candidate keywords of weight calculation of the feature of every type obtained according to the feature of obtaining and training;
Selected cell is selected keyword according to the importance values of calculating among candidate keywords.
9. equipment according to claim 8, wherein, described feature comprises at least one in text feature, language feature, statistical nature, mark feature.
10. equipment according to claim 9, wherein, come the candidate keywords classification according to the popularization log information of search engine, and inhomogeneous candidate keywords has different mark features.
11. equipment according to claim 10, wherein, the search log information of search engine comprises that whether number of clicks, the indication candidate keywords of the promotion business corresponding with described predetermined information searched based on candidate keywords are the indication information of purchased popularization word.
12. equipment according to claim 11, wherein, candidate keywords is classified as: the candidate keywords that is more than or equal to predetermined threshold as purchased popularization word and number of clicks, the candidate keywords that is less than predetermined threshold as purchased popularization word and number of clicks, candidate keywords, remaining other candidate keywords of as purchased popularization word and number of clicks, being zero.
13. equipment according to claim 8, wherein, utilize the feature of described at least one type and the importance values of the sample keyword of predetermined quantity to train the sequence learning model to obtain the weight of the feature of every type.
14. equipment according to claim 8, wherein, the weight of the feature of every type that importance degree computing unit utilization training obtains, calculate the weighted sum of feature of at least one type of each candidate keywords.
CN2013104151379A 2013-09-12 2013-09-12 Method and equipment for extracting keywords Pending CN103473317A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2013104151379A CN103473317A (en) 2013-09-12 2013-09-12 Method and equipment for extracting keywords

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2013104151379A CN103473317A (en) 2013-09-12 2013-09-12 Method and equipment for extracting keywords

Publications (1)

Publication Number Publication Date
CN103473317A true CN103473317A (en) 2013-12-25

Family

ID=49798165

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2013104151379A Pending CN103473317A (en) 2013-09-12 2013-09-12 Method and equipment for extracting keywords

Country Status (1)

Country Link
CN (1) CN103473317A (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104679768A (en) * 2013-11-29 2015-06-03 百度在线网络技术(北京)有限公司 Method and device for extracting keywords from documents
CN105096942A (en) * 2014-05-21 2015-11-25 清华大学 Semantic analysis method and semantic analysis device
CN105808737A (en) * 2016-03-10 2016-07-27 腾讯科技(深圳)有限公司 Information retrieval method and server
CN105989040A (en) * 2015-02-03 2016-10-05 阿里巴巴集团控股有限公司 Intelligent question-answer method, device and system
CN106649422A (en) * 2016-06-12 2017-05-10 中国移动通信集团湖北有限公司 Keyword extraction method and apparatus
CN107203548A (en) * 2016-03-17 2017-09-26 阿里巴巴集团控股有限公司 Attribute acquisition methods and device
CN107885717A (en) * 2016-09-30 2018-04-06 腾讯科技(深圳)有限公司 A kind of keyword extracting method and device
CN107944589A (en) * 2016-10-12 2018-04-20 北京奇虎科技有限公司 The Forecasting Methodology and prediction meanss of ad click rate
CN108073568A (en) * 2016-11-10 2018-05-25 腾讯科技(深圳)有限公司 keyword extracting method and device
CN108334533A (en) * 2017-10-20 2018-07-27 腾讯科技(深圳)有限公司 keyword extracting method and device, storage medium and electronic device
CN108628861A (en) * 2017-03-15 2018-10-09 百度在线网络技术(北京)有限公司 Method and apparatus for pushed information
CN109658193A (en) * 2018-12-20 2019-04-19 拉扎斯网络科技(上海)有限公司 Determination method, apparatus, electronic equipment and the storage medium of system object importance
CN109902152A (en) * 2019-03-21 2019-06-18 北京百度网讯科技有限公司 Method and apparatus for retrieving information
CN110347900A (en) * 2019-07-10 2019-10-18 腾讯科技(深圳)有限公司 A kind of importance calculation method of keyword, device, server and medium
CN107085573B (en) * 2016-02-14 2020-08-11 北京国双科技有限公司 Hotspot information acquisition method and device
CN112397163A (en) * 2019-08-16 2021-02-23 北京大数医达科技有限公司 Method, apparatus, electronic device, and medium for generating case input model
CN112417130A (en) * 2020-11-19 2021-02-26 贝壳技术有限公司 Word screening method and device, computer readable storage medium and electronic equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060085181A1 (en) * 2004-10-20 2006-04-20 Kabushiki Kaisha Toshiba Keyword extraction apparatus and keyword extraction program
CN102033919A (en) * 2010-12-07 2011-04-27 北京新媒传信科技有限公司 Method and system for extracting text key words
CN102262625A (en) * 2009-12-24 2011-11-30 华为技术有限公司 Method and device for extracting keywords of page
CN103150388A (en) * 2013-03-21 2013-06-12 天脉聚源(北京)传媒科技有限公司 Method and device for extracting key words
CN103201718A (en) * 2010-11-05 2013-07-10 乐天株式会社 Systems and methods regarding keyword extraction
CN103235773A (en) * 2013-04-26 2013-08-07 百度在线网络技术(北京)有限公司 Method and device for extracting text labels based on keywords

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060085181A1 (en) * 2004-10-20 2006-04-20 Kabushiki Kaisha Toshiba Keyword extraction apparatus and keyword extraction program
CN102262625A (en) * 2009-12-24 2011-11-30 华为技术有限公司 Method and device for extracting keywords of page
CN103201718A (en) * 2010-11-05 2013-07-10 乐天株式会社 Systems and methods regarding keyword extraction
CN102033919A (en) * 2010-12-07 2011-04-27 北京新媒传信科技有限公司 Method and system for extracting text key words
CN103150388A (en) * 2013-03-21 2013-06-12 天脉聚源(北京)传媒科技有限公司 Method and device for extracting key words
CN103235773A (en) * 2013-04-26 2013-08-07 百度在线网络技术(北京)有限公司 Method and device for extracting text labels based on keywords

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
万源: "基于语义统计分析的网络舆情挖掘技术研究", 《中国博士学位论文全文数据库·信息科技辑》 *
谢晋: "基于词跨度的中文文本关键词提取及在文本分类中的应用", 《中国优秀硕士学位论文全文数据库·信息科技辑》 *

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104679768A (en) * 2013-11-29 2015-06-03 百度在线网络技术(北京)有限公司 Method and device for extracting keywords from documents
CN104679768B (en) * 2013-11-29 2019-08-09 百度在线网络技术(北京)有限公司 The method and apparatus of keyword is extracted from document
CN105096942A (en) * 2014-05-21 2015-11-25 清华大学 Semantic analysis method and semantic analysis device
CN105989040A (en) * 2015-02-03 2016-10-05 阿里巴巴集团控股有限公司 Intelligent question-answer method, device and system
CN105989040B (en) * 2015-02-03 2021-02-09 创新先进技术有限公司 Intelligent question and answer method, device and system
CN107085573B (en) * 2016-02-14 2020-08-11 北京国双科技有限公司 Hotspot information acquisition method and device
CN105808737A (en) * 2016-03-10 2016-07-27 腾讯科技(深圳)有限公司 Information retrieval method and server
CN107203548A (en) * 2016-03-17 2017-09-26 阿里巴巴集团控股有限公司 Attribute acquisition methods and device
CN106649422B (en) * 2016-06-12 2019-05-03 中国移动通信集团湖北有限公司 Keyword extracting method and device
CN106649422A (en) * 2016-06-12 2017-05-10 中国移动通信集团湖北有限公司 Keyword extraction method and apparatus
CN107885717A (en) * 2016-09-30 2018-04-06 腾讯科技(深圳)有限公司 A kind of keyword extracting method and device
CN107885717B (en) * 2016-09-30 2020-12-29 腾讯科技(深圳)有限公司 Keyword extraction method and device
CN107944589A (en) * 2016-10-12 2018-04-20 北京奇虎科技有限公司 The Forecasting Methodology and prediction meanss of ad click rate
CN108073568B (en) * 2016-11-10 2020-09-11 腾讯科技(深圳)有限公司 Keyword extraction method and device
US10878004B2 (en) 2016-11-10 2020-12-29 Tencent Technology (Shenzhen) Company Limited Keyword extraction method, apparatus and server
CN108073568A (en) * 2016-11-10 2018-05-25 腾讯科技(深圳)有限公司 keyword extracting method and device
CN108628861A (en) * 2017-03-15 2018-10-09 百度在线网络技术(北京)有限公司 Method and apparatus for pushed information
US11194965B2 (en) 2017-10-20 2021-12-07 Tencent Technology (Shenzhen) Company Limited Keyword extraction method and apparatus, storage medium, and electronic apparatus
CN108334533A (en) * 2017-10-20 2018-07-27 腾讯科技(深圳)有限公司 keyword extracting method and device, storage medium and electronic device
CN108334533B (en) * 2017-10-20 2021-12-24 腾讯科技(深圳)有限公司 Keyword extraction method and device, storage medium and electronic device
CN109658193A (en) * 2018-12-20 2019-04-19 拉扎斯网络科技(上海)有限公司 Determination method, apparatus, electronic equipment and the storage medium of system object importance
CN109902152A (en) * 2019-03-21 2019-06-18 北京百度网讯科技有限公司 Method and apparatus for retrieving information
CN110347900A (en) * 2019-07-10 2019-10-18 腾讯科技(深圳)有限公司 A kind of importance calculation method of keyword, device, server and medium
CN110347900B (en) * 2019-07-10 2022-12-27 腾讯科技(深圳)有限公司 Keyword importance calculation method, device, server and medium
CN112397163A (en) * 2019-08-16 2021-02-23 北京大数医达科技有限公司 Method, apparatus, electronic device, and medium for generating case input model
CN112397163B (en) * 2019-08-16 2024-02-02 北京大数医达科技有限公司 Method, apparatus, electronic device and medium for generating case input model
CN112417130A (en) * 2020-11-19 2021-02-26 贝壳技术有限公司 Word screening method and device, computer readable storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
CN103473317A (en) Method and equipment for extracting keywords
CN106649818B (en) Application search intention identification method and device, application search method and server
CN103049470B (en) Viewpoint searching method based on emotion degree of association
Pane et al. A multi-lable classification on topics of quranic verses in english translation using multinomial naive bayes
CN108255813B (en) Text matching method based on word frequency-inverse document and CRF
US20130198192A1 (en) Author disambiguation
CN103823896A (en) Subject characteristic value algorithm and subject characteristic value algorithm-based project evaluation expert recommendation algorithm
CN105279252A (en) Related word mining method, search method and search system
EP3057003A1 (en) Device for collecting contradictory expression and computer program for same
CN102411563A (en) Method, device and system for identifying target words
CN110750640A (en) Text data classification method and device based on neural network model and storage medium
CN105912716A (en) Short text classification method and apparatus
CN104199965A (en) Semantic information retrieval method
CN104317784A (en) Cross-platform user identification method and cross-platform user identification system
CN101702167A (en) Method for extracting attribution and comment word with template based on internet
US11893537B2 (en) Linguistic analysis of seed documents and peer groups
CN104361037B (en) Microblogging sorting technique and device
CN104881458A (en) Labeling method and device for web page topics
CN103544307B (en) A kind of multiple search engine automation contrast evaluating method independent of document library
CN103559313B (en) Searching method and device
US9652997B2 (en) Method and apparatus for building emotion basis lexeme information on an emotion lexicon comprising calculation of an emotion strength for each lexeme
CN104346326A (en) Method and device for determining emotional characteristics of emotional texts
CN105468649A (en) Method and apparatus for determining matching of to-be-displayed object
CN115329085A (en) Social robot classification method and system
CN112328469B (en) Function level defect positioning method based on embedding technology

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20131225