CN106202049A - A kind of hot word determines method and device - Google Patents

A kind of hot word determines method and device Download PDF

Info

Publication number
CN106202049A
CN106202049A CN201610565135.1A CN201610565135A CN106202049A CN 106202049 A CN106202049 A CN 106202049A CN 201610565135 A CN201610565135 A CN 201610565135A CN 106202049 A CN106202049 A CN 106202049A
Authority
CN
China
Prior art keywords
word
hot word
determined
probability
hot
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610565135.1A
Other languages
Chinese (zh)
Inventor
魏博
齐志兵
尹玉宗
姚键
潘柏宇
王冀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
1Verge Internet Technology Beijing Co Ltd
Original Assignee
1Verge Internet Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 1Verge Internet Technology Beijing Co Ltd filed Critical 1Verge Internet Technology Beijing Co Ltd
Priority to CN201610565135.1A priority Critical patent/CN106202049A/en
Publication of CN106202049A publication Critical patent/CN106202049A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/7867Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Library & Information Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Machine Translation (AREA)

Abstract

This application discloses a kind of hot word and determine method and device, the method includes: obtain prior probability corresponding to prior probability corresponding to hot word, non-hot word and word to be determined, determine that the hot word that this word to be determined is corresponding judges probability according to the prior probability that this hot word is corresponding, and determine that the non-hot word that this word to be determined is corresponding judges probability according to the prior probability that this non-hot word is corresponding, judge that probability and non-hot word corresponding to this word to be determined judge probability according to this word correspondence hot word to be determined, determine whether this word to be determined is hot word.By said method, when determining which word is hot word, the subjective experience that need not depend on people again goes to judge, but goes to judge which word is hot word by objective mode, effectively reduces the subjectivity artificially determining which word is hot word.

Description

A kind of hot word determines method and device
Technical field
The application relates to field of computer technology, particularly relates to a kind of hot word and determines method and device.
Background technology
Along with the development of network technology, people have become as in life the most retrievable one by network acquisition information Part, e.g., people can pass through Network Capture video information.
At present, user, when by Network Capture video information, is required by the search column input in website The mode of key word of video information search oneself required video information, follow-up, server is typically this key word Carry out mating search with the title of video information, and Search Results according to default aligning method (e.g., according to click volume by height To low arrangement) arrange, and the Search Results after arrangement is returned to user.
It addition, some third party website not only provides the user video information, it is also possible to by user, the video of oneself is believed Breath uploads on this third party website, and is supplied to other users by this third party website, but the user being to provide video is The video information oneself uploaded can be allowed to be seen by other users, generally utilize the Keywords matching principle of server (i.e., Key word is mated with the title of video information), the title of video information adds the multiple and nothing of video content own Close but be used by a user a lot of key word of number of times (in this application, a lot of by being used by a user number of times key definition For hot word, but and multiple unrelated with video content itself it is used by a user the key word that number of times is a lot of by adding in title Video information as cheating video information), so, server, after user inputs these hot words, will match these and add Add the video information of multiple hot word, and these video informations are supplied to user, owing to hot word is often to be brought by user to work as Doing key word, therefore, this also makes the video information adding multiple hot word can be often supplied to user, and is also possible to The click volume adding the video information of multiple hot word can be gradually increased, follow-up, server Search Results according to preset Aligning method (e.g., arrange from high to low according to click volume) when arranging, the video of multiple for this interpolation hot words will be believed Breath comes provided above to user, and after the really necessary video information of user can come.
In order to effectively find out this add in the title of video information multiple unrelated with video content itself still It is used by a user the video information of a lot of key word of number of times, it will usually first determine which word is hot word, and according to determining Hot word each video information is screened.
In the prior art, determine which word be hot word be by manually going to judge which word is heat by subjective experience Word.
Obviously, in the prior art, determining which word is that hot word excessively relies on artificial experience, subjectivity is the strongest.
Summary of the invention
The embodiment of the present application provides a kind of hot word to determine method and device, in order to solve which word is prior art determining For excessively relying on artificial experience, the problem that subjectivity is the strongest during hot word.
A kind of hot word that the embodiment of the present application provides determines method, including:
Obtain prior probability corresponding to prior probability corresponding to hot word, non-hot word and word to be determined;
Determine that the hot word that described word to be determined is corresponding judges probability, and root according to the prior probability that described hot word is corresponding Determine that the non-hot word that described word to be determined is corresponding judges probability according to the prior probability that described non-hot word is corresponding;
Probability and non-hot word corresponding to described word to be determined is judged according to described word correspondence hot word to be determined Judge probability, determine whether described word to be determined is hot word.
A kind of hot word that the embodiment of the present application provides determines device, including:
Acquisition module, for obtaining prior probability corresponding to prior probability corresponding to hot word, non-hot word and to be determined Each word;
Judge probability determination module, for determining described word pair to be determined according to the prior probability that described hot word is corresponding The hot word answered judges probability, and determines corresponding non-thermal of described word to be determined according to the prior probability that described non-hot word is corresponding Word judges probability;
Hot word determines module, for judging probability and described to be determined according to described word correspondence hot word to be determined The non-hot word that word is corresponding judges probability, determines whether described word to be determined is hot word.
The embodiment of the present application provides a kind of hot word to determine method and device, the method obtain prior probability corresponding to hot word, Prior probability that non-hot word is corresponding and word to be determined, determine this word to be determined according to the prior probability that this hot word is corresponding The hot word that language is corresponding judges probability, and determines corresponding non-thermal of this word to be determined according to the prior probability that this non-hot word is corresponding Word judges probability, judges that probability and non-hot word corresponding to this word to be determined are sentenced according to this word correspondence hot word to be determined Disconnected probability, determines whether this word to be determined is hot word.By said method, when determining which word is hot word, need not The subjective experience depending on people again goes to judge, but goes to judge which word is hot word by objective mode, effectively reduces Artificially determine which word is the subjectivity of hot word.
Accompanying drawing explanation
Accompanying drawing described herein is used for providing further understanding of the present application, constitutes the part of the application, this Shen Schematic description and description please is used for explaining the application, is not intended that the improper restriction to the application.In the accompanying drawings:
Fig. 1 determines the process schematic of method for the hot word that the embodiment of the present application provides;
Fig. 2 determines the structural representation of device for the hot word that the embodiment of the present application provides.
Detailed description of the invention
For making the purpose of the application, technical scheme and advantage clearer, below in conjunction with the application specific embodiment and Technical scheme is clearly and completely described by corresponding accompanying drawing.Obviously, described embodiment is only the application one Section Example rather than whole embodiments.Based on the embodiment in the application, those of ordinary skill in the art are not doing Go out the every other embodiment obtained under creative work premise, broadly fall into the scope of the application protection.
The hot word that Fig. 1 provides for the embodiment of the present application determines process, specifically includes following steps:
S101: obtain prior probability corresponding to prior probability corresponding to hot word, non-hot word and word to be determined.
In actual applications, in the title of video information, multiple and video content itself is added in order to effectively find out But the unrelated video information being used by a user a lot of key word of number of times, it will usually first determine which word is hot word, and root According to the hot word determined, each video information is screened.
Further, when determining which word is hot word, it is necessary first to obtain word to be determined, in this application, Specifically can obtain search word from each user's historical search record, this search word is defined as word to be determined, and obtains Word to be determined can be completed by server, it is also possible to the equipment that can be carried out data process by other completes.
Further, since when determining which word is hot word, determination method used in this application is public by Bayes Formula completes, and therefore, in this application, server, before obtaining word to be determined, needs first to determine that hot word is corresponding Prior probability (it is, in conventional historical data, the probability of hot word occur) and prior probability corresponding to non-hot word are (also It is exactly, in conventional historical data, the probability of non-hot word to occur), concrete, obtain the title of each video information, to respectively regarding Frequently the title of information carries out participle, determines quantity and the quantity of non-hot word of the hot word comprised in each participle, according to this hot word Quantity and the quantity of non-hot word, determine prior probability corresponding to hot word and prior probability corresponding to non-hot word, follow-up, clothes Business device not only to obtain word to be determined, in addition it is also necessary to obtains prior probability corresponding to hot word and priori corresponding to non-hot word Probability.
Further, during due to the quantity of the hot word comprised in each participle after determining participle, it is to be appreciated that after participle Which participle is hot word, therefore, in this application, can believe each video the most in advance before obtaining the title of each video information The title of breath confirms, confirms whether video information is cheating video information, if it is, by the title of cheating video information It is labeled as video information of practising fraud, if it is not, then the title of cheating video information is labeled as non-cheating video information (is not that is, Cheating video information), follow-up, after server obtains the title of the most labeled each video information, directly to each video The title of information carries out participle, in adding up each participle after participle during the quantity of hot word, and video information of cutting can being practised fraud Each participle obtained by title all as hot word, and, in each participle after determining participle during the quantity of non-hot word, can will cut Each participle obtained by the title of point non-cheating video information is all as non-hot word, certainly, and the one that the above-mentioned the application of being provides Embodiment, it is also possible to after determining participle by other embodiments, which participle is hot word, which is non-hot word, e.g., can According to manually determining which is hot word, which is non-hot word.
It addition, the application also provides for a kind of title to each video information carries out the mode of participle, concrete, can be according to word The part of speech of language, carries out participle to the title of each video information, thus obtains each participle.
Further, owing to, for word, the number of word can only be weighed by number, say, that be discrete , therefore, when determining prior probability corresponding to hot word and prior probability corresponding to non-hot word, can come really by the mode of statistics Determine prior probability corresponding to hot word and prior probability corresponding to non-hot word, concrete, determine the sum of each participle after participle Amount, by the ratio of the total quantity of each participle after the quantity of this hot word and the participle determined, the priori corresponding as hot word is general Rate, by the ratio of the total quantity of each participle after the quantity of this non-hot word and the participle determined, as the elder generation that non-hot word is corresponding Test probability.
At this it should be noted that in this application, word is only divided into two classes altogether, and a class is hot word, another kind of, is Non-hot word (that is, not being hot word).
For example, it is assumed that the title of the most labeled each video information of server acquisition is as shown in table 1:
The title of video information Whether it is cheating video
The most Du Yun Sheng Lijia of the pioneering mode Ma Yunchen peace of glamour is sincere It is
Trendy VIB pneumatic boat letter light assault boat line fishing boat No
Make progress every day what Gui Xie Na HNTV of happy base camp It is
Horse cloud is delivered a speech No
Happy base camp horse cloud is delivered a speech No
Table 1
Server carries out participle according to the part of speech of word to the title in above-mentioned table 1, and according to the determination in step S101 Which word be the mode of hot word determine hot word comprise " most glamour, pioneering mode, Ma Yun, Chen An it, Du Yunsheng, Lee Good really, make progress every day, happy base camp, He Gui, Xie Na, HNTV ", determine that non-hot word comprises " trendy, VIB, rubber Ship, letter light, assault boat, line fishing boat, Ma Yun, speech, happy base camp, Ma Yun, speech ".
The total quantity determining each participle after participle is 22, determines the quantity of the hot word comprised in each participle after participle It it is each participle after quantity and the participle determined of the hot word comprised in each participle after 11, and the participle that will determine The ratio 0.5 of total quantity, as the prior probability that hot word is corresponding;
Determine in the quantity 11 of the non-hot word comprised in each participle after participle, and each participle after the participle that will determine The ratio 0.5 of the total quantity of each participle after the quantity of the non-hot word comprised and the participle determined is corresponding as non-hot word Prior probability.
Follow-up, server can obtain prior probability 0.5 corresponding to prior probability corresponding to hot word 0.5, non-hot word and treat The word " Ma Yun " that determines (illustrate present invention thinking for convenience, this example only obtain a word to be determined, In actual applications, it is to need many words are determined).
S102: determine that the hot word that described word to be determined is corresponding judges according to the prior probability that described hot word is corresponding general Rate, and determine that the non-hot word that described word to be determined is corresponding judges probability according to the prior probability that described non-hot word is corresponding.
S103: judge probability and corresponding non-of described word to be determined according to described word correspondence hot word to be determined Hot word judges probability, determines whether described word to be determined is hot word.
Owing to being probably hot word for arbitrary word to be determined, it is also possible to be non-hot word, therefore, in this application, Useful hot word judge probability to represent that word to be determined is the probability size of hot word, hot word judges that probability is the biggest, then illustrates Word to be determined is that the probability of hot word is the biggest, otherwise, then illustrate that word to be determined is that the probability of hot word is the least, with this Meanwhile, judging that probability represents that word to be determined is the probability size of non-hot word by non-hot word, non-hot word judges that probability is more Greatly, then illustrate that word to be determined is that the probability of non-hot word is the biggest, otherwise, then illustrate word to be determined be non-hot word can Energy property is the least.
Further, owing to determining that the hot word that arbitrary word to be determined is corresponding judges the non-hot word of probability and correspondence During probability, this word to be determined is given, say, that in the case of given word to be determined, this is to be determined Word be the probability size (that is, hot word judgement probability) of hot word be the definition of eligible probability, therefore, in the application In, determine that hot word judges that probability and non-hot word judge that probability can be determined by Bayesian formula, concrete, after statistics participle Each participle in hot word in the number of times of word to be determined occurs, determine the number of hot word in each participle after this number of times and participle The ratio of amount, the product of the prior probability that this ratio is corresponding with this hot word judges as the hot word that this word to be determined is corresponding Probability.Further, the non-hot word in each participle after statistics participle occurs the number of times of word to be determined, determines this number of times and divide The ratio of the quantity of non-hot word in each participle after word, the product of the prior probability that this ratio is corresponding with this non-hot word is as this The non-hot word that word to be determined is corresponding judges probability.
Further, owing to hot word judges that the word being to be determined that probability represents is the probability size of hot word, rather than Hot word judges that the word being to be determined that probability represents is the probability size of non-hot word, therefore, in this application, is determining After the hot word that word to be determined is corresponding judges that probability and non-hot word judge probability, comparable corresponding according to word to be determined Hot word judges that the non-hot word that probability is corresponding with word to be determined judges the size of probability, if according to word correspondence heat to be determined Word judges that the non-hot word that probability is corresponding more than or equal to word to be determined judges probability, determines that this word to be determined is hot word, If judging that the non-hot word that probability is corresponding less than word to be determined judges probability according to word correspondence hot word to be determined, determine this Word to be determined is non-hot word.
It addition, at this it should be noted that owing to, after determining that hot word judges that probability and non-hot word judge probability, being Need hot word to judge, probability and non-hot word judge that probability compares size, again due in Bayesian formula, really The product of the prior probability that this ratio fixed is corresponding with this hot word and the product of this ratio prior probability corresponding with this non-hot word After, it is required for divided by identical numerical value, therefore, in this application, not be used in the priori determining that this ratio is corresponding with this hot word After the product of probability, the ratio of this product with numerical value is judged probability as hot word, but directly by this ratio and this hot word pair The product of the prior probability answered judges probability as the hot word that word to be determined is corresponding, in like manner, also directly by this ratio The product of the prior probability corresponding with this non-hot word judges probability as the non-hot word that word to be determined is corresponding.
Continuation of the previous cases, server obtain prior probability 0.5 corresponding to prior probability 0.5 corresponding to hot word, non-hot word with And after word " Ma Yun " to be determined, the hot word in each participle after statistics participle occurs the number of times of word to be determined, i.e. 1 Secondary, determine that in each participle after this number of times (that is, 1 time) and participle, the ratio of the quantity (that is, 11) of hot word is 0.09, by this ratio The product of the prior probability (that is, 0.5) that value (that is, 0.09) is corresponding with this hot word is sentenced as the hot word that this word to be determined is corresponding Disconnected probability, i.e. 0.045.
The non-hot word in each participle after statistics participle occurs the number of times of word to be determined, i.e. 2 times, determines this number of times In each participle after (that is, 2 times) and participle, the ratio of the quantity (that is, 11) of non-hot word is 0.18, by this ratio (that is, 0.18) The product of the prior probability (that is, 0.5) corresponding with this non-hot word judges probability as the non-hot word that this word to be determined is corresponding, That is, 0.09.
Server is determining that the hot word that " Ma Yun " (word to be determined) is corresponding judges that probability and non-hot word judge generally After rate, determine that " Ma Yun " (word to be determined) corresponding hot word judges that probability is more than the non-of " Ma Yun " (word to be determined) correspondence Hot word judges probability, therefore, " Ma Yun " is defined as hot word.
By said method, when determining which word is hot word, the subjective experience that need not depend on people again goes to judge, and It is to go to judge which word is hot word by objective mode, effectively reduces the subjectivity artificially determining which word is hot word Property.
The hot word provided for the embodiment of the present application above determines method, and based on same thinking, the embodiment of the present application also carries Device is determined for a kind of hot word.
As in figure 2 it is shown, a kind of hot word that the embodiment of the present application provides determines that device includes:
Acquisition module 201, for obtaining prior probability corresponding to prior probability corresponding to hot word, non-hot word and to be determined Each word;
Judge probability determination module 202, for determining described word to be determined according to the prior probability that described hot word is corresponding The hot word that language is corresponding judges probability, and determines that described word to be determined is corresponding according to the prior probability that described non-hot word is corresponding Non-hot word judges probability;
Hot word determines module 203, for according to described word correspondence hot word to be determined judge probability and described in treat true The non-hot word that fixed word is corresponding judges probability, determines whether described word to be determined is hot word.
Described device also includes:
Prior probability determines module 204, obtains prior probability corresponding to hot word, non-hot word for described acquisition module 201 Before corresponding prior probability and word to be determined, obtain the title of each video information, the mark to each described video information Topic carries out participle, determines quantity and the quantity of non-hot word of the hot word comprised in each participle, according to the quantity of described hot word with And the quantity of non-hot word, determine prior probability corresponding to hot word and prior probability corresponding to non-hot word.
Described prior probability determine module 204 specifically for, according to part of speech, the title of each described video information is carried out point Word.
Described prior probability determine module 204 specifically for, determine the total quantity of each participle after participle, by described hot word Quantity and the participle determined after the ratio of total quantity of each participle, as the prior probability that hot word is corresponding, by described non- The ratio of the total quantity of each participle after the quantity of hot word and the participle determined, as the prior probability that non-hot word is corresponding.
Described judgement probability determination module 202 specifically for, the hot word in each participle after statistics participle occurs treating really The number of times of fixed word, determines the ratio of the quantity of hot word in each participle after described number of times and participle, by described ratio and institute The product stating prior probability corresponding to hot word judges probability as the hot word that this word to be determined is corresponding.
Described judgement probability determination module 202 specifically for, statistics participle after each participle in non-hot word in occur treating The number of times of the word determined, determines the ratio of the quantity of non-hot word in each participle after described number of times and participle, by described ratio The product of the prior probability corresponding with described non-hot word judges probability as the non-hot word that this word to be determined is corresponding.
Described hot word determine module 203 specifically for, compare and judge probability according to described word correspondence hot word to be determined The non-hot word corresponding with described word to be determined judges the size of probability, if sentencing according to described word correspondence hot word to be determined The non-hot word that disconnected probability is corresponding more than or equal to described word to be determined judges probability, determines that described word to be determined is for warm According to described word correspondence hot word to be determined, word, if judging that the non-hot word that probability is corresponding less than described word to be determined judges Probability, determines that described word to be determined is non-hot word.
In a typical configuration, calculating equipment includes one or more processor (CPU), input/output interface, net Network interface and internal memory.
Internal memory potentially includes the volatile memory in computer-readable medium, random access memory (RAM) and/or The forms such as Nonvolatile memory, such as read only memory (ROM) or flash memory (flash RAM).Internal memory is computer-readable medium Example.
Computer-readable medium includes that removable media permanent and non-permanent, removable and non-can be by any method Or technology realizes information storage.Information can be computer-readable instruction, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase transition internal memory (PRAM), static RAM (SRAM), moves State random access memory (DRAM), other kinds of random access memory (RAM), read only memory (ROM), electric erasable Programmable read only memory (EEPROM), fast flash memory bank or other memory techniques, read-only optical disc read only memory (CD-ROM), Digital versatile disc (DVD) or other optical storage, magnetic cassette tape, the storage of tape magnetic rigid disk or other magnetic storage apparatus Or any other non-transmission medium, can be used for the information that storage can be accessed by a computing device.According to defining herein, calculate Machine computer-readable recording medium does not include temporary computer readable media (transitory media), such as data signal and the carrier wave of modulation.
Also, it should be noted term " includes ", " comprising " or its any other variant are intended to nonexcludability Comprise, so that include that the process of a series of key element, method, commodity or equipment not only include those key elements, but also wrap Include other key elements being not expressly set out, or also include want intrinsic for this process, method, commodity or equipment Element.In the case of there is no more restriction, statement " including ... " key element limited, it is not excluded that including described wanting Process, method, commodity or the equipment of element there is also other identical element.
It will be understood by those skilled in the art that embodiments herein can be provided as method, system or computer program. Therefore, the embodiment in terms of the application can use complete hardware embodiment, complete software implementation or combine software and hardware Form.And, the application can use can be with depositing at one or more computers wherein including computer usable program code The shape of the upper computer program implemented of storage media (including but not limited to disk memory, CD-ROM, optical memory etc.) Formula.
The foregoing is only embodiments herein, be not limited to the application.For those skilled in the art For, the application can have various modifications and variations.All made within spirit herein and principle any amendment, equivalent Replacement, improvement etc., within the scope of should be included in claims hereof.

Claims (14)

1. a hot word determines method, it is characterised in that including:
Obtain prior probability corresponding to prior probability corresponding to hot word, non-hot word and word to be determined;
Determine that the hot word that described word to be determined is corresponding judges probability according to the prior probability that described hot word is corresponding, and according to institute State the non-hot word that prior probability corresponding to non-hot word determine that described word to be determined is corresponding and judge probability;
Judge that probability and non-hot word corresponding to described word to be determined judge according to described word correspondence hot word to be determined Probability, determines whether described word to be determined is hot word.
2. the method for claim 1, it is characterised in that prior probability corresponding to hot word, non-hot word are corresponding obtaining Before prior probability and word to be determined, described method also includes:
Obtain the title of each video information;
The title of each described video information is carried out participle;
Determine quantity and the quantity of non-hot word of the hot word comprised in each participle;
Quantity according to described hot word and the quantity of non-hot word, determine that prior probability corresponding to hot word and non-hot word are corresponding Prior probability.
3. method as claimed in claim 2, it is characterised in that the title of each described video information is carried out participle, specifically wraps Include:
According to part of speech, the title of each described video information is carried out participle.
4. method as claimed in claim 2, it is characterised in that according to quantity and the quantity of non-hot word of described hot word, really Determine prior probability corresponding to hot word and prior probability corresponding to non-hot word, specifically include:
Determine the total quantity of each participle after participle;
By the ratio of the total quantity of each participle after the quantity of described hot word and the participle determined, as the priori that hot word is corresponding Probability;
By the ratio of the total quantity of each participle after the quantity of described non-hot word and the participle determined, corresponding as non-hot word Prior probability.
5. the method for claim 1, it is characterised in that determine that this is to be determined according to the prior probability that described hot word is corresponding Hot word corresponding to word judge probability, specifically include:
The hot word in each participle after statistics participle occurs the number of times of word to be determined;
Determine the ratio of the quantity of hot word in each participle after described number of times and participle;
The product of the prior probability that described ratio is corresponding with described hot word judges as the hot word that this word to be determined is corresponding Probability.
6. the method for claim 1, it is characterised in that determine that this is treated really according to the prior probability that described non-hot word is corresponding The non-hot word that fixed word is corresponding judges probability, specifically includes:
The non-hot word in each participle after statistics participle occurs the number of times of word to be determined;
Determine the ratio of the quantity of non-hot word in each participle after described number of times and participle;
The product of the prior probability that described ratio is corresponding with described non-hot word is as non-hot word corresponding to this word to be determined Judge probability.
7. the method for claim 1, it is characterised in that according to described word correspondence hot word to be determined judge probability with And non-hot word corresponding to described word to be determined judges probability, determine whether described word to be determined is hot word, specifically wrap Include:
Relatively judge that the non-hot word that probability is corresponding with described word to be determined is sentenced according to described word correspondence hot word to be determined The size of disconnected probability;
If judging that probability is more than or equal to corresponding non-thermal of described word to be determined according to described word correspondence hot word to be determined Word judges probability, determines that described word to be determined is hot word;
If judging that the non-hot word that probability is corresponding less than described word to be determined is sentenced according to described word correspondence hot word to be determined Disconnected probability, determines that described word to be determined is non-hot word.
8. a hot word determines device, it is characterised in that including:
Acquisition module, for obtaining prior probability corresponding to prior probability corresponding to hot word, non-hot word and each word to be determined Language;
Judge probability determination module, for determining that described word to be determined is corresponding according to the prior probability that described hot word is corresponding Hot word judges probability, and determines that the non-hot word that described word to be determined is corresponding is sentenced according to the prior probability that described non-hot word is corresponding Disconnected probability;
Hot word determines module, for judging probability and described word to be determined according to described word correspondence hot word to be determined Corresponding non-hot word judges probability, determines whether described word to be determined is hot word.
9. device as claimed in claim 8, it is characterised in that described device also includes:
Prior probability determines module, obtains, for described acquisition module, the priori that prior probability corresponding to hot word, non-hot word are corresponding Before probability and word to be determined, obtain the title of each video information, the title of each described video information carried out participle, Determine quantity and the quantity of non-hot word of the hot word comprised in each participle, according to quantity and the number of non-hot word of described hot word Amount, determines prior probability corresponding to hot word and prior probability corresponding to non-hot word.
10. device as claimed in claim 9, it is characterised in that described prior probability determine module specifically for, according to word Property, the title of each described video information is carried out participle.
11. devices as claimed in claim 9, it is characterised in that described prior probability determine module specifically for, determine participle After the total quantity of each participle, by the ratio of the total quantity of each participle after the quantity of described hot word and the participle determined, make For the prior probability that hot word is corresponding, by the ratio of the total quantity of each participle after the quantity of described non-hot word and the participle determined Value, as the prior probability that non-hot word is corresponding.
12. devices as claimed in claim 8, it is characterised in that described judgement probability determination module specifically for, add up participle After each participle in hot word in the number of times of word to be determined occurs, determine hot word in each participle after described number of times and participle The ratio of quantity, the product of the prior probability that described ratio is corresponding with described hot word is corresponding as this word to be determined Hot word judges probability.
13. devices as claimed in claim 8, it is characterised in that described judgement probability determination module specifically for, add up participle After each participle in non-hot word in the number of times of word to be determined occurs, determine in each participle after described number of times and participle non- The ratio of the quantity of hot word, the product of the prior probability that described ratio is corresponding with described non-hot word is as this word to be determined Corresponding non-hot word judges probability.
14. devices as claimed in claim 8, it is characterised in that described hot word determine module specifically for, compare according to described Word correspondence hot word to be determined judges that the non-hot word that probability is corresponding with described word to be determined judges the size of probability, if root Judge that the non-hot word that probability is corresponding more than or equal to described word to be determined judges according to described word correspondence hot word to be determined general Rate, determines that described word to be determined is hot word, if judging that probability is less than described according to described word correspondence hot word to be determined The non-hot word that word to be determined is corresponding judges probability, determines that described word to be determined is non-hot word.
CN201610565135.1A 2016-07-18 2016-07-18 A kind of hot word determines method and device Pending CN106202049A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610565135.1A CN106202049A (en) 2016-07-18 2016-07-18 A kind of hot word determines method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610565135.1A CN106202049A (en) 2016-07-18 2016-07-18 A kind of hot word determines method and device

Publications (1)

Publication Number Publication Date
CN106202049A true CN106202049A (en) 2016-12-07

Family

ID=57493872

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610565135.1A Pending CN106202049A (en) 2016-07-18 2016-07-18 A kind of hot word determines method and device

Country Status (1)

Country Link
CN (1) CN106202049A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109840445A (en) * 2017-11-24 2019-06-04 优酷网络技术(北京)有限公司 A kind of recognition methods and system of video of practising fraud
CN111930949A (en) * 2020-09-11 2020-11-13 腾讯科技(深圳)有限公司 Search string processing method and device, computer readable medium and electronic equipment
CN114938477A (en) * 2022-06-23 2022-08-23 阿里巴巴(中国)有限公司 Video topic determination method, device and equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103218368A (en) * 2012-01-20 2013-07-24 深圳市腾讯计算机系统有限公司 Method and device for discovering hot words
US20150081431A1 (en) * 2013-09-18 2015-03-19 Yahoo Japan Corporation Posterior probability calculating apparatus, posterior probability calculating method, and non-transitory computer-readable recording medium
CN104462347A (en) * 2014-12-04 2015-03-25 北京国双科技有限公司 Keyword classifying method and device
CN104615640A (en) * 2014-11-28 2015-05-13 百度在线网络技术(北京)有限公司 Method and device for providing searching keywords and carrying out searching

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103218368A (en) * 2012-01-20 2013-07-24 深圳市腾讯计算机系统有限公司 Method and device for discovering hot words
US20150081431A1 (en) * 2013-09-18 2015-03-19 Yahoo Japan Corporation Posterior probability calculating apparatus, posterior probability calculating method, and non-transitory computer-readable recording medium
CN104615640A (en) * 2014-11-28 2015-05-13 百度在线网络技术(北京)有限公司 Method and device for providing searching keywords and carrying out searching
CN104462347A (en) * 2014-12-04 2015-03-25 北京国双科技有限公司 Keyword classifying method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王锦波 等: "一种改进的朴素贝叶斯关键词提取算法研究", 《计算机应用与软件》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109840445A (en) * 2017-11-24 2019-06-04 优酷网络技术(北京)有限公司 A kind of recognition methods and system of video of practising fraud
CN109840445B (en) * 2017-11-24 2021-10-01 阿里巴巴(中国)有限公司 Method and system for identifying cheating videos
CN111930949A (en) * 2020-09-11 2020-11-13 腾讯科技(深圳)有限公司 Search string processing method and device, computer readable medium and electronic equipment
CN114938477A (en) * 2022-06-23 2022-08-23 阿里巴巴(中国)有限公司 Video topic determination method, device and equipment
CN114938477B (en) * 2022-06-23 2024-05-03 阿里巴巴(中国)有限公司 Video topic determination method, device and equipment

Similar Documents

Publication Publication Date Title
CN108920654B (en) Question and answer text semantic matching method and device
CN106649401A (en) Data writing method and device of distributed file system
US11507743B2 (en) System and method for automatic key phrase extraction rule generation
CN107451854B (en) Method and device for determining user type and electronic equipment
CN109325055A (en) The screening of business association tables of data and checking method, device, electronic equipment
CN112559895B (en) Data processing method and device, electronic equipment and storage medium
CN110162778B (en) Text abstract generation method and device
CN106202049A (en) A kind of hot word determines method and device
CN112200132A (en) Data processing method, device and equipment based on privacy protection
US10956976B2 (en) Recommending shared products
CN107391535A (en) The method and device of document is searched in document application
CN107391564B (en) Data conversion method and device and electronic equipment
CN109582834B (en) Data risk prediction method and device
CN109063967B (en) Processing method and device for wind control scene feature tensor and electronic equipment
CN108595395B (en) Nickname generation method, device and equipment
CN111144098B (en) Recall method and device for extended question
CN116910345A (en) Label recommending method, device, equipment and storage medium
CN114519529A (en) Enterprise credit rating method, device and medium based on convolution self-encoder
CN108108345A (en) For determining the method and apparatus of theme of news
CN112580915A (en) Project milestone determination method and device, storage medium and electronic equipment
CN111737554A (en) Scoring model training method, electronic book scoring method and device
CN111737461A (en) Text processing method and device, electronic equipment and computer readable storage medium
CN104572951A (en) Ability label determining method
CN110969019A (en) Method and device for disambiguating name
CN112445973B (en) Method, device, storage medium and computer equipment for searching items

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100080 A 5 C, block A, China International Steel Plaza, 8 Haidian Avenue, Haidian District, Beijing.

Applicant after: Youku network technology (Beijing) Co., Ltd.

Address before: 100080 A 5 C, block A, China International Steel Plaza, 8 Haidian Avenue, Haidian District, Beijing.

Applicant before: 1Verge Inc.

CB02 Change of applicant information
RJ01 Rejection of invention patent application after publication

Application publication date: 20161207

RJ01 Rejection of invention patent application after publication