CN106202049A - A kind of hot word determines method and device - Google Patents
A kind of hot word determines method and device Download PDFInfo
- Publication number
- CN106202049A CN106202049A CN201610565135.1A CN201610565135A CN106202049A CN 106202049 A CN106202049 A CN 106202049A CN 201610565135 A CN201610565135 A CN 201610565135A CN 106202049 A CN106202049 A CN 106202049A
- Authority
- CN
- China
- Prior art keywords
- word
- hot word
- determined
- probability
- hot
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/7867—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Multimedia (AREA)
- Library & Information Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Machine Translation (AREA)
Abstract
This application discloses a kind of hot word and determine method and device, the method includes: obtain prior probability corresponding to prior probability corresponding to hot word, non-hot word and word to be determined, determine that the hot word that this word to be determined is corresponding judges probability according to the prior probability that this hot word is corresponding, and determine that the non-hot word that this word to be determined is corresponding judges probability according to the prior probability that this non-hot word is corresponding, judge that probability and non-hot word corresponding to this word to be determined judge probability according to this word correspondence hot word to be determined, determine whether this word to be determined is hot word.By said method, when determining which word is hot word, the subjective experience that need not depend on people again goes to judge, but goes to judge which word is hot word by objective mode, effectively reduces the subjectivity artificially determining which word is hot word.
Description
Technical field
The application relates to field of computer technology, particularly relates to a kind of hot word and determines method and device.
Background technology
Along with the development of network technology, people have become as in life the most retrievable one by network acquisition information
Part, e.g., people can pass through Network Capture video information.
At present, user, when by Network Capture video information, is required by the search column input in website
The mode of key word of video information search oneself required video information, follow-up, server is typically this key word
Carry out mating search with the title of video information, and Search Results according to default aligning method (e.g., according to click volume by height
To low arrangement) arrange, and the Search Results after arrangement is returned to user.
It addition, some third party website not only provides the user video information, it is also possible to by user, the video of oneself is believed
Breath uploads on this third party website, and is supplied to other users by this third party website, but the user being to provide video is
The video information oneself uploaded can be allowed to be seen by other users, generally utilize the Keywords matching principle of server (i.e.,
Key word is mated with the title of video information), the title of video information adds the multiple and nothing of video content own
Close but be used by a user a lot of key word of number of times (in this application, a lot of by being used by a user number of times key definition
For hot word, but and multiple unrelated with video content itself it is used by a user the key word that number of times is a lot of by adding in title
Video information as cheating video information), so, server, after user inputs these hot words, will match these and add
Add the video information of multiple hot word, and these video informations are supplied to user, owing to hot word is often to be brought by user to work as
Doing key word, therefore, this also makes the video information adding multiple hot word can be often supplied to user, and is also possible to
The click volume adding the video information of multiple hot word can be gradually increased, follow-up, server Search Results according to preset
Aligning method (e.g., arrange from high to low according to click volume) when arranging, the video of multiple for this interpolation hot words will be believed
Breath comes provided above to user, and after the really necessary video information of user can come.
In order to effectively find out this add in the title of video information multiple unrelated with video content itself still
It is used by a user the video information of a lot of key word of number of times, it will usually first determine which word is hot word, and according to determining
Hot word each video information is screened.
In the prior art, determine which word be hot word be by manually going to judge which word is heat by subjective experience
Word.
Obviously, in the prior art, determining which word is that hot word excessively relies on artificial experience, subjectivity is the strongest.
Summary of the invention
The embodiment of the present application provides a kind of hot word to determine method and device, in order to solve which word is prior art determining
For excessively relying on artificial experience, the problem that subjectivity is the strongest during hot word.
A kind of hot word that the embodiment of the present application provides determines method, including:
Obtain prior probability corresponding to prior probability corresponding to hot word, non-hot word and word to be determined;
Determine that the hot word that described word to be determined is corresponding judges probability, and root according to the prior probability that described hot word is corresponding
Determine that the non-hot word that described word to be determined is corresponding judges probability according to the prior probability that described non-hot word is corresponding;
Probability and non-hot word corresponding to described word to be determined is judged according to described word correspondence hot word to be determined
Judge probability, determine whether described word to be determined is hot word.
A kind of hot word that the embodiment of the present application provides determines device, including:
Acquisition module, for obtaining prior probability corresponding to prior probability corresponding to hot word, non-hot word and to be determined
Each word;
Judge probability determination module, for determining described word pair to be determined according to the prior probability that described hot word is corresponding
The hot word answered judges probability, and determines corresponding non-thermal of described word to be determined according to the prior probability that described non-hot word is corresponding
Word judges probability;
Hot word determines module, for judging probability and described to be determined according to described word correspondence hot word to be determined
The non-hot word that word is corresponding judges probability, determines whether described word to be determined is hot word.
The embodiment of the present application provides a kind of hot word to determine method and device, the method obtain prior probability corresponding to hot word,
Prior probability that non-hot word is corresponding and word to be determined, determine this word to be determined according to the prior probability that this hot word is corresponding
The hot word that language is corresponding judges probability, and determines corresponding non-thermal of this word to be determined according to the prior probability that this non-hot word is corresponding
Word judges probability, judges that probability and non-hot word corresponding to this word to be determined are sentenced according to this word correspondence hot word to be determined
Disconnected probability, determines whether this word to be determined is hot word.By said method, when determining which word is hot word, need not
The subjective experience depending on people again goes to judge, but goes to judge which word is hot word by objective mode, effectively reduces
Artificially determine which word is the subjectivity of hot word.
Accompanying drawing explanation
Accompanying drawing described herein is used for providing further understanding of the present application, constitutes the part of the application, this Shen
Schematic description and description please is used for explaining the application, is not intended that the improper restriction to the application.In the accompanying drawings:
Fig. 1 determines the process schematic of method for the hot word that the embodiment of the present application provides;
Fig. 2 determines the structural representation of device for the hot word that the embodiment of the present application provides.
Detailed description of the invention
For making the purpose of the application, technical scheme and advantage clearer, below in conjunction with the application specific embodiment and
Technical scheme is clearly and completely described by corresponding accompanying drawing.Obviously, described embodiment is only the application one
Section Example rather than whole embodiments.Based on the embodiment in the application, those of ordinary skill in the art are not doing
Go out the every other embodiment obtained under creative work premise, broadly fall into the scope of the application protection.
The hot word that Fig. 1 provides for the embodiment of the present application determines process, specifically includes following steps:
S101: obtain prior probability corresponding to prior probability corresponding to hot word, non-hot word and word to be determined.
In actual applications, in the title of video information, multiple and video content itself is added in order to effectively find out
But the unrelated video information being used by a user a lot of key word of number of times, it will usually first determine which word is hot word, and root
According to the hot word determined, each video information is screened.
Further, when determining which word is hot word, it is necessary first to obtain word to be determined, in this application,
Specifically can obtain search word from each user's historical search record, this search word is defined as word to be determined, and obtains
Word to be determined can be completed by server, it is also possible to the equipment that can be carried out data process by other completes.
Further, since when determining which word is hot word, determination method used in this application is public by Bayes
Formula completes, and therefore, in this application, server, before obtaining word to be determined, needs first to determine that hot word is corresponding
Prior probability (it is, in conventional historical data, the probability of hot word occur) and prior probability corresponding to non-hot word are (also
It is exactly, in conventional historical data, the probability of non-hot word to occur), concrete, obtain the title of each video information, to respectively regarding
Frequently the title of information carries out participle, determines quantity and the quantity of non-hot word of the hot word comprised in each participle, according to this hot word
Quantity and the quantity of non-hot word, determine prior probability corresponding to hot word and prior probability corresponding to non-hot word, follow-up, clothes
Business device not only to obtain word to be determined, in addition it is also necessary to obtains prior probability corresponding to hot word and priori corresponding to non-hot word
Probability.
Further, during due to the quantity of the hot word comprised in each participle after determining participle, it is to be appreciated that after participle
Which participle is hot word, therefore, in this application, can believe each video the most in advance before obtaining the title of each video information
The title of breath confirms, confirms whether video information is cheating video information, if it is, by the title of cheating video information
It is labeled as video information of practising fraud, if it is not, then the title of cheating video information is labeled as non-cheating video information (is not that is,
Cheating video information), follow-up, after server obtains the title of the most labeled each video information, directly to each video
The title of information carries out participle, in adding up each participle after participle during the quantity of hot word, and video information of cutting can being practised fraud
Each participle obtained by title all as hot word, and, in each participle after determining participle during the quantity of non-hot word, can will cut
Each participle obtained by the title of point non-cheating video information is all as non-hot word, certainly, and the one that the above-mentioned the application of being provides
Embodiment, it is also possible to after determining participle by other embodiments, which participle is hot word, which is non-hot word, e.g., can
According to manually determining which is hot word, which is non-hot word.
It addition, the application also provides for a kind of title to each video information carries out the mode of participle, concrete, can be according to word
The part of speech of language, carries out participle to the title of each video information, thus obtains each participle.
Further, owing to, for word, the number of word can only be weighed by number, say, that be discrete
, therefore, when determining prior probability corresponding to hot word and prior probability corresponding to non-hot word, can come really by the mode of statistics
Determine prior probability corresponding to hot word and prior probability corresponding to non-hot word, concrete, determine the sum of each participle after participle
Amount, by the ratio of the total quantity of each participle after the quantity of this hot word and the participle determined, the priori corresponding as hot word is general
Rate, by the ratio of the total quantity of each participle after the quantity of this non-hot word and the participle determined, as the elder generation that non-hot word is corresponding
Test probability.
At this it should be noted that in this application, word is only divided into two classes altogether, and a class is hot word, another kind of, is
Non-hot word (that is, not being hot word).
For example, it is assumed that the title of the most labeled each video information of server acquisition is as shown in table 1:
The title of video information | Whether it is cheating video |
The most Du Yun Sheng Lijia of the pioneering mode Ma Yunchen peace of glamour is sincere | It is |
Trendy VIB pneumatic boat letter light assault boat line fishing boat | No |
Make progress every day what Gui Xie Na HNTV of happy base camp | It is |
Horse cloud is delivered a speech | No |
Happy base camp horse cloud is delivered a speech | No |
Table 1
Server carries out participle according to the part of speech of word to the title in above-mentioned table 1, and according to the determination in step S101
Which word be the mode of hot word determine hot word comprise " most glamour, pioneering mode, Ma Yun, Chen An it, Du Yunsheng, Lee
Good really, make progress every day, happy base camp, He Gui, Xie Na, HNTV ", determine that non-hot word comprises " trendy, VIB, rubber
Ship, letter light, assault boat, line fishing boat, Ma Yun, speech, happy base camp, Ma Yun, speech ".
The total quantity determining each participle after participle is 22, determines the quantity of the hot word comprised in each participle after participle
It it is each participle after quantity and the participle determined of the hot word comprised in each participle after 11, and the participle that will determine
The ratio 0.5 of total quantity, as the prior probability that hot word is corresponding;
Determine in the quantity 11 of the non-hot word comprised in each participle after participle, and each participle after the participle that will determine
The ratio 0.5 of the total quantity of each participle after the quantity of the non-hot word comprised and the participle determined is corresponding as non-hot word
Prior probability.
Follow-up, server can obtain prior probability 0.5 corresponding to prior probability corresponding to hot word 0.5, non-hot word and treat
The word " Ma Yun " that determines (illustrate present invention thinking for convenience, this example only obtain a word to be determined,
In actual applications, it is to need many words are determined).
S102: determine that the hot word that described word to be determined is corresponding judges according to the prior probability that described hot word is corresponding general
Rate, and determine that the non-hot word that described word to be determined is corresponding judges probability according to the prior probability that described non-hot word is corresponding.
S103: judge probability and corresponding non-of described word to be determined according to described word correspondence hot word to be determined
Hot word judges probability, determines whether described word to be determined is hot word.
Owing to being probably hot word for arbitrary word to be determined, it is also possible to be non-hot word, therefore, in this application,
Useful hot word judge probability to represent that word to be determined is the probability size of hot word, hot word judges that probability is the biggest, then illustrates
Word to be determined is that the probability of hot word is the biggest, otherwise, then illustrate that word to be determined is that the probability of hot word is the least, with this
Meanwhile, judging that probability represents that word to be determined is the probability size of non-hot word by non-hot word, non-hot word judges that probability is more
Greatly, then illustrate that word to be determined is that the probability of non-hot word is the biggest, otherwise, then illustrate word to be determined be non-hot word can
Energy property is the least.
Further, owing to determining that the hot word that arbitrary word to be determined is corresponding judges the non-hot word of probability and correspondence
During probability, this word to be determined is given, say, that in the case of given word to be determined, this is to be determined
Word be the probability size (that is, hot word judgement probability) of hot word be the definition of eligible probability, therefore, in the application
In, determine that hot word judges that probability and non-hot word judge that probability can be determined by Bayesian formula, concrete, after statistics participle
Each participle in hot word in the number of times of word to be determined occurs, determine the number of hot word in each participle after this number of times and participle
The ratio of amount, the product of the prior probability that this ratio is corresponding with this hot word judges as the hot word that this word to be determined is corresponding
Probability.Further, the non-hot word in each participle after statistics participle occurs the number of times of word to be determined, determines this number of times and divide
The ratio of the quantity of non-hot word in each participle after word, the product of the prior probability that this ratio is corresponding with this non-hot word is as this
The non-hot word that word to be determined is corresponding judges probability.
Further, owing to hot word judges that the word being to be determined that probability represents is the probability size of hot word, rather than
Hot word judges that the word being to be determined that probability represents is the probability size of non-hot word, therefore, in this application, is determining
After the hot word that word to be determined is corresponding judges that probability and non-hot word judge probability, comparable corresponding according to word to be determined
Hot word judges that the non-hot word that probability is corresponding with word to be determined judges the size of probability, if according to word correspondence heat to be determined
Word judges that the non-hot word that probability is corresponding more than or equal to word to be determined judges probability, determines that this word to be determined is hot word,
If judging that the non-hot word that probability is corresponding less than word to be determined judges probability according to word correspondence hot word to be determined, determine this
Word to be determined is non-hot word.
It addition, at this it should be noted that owing to, after determining that hot word judges that probability and non-hot word judge probability, being
Need hot word to judge, probability and non-hot word judge that probability compares size, again due in Bayesian formula, really
The product of the prior probability that this ratio fixed is corresponding with this hot word and the product of this ratio prior probability corresponding with this non-hot word
After, it is required for divided by identical numerical value, therefore, in this application, not be used in the priori determining that this ratio is corresponding with this hot word
After the product of probability, the ratio of this product with numerical value is judged probability as hot word, but directly by this ratio and this hot word pair
The product of the prior probability answered judges probability as the hot word that word to be determined is corresponding, in like manner, also directly by this ratio
The product of the prior probability corresponding with this non-hot word judges probability as the non-hot word that word to be determined is corresponding.
Continuation of the previous cases, server obtain prior probability 0.5 corresponding to prior probability 0.5 corresponding to hot word, non-hot word with
And after word " Ma Yun " to be determined, the hot word in each participle after statistics participle occurs the number of times of word to be determined, i.e. 1
Secondary, determine that in each participle after this number of times (that is, 1 time) and participle, the ratio of the quantity (that is, 11) of hot word is 0.09, by this ratio
The product of the prior probability (that is, 0.5) that value (that is, 0.09) is corresponding with this hot word is sentenced as the hot word that this word to be determined is corresponding
Disconnected probability, i.e. 0.045.
The non-hot word in each participle after statistics participle occurs the number of times of word to be determined, i.e. 2 times, determines this number of times
In each participle after (that is, 2 times) and participle, the ratio of the quantity (that is, 11) of non-hot word is 0.18, by this ratio (that is, 0.18)
The product of the prior probability (that is, 0.5) corresponding with this non-hot word judges probability as the non-hot word that this word to be determined is corresponding,
That is, 0.09.
Server is determining that the hot word that " Ma Yun " (word to be determined) is corresponding judges that probability and non-hot word judge generally
After rate, determine that " Ma Yun " (word to be determined) corresponding hot word judges that probability is more than the non-of " Ma Yun " (word to be determined) correspondence
Hot word judges probability, therefore, " Ma Yun " is defined as hot word.
By said method, when determining which word is hot word, the subjective experience that need not depend on people again goes to judge, and
It is to go to judge which word is hot word by objective mode, effectively reduces the subjectivity artificially determining which word is hot word
Property.
The hot word provided for the embodiment of the present application above determines method, and based on same thinking, the embodiment of the present application also carries
Device is determined for a kind of hot word.
As in figure 2 it is shown, a kind of hot word that the embodiment of the present application provides determines that device includes:
Acquisition module 201, for obtaining prior probability corresponding to prior probability corresponding to hot word, non-hot word and to be determined
Each word;
Judge probability determination module 202, for determining described word to be determined according to the prior probability that described hot word is corresponding
The hot word that language is corresponding judges probability, and determines that described word to be determined is corresponding according to the prior probability that described non-hot word is corresponding
Non-hot word judges probability;
Hot word determines module 203, for according to described word correspondence hot word to be determined judge probability and described in treat true
The non-hot word that fixed word is corresponding judges probability, determines whether described word to be determined is hot word.
Described device also includes:
Prior probability determines module 204, obtains prior probability corresponding to hot word, non-hot word for described acquisition module 201
Before corresponding prior probability and word to be determined, obtain the title of each video information, the mark to each described video information
Topic carries out participle, determines quantity and the quantity of non-hot word of the hot word comprised in each participle, according to the quantity of described hot word with
And the quantity of non-hot word, determine prior probability corresponding to hot word and prior probability corresponding to non-hot word.
Described prior probability determine module 204 specifically for, according to part of speech, the title of each described video information is carried out point
Word.
Described prior probability determine module 204 specifically for, determine the total quantity of each participle after participle, by described hot word
Quantity and the participle determined after the ratio of total quantity of each participle, as the prior probability that hot word is corresponding, by described non-
The ratio of the total quantity of each participle after the quantity of hot word and the participle determined, as the prior probability that non-hot word is corresponding.
Described judgement probability determination module 202 specifically for, the hot word in each participle after statistics participle occurs treating really
The number of times of fixed word, determines the ratio of the quantity of hot word in each participle after described number of times and participle, by described ratio and institute
The product stating prior probability corresponding to hot word judges probability as the hot word that this word to be determined is corresponding.
Described judgement probability determination module 202 specifically for, statistics participle after each participle in non-hot word in occur treating
The number of times of the word determined, determines the ratio of the quantity of non-hot word in each participle after described number of times and participle, by described ratio
The product of the prior probability corresponding with described non-hot word judges probability as the non-hot word that this word to be determined is corresponding.
Described hot word determine module 203 specifically for, compare and judge probability according to described word correspondence hot word to be determined
The non-hot word corresponding with described word to be determined judges the size of probability, if sentencing according to described word correspondence hot word to be determined
The non-hot word that disconnected probability is corresponding more than or equal to described word to be determined judges probability, determines that described word to be determined is for warm
According to described word correspondence hot word to be determined, word, if judging that the non-hot word that probability is corresponding less than described word to be determined judges
Probability, determines that described word to be determined is non-hot word.
In a typical configuration, calculating equipment includes one or more processor (CPU), input/output interface, net
Network interface and internal memory.
Internal memory potentially includes the volatile memory in computer-readable medium, random access memory (RAM) and/or
The forms such as Nonvolatile memory, such as read only memory (ROM) or flash memory (flash RAM).Internal memory is computer-readable medium
Example.
Computer-readable medium includes that removable media permanent and non-permanent, removable and non-can be by any method
Or technology realizes information storage.Information can be computer-readable instruction, data structure, the module of program or other data.
The example of the storage medium of computer includes, but are not limited to phase transition internal memory (PRAM), static RAM (SRAM), moves
State random access memory (DRAM), other kinds of random access memory (RAM), read only memory (ROM), electric erasable
Programmable read only memory (EEPROM), fast flash memory bank or other memory techniques, read-only optical disc read only memory (CD-ROM),
Digital versatile disc (DVD) or other optical storage, magnetic cassette tape, the storage of tape magnetic rigid disk or other magnetic storage apparatus
Or any other non-transmission medium, can be used for the information that storage can be accessed by a computing device.According to defining herein, calculate
Machine computer-readable recording medium does not include temporary computer readable media (transitory media), such as data signal and the carrier wave of modulation.
Also, it should be noted term " includes ", " comprising " or its any other variant are intended to nonexcludability
Comprise, so that include that the process of a series of key element, method, commodity or equipment not only include those key elements, but also wrap
Include other key elements being not expressly set out, or also include want intrinsic for this process, method, commodity or equipment
Element.In the case of there is no more restriction, statement " including ... " key element limited, it is not excluded that including described wanting
Process, method, commodity or the equipment of element there is also other identical element.
It will be understood by those skilled in the art that embodiments herein can be provided as method, system or computer program.
Therefore, the embodiment in terms of the application can use complete hardware embodiment, complete software implementation or combine software and hardware
Form.And, the application can use can be with depositing at one or more computers wherein including computer usable program code
The shape of the upper computer program implemented of storage media (including but not limited to disk memory, CD-ROM, optical memory etc.)
Formula.
The foregoing is only embodiments herein, be not limited to the application.For those skilled in the art
For, the application can have various modifications and variations.All made within spirit herein and principle any amendment, equivalent
Replacement, improvement etc., within the scope of should be included in claims hereof.
Claims (14)
1. a hot word determines method, it is characterised in that including:
Obtain prior probability corresponding to prior probability corresponding to hot word, non-hot word and word to be determined;
Determine that the hot word that described word to be determined is corresponding judges probability according to the prior probability that described hot word is corresponding, and according to institute
State the non-hot word that prior probability corresponding to non-hot word determine that described word to be determined is corresponding and judge probability;
Judge that probability and non-hot word corresponding to described word to be determined judge according to described word correspondence hot word to be determined
Probability, determines whether described word to be determined is hot word.
2. the method for claim 1, it is characterised in that prior probability corresponding to hot word, non-hot word are corresponding obtaining
Before prior probability and word to be determined, described method also includes:
Obtain the title of each video information;
The title of each described video information is carried out participle;
Determine quantity and the quantity of non-hot word of the hot word comprised in each participle;
Quantity according to described hot word and the quantity of non-hot word, determine that prior probability corresponding to hot word and non-hot word are corresponding
Prior probability.
3. method as claimed in claim 2, it is characterised in that the title of each described video information is carried out participle, specifically wraps
Include:
According to part of speech, the title of each described video information is carried out participle.
4. method as claimed in claim 2, it is characterised in that according to quantity and the quantity of non-hot word of described hot word, really
Determine prior probability corresponding to hot word and prior probability corresponding to non-hot word, specifically include:
Determine the total quantity of each participle after participle;
By the ratio of the total quantity of each participle after the quantity of described hot word and the participle determined, as the priori that hot word is corresponding
Probability;
By the ratio of the total quantity of each participle after the quantity of described non-hot word and the participle determined, corresponding as non-hot word
Prior probability.
5. the method for claim 1, it is characterised in that determine that this is to be determined according to the prior probability that described hot word is corresponding
Hot word corresponding to word judge probability, specifically include:
The hot word in each participle after statistics participle occurs the number of times of word to be determined;
Determine the ratio of the quantity of hot word in each participle after described number of times and participle;
The product of the prior probability that described ratio is corresponding with described hot word judges as the hot word that this word to be determined is corresponding
Probability.
6. the method for claim 1, it is characterised in that determine that this is treated really according to the prior probability that described non-hot word is corresponding
The non-hot word that fixed word is corresponding judges probability, specifically includes:
The non-hot word in each participle after statistics participle occurs the number of times of word to be determined;
Determine the ratio of the quantity of non-hot word in each participle after described number of times and participle;
The product of the prior probability that described ratio is corresponding with described non-hot word is as non-hot word corresponding to this word to be determined
Judge probability.
7. the method for claim 1, it is characterised in that according to described word correspondence hot word to be determined judge probability with
And non-hot word corresponding to described word to be determined judges probability, determine whether described word to be determined is hot word, specifically wrap
Include:
Relatively judge that the non-hot word that probability is corresponding with described word to be determined is sentenced according to described word correspondence hot word to be determined
The size of disconnected probability;
If judging that probability is more than or equal to corresponding non-thermal of described word to be determined according to described word correspondence hot word to be determined
Word judges probability, determines that described word to be determined is hot word;
If judging that the non-hot word that probability is corresponding less than described word to be determined is sentenced according to described word correspondence hot word to be determined
Disconnected probability, determines that described word to be determined is non-hot word.
8. a hot word determines device, it is characterised in that including:
Acquisition module, for obtaining prior probability corresponding to prior probability corresponding to hot word, non-hot word and each word to be determined
Language;
Judge probability determination module, for determining that described word to be determined is corresponding according to the prior probability that described hot word is corresponding
Hot word judges probability, and determines that the non-hot word that described word to be determined is corresponding is sentenced according to the prior probability that described non-hot word is corresponding
Disconnected probability;
Hot word determines module, for judging probability and described word to be determined according to described word correspondence hot word to be determined
Corresponding non-hot word judges probability, determines whether described word to be determined is hot word.
9. device as claimed in claim 8, it is characterised in that described device also includes:
Prior probability determines module, obtains, for described acquisition module, the priori that prior probability corresponding to hot word, non-hot word are corresponding
Before probability and word to be determined, obtain the title of each video information, the title of each described video information carried out participle,
Determine quantity and the quantity of non-hot word of the hot word comprised in each participle, according to quantity and the number of non-hot word of described hot word
Amount, determines prior probability corresponding to hot word and prior probability corresponding to non-hot word.
10. device as claimed in claim 9, it is characterised in that described prior probability determine module specifically for, according to word
Property, the title of each described video information is carried out participle.
11. devices as claimed in claim 9, it is characterised in that described prior probability determine module specifically for, determine participle
After the total quantity of each participle, by the ratio of the total quantity of each participle after the quantity of described hot word and the participle determined, make
For the prior probability that hot word is corresponding, by the ratio of the total quantity of each participle after the quantity of described non-hot word and the participle determined
Value, as the prior probability that non-hot word is corresponding.
12. devices as claimed in claim 8, it is characterised in that described judgement probability determination module specifically for, add up participle
After each participle in hot word in the number of times of word to be determined occurs, determine hot word in each participle after described number of times and participle
The ratio of quantity, the product of the prior probability that described ratio is corresponding with described hot word is corresponding as this word to be determined
Hot word judges probability.
13. devices as claimed in claim 8, it is characterised in that described judgement probability determination module specifically for, add up participle
After each participle in non-hot word in the number of times of word to be determined occurs, determine in each participle after described number of times and participle non-
The ratio of the quantity of hot word, the product of the prior probability that described ratio is corresponding with described non-hot word is as this word to be determined
Corresponding non-hot word judges probability.
14. devices as claimed in claim 8, it is characterised in that described hot word determine module specifically for, compare according to described
Word correspondence hot word to be determined judges that the non-hot word that probability is corresponding with described word to be determined judges the size of probability, if root
Judge that the non-hot word that probability is corresponding more than or equal to described word to be determined judges according to described word correspondence hot word to be determined general
Rate, determines that described word to be determined is hot word, if judging that probability is less than described according to described word correspondence hot word to be determined
The non-hot word that word to be determined is corresponding judges probability, determines that described word to be determined is non-hot word.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610565135.1A CN106202049A (en) | 2016-07-18 | 2016-07-18 | A kind of hot word determines method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610565135.1A CN106202049A (en) | 2016-07-18 | 2016-07-18 | A kind of hot word determines method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106202049A true CN106202049A (en) | 2016-12-07 |
Family
ID=57493872
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610565135.1A Pending CN106202049A (en) | 2016-07-18 | 2016-07-18 | A kind of hot word determines method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106202049A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109840445A (en) * | 2017-11-24 | 2019-06-04 | 优酷网络技术(北京)有限公司 | A kind of recognition methods and system of video of practising fraud |
CN111930949A (en) * | 2020-09-11 | 2020-11-13 | 腾讯科技(深圳)有限公司 | Search string processing method and device, computer readable medium and electronic equipment |
CN114938477A (en) * | 2022-06-23 | 2022-08-23 | 阿里巴巴(中国)有限公司 | Video topic determination method, device and equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103218368A (en) * | 2012-01-20 | 2013-07-24 | 深圳市腾讯计算机系统有限公司 | Method and device for discovering hot words |
US20150081431A1 (en) * | 2013-09-18 | 2015-03-19 | Yahoo Japan Corporation | Posterior probability calculating apparatus, posterior probability calculating method, and non-transitory computer-readable recording medium |
CN104462347A (en) * | 2014-12-04 | 2015-03-25 | 北京国双科技有限公司 | Keyword classifying method and device |
CN104615640A (en) * | 2014-11-28 | 2015-05-13 | 百度在线网络技术(北京)有限公司 | Method and device for providing searching keywords and carrying out searching |
-
2016
- 2016-07-18 CN CN201610565135.1A patent/CN106202049A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103218368A (en) * | 2012-01-20 | 2013-07-24 | 深圳市腾讯计算机系统有限公司 | Method and device for discovering hot words |
US20150081431A1 (en) * | 2013-09-18 | 2015-03-19 | Yahoo Japan Corporation | Posterior probability calculating apparatus, posterior probability calculating method, and non-transitory computer-readable recording medium |
CN104615640A (en) * | 2014-11-28 | 2015-05-13 | 百度在线网络技术(北京)有限公司 | Method and device for providing searching keywords and carrying out searching |
CN104462347A (en) * | 2014-12-04 | 2015-03-25 | 北京国双科技有限公司 | Keyword classifying method and device |
Non-Patent Citations (1)
Title |
---|
王锦波 等: "一种改进的朴素贝叶斯关键词提取算法研究", 《计算机应用与软件》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109840445A (en) * | 2017-11-24 | 2019-06-04 | 优酷网络技术(北京)有限公司 | A kind of recognition methods and system of video of practising fraud |
CN109840445B (en) * | 2017-11-24 | 2021-10-01 | 阿里巴巴(中国)有限公司 | Method and system for identifying cheating videos |
CN111930949A (en) * | 2020-09-11 | 2020-11-13 | 腾讯科技(深圳)有限公司 | Search string processing method and device, computer readable medium and electronic equipment |
CN114938477A (en) * | 2022-06-23 | 2022-08-23 | 阿里巴巴(中国)有限公司 | Video topic determination method, device and equipment |
CN114938477B (en) * | 2022-06-23 | 2024-05-03 | 阿里巴巴(中国)有限公司 | Video topic determination method, device and equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108920654B (en) | Question and answer text semantic matching method and device | |
CN106649401A (en) | Data writing method and device of distributed file system | |
US11507743B2 (en) | System and method for automatic key phrase extraction rule generation | |
CN107451854B (en) | Method and device for determining user type and electronic equipment | |
CN109325055A (en) | The screening of business association tables of data and checking method, device, electronic equipment | |
CN112559895B (en) | Data processing method and device, electronic equipment and storage medium | |
CN110162778B (en) | Text abstract generation method and device | |
CN106202049A (en) | A kind of hot word determines method and device | |
CN112200132A (en) | Data processing method, device and equipment based on privacy protection | |
US10956976B2 (en) | Recommending shared products | |
CN107391535A (en) | The method and device of document is searched in document application | |
CN107391564B (en) | Data conversion method and device and electronic equipment | |
CN109582834B (en) | Data risk prediction method and device | |
CN109063967B (en) | Processing method and device for wind control scene feature tensor and electronic equipment | |
CN108595395B (en) | Nickname generation method, device and equipment | |
CN111144098B (en) | Recall method and device for extended question | |
CN116910345A (en) | Label recommending method, device, equipment and storage medium | |
CN114519529A (en) | Enterprise credit rating method, device and medium based on convolution self-encoder | |
CN108108345A (en) | For determining the method and apparatus of theme of news | |
CN112580915A (en) | Project milestone determination method and device, storage medium and electronic equipment | |
CN111737554A (en) | Scoring model training method, electronic book scoring method and device | |
CN111737461A (en) | Text processing method and device, electronic equipment and computer readable storage medium | |
CN104572951A (en) | Ability label determining method | |
CN110969019A (en) | Method and device for disambiguating name | |
CN112445973B (en) | Method, device, storage medium and computer equipment for searching items |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: 100080 A 5 C, block A, China International Steel Plaza, 8 Haidian Avenue, Haidian District, Beijing. Applicant after: Youku network technology (Beijing) Co., Ltd. Address before: 100080 A 5 C, block A, China International Steel Plaza, 8 Haidian Avenue, Haidian District, Beijing. Applicant before: 1Verge Inc. |
|
CB02 | Change of applicant information | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20161207 |
|
RJ01 | Rejection of invention patent application after publication |