CN106202293B - A kind of update method and device of emergency event corpus - Google Patents
A kind of update method and device of emergency event corpus Download PDFInfo
- Publication number
- CN106202293B CN106202293B CN201610509717.8A CN201610509717A CN106202293B CN 106202293 B CN106202293 B CN 106202293B CN 201610509717 A CN201610509717 A CN 201610509717A CN 106202293 B CN106202293 B CN 106202293B
- Authority
- CN
- China
- Prior art keywords
- term vector
- record
- list
- title
- participle
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
Abstract
The embodiment of the invention discloses the update methods and device of a kind of emergency event corpus, including, obtain the title of video;According to the title, corresponding first term vector of the title is generated;The cluster centre for updating the emergency event corpus is updated according to first term vector and the default rule that updates;The term vector of the updated cluster centre is filtered;According to filtered term vector, the emergency event corpus is updated.Using the embodiment of the present invention, improves for the update efficiency for updating emergency event corpus, keep the search result towards emergency event more reasonable.
Description
Technical field
The present invention relates to resource management techniques field, in particular to a kind of the update method and dress of emergency event corpus
It sets.
Background technique
As number of videos and user quickly increase video search engine usage degree, video search has become use
A kind of important way of family acquisition information.Currently, the newly-increased video towards emergency event (such as news, the events such as amusement Eight Diagrams)
Searching request has become a kind of important searching request, and for this kind of request, user's expectation searches newer video.
In the prior art, search engine is usually according to the fixed weight comprehensive consideration degree of correlation, click data, video matter
The score of amount, freshness and other aspect totally five dimensions, the higher video file of score is exported to user.If user
The participle for wanting retrieval is the relevant participle of emergency event (such as news, amusement Eight Diagrams), and this requires increase for this participle
The weight of freshness, at this moment search engine will be exported according to the result searched after freshness weight is increased to user, wherein
Including the conceivable content of user, such as news, amusement Eight Diagrams, there are also the not conceivable contents of user, such as hot broadcast collection of drama, search
The result of rope is not reasonable, influences user experience.
When search engine judges that search term is towards emergency event, pass through the phase of search term and emergency event corpus
Guan Du matching a, it can be determined that whether search term is towards emergency event.However, current all video file research tools
All be artificial regeneration emergency event corpus, artificial regeneration can consume a large amount of time and manpower, the update efficiency of corpus compared with
It is low.
In addition, existing search method when retrieving emergency event, can show the video of frequent updating or hot broadcast collection of drama
It shows and, influence the quality of emergency event search result, the search result towards emergency event is unreasonable.
Summary of the invention
The update method and device for being designed to provide a kind of emergency event corpus of the embodiment of the present invention, to improve needle
To the update efficiency for updating emergency event corpus, keep the search result towards emergency event more reasonable.
In order to achieve the above objectives, the embodiment of the invention discloses a kind of update methods of emergency event corpus, comprising:
Obtain the title of video;
According to the title, corresponding first term vector of the title is generated;
According to first term vector and the default rule that updates to the cluster for updating the emergency event corpus
Center is updated;
The term vector of the updated cluster centre is filtered;
According to filtered term vector, the emergency event corpus is updated.
Preferably, described according to the title, before generating corresponding first term vector of the title, further includes:
Judge whether duration of the time of occurrence away from current time of the corresponding video of the title is less than default first duration,
And judge whether the video length of the video is less than default second duration;
It is described according to the title, generate corresponding first term vector of the title, comprising:
It is less than default first duration in duration of the time of occurrence away from current time of the corresponding video of the title, and described
In the case that the video length of video is less than default second duration, according to the title, corresponding first word of the title is generated
Vector.
Preferably, it is described according to the title, generate corresponding first term vector of the title, comprising:
Word segmentation processing is carried out to the title, obtains at least one participle corresponding for the title;
According to default filtering rule, obtained word segmentation result is filtered;
According to filtered word segmentation result, corresponding first term vector of the title is generated.
Preferably, described regular to for updating the emergency event corpus according to the term vector and default update
Cluster centre be updated, comprising:
Judge to whether there is and first term vector in the first list in the cluster centre for storing term vector
Similar second term vector;
If it does not, first term vector addition is used in the first list, and in the cluster centre
Addition is corresponding with first term vector in the second list of the frequency of participle in storage class members's number and term vector
First record;
If it does, updating the second record in the corresponding second list of second term vector;
For it is described first record or it is described second record, judge it is described first record or it is described second record in it is each
Whether the quotient of the frequency of a participle and first record or class members's number in second record is greater than default first threshold
Value;If so, the participle is determined as participle to be processed;According to being needed in first record or second record
Processing participle, generates target term vector;
Judge in the term vector in the first list in addition to first term vector or second term vector whether
Term vector in the presence of term vector identical with the target term vector or comprising the target term vector;
If there is or comprising, by the first list first term vector or second term vector delete and
Or the term vector comprising the target term vector is deleted;It will be identical as the target term vector in the second list
Term vector or the corresponding record deletion of term vector comprising the target term vector;And it establishes identical as the target term vector
Term vector and first record or second record corresponding relationship, or by target term vector addition described the
In one list, and establish the corresponding relationship of the target term vector and first record or second record;
If there is no or do not include, by the first list first term vector or second term vector delete
It removes;By the target term vector addition in the first list, and establish the target term vector and it is described first record or
The corresponding relationship of second record.
Preferably, the term vector to the updated cluster centre is filtered, comprising:
For each term vector in the cluster centre, class members's number in the corresponding record of the term vector is judged
Whether default second threshold is greater than;
If so, calculating the inverse word frequency of each participle in the term vector;
According to the inverse word frequency of each participle, the average value of the corresponding inverse word frequency of the term vector is calculated;
Judge whether the average value is less than default third threshold value;
If so, by the record deletion in the term vector and the second list corresponding with the term vector.
Preferably, it is described according to filtered term vector, the emergency event corpus is updated, comprising:
By the corresponding participle addition of term vector remaining after deletion in the emergency event corpus.
To achieve the above object, the embodiment of the invention also discloses a kind of updating devices of emergency event corpus, including
Video title obtains module, the first term vector generation module, cluster centre update module, term vector filtering module and emergency event
Corpus update module, wherein
The video title obtains module, for obtaining the title of video;
The first term vector generation module, for generating corresponding first term vector of the title according to the title;
The cluster centre update module, for regular to for updating according to first term vector and default update
The cluster centre of the emergency event corpus is updated;
The term vector filtering module is filtered for the term vector to the updated cluster centre;
The emergency event corpus update module is used for according to filtered term vector, to the emergency event corpus
Library is updated.
Preferably, the first term vector generation module, is specifically used for:
Word segmentation processing is carried out to the title, obtains at least one participle corresponding for the title;
According to default filtering rule, obtained participle is filtered;
According to filtered participle, corresponding first term vector of the title is generated.
Preferably, described device further include: judgment module,
The judgment module, for judge the corresponding video of the title duration of the time of occurrence away from current time whether
Less than default first duration, and judge whether the video length of the video is less than default second duration;
The first term vector generation module, is specifically used for:
Judge that the time of occurrence of the corresponding video of the title is less than away from the duration of current time in the judgment module
Default first duration, and in the case that the video length of the video is less than default second duration, the title is segmented
Processing obtains at least one participle corresponding for the title;According to default filtering rule, obtained participle was carried out
Filter;According to filtered participle, corresponding first term vector of the title is generated.
Preferably, the cluster centre update module, is specifically used for:
Judge to whether there is and first term vector in the first list in the cluster centre for storing term vector
Similar second term vector;
If it does not, first term vector addition is used in the first list, and in the cluster centre
Addition is corresponding with first term vector in the second list of the frequency of participle in storage class members's number and term vector
First record;
If it does, updating the second record in the corresponding second list of second term vector;
For it is described first record or it is described second record, judge it is described first record or it is described second record in it is each
Whether the quotient of the frequency of a participle and first record or class members's number in second record is greater than default first threshold
Value;If so, the participle is determined as participle to be processed;According to being needed in first record or second record
Processing participle, generates target term vector;
Judge in the term vector in the first list in addition to first term vector or second term vector whether
Term vector in the presence of term vector identical with the target term vector or comprising the target term vector;
If there is or comprising, by the first list first term vector or second term vector delete and
Or the term vector comprising the target term vector is deleted;It will be identical as the target term vector in the second list
Term vector or the corresponding record deletion of term vector comprising the target term vector;And it establishes identical as the target term vector
Term vector and first record or second record corresponding relationship, or by target term vector addition described the
In one list, and establish the corresponding relationship of the target term vector and first record or second record;
If there is no or do not include, by the first list first term vector or second term vector delete
It removes;By the target term vector addition in the first list, and establish the target term vector and it is described first record or
The corresponding relationship of second record.
Preferably, the term vector filtering module, is specifically used for:
For each term vector in the cluster centre, class members's number in the corresponding record of the term vector is judged
Whether default second threshold is greater than;
If so, calculating the inverse word frequency of each participle in the term vector;
According to the inverse word frequency of each participle, the average value of the corresponding inverse word frequency of the term vector is calculated;
Judge whether the average value is less than default third threshold value;
If so, by the record deletion in the term vector and the second list corresponding with the term vector.
Preferably, the corpus update module, is specifically used for:
By the corresponding participle addition of term vector remaining after deletion in the emergency event corpus.
As seen from the above technical solution, the embodiment of the invention discloses the update methods and dress of a kind of emergency event corpus
It sets, including, obtain the title of video;According to the title, corresponding first term vector of the title is generated;According to described first
Term vector and the default rule that updates are updated the cluster centre for updating the emergency event corpus;After update
The term vector of the cluster centre be filtered;According to filtered term vector, the emergency event corpus is carried out more
Newly.
Emergency event corpus can be automatically updated using method provided in an embodiment of the present invention, it is prominent to eliminate artificial regeneration
A large amount of time and manpower needed for hair event corpus improve the efficiency for updating emergency event corpus, while according to inverse
Word frequency list optimizes the search result towards emergency event, keeps the search result towards emergency event more reasonable.
Certainly, it implements any of the products of the present invention or method must be not necessarily required to reach all the above excellent simultaneously
Point.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with
It obtains other drawings based on these drawings.
Fig. 1 is a kind of flow diagram of the update method of emergency event corpus provided in an embodiment of the present invention;
Fig. 2 is the flow diagram provided in an embodiment of the present invention for generating the first term vector;
Fig. 3 is the flow diagram provided in an embodiment of the present invention for updating cluster centre;
Fig. 4 is the process signal that the term vector provided in an embodiment of the present invention to updated cluster centre is filtered
Figure;
Fig. 5 is the flow diagram of the update method of another emergency event corpus provided in an embodiment of the present invention;
Fig. 6 is a kind of structural schematic diagram of the updating device of emergency event corpus provided in an embodiment of the present invention;
Fig. 7 is the structural schematic diagram of the updating device of another emergency event corpus provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
To solve prior art problem, the embodiment of the invention provides the update methods and dress of a kind of emergency event corpus
It sets, just a kind of update method of emergency event corpus provided in an embodiment of the present invention is illustrated first below.
Fig. 1 is a kind of flow diagram of the update method of emergency event corpus provided in an embodiment of the present invention, comprising:
S101: the title of video is obtained.
Specifically, in practical applications, it is assumed that search engine passes through search, entitled " South Korea's Piao's rose of Sharon of the video of acquisition
The philosophy life of president Hui ".
S102: according to the title, corresponding first term vector of the title is generated.
Specifically, as shown in Fig. 2, S102 may include:
S102A: carrying out word segmentation processing to the title, obtains at least one participle corresponding for the title.
Specifically, in practical applications, being segmented using existing participle code to video title.
S102B: according to default filtering rule, obtained word segmentation result is filtered.
Specifically, being filtered to obtained participle to filter out the participle that length is less than setting word segmentation result length
It is set as a result, wherein setting word segmentation result length as user.
S102C: according to filtered word segmentation result, corresponding first term vector of the title is generated.
S102 step is carried out specifically by taking the video of entitled " the philosophy life of South Korea president Park Geun-hye " as an example below
It is bright.
Word segmentation processing first is carried out to title " the philosophy life of South Korea president Park Geun-hye ", obtained word segmentation result is " South Korea;
Park Geun-hye;President;'s;Philosophy;Life ".
Filtering rule is preset further according to basis, to obtained " South Korea;Park Geun-hye;President;'s;Philosophy;Life " carries out
Filtering.
Specifically, in practical applications, default filtering rule be filter out length less than setting value participle wherein,
Setting value is 2 in this step.
After the filtering of obtained word segmentation result, " South Korea is obtained;Park Geun-hye;President;Philosophy;Life ".
The first term vector is generated further according to filtered word segmentation result.
S103: according to first term vector and the default rule that updates to for updating the emergency event corpus
Cluster centre is updated.
Specifically, cluster centre is made of first list and second list, wherein first list is used to store term vector, the
Two lists for store record corresponding with the term vector in first list, each in second list record all include and
The word frequency list of the corresponding class members's number of first term vector and class.
Specifically, in practical applications, the word frequency list of class includes, participle corresponding with the first term vector and with this point
The corresponding word frequency of word.
Specifically, as shown in figure 3, S103 may include:
S103A: judge to whether there is and described first in the first list in the cluster centre for storing term vector
Similar second term vector of term vector, if not, S103B is executed, if so, executing S103C.
Specifically, the second term vector similar with the first term vector is, it is not less than with the repetition degree of the first term vector and sets
Determine the term vector of threshold value being stored in first list.
S103B: by first term vector addition in the first list, and for storing in the cluster centre
It is added and first term vector corresponding first in the second list of the frequency of participle in class members's number and term vector
Record.
Specifically, addition first record when, to first record in class members's number assign initial value, by institute's predicate to
Participle in amount is added in word frequency list, and assigns initial value to the corresponding word frequency of participle in word frequency list.
S103C: the second record in the corresponding second list of second term vector is updated.
Specifically, increasing class members's number in the second record, increase word frequency corresponding with the participle in the term vector
Value, wherein it is increased value be the value equal with initial value.
S103D: for first record or second record, judge in first record or second record
Each participle frequency and it is described first record or it is described second record in class members's number quotient whether be greater than it is default
First threshold;If so, executing S103E.
S103E: the participle is determined as participle to be processed;According to the institute in first record or second record
Participle to be handled generates target term vector.
S103F: judge the term vector in the first list in addition to first term vector or second term vector
In whether there is term vector identical with the target term vector or the term vector comprising the target term vector;If so, holding
Row S103G, if not, executing S103H.
Specifically, identical as target term vector to be, except described in all participles and first list in target term vector
The participle in term vector except first term vector or second term vector is identical;It is first comprising target term vector
Term vector in list in addition to first term vector or second term vector is in addition to complete with the participle in target term vector
It further include other term vectors outside exactly the same participle.
S103G: by the first list first term vector or second term vector delete and/or will be described
Term vector comprising the target term vector is deleted;By the term vector identical with the target term vector in the second list
Or the corresponding record deletion of term vector comprising the target term vector;And establish term vector identical with the target term vector
The corresponding relationship recorded with first record or described second, or the target term vector is added in the first list
In, and establish the corresponding relationship of the target term vector and first record or second record.
Specifically, if term vector in the first list in addition to first term vector or second term vector
It is middle to there is term vector identical with the target term vector, by first term vector or described second in the first list
Term vector is deleted, by the corresponding record deletion of term vector identical with the target term vector in the second list;And it builds
The corresponding relationship of vertical term vector identical with the target term vector and first record or second record.
Specifically, if term vector in the first list in addition to first term vector or second term vector
It is middle to there is the term vector comprising the target term vector;By first term vector or second word in the first list
Vector is deleted, and the term vector comprising the target term vector is deleted;It will include the target word in the second list
The corresponding record deletion of the term vector of vector;By target term vector addition in the first list, and establish the mesh
Mark the corresponding relationship of term vector and first record or second record.
S103H: by the first list first term vector or second term vector delete;By the target
Term vector adds in the first list, and establishes what the target term vector was recorded with first record or described second
Corresponding relationship.
S103 step is described in detail below.
It should be noted that following citing is not constituted merely for the purpose for better understanding the embodiment of the present invention to this
The restriction of invention.
Specifically, in practical applications, judging whether term vector is similar for the prior art, and the embodiment of the present invention is not right herein
It is repeated.
Illustratively, below to judge whether term vector is similar according to the repetition degree of term vector and be illustrated.
First judge term vector [South Korea;Park Geun-hye;President;Philosophy;Life] with first list in each term vector weight
Whether multiple degree is less than given threshold, it is assumed that in embodiments of the present invention, which is 4.
Assuming that former cluster centre are as follows:
First list:
[the fast and the furious;Leading role;Traffic accident]
[spend thousand bones;Victory meeting;It holds;Zhao Liying]
Second list:
[15, (and the fast and the furious: 10, leading role: 12, traffic accident: 9, Paul Walker: 3)]
[23, (and spend thousand bones: 20, victory meeting: 21, it holds: 15, Zhao Liying: 8)]
Wherein, with [23, (and spend thousand bones: 20, victory meeting: 21, hold: for 15, Zhao Liying: 8)], 23 be class members
Number, " spend thousand bones: 20 in 20 " be the frequency (word frequency) of word " spending thousand bones ".
Judging result is all term vectors and [South Korea in first list;Park Geun-hye;President;Philosophy;Life] repeat degree
Respectively less than 4, i.e. the judging result of S103A is no.
Again by the first term vector [South Korea;Park Geun-hye;President;Philosophy;Life] it is added in first list, and in secondary series
Corresponding first record is generated in table.Specifically, generated in second list it is corresponding first record include, by term vector [Korea Spro
State;Park Geun-hye;President;Philosophy;Life] corresponding class members's number initial value is set as 1, segment South Korea, Park Geun-hye, president, wise man
Learn and the word frequency of life to set be 1, corresponding with the first term vector the first of generation be recorded as [1, (and South Korea: 1, Park Geun-hye: 1,
President: 1, philosophy: 1, life: 1)]
New cluster centre are as follows:
First list:
[the fast and the furious;Leading role;Traffic accident]
[spend thousand bones;Victory meeting;It holds;Zhao Liying]
[South Korea;Park Geun-hye;President;Philosophy;Life]
Second list:
[15, (and the fast and the furious: 10, leading role: 12, traffic accident: 9, Paul Walker: 3)]
[23, (and spend thousand bones: 20, victory meeting: 21, it holds: 15, Zhao Liying: 8)]
[1, (and South Korea: 1, Park Geun-hye: 1, president: 1, philosophy: 1, life: 1)]
Be directed to again [1, (and South Korea: 1, Park Geun-hye: 1, president: 1, philosophy: 1, life: 1)], judge each of these participle
The quotient of frequency and class members's number in [1, (South Korea: 1, Park Geun-hye: 1, president: 1, philosophy: 1, life: 1)] be equal 1, greatly
It is greater than 0.6 in preset first threshold value.
" South Korea, Park Geun-hye, president, philosophy, life " is determined as participle to be processed again;According to institute's participle to be handled,
Generate target term vector [South Korea;Park Geun-hye;President;Philosophy;Life].
Judged in first list again except the first term vector [South Korea;Park Geun-hye;President;Philosophy;Life] except term vector
In term vector identical with the target term vector or the term vector comprising the target term vector, i.e. S103F step is not present
Judging result be no.
Again by the first term vector [South Korea in the first list;Park Geun-hye;President;Philosophy;Life] it deletes;By target
Term vector add in the first list, and establish the target term vector with [1, (and South Korea: 1, Park Geun-hye: 1, president: 1,
Philosophy: 1, life: 1)] corresponding relationship.Updated cluster centre is as follows:
First list:
[the fast and the furious;Leading role;Traffic accident]
[spend thousand bones;Victory meeting;It holds;Zhao Liying]
[South Korea;Park Geun-hye;President;Philosophy;Life]
Second list:
[15, (and the fast and the furious: 10, leading role: 12, traffic accident: 9, Paul Walker: 3)]
[23, (and spend thousand bones: 20, victory meeting: 21, it holds: 15, Zhao Liying: 8)]
[1, (and South Korea: 1, Park Geun-hye: 1, president: 1, philosophy: 1, life: 1)]
Assuming that former cluster centre are as follows:
First list:
[the fast and the furious;Leading role;Traffic accident]
[spend thousand bones;Victory meeting;It holds;Zhao Liying]
[South Korea;Park Geun-hye;President;Philosophy]
[South Korea;Park Geun-hye;Philosophy]
Second list:
[15, (and the fast and the furious: 10, leading role: 12, traffic accident: 9, Paul Walker: 3)]
[23, (and spend thousand bones: 20, victory meeting: 21, it holds: 15, Zhao Liying: 8)]
[12, (and South Korea: 8, Park Geun-hye: 9, president: 6, philosophy: 7)]
[10, (and South Korea: 8, Park Geun-hye: 10, philosophy: 8)]
First judge the second term vector [South Korea in first list;Park Geun-hye;President;Philosophy] and the first term vector [South Korea;
Park Geun-hye;President;Philosophy;Life] similarity be 4, be not less than given threshold 4, i.e. the judging result of S103A is yes.
The second record [12, (South Korea: 8, Park Geun-hye: 9, president: 6, philosophy: 7)] record is updated again, including, by class at
Member's number 12 plus 1, is updated to 13.Participle South Korea is present in the second record, by the word frequency 8 of the participle plus 1, is updated to 9, together
Reason, the word frequency for segmenting Park Geun-hye are updated to 10 by 9, and the word frequency for segmenting president is updated to 7 by 6, and the word frequency for segmenting philosophy is updated by 7
It is 8.Participle life is not present in the second record, which is increased in the second record, and the word frequency of the participle is initial
Value is set as 1, updated second be recorded as [13, (and South Korea: 9, Park Geun-hye: 10, president: 7, philosophy: 8, life: 1)].
New cluster centre are as follows:
First list:
[the fast and the furious;Leading role;Traffic accident]
[spend thousand bones;Victory meeting;It holds;Zhao Liying]
[South Korea;Park Geun-hye;President;Philosophy]
[South Korea;Park Geun-hye;Philosophy]
Second list:
[15, (and the fast and the furious: 10, leading role: 12, traffic accident: 9, Paul Walker: 3)]
[23, (and spend thousand bones: 20, victory meeting: 21, it holds: 15, Zhao Liying: 8)]
[13, (and South Korea: 9, Park Geun-hye: 10, president: 7, philosophy: 8, life: 1)]
[10, (and South Korea: 8, Park Geun-hye: 9, philosophy: 6)]
Again for the second record [13, (South Korea: 9, Park Geun-hye: 10, president: 7, philosophy: 8, life: 1)], judge that second remembers
The quotient of the frequency of each of record participle and class members's number in the second record, wherein the corresponding quotient of participle South Korea is
0.69, it is greater than preset first threshold value 0.6, the corresponding quotient of participle Park Geun-hye is 0.77, is greater than 0.6, and similarly, participle president is corresponding
Quotient is 0.54, and less than 0.6, the corresponding quotient of participle philosophy is 0.62, is greater than 0.6, and the corresponding quotient of participle life is 0.08, is less than
0.65。
Further according to the corresponding quotient of each participle, determine that participle " South Korea, Park Geun-hye, philosophy " is participle to be processed.According to all
Participle to be processed generates target term vector [South Korea;Park Geun-hye;Philosophy].
Judge to obtain [South Korea in the term vector in the first list in addition to the second term vector again;Park Geun-hye;Philosophy]
With target term vector [South Korea;Park Geun-hye;Philosophy] it is identical, i.e., the result that S103F step judges is yes.
Again by the second term vector [South Korea in the first list;Park Geun-hye;President;Philosophy] it deletes;By described second
The corresponding record of term vector identical with object vector in list [10, (and South Korea: 8, Park Geun-hye: 9, philosophy: 6)] it deletes;And
Establish term vector [South Korea identical with the target term vector;Park Geun-hye;Philosophy] and the second record [13, (and South Korea: 9, the plain rose of Sharon
Favour: 10, president: 7, philosophy: 8, life: 1)] corresponding relationship.Updated cluster centre are as follows:
First list:
[the fast and the furious;Leading role;Traffic accident]
[spend thousand bones;Victory meeting;It holds;Zhao Liying]
[South Korea;Park Geun-hye;Philosophy]
Second list:
[15, (and the fast and the furious: 10, leading role: 12, traffic accident: 9, Paul Walker: 3)]
[23, (and spend thousand bones: 20, victory meeting: 21, it holds: 15, Zhao Liying: 8)]
[13, (and South Korea: 9, Park Geun-hye: 10, president: 7, philosophy: 8, life: 1)]
Assuming that former cluster centre are as follows:
First list:
[the fast and the furious;Leading role;Traffic accident]
[spend thousand bones;Victory meeting;It holds;Zhao Liying]
[South Korea;Park Geun-hye;President;Philosophy]
[South Korea;Park Geun-hye;Philosophy]
Second list:
[15, (and the fast and the furious: 10, leading role: 12, traffic accident: 9, Paul Walker: 3)]
[23, (and spend thousand bones: 20, victory meeting: 21, it holds: 15, Zhao Liying: 8)]
[12, (and South Korea: 8, Park Geun-hye: 9, president: 6, philosophy: 5]
[10, (and South Korea: 8, Park Geun-hye: 10, philosophy: 8)]
The second term vector [South Korea being first judged as in first list;Park Geun-hye;President;Philosophy] and first term vector [Korea Spro
State;Park Geun-hye;President;Philosophy;Life] similarity be 4, be not less than given threshold 4, i.e. the judging result of S103A is yes.
The second record [12, (South Korea: 8, Park Geun-hye: 9, president: 6, philosophy: 5)] record is updated again, including, by class at
Member's number 12 plus 1, is updated to 13.Participle South Korea is present in the second record, by the word frequency 8 of the participle plus 1, is updated to 9, together
Reason, the word frequency for segmenting Park Geun-hye are updated to 10 by 9, and the word frequency for segmenting president is updated to 7 by 6, and the word frequency for segmenting philosophy is updated by 5
It is 6.Participle life is not present in the second record, which is increased in the second record, and the word frequency of the participle is initial
Value is set as 1, updated second be recorded as [13, (and South Korea: 9, Park Geun-hye: 10, president: 7, philosophy: 6, life: 1)].
New cluster centre are as follows:
First list:
[the fast and the furious;Leading role;Traffic accident]
[spend thousand bones;Victory meeting;It holds;Zhao Liying]
[South Korea;Park Geun-hye;President;Philosophy]
[South Korea;Park Geun-hye;Philosophy]
Second list:
[15, (and the fast and the furious: 10, leading role: 12, traffic accident: 9, Paul Walker: 3)]
[23, (and spend thousand bones: 20, victory meeting: 21, it holds: 15, Zhao Liying: 8)]
[13, (and South Korea: 9, Park Geun-hye: 10, president: 7, philosophy: 6, life: 1)]
[10, (and South Korea: 8, Park Geun-hye: 9, philosophy: 6)]
Again for the second record [13, (South Korea: 9, Park Geun-hye: 10, president: 7, philosophy: 6, life: 1)], judge that second remembers
The quotient of the frequency of each of record participle and class members's number in the second record, wherein the corresponding quotient of participle South Korea is
0.69, it is greater than preset first threshold value 0.6, the corresponding quotient of participle Park Geun-hye is 0.77, is greater than 0.6, and similarly, participle president is corresponding
Quotient is 0.54, and less than 0.6, the corresponding quotient of participle philosophy is 0.46, and less than 0.6, the corresponding quotient of participle life is 0.08, is less than
0.65。
Further according to the corresponding quotient of each participle, determine that participle " South Korea, Park Geun-hye " is participle to be processed.It is to be handled according to institute
Participle generates target term vector [South Korea;Park Geun-hye].
Judge to obtain again in the first list except the second term vector [South Korea;Park Geun-hye;President;Philosophy] except word to
[South Korea in amount;Park Geun-hye;Philosophy] it include target term vector [South Korea;Park Geun-hye], i.e., the result that S103F step judges is yes.
Again by the second term vector [South Korea in the first list;Park Geun-hye;President;Philosophy] and will comprising target word to
Term vector [the South Korea of amount;Park Geun-hye;Philosophy] it deletes;By the term vector comprising the target term vector in the second list
Corresponding record [10, (and South Korea: 8, Park Geun-hye: 9, philosophy: 6)] it deletes;And by the target term vector [South Korea;Park Geun-hye] add
Be added in the first list, and establish target term vector second record [13, (and South Korea: 9, Park Geun-hye: 10, president: 7, philosophy:
6, life: 1)] corresponding relationship.Updated cluster centre are as follows:
First list:
[the fast and the furious;Leading role;Traffic accident]
[spend thousand bones;Victory meeting;It holds;Zhao Liying]
[South Korea;Park Geun-hye]
Second list:
[15, (and the fast and the furious: 10, leading role: 12, traffic accident: 9, Paul Walker: 3)]
[23, (and spend thousand bones: 20, victory meeting: 21, it holds: 15, Zhao Liying: 8)]
[13, (and South Korea: 9, Park Geun-hye: 10, president: 7, philosophy: 6, life: 1)]
S104: the term vector of the updated cluster centre is filtered.
Specifically, as shown in figure 4, S104 may include:
S104A: for each term vector in the cluster centre, judge class in the corresponding record of the term vector at
Whether member's number is greater than default second threshold;If so, executing S104B.
S104B: the inverse word frequency of each participle in the term vector is calculated.
Specifically, calculating the inverse word frequency of each participle in the term vector by inquiring existing inverse word frequency list.
In practical applications, each record in inverse word frequency list is the view all in entire video platform according to the participle
What the frequency occurred in frequency marking topic generated, the frequency that the inverse word frequency value of the word occurs in entire corpus with the word is inversely proportional.
S104C: according to the inverse word frequency of each participle, the average value of the corresponding inverse word frequency of the term vector is calculated.
S104D: judging whether the average value is less than default third threshold value, if so, executing S104E.
S104E: by the record deletion in the term vector and the second list corresponding with the term vector.
Just S104 step is described in detail below.
Specifically, in practical applications, unified after having handled 20000 videos can carry out to updated poly-
The term vector at class center is filtered.
In practical applications, it is assumed that the updated cluster centre obtained according to S103 step is as follows:
First list:
[the fast and the furious;Leading role;Traffic accident]
[spend thousand bones;Victory meeting;It holds;Zhao Liying]
[South Korea;Park Geun-hye]
Second list:
[15, (and the fast and the furious: 10, leading role: 12, traffic accident: 9, Paul Walker: 3)]
[23, (and spend thousand bones: 20, victory meeting: 21, it holds: 15, Zhao Liying: 8)]
[13, (and South Korea: 9, Park Geun-hye: 10, president: 7, philosophy: 6, life: 1)]
Judge whether class members's number is greater than default second threshold in the record in the second list of cluster centre, wherein in advance
If second threshold is 14.Obtain in second list [15, (and the fast and the furious: 10, leading role: 12, traffic accident: 9, Paul Walker: 3)] and
[23, (and spend thousand bones: 20, victory meeting: 21, hold: class members's number 15, Zhao Liying: 8)] is greater than 14. according to this step, obtains
Cluster centre it is as follows:
First list:
[the fast and the furious;Leading role;Traffic accident]
[spend thousand bones;Victory meeting;It holds;Zhao Liying]
Second list:
[15, (and the fast and the furious: 10, leading role: 12, traffic accident: 9, Paul Walker: 3)]
[23, (and spend thousand bones: 20, victory meeting: 21, it holds: 15, Zhao Liying: 8)]
By inquiring existing inverse word frequency list, [the fast and the furious is calculated;Leading role;Traffic accident] and [spend thousand bones;Victory meeting;It calls together
It opens;Zhao Liying] the corresponding participle of term vector inverse word frequency value.
Judge whether the inverse word frequency value for the participle that above step is calculated is less than default third threshold value, wherein third threshold value
It is 8.5.
Assuming that [the fast and the furious;Leading role;Traffic accident] it is corresponding segment inverse word frequency value and be not less than 8.5, then delete first list
In [spend thousand bones;Victory meeting;It holds;Zhao Liying] record corresponding with the term vector in term vector and second list.
Filtered cluster centre is as follows:
First list:
[the fast and the furious;Leading role;Traffic accident]
Second list:
[15, (and the fast and the furious: 10, leading role: 12, traffic accident: 9, Paul Walker: 3)]
In practical applications, each record in inverse word frequency list is the view all in entire video platform according to the participle
What the frequency occurred in frequency marking topic generated, the frequency that the inverse word frequency value of the word occurs in entire corpus with the word is inversely proportional.
In the prior art, search engine is usually according to the fixed weight comprehensive consideration degree of correlation, click data, video matter
The score of amount, freshness and other aspect totally five dimensions, the higher video file of score is exported to user.If user
The participle for wanting retrieval is the relevant participle of emergency event (such as news, amusement Eight Diagrams), and this requires increase for this participle
The weight of freshness, at this moment search engine will be exported according to the result searched after freshness weight is increased to user, wherein
Including the conceivable content of user, such as news, amusement Eight Diagrams, there are also the not conceivable contents of user, such as hot broadcast collection of drama.
In embodiment provided by the invention, the corresponding participle of the video (such as hot broadcast collection of drama) frequently occurred before is inverse
Inverse word frequency value in word frequency list will be relatively high, and inverse word frequency value can be greater than to the video of third threshold value using the embodiment of the present invention
It filters out, that is to say, that the corresponding participle of these videos will not be added in emergency event corpus.In search engine to searching
When rope word is judged, the corresponding participle of these videos will not be judged as to the search term of corresponding emergency event, i.e. these videos
It does not appear in the search result towards emergency event, search result is more reasonable.
S105: according to filtered term vector, the emergency event corpus is updated.
Specifically, by the corresponding participle addition of filtered term vector in the emergency event corpus.
In practical applications, [the fast and the furious will be obtained according to S104 step;Leading role;Traffic accident] the corresponding participle of term vector
" the fast and the furious, leading role, traffic accident " is added to emergency event and expects in library.
Method provided in an embodiment of the present invention can automatically update emergency event corpus, eliminate artificial regeneration burst thing
A large amount of time and manpower needed for part corpus improve the efficiency for updating emergency event corpus, while according to inverse word frequency
Table optimizes the search result towards emergency event, keeps the search result towards emergency event more reasonable.
Fig. 5 is the flow diagram of the update method of another emergency event corpus provided in an embodiment of the present invention, this
On the basis of invention embodiment illustrated in fig. 5 embodiment shown in Fig. 1, increase S106 before S102: judging that the title is corresponding
The time of occurrence of video whether be less than default first duration away from the duration of current time, and judge the video length of the video
Whether default second duration is less than;If so, executing S102.
It is described in detail by taking the video of entitled " the philosophy life of South Korea president Park Geun-hye " as an example below.
First judge whether the time of occurrence of above-mentioned video is less than default first duration away from the duration of current time, and in judgement
Whether the video length for stating video is less than default second duration.
Assuming that default first when is 3 days a length of, preset 20 minutes a length of when second, it is assumed that the time of occurrence of above-mentioned video is away from working as
The duration of preceding time is less than default first duration 3 days, and the video length of above-mentioned video is less than default second duration 20 and divides
Clock, then the judging result of this step is yes.
It is reduced to be processed before generating the first term vector corresponding with title using embodiment illustrated in fig. 5 of the present invention
The range of video title further reduces the time needed for updating emergency event corpus, improves emergency event corpus
Update efficiency.
Corresponding with above-mentioned embodiment of the method, the embodiment of the invention also discloses a kind of updates of emergency event corpus
Device.
Fig. 6 is a kind of structural schematic diagram of the updating device of emergency event corpus provided in an embodiment of the present invention, can be with
It include: that video title obtains module 601, the first term vector generation module 602, cluster centre update module 603, term vector filtering
Module 604 and emergency event corpus update module 605, in which:
Video title obtains module 601, for obtaining the title of video.
First term vector generation module 602, for generating corresponding first term vector of the title according to the title;
In practical applications, the first term vector generation module 602, specifically can be used for:
Word segmentation processing is carried out to the title, obtains at least one participle corresponding for the title;According to presetting
Filter rule is filtered obtained participle;According to filtered participle, corresponding first term vector of the title is generated.
Cluster centre update module 603, for regular to for updating according to first term vector and default update
The cluster centre of the emergency event corpus is updated.
In practical applications, cluster centre update module 603, specifically can be used for:
Judge to whether there is and first term vector in the first list in the cluster centre for storing term vector
Similar second term vector;
If it does not, first term vector addition is used in the first list, and in the cluster centre
Addition is corresponding with first term vector in the second list of the frequency of participle in storage class members's number and term vector
First record;
If it does, updating the second record in the corresponding second list of second term vector;
For it is described first record or it is described second record, judge it is described first record or it is described second record in it is each
Whether the quotient of the frequency of a participle and first record or class members's number in second record is greater than default first threshold
Value;If so, the participle is determined as participle to be processed;According to being needed in first record or second record
Processing participle, generates target term vector;
Judge in the term vector in the first list in addition to first term vector or second term vector whether
Term vector in the presence of term vector identical with the target term vector or comprising the target term vector;
If there is or comprising, by the first list first term vector or second term vector delete and
Or the term vector comprising the target term vector is deleted;It will be identical as the target term vector in the second list
Term vector or the corresponding record deletion of term vector comprising the target term vector;And it establishes identical as the target term vector
Term vector and first record or second record corresponding relationship, or by target term vector addition described the
In one list, and establish the corresponding relationship of the target term vector and first record or second record;
If there is no or do not include, by the first list first term vector or second term vector delete
It removes;By the target term vector addition in the first list, and establish the target term vector and it is described first record or
The corresponding relationship of second record.
Term vector filtering module 604 is filtered for the term vector to the updated cluster centre.
In practical applications, term vector filtering module 604, specifically can be used for:
For each term vector in the cluster centre, class members's number in the corresponding record of the term vector is judged
Whether default second threshold is greater than;
If so, calculating the inverse word frequency of each participle in the term vector;
According to the inverse word frequency of each participle, the average value of the corresponding inverse word frequency of the term vector is calculated;
Judge whether the average value is less than default third threshold value;
If so, by the record deletion in the term vector and the second list corresponding with the term vector.
In practical applications, each record in inverse word frequency list is the view all in entire video platform according to the participle
What the frequency occurred in frequency marking topic generated, the frequency that the inverse word frequency value of the word occurs in entire corpus with the word is inversely proportional.
Emergency event corpus update module 605 is used for according to filtered term vector, to the emergency event corpus
It is updated.
In practical applications, emergency event corpus update module 605, specifically can be used for, by word remaining after deletion
The corresponding participle addition of vector is in the emergency event corpus.
Using embodiment illustrated in fig. 6 of the present invention, emergency event corpus can be automatically updated, eliminates artificial regeneration burst
A large amount of time and manpower needed for event corpus improve the efficiency for updating emergency event corpus, while according to inverse word
Frequency table optimizes the search result towards emergency event, keeps the search result towards emergency event more reasonable.
Fig. 7 is the structural schematic diagram of another emergency event corpus updating device provided in an embodiment of the present invention, this hair
Video title judgment module 606 is increased on the basis of bright embodiment illustrated in fig. 7 embodiment shown in Fig. 6, it is described for judging
Whether duration of the time of occurrence away from current time of the corresponding video of title is less than default first duration, and judges the video
Whether video length is less than default second duration, if so, the first term vector generation module 602 of triggering.
View to be processed is reduced before generating the first term vector corresponding with title using Fig. 7 shown device of the present invention
The range of frequency marking topic, further reduces the time needed for updating emergency event corpus, improves emergency event corpus
Update efficiency.
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality
Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation
In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to
Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those
Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment
Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that
There is also other identical elements in process, method, article or equipment including the element.
Each embodiment in this specification is all made of relevant mode and describes, same and similar portion between each embodiment
Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for device reality
For applying example, since it is substantially similar to the method embodiment, so being described relatively simple, related place is referring to embodiment of the method
Part explanation.
Those of ordinary skill in the art will appreciate that all or part of the steps in realization above method embodiment is can
It is completed with instructing relevant hardware by program, the program can store in computer-readable storage medium,
The storage medium designated herein obtained, such as: ROM/RAM, magnetic disk, CD.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all
Any modification, equivalent replacement, improvement and so within the spirit and principles in the present invention, are all contained in protection scope of the present invention
It is interior.
Claims (10)
1. a kind of update method of emergency event corpus characterized by comprising
Obtain the title of video;
According to the title, corresponding first term vector of the title is generated;
According to first term vector and the default rule that updates to the cluster centre for updating the emergency event corpus
It is updated;
The term vector of the updated cluster centre is filtered;
According to filtered term vector, the emergency event corpus is updated;
It is described regular to the cluster for updating the emergency event corpus according to first term vector and default update
Center is updated, comprising:
Judge in the first list in the cluster centre for storing term vector with the presence or absence of similar to first term vector
The second term vector;
If it does not, by first term vector addition in the first list, and for depositing in the cluster centre
Store up addition in the second list of the frequency of class members's number and the participle in term vector corresponding with first term vector the
One record;
If it does, updating the second record in the corresponding second list of second term vector;
For first record or second record, each of first record or second record point are judged
Whether the quotient of the frequency of word and first record or class members's number in second record is greater than preset first threshold value;Such as
Fruit is that the participle is determined as participle to be processed;It is to be handled according to the institute in first record or second record
Participle generates target term vector;
Judge to whether there is in the term vector in the first list in addition to first term vector or second term vector
Term vector identical with the target term vector or the term vector comprising the target term vector;
If existed and institute in the term vector in the first list in addition to first term vector or second term vector
State the identical term vector of target term vector, by the first list first term vector or second term vector delete
It removes, by the corresponding record deletion of term vector identical with the target term vector in the second list;And establish with it is described
The corresponding relationship of the identical term vector of target term vector and first record or second record;
If in the term vector in the first list in addition to first term vector or second term vector exist comprising
The term vector of the target term vector;By in the first list first term vector or second term vector delete,
The term vector comprising the target term vector is deleted;By in the second list include the target term vector word to
Measure corresponding record deletion;By the target term vector addition in the first list, and establish the target term vector with
The corresponding relationship of first record or second record;
If there is no or do not include, by the first list first term vector or second term vector delete;
By the target term vector addition in the first list, and establish the target term vector and it is described first record or it is described
The corresponding relationship of second record.
2. the method according to claim 1, wherein it is corresponding to generate the title according to the title described
The first term vector before, further includes:
Judge whether duration of the time of occurrence away from current time of the corresponding video of the title is less than default first duration, and sentences
Whether the video length of the video of breaking is less than default second duration;
It is described according to the title, generate corresponding first term vector of the title, comprising:
It is less than default first duration, and the video in duration of the time of occurrence away from current time of the corresponding video of the title
Video length be less than default second duration in the case where, according to the title, generate corresponding first term vector of the title.
3. method according to claim 1 or 2, which is characterized in that it is described according to the title, it is corresponding to generate the title
The first term vector, comprising:
Word segmentation processing is carried out to the title, obtains at least one participle corresponding for the title;
According to default filtering rule, obtained word segmentation result is filtered;
According to filtered word segmentation result, corresponding first term vector of the title is generated.
4. the method according to claim 1, wherein the term vector to the updated cluster centre into
Row filtering, comprising:
For each term vector in the cluster centre, judge whether is class members's number in the corresponding record of the term vector
Greater than default second threshold;
If so, calculating the inverse word frequency of each participle in the term vector;
According to the inverse word frequency of each participle, the average value of the corresponding inverse word frequency of the term vector is calculated;
Judge whether the average value is less than default third threshold value;
If so, by the record deletion in the term vector and the second list corresponding with the term vector.
5. according to the method described in claim 4, it is characterized in that, described according to filtered term vector, to the burst thing
Part corpus is updated, comprising:
By the corresponding participle addition of term vector remaining after deletion in the emergency event corpus.
6. a kind of updating device of emergency event corpus, which is characterized in that described device includes that video title obtains module, the
One term vector generation module, cluster centre update module, term vector filtering module and emergency event corpus update module,
In,
The video title obtains module, for obtaining the title of video;
The first term vector generation module, for generating corresponding first term vector of the title according to the title;
The cluster centre update module, for described to being used to update according to first term vector and default update rule
The cluster centre of emergency event corpus is updated;
The term vector filtering module is filtered for the term vector to the updated cluster centre;
The emergency event corpus update module, for according to filtered term vector, to the emergency event corpus into
Row updates;
The cluster centre update module, is specifically used for:
Judge in the first list in the cluster centre for storing term vector with the presence or absence of similar to first term vector
The second term vector;
If it does not, by first term vector addition in the first list, and for depositing in the cluster centre
Store up addition in the second list of the frequency of class members's number and the participle in term vector corresponding with first term vector the
One record;
If it does, updating the second record in the corresponding second list of second term vector;
For first record or second record, each of first record or second record point are judged
Whether the quotient of the frequency of word and first record or class members's number in second record is greater than preset first threshold value;Such as
Fruit is that the participle is determined as participle to be processed;It is to be handled according to the institute in first record or second record
Participle generates target term vector;
Judge to whether there is in the term vector in the first list in addition to first term vector or second term vector
Term vector identical with the target term vector or the term vector comprising the target term vector;
If existed and institute in the term vector in the first list in addition to first term vector or second term vector
State the identical term vector of target term vector, by the first list first term vector or second term vector delete
It removes, by the corresponding record deletion of term vector identical with the target term vector in the second list;And establish with it is described
The corresponding relationship of the identical term vector of target term vector and first record or second record;
If in the term vector in the first list in addition to first term vector or second term vector exist comprising
The term vector of the target term vector;By in the first list first term vector or second term vector delete,
The term vector comprising the target term vector is deleted;By in the second list include the target term vector word to
Measure corresponding record deletion;By the target term vector addition in the first list, and establish the target term vector with
The corresponding relationship of first record or second record;
If there is no or do not include, by the first list first term vector or second term vector delete;
By the target term vector addition in the first list, and establish the target term vector and it is described first record or it is described
The corresponding relationship of second record.
7. device according to claim 6, which is characterized in that the first term vector generation module is specifically used for:
Word segmentation processing is carried out to the title, obtains at least one participle corresponding for the title;
According to default filtering rule, obtained participle is filtered;
According to filtered participle, corresponding first term vector of the title is generated.
8. device according to claim 6, which is characterized in that described device further include: judgment module,
The judgment module, for judging whether duration of the time of occurrence away from current time of the corresponding video of the title is less than
Default first duration, and judge whether the video length of the video is less than default second duration;
The first term vector generation module, is specifically used for:
Judge that duration of the time of occurrence away from current time of the corresponding video of the title is less than in the judgment module to preset
First duration, and in the case that the video length of the video is less than default second duration, word segmentation processing is carried out to the title,
Obtain at least one participle corresponding for the title;According to default filtering rule, obtained participle is filtered;Root
According to filtered participle, corresponding first term vector of the title is generated.
9. device according to claim 6, which is characterized in that the term vector filtering module is specifically used for:
For each term vector in the cluster centre, judge whether is class members's number in the corresponding record of the term vector
Greater than default second threshold;
If so, calculating the inverse word frequency of each participle in the term vector;
According to the inverse word frequency of each participle, the average value of the corresponding inverse word frequency of the term vector is calculated;
Judge whether the average value is less than default third threshold value;
If so, by the record deletion in the term vector and the second list corresponding with the term vector.
10. device according to claim 9, which is characterized in that the corpus update module is specifically used for:
By the corresponding participle addition of term vector remaining after deletion in the emergency event corpus.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610509717.8A CN106202293B (en) | 2016-06-30 | 2016-06-30 | A kind of update method and device of emergency event corpus |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610509717.8A CN106202293B (en) | 2016-06-30 | 2016-06-30 | A kind of update method and device of emergency event corpus |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106202293A CN106202293A (en) | 2016-12-07 |
CN106202293B true CN106202293B (en) | 2019-05-10 |
Family
ID=57463191
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610509717.8A Active CN106202293B (en) | 2016-06-30 | 2016-06-30 | A kind of update method and device of emergency event corpus |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106202293B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107832444B (en) * | 2017-11-21 | 2021-08-13 | 北京百度网讯科技有限公司 | Event discovery method and device based on search log |
CN110363206B (en) * | 2018-03-26 | 2023-06-27 | 阿里巴巴集团控股有限公司 | Clustering of data objects, data processing and data identification method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102968439A (en) * | 2012-10-11 | 2013-03-13 | 微梦创科网络科技(中国)有限公司 | Method and device for sending microblogs |
CN104809252A (en) * | 2015-05-20 | 2015-07-29 | 成都布林特信息技术有限公司 | Internet data extraction system |
CN104915447A (en) * | 2015-06-30 | 2015-09-16 | 北京奇艺世纪科技有限公司 | Method and device for tracing hot topics and confirming keywords |
CN105022801A (en) * | 2015-06-30 | 2015-11-04 | 北京奇艺世纪科技有限公司 | Hot video mining method and hot video mining device |
-
2016
- 2016-06-30 CN CN201610509717.8A patent/CN106202293B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102968439A (en) * | 2012-10-11 | 2013-03-13 | 微梦创科网络科技(中国)有限公司 | Method and device for sending microblogs |
CN104809252A (en) * | 2015-05-20 | 2015-07-29 | 成都布林特信息技术有限公司 | Internet data extraction system |
CN104915447A (en) * | 2015-06-30 | 2015-09-16 | 北京奇艺世纪科技有限公司 | Method and device for tracing hot topics and confirming keywords |
CN105022801A (en) * | 2015-06-30 | 2015-11-04 | 北京奇艺世纪科技有限公司 | Hot video mining method and hot video mining device |
Also Published As
Publication number | Publication date |
---|---|
CN106202293A (en) | 2016-12-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107943718B (en) | Method and device for cleaning cache file | |
CN105224658B (en) | A kind of Query method in real time and system of big data | |
CN109002484B (en) | Method and system for sequentially consuming data | |
US20170031948A1 (en) | File synchronization method, server, and terminal | |
CN111061750A (en) | Query processing method and device and computer readable storage medium | |
KR101705778B1 (en) | Sliding window based frequent patterns management method for mining weighted maximal frequent patterns over data stream | |
CN106469097B (en) | A kind of method and apparatus for recalling error correction candidate based on artificial intelligence | |
CN105608194A (en) | Method for analyzing main characteristics in social media | |
CN109947729B (en) | Real-time data analysis method and device | |
CN109194711A (en) | A kind of synchronous method of organizational structure, client, server-side and medium | |
CN105631749A (en) | User portrait calculation method based on statistical data | |
CN105827422A (en) | Method and device for determining network element alarm correlation relation | |
CN103747147A (en) | Method and equipment for updating address book | |
CN106202293B (en) | A kind of update method and device of emergency event corpus | |
US10250550B2 (en) | Social message monitoring method and apparatus | |
CN110515895B (en) | Method and system for carrying out associated storage on data files in big data storage system | |
CN110532428A (en) | Hot word configuration method, device, equipment and storage medium | |
CN103064908A (en) | Method for rapidly removing repeated list through a memory | |
CN106649385B (en) | Data reordering method and device based on HBase database | |
CN113420026B (en) | Database table structure changing method, device, equipment and storage medium | |
CN110580255A (en) | method and system for storing and retrieving data | |
CN107590233B (en) | File management method and device | |
CN105512232B (en) | Data storage method and device | |
CN105512230B (en) | Data storage method and device | |
CN108984519B (en) | Dual-mode-based automatic event corpus construction method and device and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |