CN106202293B - A kind of update method and device of emergency event corpus - Google Patents

A kind of update method and device of emergency event corpus Download PDF

Info

Publication number
CN106202293B
CN106202293B CN201610509717.8A CN201610509717A CN106202293B CN 106202293 B CN106202293 B CN 106202293B CN 201610509717 A CN201610509717 A CN 201610509717A CN 106202293 B CN106202293 B CN 106202293B
Authority
CN
China
Prior art keywords
term vector
record
list
title
participle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610509717.8A
Other languages
Chinese (zh)
Other versions
CN106202293A (en
Inventor
叶澄灿
陈英傑
胡军
王天畅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN201610509717.8A priority Critical patent/CN106202293B/en
Publication of CN106202293A publication Critical patent/CN106202293A/en
Application granted granted Critical
Publication of CN106202293B publication Critical patent/CN106202293B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Abstract

The embodiment of the invention discloses the update methods and device of a kind of emergency event corpus, including, obtain the title of video;According to the title, corresponding first term vector of the title is generated;The cluster centre for updating the emergency event corpus is updated according to first term vector and the default rule that updates;The term vector of the updated cluster centre is filtered;According to filtered term vector, the emergency event corpus is updated.Using the embodiment of the present invention, improves for the update efficiency for updating emergency event corpus, keep the search result towards emergency event more reasonable.

Description

A kind of update method and device of emergency event corpus
Technical field
The present invention relates to resource management techniques field, in particular to a kind of the update method and dress of emergency event corpus It sets.
Background technique
As number of videos and user quickly increase video search engine usage degree, video search has become use A kind of important way of family acquisition information.Currently, the newly-increased video towards emergency event (such as news, the events such as amusement Eight Diagrams) Searching request has become a kind of important searching request, and for this kind of request, user's expectation searches newer video.
In the prior art, search engine is usually according to the fixed weight comprehensive consideration degree of correlation, click data, video matter The score of amount, freshness and other aspect totally five dimensions, the higher video file of score is exported to user.If user The participle for wanting retrieval is the relevant participle of emergency event (such as news, amusement Eight Diagrams), and this requires increase for this participle The weight of freshness, at this moment search engine will be exported according to the result searched after freshness weight is increased to user, wherein Including the conceivable content of user, such as news, amusement Eight Diagrams, there are also the not conceivable contents of user, such as hot broadcast collection of drama, search The result of rope is not reasonable, influences user experience.
When search engine judges that search term is towards emergency event, pass through the phase of search term and emergency event corpus Guan Du matching a, it can be determined that whether search term is towards emergency event.However, current all video file research tools All be artificial regeneration emergency event corpus, artificial regeneration can consume a large amount of time and manpower, the update efficiency of corpus compared with It is low.
In addition, existing search method when retrieving emergency event, can show the video of frequent updating or hot broadcast collection of drama It shows and, influence the quality of emergency event search result, the search result towards emergency event is unreasonable.
Summary of the invention
The update method and device for being designed to provide a kind of emergency event corpus of the embodiment of the present invention, to improve needle To the update efficiency for updating emergency event corpus, keep the search result towards emergency event more reasonable.
In order to achieve the above objectives, the embodiment of the invention discloses a kind of update methods of emergency event corpus, comprising:
Obtain the title of video;
According to the title, corresponding first term vector of the title is generated;
According to first term vector and the default rule that updates to the cluster for updating the emergency event corpus Center is updated;
The term vector of the updated cluster centre is filtered;
According to filtered term vector, the emergency event corpus is updated.
Preferably, described according to the title, before generating corresponding first term vector of the title, further includes:
Judge whether duration of the time of occurrence away from current time of the corresponding video of the title is less than default first duration, And judge whether the video length of the video is less than default second duration;
It is described according to the title, generate corresponding first term vector of the title, comprising:
It is less than default first duration in duration of the time of occurrence away from current time of the corresponding video of the title, and described In the case that the video length of video is less than default second duration, according to the title, corresponding first word of the title is generated Vector.
Preferably, it is described according to the title, generate corresponding first term vector of the title, comprising:
Word segmentation processing is carried out to the title, obtains at least one participle corresponding for the title;
According to default filtering rule, obtained word segmentation result is filtered;
According to filtered word segmentation result, corresponding first term vector of the title is generated.
Preferably, described regular to for updating the emergency event corpus according to the term vector and default update Cluster centre be updated, comprising:
Judge to whether there is and first term vector in the first list in the cluster centre for storing term vector Similar second term vector;
If it does not, first term vector addition is used in the first list, and in the cluster centre Addition is corresponding with first term vector in the second list of the frequency of participle in storage class members's number and term vector First record;
If it does, updating the second record in the corresponding second list of second term vector;
For it is described first record or it is described second record, judge it is described first record or it is described second record in it is each Whether the quotient of the frequency of a participle and first record or class members's number in second record is greater than default first threshold Value;If so, the participle is determined as participle to be processed;According to being needed in first record or second record Processing participle, generates target term vector;
Judge in the term vector in the first list in addition to first term vector or second term vector whether Term vector in the presence of term vector identical with the target term vector or comprising the target term vector;
If there is or comprising, by the first list first term vector or second term vector delete and Or the term vector comprising the target term vector is deleted;It will be identical as the target term vector in the second list Term vector or the corresponding record deletion of term vector comprising the target term vector;And it establishes identical as the target term vector Term vector and first record or second record corresponding relationship, or by target term vector addition described the In one list, and establish the corresponding relationship of the target term vector and first record or second record;
If there is no or do not include, by the first list first term vector or second term vector delete It removes;By the target term vector addition in the first list, and establish the target term vector and it is described first record or The corresponding relationship of second record.
Preferably, the term vector to the updated cluster centre is filtered, comprising:
For each term vector in the cluster centre, class members's number in the corresponding record of the term vector is judged Whether default second threshold is greater than;
If so, calculating the inverse word frequency of each participle in the term vector;
According to the inverse word frequency of each participle, the average value of the corresponding inverse word frequency of the term vector is calculated;
Judge whether the average value is less than default third threshold value;
If so, by the record deletion in the term vector and the second list corresponding with the term vector.
Preferably, it is described according to filtered term vector, the emergency event corpus is updated, comprising:
By the corresponding participle addition of term vector remaining after deletion in the emergency event corpus.
To achieve the above object, the embodiment of the invention also discloses a kind of updating devices of emergency event corpus, including Video title obtains module, the first term vector generation module, cluster centre update module, term vector filtering module and emergency event Corpus update module, wherein
The video title obtains module, for obtaining the title of video;
The first term vector generation module, for generating corresponding first term vector of the title according to the title;
The cluster centre update module, for regular to for updating according to first term vector and default update The cluster centre of the emergency event corpus is updated;
The term vector filtering module is filtered for the term vector to the updated cluster centre;
The emergency event corpus update module is used for according to filtered term vector, to the emergency event corpus Library is updated.
Preferably, the first term vector generation module, is specifically used for:
Word segmentation processing is carried out to the title, obtains at least one participle corresponding for the title;
According to default filtering rule, obtained participle is filtered;
According to filtered participle, corresponding first term vector of the title is generated.
Preferably, described device further include: judgment module,
The judgment module, for judge the corresponding video of the title duration of the time of occurrence away from current time whether Less than default first duration, and judge whether the video length of the video is less than default second duration;
The first term vector generation module, is specifically used for:
Judge that the time of occurrence of the corresponding video of the title is less than away from the duration of current time in the judgment module Default first duration, and in the case that the video length of the video is less than default second duration, the title is segmented Processing obtains at least one participle corresponding for the title;According to default filtering rule, obtained participle was carried out Filter;According to filtered participle, corresponding first term vector of the title is generated.
Preferably, the cluster centre update module, is specifically used for:
Judge to whether there is and first term vector in the first list in the cluster centre for storing term vector Similar second term vector;
If it does not, first term vector addition is used in the first list, and in the cluster centre Addition is corresponding with first term vector in the second list of the frequency of participle in storage class members's number and term vector First record;
If it does, updating the second record in the corresponding second list of second term vector;
For it is described first record or it is described second record, judge it is described first record or it is described second record in it is each Whether the quotient of the frequency of a participle and first record or class members's number in second record is greater than default first threshold Value;If so, the participle is determined as participle to be processed;According to being needed in first record or second record Processing participle, generates target term vector;
Judge in the term vector in the first list in addition to first term vector or second term vector whether Term vector in the presence of term vector identical with the target term vector or comprising the target term vector;
If there is or comprising, by the first list first term vector or second term vector delete and Or the term vector comprising the target term vector is deleted;It will be identical as the target term vector in the second list Term vector or the corresponding record deletion of term vector comprising the target term vector;And it establishes identical as the target term vector Term vector and first record or second record corresponding relationship, or by target term vector addition described the In one list, and establish the corresponding relationship of the target term vector and first record or second record;
If there is no or do not include, by the first list first term vector or second term vector delete It removes;By the target term vector addition in the first list, and establish the target term vector and it is described first record or The corresponding relationship of second record.
Preferably, the term vector filtering module, is specifically used for:
For each term vector in the cluster centre, class members's number in the corresponding record of the term vector is judged Whether default second threshold is greater than;
If so, calculating the inverse word frequency of each participle in the term vector;
According to the inverse word frequency of each participle, the average value of the corresponding inverse word frequency of the term vector is calculated;
Judge whether the average value is less than default third threshold value;
If so, by the record deletion in the term vector and the second list corresponding with the term vector.
Preferably, the corpus update module, is specifically used for:
By the corresponding participle addition of term vector remaining after deletion in the emergency event corpus.
As seen from the above technical solution, the embodiment of the invention discloses the update methods and dress of a kind of emergency event corpus It sets, including, obtain the title of video;According to the title, corresponding first term vector of the title is generated;According to described first Term vector and the default rule that updates are updated the cluster centre for updating the emergency event corpus;After update The term vector of the cluster centre be filtered;According to filtered term vector, the emergency event corpus is carried out more Newly.
Emergency event corpus can be automatically updated using method provided in an embodiment of the present invention, it is prominent to eliminate artificial regeneration A large amount of time and manpower needed for hair event corpus improve the efficiency for updating emergency event corpus, while according to inverse Word frequency list optimizes the search result towards emergency event, keeps the search result towards emergency event more reasonable.
Certainly, it implements any of the products of the present invention or method must be not necessarily required to reach all the above excellent simultaneously Point.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.
Fig. 1 is a kind of flow diagram of the update method of emergency event corpus provided in an embodiment of the present invention;
Fig. 2 is the flow diagram provided in an embodiment of the present invention for generating the first term vector;
Fig. 3 is the flow diagram provided in an embodiment of the present invention for updating cluster centre;
Fig. 4 is the process signal that the term vector provided in an embodiment of the present invention to updated cluster centre is filtered Figure;
Fig. 5 is the flow diagram of the update method of another emergency event corpus provided in an embodiment of the present invention;
Fig. 6 is a kind of structural schematic diagram of the updating device of emergency event corpus provided in an embodiment of the present invention;
Fig. 7 is the structural schematic diagram of the updating device of another emergency event corpus provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
To solve prior art problem, the embodiment of the invention provides the update methods and dress of a kind of emergency event corpus It sets, just a kind of update method of emergency event corpus provided in an embodiment of the present invention is illustrated first below.
Fig. 1 is a kind of flow diagram of the update method of emergency event corpus provided in an embodiment of the present invention, comprising:
S101: the title of video is obtained.
Specifically, in practical applications, it is assumed that search engine passes through search, entitled " South Korea's Piao's rose of Sharon of the video of acquisition The philosophy life of president Hui ".
S102: according to the title, corresponding first term vector of the title is generated.
Specifically, as shown in Fig. 2, S102 may include:
S102A: carrying out word segmentation processing to the title, obtains at least one participle corresponding for the title.
Specifically, in practical applications, being segmented using existing participle code to video title.
S102B: according to default filtering rule, obtained word segmentation result is filtered.
Specifically, being filtered to obtained participle to filter out the participle that length is less than setting word segmentation result length It is set as a result, wherein setting word segmentation result length as user.
S102C: according to filtered word segmentation result, corresponding first term vector of the title is generated.
S102 step is carried out specifically by taking the video of entitled " the philosophy life of South Korea president Park Geun-hye " as an example below It is bright.
Word segmentation processing first is carried out to title " the philosophy life of South Korea president Park Geun-hye ", obtained word segmentation result is " South Korea; Park Geun-hye;President;'s;Philosophy;Life ".
Filtering rule is preset further according to basis, to obtained " South Korea;Park Geun-hye;President;'s;Philosophy;Life " carries out Filtering.
Specifically, in practical applications, default filtering rule be filter out length less than setting value participle wherein, Setting value is 2 in this step.
After the filtering of obtained word segmentation result, " South Korea is obtained;Park Geun-hye;President;Philosophy;Life ".
The first term vector is generated further according to filtered word segmentation result.
S103: according to first term vector and the default rule that updates to for updating the emergency event corpus Cluster centre is updated.
Specifically, cluster centre is made of first list and second list, wherein first list is used to store term vector, the Two lists for store record corresponding with the term vector in first list, each in second list record all include and The word frequency list of the corresponding class members's number of first term vector and class.
Specifically, in practical applications, the word frequency list of class includes, participle corresponding with the first term vector and with this point The corresponding word frequency of word.
Specifically, as shown in figure 3, S103 may include:
S103A: judge to whether there is and described first in the first list in the cluster centre for storing term vector Similar second term vector of term vector, if not, S103B is executed, if so, executing S103C.
Specifically, the second term vector similar with the first term vector is, it is not less than with the repetition degree of the first term vector and sets Determine the term vector of threshold value being stored in first list.
S103B: by first term vector addition in the first list, and for storing in the cluster centre It is added and first term vector corresponding first in the second list of the frequency of participle in class members's number and term vector Record.
Specifically, addition first record when, to first record in class members's number assign initial value, by institute's predicate to Participle in amount is added in word frequency list, and assigns initial value to the corresponding word frequency of participle in word frequency list.
S103C: the second record in the corresponding second list of second term vector is updated.
Specifically, increasing class members's number in the second record, increase word frequency corresponding with the participle in the term vector Value, wherein it is increased value be the value equal with initial value.
S103D: for first record or second record, judge in first record or second record Each participle frequency and it is described first record or it is described second record in class members's number quotient whether be greater than it is default First threshold;If so, executing S103E.
S103E: the participle is determined as participle to be processed;According to the institute in first record or second record Participle to be handled generates target term vector.
S103F: judge the term vector in the first list in addition to first term vector or second term vector In whether there is term vector identical with the target term vector or the term vector comprising the target term vector;If so, holding Row S103G, if not, executing S103H.
Specifically, identical as target term vector to be, except described in all participles and first list in target term vector The participle in term vector except first term vector or second term vector is identical;It is first comprising target term vector Term vector in list in addition to first term vector or second term vector is in addition to complete with the participle in target term vector It further include other term vectors outside exactly the same participle.
S103G: by the first list first term vector or second term vector delete and/or will be described Term vector comprising the target term vector is deleted;By the term vector identical with the target term vector in the second list Or the corresponding record deletion of term vector comprising the target term vector;And establish term vector identical with the target term vector The corresponding relationship recorded with first record or described second, or the target term vector is added in the first list In, and establish the corresponding relationship of the target term vector and first record or second record.
Specifically, if term vector in the first list in addition to first term vector or second term vector It is middle to there is term vector identical with the target term vector, by first term vector or described second in the first list Term vector is deleted, by the corresponding record deletion of term vector identical with the target term vector in the second list;And it builds The corresponding relationship of vertical term vector identical with the target term vector and first record or second record.
Specifically, if term vector in the first list in addition to first term vector or second term vector It is middle to there is the term vector comprising the target term vector;By first term vector or second word in the first list Vector is deleted, and the term vector comprising the target term vector is deleted;It will include the target word in the second list The corresponding record deletion of the term vector of vector;By target term vector addition in the first list, and establish the mesh Mark the corresponding relationship of term vector and first record or second record.
S103H: by the first list first term vector or second term vector delete;By the target Term vector adds in the first list, and establishes what the target term vector was recorded with first record or described second Corresponding relationship.
S103 step is described in detail below.
It should be noted that following citing is not constituted merely for the purpose for better understanding the embodiment of the present invention to this The restriction of invention.
Specifically, in practical applications, judging whether term vector is similar for the prior art, and the embodiment of the present invention is not right herein It is repeated.
Illustratively, below to judge whether term vector is similar according to the repetition degree of term vector and be illustrated.
First judge term vector [South Korea;Park Geun-hye;President;Philosophy;Life] with first list in each term vector weight Whether multiple degree is less than given threshold, it is assumed that in embodiments of the present invention, which is 4.
Assuming that former cluster centre are as follows:
First list:
[the fast and the furious;Leading role;Traffic accident]
[spend thousand bones;Victory meeting;It holds;Zhao Liying]
Second list:
[15, (and the fast and the furious: 10, leading role: 12, traffic accident: 9, Paul Walker: 3)]
[23, (and spend thousand bones: 20, victory meeting: 21, it holds: 15, Zhao Liying: 8)]
Wherein, with [23, (and spend thousand bones: 20, victory meeting: 21, hold: for 15, Zhao Liying: 8)], 23 be class members Number, " spend thousand bones: 20 in 20 " be the frequency (word frequency) of word " spending thousand bones ".
Judging result is all term vectors and [South Korea in first list;Park Geun-hye;President;Philosophy;Life] repeat degree Respectively less than 4, i.e. the judging result of S103A is no.
Again by the first term vector [South Korea;Park Geun-hye;President;Philosophy;Life] it is added in first list, and in secondary series Corresponding first record is generated in table.Specifically, generated in second list it is corresponding first record include, by term vector [Korea Spro State;Park Geun-hye;President;Philosophy;Life] corresponding class members's number initial value is set as 1, segment South Korea, Park Geun-hye, president, wise man Learn and the word frequency of life to set be 1, corresponding with the first term vector the first of generation be recorded as [1, (and South Korea: 1, Park Geun-hye: 1, President: 1, philosophy: 1, life: 1)]
New cluster centre are as follows:
First list:
[the fast and the furious;Leading role;Traffic accident]
[spend thousand bones;Victory meeting;It holds;Zhao Liying]
[South Korea;Park Geun-hye;President;Philosophy;Life]
Second list:
[15, (and the fast and the furious: 10, leading role: 12, traffic accident: 9, Paul Walker: 3)]
[23, (and spend thousand bones: 20, victory meeting: 21, it holds: 15, Zhao Liying: 8)]
[1, (and South Korea: 1, Park Geun-hye: 1, president: 1, philosophy: 1, life: 1)]
Be directed to again [1, (and South Korea: 1, Park Geun-hye: 1, president: 1, philosophy: 1, life: 1)], judge each of these participle The quotient of frequency and class members's number in [1, (South Korea: 1, Park Geun-hye: 1, president: 1, philosophy: 1, life: 1)] be equal 1, greatly It is greater than 0.6 in preset first threshold value.
" South Korea, Park Geun-hye, president, philosophy, life " is determined as participle to be processed again;According to institute's participle to be handled, Generate target term vector [South Korea;Park Geun-hye;President;Philosophy;Life].
Judged in first list again except the first term vector [South Korea;Park Geun-hye;President;Philosophy;Life] except term vector In term vector identical with the target term vector or the term vector comprising the target term vector, i.e. S103F step is not present Judging result be no.
Again by the first term vector [South Korea in the first list;Park Geun-hye;President;Philosophy;Life] it deletes;By target Term vector add in the first list, and establish the target term vector with [1, (and South Korea: 1, Park Geun-hye: 1, president: 1, Philosophy: 1, life: 1)] corresponding relationship.Updated cluster centre is as follows:
First list:
[the fast and the furious;Leading role;Traffic accident]
[spend thousand bones;Victory meeting;It holds;Zhao Liying]
[South Korea;Park Geun-hye;President;Philosophy;Life]
Second list:
[15, (and the fast and the furious: 10, leading role: 12, traffic accident: 9, Paul Walker: 3)]
[23, (and spend thousand bones: 20, victory meeting: 21, it holds: 15, Zhao Liying: 8)]
[1, (and South Korea: 1, Park Geun-hye: 1, president: 1, philosophy: 1, life: 1)]
Assuming that former cluster centre are as follows:
First list:
[the fast and the furious;Leading role;Traffic accident]
[spend thousand bones;Victory meeting;It holds;Zhao Liying]
[South Korea;Park Geun-hye;President;Philosophy]
[South Korea;Park Geun-hye;Philosophy]
Second list:
[15, (and the fast and the furious: 10, leading role: 12, traffic accident: 9, Paul Walker: 3)]
[23, (and spend thousand bones: 20, victory meeting: 21, it holds: 15, Zhao Liying: 8)]
[12, (and South Korea: 8, Park Geun-hye: 9, president: 6, philosophy: 7)]
[10, (and South Korea: 8, Park Geun-hye: 10, philosophy: 8)]
First judge the second term vector [South Korea in first list;Park Geun-hye;President;Philosophy] and the first term vector [South Korea; Park Geun-hye;President;Philosophy;Life] similarity be 4, be not less than given threshold 4, i.e. the judging result of S103A is yes.
The second record [12, (South Korea: 8, Park Geun-hye: 9, president: 6, philosophy: 7)] record is updated again, including, by class at Member's number 12 plus 1, is updated to 13.Participle South Korea is present in the second record, by the word frequency 8 of the participle plus 1, is updated to 9, together Reason, the word frequency for segmenting Park Geun-hye are updated to 10 by 9, and the word frequency for segmenting president is updated to 7 by 6, and the word frequency for segmenting philosophy is updated by 7 It is 8.Participle life is not present in the second record, which is increased in the second record, and the word frequency of the participle is initial Value is set as 1, updated second be recorded as [13, (and South Korea: 9, Park Geun-hye: 10, president: 7, philosophy: 8, life: 1)].
New cluster centre are as follows:
First list:
[the fast and the furious;Leading role;Traffic accident]
[spend thousand bones;Victory meeting;It holds;Zhao Liying]
[South Korea;Park Geun-hye;President;Philosophy]
[South Korea;Park Geun-hye;Philosophy]
Second list:
[15, (and the fast and the furious: 10, leading role: 12, traffic accident: 9, Paul Walker: 3)]
[23, (and spend thousand bones: 20, victory meeting: 21, it holds: 15, Zhao Liying: 8)]
[13, (and South Korea: 9, Park Geun-hye: 10, president: 7, philosophy: 8, life: 1)]
[10, (and South Korea: 8, Park Geun-hye: 9, philosophy: 6)]
Again for the second record [13, (South Korea: 9, Park Geun-hye: 10, president: 7, philosophy: 8, life: 1)], judge that second remembers The quotient of the frequency of each of record participle and class members's number in the second record, wherein the corresponding quotient of participle South Korea is 0.69, it is greater than preset first threshold value 0.6, the corresponding quotient of participle Park Geun-hye is 0.77, is greater than 0.6, and similarly, participle president is corresponding Quotient is 0.54, and less than 0.6, the corresponding quotient of participle philosophy is 0.62, is greater than 0.6, and the corresponding quotient of participle life is 0.08, is less than 0.65。
Further according to the corresponding quotient of each participle, determine that participle " South Korea, Park Geun-hye, philosophy " is participle to be processed.According to all Participle to be processed generates target term vector [South Korea;Park Geun-hye;Philosophy].
Judge to obtain [South Korea in the term vector in the first list in addition to the second term vector again;Park Geun-hye;Philosophy] With target term vector [South Korea;Park Geun-hye;Philosophy] it is identical, i.e., the result that S103F step judges is yes.
Again by the second term vector [South Korea in the first list;Park Geun-hye;President;Philosophy] it deletes;By described second The corresponding record of term vector identical with object vector in list [10, (and South Korea: 8, Park Geun-hye: 9, philosophy: 6)] it deletes;And Establish term vector [South Korea identical with the target term vector;Park Geun-hye;Philosophy] and the second record [13, (and South Korea: 9, the plain rose of Sharon Favour: 10, president: 7, philosophy: 8, life: 1)] corresponding relationship.Updated cluster centre are as follows:
First list:
[the fast and the furious;Leading role;Traffic accident]
[spend thousand bones;Victory meeting;It holds;Zhao Liying]
[South Korea;Park Geun-hye;Philosophy]
Second list:
[15, (and the fast and the furious: 10, leading role: 12, traffic accident: 9, Paul Walker: 3)]
[23, (and spend thousand bones: 20, victory meeting: 21, it holds: 15, Zhao Liying: 8)]
[13, (and South Korea: 9, Park Geun-hye: 10, president: 7, philosophy: 8, life: 1)]
Assuming that former cluster centre are as follows:
First list:
[the fast and the furious;Leading role;Traffic accident]
[spend thousand bones;Victory meeting;It holds;Zhao Liying]
[South Korea;Park Geun-hye;President;Philosophy]
[South Korea;Park Geun-hye;Philosophy]
Second list:
[15, (and the fast and the furious: 10, leading role: 12, traffic accident: 9, Paul Walker: 3)]
[23, (and spend thousand bones: 20, victory meeting: 21, it holds: 15, Zhao Liying: 8)]
[12, (and South Korea: 8, Park Geun-hye: 9, president: 6, philosophy: 5]
[10, (and South Korea: 8, Park Geun-hye: 10, philosophy: 8)]
The second term vector [South Korea being first judged as in first list;Park Geun-hye;President;Philosophy] and first term vector [Korea Spro State;Park Geun-hye;President;Philosophy;Life] similarity be 4, be not less than given threshold 4, i.e. the judging result of S103A is yes.
The second record [12, (South Korea: 8, Park Geun-hye: 9, president: 6, philosophy: 5)] record is updated again, including, by class at Member's number 12 plus 1, is updated to 13.Participle South Korea is present in the second record, by the word frequency 8 of the participle plus 1, is updated to 9, together Reason, the word frequency for segmenting Park Geun-hye are updated to 10 by 9, and the word frequency for segmenting president is updated to 7 by 6, and the word frequency for segmenting philosophy is updated by 5 It is 6.Participle life is not present in the second record, which is increased in the second record, and the word frequency of the participle is initial Value is set as 1, updated second be recorded as [13, (and South Korea: 9, Park Geun-hye: 10, president: 7, philosophy: 6, life: 1)].
New cluster centre are as follows:
First list:
[the fast and the furious;Leading role;Traffic accident]
[spend thousand bones;Victory meeting;It holds;Zhao Liying]
[South Korea;Park Geun-hye;President;Philosophy]
[South Korea;Park Geun-hye;Philosophy]
Second list:
[15, (and the fast and the furious: 10, leading role: 12, traffic accident: 9, Paul Walker: 3)]
[23, (and spend thousand bones: 20, victory meeting: 21, it holds: 15, Zhao Liying: 8)]
[13, (and South Korea: 9, Park Geun-hye: 10, president: 7, philosophy: 6, life: 1)]
[10, (and South Korea: 8, Park Geun-hye: 9, philosophy: 6)]
Again for the second record [13, (South Korea: 9, Park Geun-hye: 10, president: 7, philosophy: 6, life: 1)], judge that second remembers The quotient of the frequency of each of record participle and class members's number in the second record, wherein the corresponding quotient of participle South Korea is 0.69, it is greater than preset first threshold value 0.6, the corresponding quotient of participle Park Geun-hye is 0.77, is greater than 0.6, and similarly, participle president is corresponding Quotient is 0.54, and less than 0.6, the corresponding quotient of participle philosophy is 0.46, and less than 0.6, the corresponding quotient of participle life is 0.08, is less than 0.65。
Further according to the corresponding quotient of each participle, determine that participle " South Korea, Park Geun-hye " is participle to be processed.It is to be handled according to institute Participle generates target term vector [South Korea;Park Geun-hye].
Judge to obtain again in the first list except the second term vector [South Korea;Park Geun-hye;President;Philosophy] except word to [South Korea in amount;Park Geun-hye;Philosophy] it include target term vector [South Korea;Park Geun-hye], i.e., the result that S103F step judges is yes.
Again by the second term vector [South Korea in the first list;Park Geun-hye;President;Philosophy] and will comprising target word to Term vector [the South Korea of amount;Park Geun-hye;Philosophy] it deletes;By the term vector comprising the target term vector in the second list Corresponding record [10, (and South Korea: 8, Park Geun-hye: 9, philosophy: 6)] it deletes;And by the target term vector [South Korea;Park Geun-hye] add Be added in the first list, and establish target term vector second record [13, (and South Korea: 9, Park Geun-hye: 10, president: 7, philosophy: 6, life: 1)] corresponding relationship.Updated cluster centre are as follows:
First list:
[the fast and the furious;Leading role;Traffic accident]
[spend thousand bones;Victory meeting;It holds;Zhao Liying]
[South Korea;Park Geun-hye]
Second list:
[15, (and the fast and the furious: 10, leading role: 12, traffic accident: 9, Paul Walker: 3)]
[23, (and spend thousand bones: 20, victory meeting: 21, it holds: 15, Zhao Liying: 8)]
[13, (and South Korea: 9, Park Geun-hye: 10, president: 7, philosophy: 6, life: 1)]
S104: the term vector of the updated cluster centre is filtered.
Specifically, as shown in figure 4, S104 may include:
S104A: for each term vector in the cluster centre, judge class in the corresponding record of the term vector at Whether member's number is greater than default second threshold;If so, executing S104B.
S104B: the inverse word frequency of each participle in the term vector is calculated.
Specifically, calculating the inverse word frequency of each participle in the term vector by inquiring existing inverse word frequency list.
In practical applications, each record in inverse word frequency list is the view all in entire video platform according to the participle What the frequency occurred in frequency marking topic generated, the frequency that the inverse word frequency value of the word occurs in entire corpus with the word is inversely proportional.
S104C: according to the inverse word frequency of each participle, the average value of the corresponding inverse word frequency of the term vector is calculated.
S104D: judging whether the average value is less than default third threshold value, if so, executing S104E.
S104E: by the record deletion in the term vector and the second list corresponding with the term vector.
Just S104 step is described in detail below.
Specifically, in practical applications, unified after having handled 20000 videos can carry out to updated poly- The term vector at class center is filtered.
In practical applications, it is assumed that the updated cluster centre obtained according to S103 step is as follows:
First list:
[the fast and the furious;Leading role;Traffic accident]
[spend thousand bones;Victory meeting;It holds;Zhao Liying]
[South Korea;Park Geun-hye]
Second list:
[15, (and the fast and the furious: 10, leading role: 12, traffic accident: 9, Paul Walker: 3)]
[23, (and spend thousand bones: 20, victory meeting: 21, it holds: 15, Zhao Liying: 8)]
[13, (and South Korea: 9, Park Geun-hye: 10, president: 7, philosophy: 6, life: 1)]
Judge whether class members's number is greater than default second threshold in the record in the second list of cluster centre, wherein in advance If second threshold is 14.Obtain in second list [15, (and the fast and the furious: 10, leading role: 12, traffic accident: 9, Paul Walker: 3)] and [23, (and spend thousand bones: 20, victory meeting: 21, hold: class members's number 15, Zhao Liying: 8)] is greater than 14. according to this step, obtains Cluster centre it is as follows:
First list:
[the fast and the furious;Leading role;Traffic accident]
[spend thousand bones;Victory meeting;It holds;Zhao Liying]
Second list:
[15, (and the fast and the furious: 10, leading role: 12, traffic accident: 9, Paul Walker: 3)]
[23, (and spend thousand bones: 20, victory meeting: 21, it holds: 15, Zhao Liying: 8)]
By inquiring existing inverse word frequency list, [the fast and the furious is calculated;Leading role;Traffic accident] and [spend thousand bones;Victory meeting;It calls together It opens;Zhao Liying] the corresponding participle of term vector inverse word frequency value.
Judge whether the inverse word frequency value for the participle that above step is calculated is less than default third threshold value, wherein third threshold value It is 8.5.
Assuming that [the fast and the furious;Leading role;Traffic accident] it is corresponding segment inverse word frequency value and be not less than 8.5, then delete first list In [spend thousand bones;Victory meeting;It holds;Zhao Liying] record corresponding with the term vector in term vector and second list.
Filtered cluster centre is as follows:
First list:
[the fast and the furious;Leading role;Traffic accident]
Second list:
[15, (and the fast and the furious: 10, leading role: 12, traffic accident: 9, Paul Walker: 3)]
In practical applications, each record in inverse word frequency list is the view all in entire video platform according to the participle What the frequency occurred in frequency marking topic generated, the frequency that the inverse word frequency value of the word occurs in entire corpus with the word is inversely proportional.
In the prior art, search engine is usually according to the fixed weight comprehensive consideration degree of correlation, click data, video matter The score of amount, freshness and other aspect totally five dimensions, the higher video file of score is exported to user.If user The participle for wanting retrieval is the relevant participle of emergency event (such as news, amusement Eight Diagrams), and this requires increase for this participle The weight of freshness, at this moment search engine will be exported according to the result searched after freshness weight is increased to user, wherein Including the conceivable content of user, such as news, amusement Eight Diagrams, there are also the not conceivable contents of user, such as hot broadcast collection of drama.
In embodiment provided by the invention, the corresponding participle of the video (such as hot broadcast collection of drama) frequently occurred before is inverse Inverse word frequency value in word frequency list will be relatively high, and inverse word frequency value can be greater than to the video of third threshold value using the embodiment of the present invention It filters out, that is to say, that the corresponding participle of these videos will not be added in emergency event corpus.In search engine to searching When rope word is judged, the corresponding participle of these videos will not be judged as to the search term of corresponding emergency event, i.e. these videos It does not appear in the search result towards emergency event, search result is more reasonable.
S105: according to filtered term vector, the emergency event corpus is updated.
Specifically, by the corresponding participle addition of filtered term vector in the emergency event corpus.
In practical applications, [the fast and the furious will be obtained according to S104 step;Leading role;Traffic accident] the corresponding participle of term vector " the fast and the furious, leading role, traffic accident " is added to emergency event and expects in library.
Method provided in an embodiment of the present invention can automatically update emergency event corpus, eliminate artificial regeneration burst thing A large amount of time and manpower needed for part corpus improve the efficiency for updating emergency event corpus, while according to inverse word frequency Table optimizes the search result towards emergency event, keeps the search result towards emergency event more reasonable.
Fig. 5 is the flow diagram of the update method of another emergency event corpus provided in an embodiment of the present invention, this On the basis of invention embodiment illustrated in fig. 5 embodiment shown in Fig. 1, increase S106 before S102: judging that the title is corresponding The time of occurrence of video whether be less than default first duration away from the duration of current time, and judge the video length of the video Whether default second duration is less than;If so, executing S102.
It is described in detail by taking the video of entitled " the philosophy life of South Korea president Park Geun-hye " as an example below.
First judge whether the time of occurrence of above-mentioned video is less than default first duration away from the duration of current time, and in judgement Whether the video length for stating video is less than default second duration.
Assuming that default first when is 3 days a length of, preset 20 minutes a length of when second, it is assumed that the time of occurrence of above-mentioned video is away from working as The duration of preceding time is less than default first duration 3 days, and the video length of above-mentioned video is less than default second duration 20 and divides Clock, then the judging result of this step is yes.
It is reduced to be processed before generating the first term vector corresponding with title using embodiment illustrated in fig. 5 of the present invention The range of video title further reduces the time needed for updating emergency event corpus, improves emergency event corpus Update efficiency.
Corresponding with above-mentioned embodiment of the method, the embodiment of the invention also discloses a kind of updates of emergency event corpus Device.
Fig. 6 is a kind of structural schematic diagram of the updating device of emergency event corpus provided in an embodiment of the present invention, can be with It include: that video title obtains module 601, the first term vector generation module 602, cluster centre update module 603, term vector filtering Module 604 and emergency event corpus update module 605, in which:
Video title obtains module 601, for obtaining the title of video.
First term vector generation module 602, for generating corresponding first term vector of the title according to the title;
In practical applications, the first term vector generation module 602, specifically can be used for:
Word segmentation processing is carried out to the title, obtains at least one participle corresponding for the title;According to presetting Filter rule is filtered obtained participle;According to filtered participle, corresponding first term vector of the title is generated.
Cluster centre update module 603, for regular to for updating according to first term vector and default update The cluster centre of the emergency event corpus is updated.
In practical applications, cluster centre update module 603, specifically can be used for:
Judge to whether there is and first term vector in the first list in the cluster centre for storing term vector Similar second term vector;
If it does not, first term vector addition is used in the first list, and in the cluster centre Addition is corresponding with first term vector in the second list of the frequency of participle in storage class members's number and term vector First record;
If it does, updating the second record in the corresponding second list of second term vector;
For it is described first record or it is described second record, judge it is described first record or it is described second record in it is each Whether the quotient of the frequency of a participle and first record or class members's number in second record is greater than default first threshold Value;If so, the participle is determined as participle to be processed;According to being needed in first record or second record Processing participle, generates target term vector;
Judge in the term vector in the first list in addition to first term vector or second term vector whether Term vector in the presence of term vector identical with the target term vector or comprising the target term vector;
If there is or comprising, by the first list first term vector or second term vector delete and Or the term vector comprising the target term vector is deleted;It will be identical as the target term vector in the second list Term vector or the corresponding record deletion of term vector comprising the target term vector;And it establishes identical as the target term vector Term vector and first record or second record corresponding relationship, or by target term vector addition described the In one list, and establish the corresponding relationship of the target term vector and first record or second record;
If there is no or do not include, by the first list first term vector or second term vector delete It removes;By the target term vector addition in the first list, and establish the target term vector and it is described first record or The corresponding relationship of second record.
Term vector filtering module 604 is filtered for the term vector to the updated cluster centre.
In practical applications, term vector filtering module 604, specifically can be used for:
For each term vector in the cluster centre, class members's number in the corresponding record of the term vector is judged Whether default second threshold is greater than;
If so, calculating the inverse word frequency of each participle in the term vector;
According to the inverse word frequency of each participle, the average value of the corresponding inverse word frequency of the term vector is calculated;
Judge whether the average value is less than default third threshold value;
If so, by the record deletion in the term vector and the second list corresponding with the term vector.
In practical applications, each record in inverse word frequency list is the view all in entire video platform according to the participle What the frequency occurred in frequency marking topic generated, the frequency that the inverse word frequency value of the word occurs in entire corpus with the word is inversely proportional.
Emergency event corpus update module 605 is used for according to filtered term vector, to the emergency event corpus It is updated.
In practical applications, emergency event corpus update module 605, specifically can be used for, by word remaining after deletion The corresponding participle addition of vector is in the emergency event corpus.
Using embodiment illustrated in fig. 6 of the present invention, emergency event corpus can be automatically updated, eliminates artificial regeneration burst A large amount of time and manpower needed for event corpus improve the efficiency for updating emergency event corpus, while according to inverse word Frequency table optimizes the search result towards emergency event, keeps the search result towards emergency event more reasonable.
Fig. 7 is the structural schematic diagram of another emergency event corpus updating device provided in an embodiment of the present invention, this hair Video title judgment module 606 is increased on the basis of bright embodiment illustrated in fig. 7 embodiment shown in Fig. 6, it is described for judging Whether duration of the time of occurrence away from current time of the corresponding video of title is less than default first duration, and judges the video Whether video length is less than default second duration, if so, the first term vector generation module 602 of triggering.
View to be processed is reduced before generating the first term vector corresponding with title using Fig. 7 shown device of the present invention The range of frequency marking topic, further reduces the time needed for updating emergency event corpus, improves emergency event corpus Update efficiency.
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in process, method, article or equipment including the element.
Each embodiment in this specification is all made of relevant mode and describes, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for device reality For applying example, since it is substantially similar to the method embodiment, so being described relatively simple, related place is referring to embodiment of the method Part explanation.
Those of ordinary skill in the art will appreciate that all or part of the steps in realization above method embodiment is can It is completed with instructing relevant hardware by program, the program can store in computer-readable storage medium, The storage medium designated herein obtained, such as: ROM/RAM, magnetic disk, CD.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all Any modification, equivalent replacement, improvement and so within the spirit and principles in the present invention, are all contained in protection scope of the present invention It is interior.

Claims (10)

1. a kind of update method of emergency event corpus characterized by comprising
Obtain the title of video;
According to the title, corresponding first term vector of the title is generated;
According to first term vector and the default rule that updates to the cluster centre for updating the emergency event corpus It is updated;
The term vector of the updated cluster centre is filtered;
According to filtered term vector, the emergency event corpus is updated;
It is described regular to the cluster for updating the emergency event corpus according to first term vector and default update Center is updated, comprising:
Judge in the first list in the cluster centre for storing term vector with the presence or absence of similar to first term vector The second term vector;
If it does not, by first term vector addition in the first list, and for depositing in the cluster centre Store up addition in the second list of the frequency of class members's number and the participle in term vector corresponding with first term vector the One record;
If it does, updating the second record in the corresponding second list of second term vector;
For first record or second record, each of first record or second record point are judged Whether the quotient of the frequency of word and first record or class members's number in second record is greater than preset first threshold value;Such as Fruit is that the participle is determined as participle to be processed;It is to be handled according to the institute in first record or second record Participle generates target term vector;
Judge to whether there is in the term vector in the first list in addition to first term vector or second term vector Term vector identical with the target term vector or the term vector comprising the target term vector;
If existed and institute in the term vector in the first list in addition to first term vector or second term vector State the identical term vector of target term vector, by the first list first term vector or second term vector delete It removes, by the corresponding record deletion of term vector identical with the target term vector in the second list;And establish with it is described The corresponding relationship of the identical term vector of target term vector and first record or second record;
If in the term vector in the first list in addition to first term vector or second term vector exist comprising The term vector of the target term vector;By in the first list first term vector or second term vector delete, The term vector comprising the target term vector is deleted;By in the second list include the target term vector word to Measure corresponding record deletion;By the target term vector addition in the first list, and establish the target term vector with The corresponding relationship of first record or second record;
If there is no or do not include, by the first list first term vector or second term vector delete; By the target term vector addition in the first list, and establish the target term vector and it is described first record or it is described The corresponding relationship of second record.
2. the method according to claim 1, wherein it is corresponding to generate the title according to the title described The first term vector before, further includes:
Judge whether duration of the time of occurrence away from current time of the corresponding video of the title is less than default first duration, and sentences Whether the video length of the video of breaking is less than default second duration;
It is described according to the title, generate corresponding first term vector of the title, comprising:
It is less than default first duration, and the video in duration of the time of occurrence away from current time of the corresponding video of the title Video length be less than default second duration in the case where, according to the title, generate corresponding first term vector of the title.
3. method according to claim 1 or 2, which is characterized in that it is described according to the title, it is corresponding to generate the title The first term vector, comprising:
Word segmentation processing is carried out to the title, obtains at least one participle corresponding for the title;
According to default filtering rule, obtained word segmentation result is filtered;
According to filtered word segmentation result, corresponding first term vector of the title is generated.
4. the method according to claim 1, wherein the term vector to the updated cluster centre into Row filtering, comprising:
For each term vector in the cluster centre, judge whether is class members's number in the corresponding record of the term vector Greater than default second threshold;
If so, calculating the inverse word frequency of each participle in the term vector;
According to the inverse word frequency of each participle, the average value of the corresponding inverse word frequency of the term vector is calculated;
Judge whether the average value is less than default third threshold value;
If so, by the record deletion in the term vector and the second list corresponding with the term vector.
5. according to the method described in claim 4, it is characterized in that, described according to filtered term vector, to the burst thing Part corpus is updated, comprising:
By the corresponding participle addition of term vector remaining after deletion in the emergency event corpus.
6. a kind of updating device of emergency event corpus, which is characterized in that described device includes that video title obtains module, the One term vector generation module, cluster centre update module, term vector filtering module and emergency event corpus update module, In,
The video title obtains module, for obtaining the title of video;
The first term vector generation module, for generating corresponding first term vector of the title according to the title;
The cluster centre update module, for described to being used to update according to first term vector and default update rule The cluster centre of emergency event corpus is updated;
The term vector filtering module is filtered for the term vector to the updated cluster centre;
The emergency event corpus update module, for according to filtered term vector, to the emergency event corpus into Row updates;
The cluster centre update module, is specifically used for:
Judge in the first list in the cluster centre for storing term vector with the presence or absence of similar to first term vector The second term vector;
If it does not, by first term vector addition in the first list, and for depositing in the cluster centre Store up addition in the second list of the frequency of class members's number and the participle in term vector corresponding with first term vector the One record;
If it does, updating the second record in the corresponding second list of second term vector;
For first record or second record, each of first record or second record point are judged Whether the quotient of the frequency of word and first record or class members's number in second record is greater than preset first threshold value;Such as Fruit is that the participle is determined as participle to be processed;It is to be handled according to the institute in first record or second record Participle generates target term vector;
Judge to whether there is in the term vector in the first list in addition to first term vector or second term vector Term vector identical with the target term vector or the term vector comprising the target term vector;
If existed and institute in the term vector in the first list in addition to first term vector or second term vector State the identical term vector of target term vector, by the first list first term vector or second term vector delete It removes, by the corresponding record deletion of term vector identical with the target term vector in the second list;And establish with it is described The corresponding relationship of the identical term vector of target term vector and first record or second record;
If in the term vector in the first list in addition to first term vector or second term vector exist comprising The term vector of the target term vector;By in the first list first term vector or second term vector delete, The term vector comprising the target term vector is deleted;By in the second list include the target term vector word to Measure corresponding record deletion;By the target term vector addition in the first list, and establish the target term vector with The corresponding relationship of first record or second record;
If there is no or do not include, by the first list first term vector or second term vector delete; By the target term vector addition in the first list, and establish the target term vector and it is described first record or it is described The corresponding relationship of second record.
7. device according to claim 6, which is characterized in that the first term vector generation module is specifically used for:
Word segmentation processing is carried out to the title, obtains at least one participle corresponding for the title;
According to default filtering rule, obtained participle is filtered;
According to filtered participle, corresponding first term vector of the title is generated.
8. device according to claim 6, which is characterized in that described device further include: judgment module,
The judgment module, for judging whether duration of the time of occurrence away from current time of the corresponding video of the title is less than Default first duration, and judge whether the video length of the video is less than default second duration;
The first term vector generation module, is specifically used for:
Judge that duration of the time of occurrence away from current time of the corresponding video of the title is less than in the judgment module to preset First duration, and in the case that the video length of the video is less than default second duration, word segmentation processing is carried out to the title, Obtain at least one participle corresponding for the title;According to default filtering rule, obtained participle is filtered;Root According to filtered participle, corresponding first term vector of the title is generated.
9. device according to claim 6, which is characterized in that the term vector filtering module is specifically used for:
For each term vector in the cluster centre, judge whether is class members's number in the corresponding record of the term vector Greater than default second threshold;
If so, calculating the inverse word frequency of each participle in the term vector;
According to the inverse word frequency of each participle, the average value of the corresponding inverse word frequency of the term vector is calculated;
Judge whether the average value is less than default third threshold value;
If so, by the record deletion in the term vector and the second list corresponding with the term vector.
10. device according to claim 9, which is characterized in that the corpus update module is specifically used for:
By the corresponding participle addition of term vector remaining after deletion in the emergency event corpus.
CN201610509717.8A 2016-06-30 2016-06-30 A kind of update method and device of emergency event corpus Active CN106202293B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610509717.8A CN106202293B (en) 2016-06-30 2016-06-30 A kind of update method and device of emergency event corpus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610509717.8A CN106202293B (en) 2016-06-30 2016-06-30 A kind of update method and device of emergency event corpus

Publications (2)

Publication Number Publication Date
CN106202293A CN106202293A (en) 2016-12-07
CN106202293B true CN106202293B (en) 2019-05-10

Family

ID=57463191

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610509717.8A Active CN106202293B (en) 2016-06-30 2016-06-30 A kind of update method and device of emergency event corpus

Country Status (1)

Country Link
CN (1) CN106202293B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107832444B (en) * 2017-11-21 2021-08-13 北京百度网讯科技有限公司 Event discovery method and device based on search log
CN110363206B (en) * 2018-03-26 2023-06-27 阿里巴巴集团控股有限公司 Clustering of data objects, data processing and data identification method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102968439A (en) * 2012-10-11 2013-03-13 微梦创科网络科技(中国)有限公司 Method and device for sending microblogs
CN104809252A (en) * 2015-05-20 2015-07-29 成都布林特信息技术有限公司 Internet data extraction system
CN104915447A (en) * 2015-06-30 2015-09-16 北京奇艺世纪科技有限公司 Method and device for tracing hot topics and confirming keywords
CN105022801A (en) * 2015-06-30 2015-11-04 北京奇艺世纪科技有限公司 Hot video mining method and hot video mining device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102968439A (en) * 2012-10-11 2013-03-13 微梦创科网络科技(中国)有限公司 Method and device for sending microblogs
CN104809252A (en) * 2015-05-20 2015-07-29 成都布林特信息技术有限公司 Internet data extraction system
CN104915447A (en) * 2015-06-30 2015-09-16 北京奇艺世纪科技有限公司 Method and device for tracing hot topics and confirming keywords
CN105022801A (en) * 2015-06-30 2015-11-04 北京奇艺世纪科技有限公司 Hot video mining method and hot video mining device

Also Published As

Publication number Publication date
CN106202293A (en) 2016-12-07

Similar Documents

Publication Publication Date Title
CN107943718B (en) Method and device for cleaning cache file
CN105224658B (en) A kind of Query method in real time and system of big data
CN109002484B (en) Method and system for sequentially consuming data
US20170031948A1 (en) File synchronization method, server, and terminal
CN111061750A (en) Query processing method and device and computer readable storage medium
KR101705778B1 (en) Sliding window based frequent patterns management method for mining weighted maximal frequent patterns over data stream
CN106469097B (en) A kind of method and apparatus for recalling error correction candidate based on artificial intelligence
CN105608194A (en) Method for analyzing main characteristics in social media
CN109947729B (en) Real-time data analysis method and device
CN109194711A (en) A kind of synchronous method of organizational structure, client, server-side and medium
CN105631749A (en) User portrait calculation method based on statistical data
CN105827422A (en) Method and device for determining network element alarm correlation relation
CN103747147A (en) Method and equipment for updating address book
CN106202293B (en) A kind of update method and device of emergency event corpus
US10250550B2 (en) Social message monitoring method and apparatus
CN110515895B (en) Method and system for carrying out associated storage on data files in big data storage system
CN110532428A (en) Hot word configuration method, device, equipment and storage medium
CN103064908A (en) Method for rapidly removing repeated list through a memory
CN106649385B (en) Data reordering method and device based on HBase database
CN113420026B (en) Database table structure changing method, device, equipment and storage medium
CN110580255A (en) method and system for storing and retrieving data
CN107590233B (en) File management method and device
CN105512232B (en) Data storage method and device
CN105512230B (en) Data storage method and device
CN108984519B (en) Dual-mode-based automatic event corpus construction method and device and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant