CN104915447B - A kind of much-talked-about topic tracking and keyword determine method and device - Google Patents
A kind of much-talked-about topic tracking and keyword determine method and device Download PDFInfo
- Publication number
- CN104915447B CN104915447B CN201510372462.0A CN201510372462A CN104915447B CN 104915447 B CN104915447 B CN 104915447B CN 201510372462 A CN201510372462 A CN 201510372462A CN 104915447 B CN104915447 B CN 104915447B
- Authority
- CN
- China
- Prior art keywords
- topic
- talked
- much
- video
- hot spot
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/7867—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
Abstract
The embodiment of the invention discloses a kind of tracking of much-talked-about topic and keyword to determine method and device.A kind of much-talked-about topic keyword determine method be by obtaining hot spot data in setting website, and by the way that these hot spot datas are segmented, are clustered etc. with processing, definite much-talked-about topic keyword in time;A kind of much-talked-about topic method for tracing is to be directed to each much-talked-about topic, and by the keyword of the much-talked-about topic and the text similarity of the description information of video file, tracking obtains the corresponding video file of the much-talked-about topic.The technical solution provided using the embodiment of the present invention, the definite and much-talked-about topic tracking of much-talked-about topic keyword is automatically performed by server, it effectively prevent by manually runing lag issues caused by definite much-talked-about topic, and even if some much-talked-about topic may be made of all multistage negotiation events, the technical program is performed a plurality of times by server, the video file corresponding to the much-talked-about topic can be regularly updated, saves artificial operation cost.
Description
Technical field
The present invention relates to Internet technical field, more particularly to a kind of much-talked-about topic tracking and keyword determine method and dress
Put.
Background technology
Much-talked-about topic, refers to certain time, in a certain range, the hot issue that the public is concerned about the most.In video website,
UGC (User Generated Content, user produce content) class video is mostly that user occurs in focus incident or topic
Issue at the first time, there is certain timeliness, this kind of video is higher by the degree of concern of user.But because video network
UGC classes video has the features such as magnanimity, numerous and jumbled property, high duplication in standing, it has not been convenient to which user obtains important information in time.Very much
Video website can be directed to discovery and tracking that this kind of video carries out much-talked-about topic, by the video aggregation of same much-talked-about topic one
Rise, facilitate user to check.
Current video website is all the discovery and tracking that much-talked-about topic is carried out by operation personnel.Operation personnel passes through analysis
The description informations such as the title of video file, brief introduction, determine current much-talked-about topic, further determine that each much-talked-about topic institute is right
The video answered.
The much-talked-about topic determined by the description information of operation personnel's analysis video file often has hysteresis quality, moreover,
Some much-talked-about topics may be made of all multistage negotiation events, be had in time compared with long span, it is necessary to which operation personnel is persistently closed
Note and analysis, artificial operation cost are higher.
The content of the invention
To solve the above problems, the embodiment of the invention discloses a kind of tracking of much-talked-about topic and keyword to determine method and dress
Put.Technical solution is as follows:
A kind of much-talked-about topic keyword determines method, applied to server, the described method includes:
The text message of every hot spot data to being obtained in setting website segments, and obtains the base of every hot spot data
The set of plinth word;
Every hot spot data is directed to respectively, it is real for name according to attribute in the set of the basic word of this hot spot data
The frequency that the basic word of body occurs in the text message of this hot spot data, determines the text for establishing this hot spot data
The attribute of model is the basic word of name entity;
Basic word according to identified attribute for name entity, establishes the text model of every hot spot data;
According to the text similarity of the text model of every two hot spot datas, acquired all hot spot datas are gathered
Class, obtains at least one class cluster;
For each class cluster, the hot spot data included according to attribute in such cluster for the basic word of name entity in such cluster
Text message in the frequency that occurs, determine the keyword of such cluster, and the keyword of such cluster is determined as such cluster and is corresponded to
Much-talked-about topic keyword.
The present invention a kind of embodiment in, the basic word for obtaining every hot spot data set it
Afterwards, before basic word of the attribute for determining the text model for establishing this hot spot data for name entity, further include:
Stop words filtration treatment is carried out to the basic word in the set of the basic word of every hot spot data respectively.
In a kind of embodiment of the present invention, further include:
For each class cluster, the highest at least one keyword of the frequency is searched in the keyword of such definite cluster;
A title is selected in the title of the hot spot data where at least one keyword found, as such cluster
Corresponding much-talked-about topic.
A kind of much-talked-about topic method for tracing that method is determined based on above-mentioned much-talked-about topic keyword, applied to server, institute
The method of stating includes:
For each much-talked-about topic, the keyword of the much-talked-about topic and the text of the description information of each video file are determined
Similarity;
According to identified text similarity, the corresponding video file of the much-talked-about topic is followed the trail of.
In a kind of embodiment of the present invention, text similarity determined by the basis, follows the trail of hot spot words
Corresponding video file is inscribed, including:
Whether default first threshold is more than according to identified text similarity, determines the candidate video of the much-talked-about topic
Collection;
Concentrated in the candidate video of the much-talked-about topic, carry out video duplicate removal processing;
According to duplicate removal handling result, the corresponding video file of the much-talked-about topic is determined.
In a kind of embodiment of the present invention, described according to duplicate removal handling result, the much-talked-about topic pair is determined
After the video file answered, further include:
Whether the quantity of the corresponding video file of the much-talked-about topic is more than default second threshold determined by judging;
If it is, the identified hot spot is talked about according to the issue time at intervals of issue moment adjacent video successively
Inscribe corresponding video file and carry out hierarchical clustering processing, until obtained classification number is not more than the default second threshold;
According to the quality of video in each classification, determine that each classification is corresponding and represent video;
By the corresponding associated video for representing video and being determined as the much-talked-about topic of each classification.
In a kind of embodiment of the present invention, the candidate video in the much-talked-about topic is concentrated, and carries out video
Duplicate removal processing, including:
The issue moment for the video file concentrated according to the candidate video of the much-talked-about topic, according to the issue moment from morning to night
The video concentrated to the candidate video of order be ranked up;
Judge to issue whether moment adjacent two video files are palinopsia frequency successively, if it is, talking about in the hot spot
The candidate video of topic concentrates the video file for retaining issue morning at moment, deletes the video file in issue evening at moment.
In a kind of embodiment of the present invention, it is described judge issue moment adjacent two video files whether be
Video is repeated, including:
Calculate the text similarity of the description information of issue moment adjacent two video files, and according to text is calculated
This similarity, determines whether the two video files are palinopsia frequency;
Alternatively,
The visual signature similarity of two adjacent video files of moment is issued in calculating, and special according to the vision being calculated
Similarity is levied, determines whether the two video files are palinopsia frequency;
Or;
The text similarity and the two video files of the description information of two adjacent video files of moment are issued in calculating
Visual signature similarity, and according to the text similarity and visual signature similarity being calculated, determine the two videos text
Whether part is palinopsia frequency.
A kind of much-talked-about topic keyword determining device, applied to server, described device includes:
Basic set of words obtains module, and the text message for every hot spot data to being obtained in setting website divides
Word, obtains the set of the basic word of every hot spot data;
Entity attribute basis word determining module is named, for being directed to every hot spot data respectively, in this hot spot data
It is the frequency for naming the basic word of entity to occur in the text message of this hot spot data according to attribute in the set of basic word
Secondary, the attribute for determining the text model for establishing this hot spot data is the basic word of name entity;
Text model establishes module, for the basic word according to identified attribute for name entity, establishes every hot spot
The text model of data;
Hot spot data cluster module, for the text similarity of the text model according to every two hot spot datas, to being obtained
All hot spot datas taken are clustered, and obtain at least one class cluster;
Much-talked-about topic keyword determining module, for being name entity according to attribute in such cluster for each class cluster
The frequency that basic word occurs in the text message for the hot spot data that such cluster includes, determines the keyword of such cluster, and should
The keyword of class cluster is determined as the keyword of the corresponding much-talked-about topic of such cluster.
In a kind of embodiment of the present invention, the basis set of words obtains module and is additionally operable to:
Stop words filtration treatment is carried out to the basic word in the set of the basic word of every hot spot data respectively.
In a kind of embodiment of the present invention, further include:
Much-talked-about topic title determining module, for for each class cluster, frequency to be searched in the keyword of such definite cluster
Secondary highest at least one keyword;One is selected in the title of the hot spot data where at least one keyword found
Title, as the corresponding much-talked-about topic of such cluster.
A kind of much-talked-about topic follow-up mechanism based on above-mentioned much-talked-about topic keyword determining device, applied to server, institute
Stating device includes:
Text similarity determining module, for for each much-talked-about topic, determine the keyword of the much-talked-about topic with it is each
The text similarity of the description information of video file;
Video file tracing module, for according to identified text similarity, following the trail of the corresponding video of the much-talked-about topic
File.
In a kind of embodiment of the present invention, the video file tracing module, including:
Candidate video collection determination sub-module, for whether being more than default first threshold according to identified text similarity
Value, determines the candidate video collection of the much-talked-about topic;
Duplicate removal handles submodule, for the candidate video concentration in the much-talked-about topic, carries out video duplicate removal processing;
Video file determination sub-module, for according to duplicate removal handling result, determining the corresponding video file of the much-talked-about topic.
In a kind of embodiment of the present invention, further include:
Judging submodule, it is default whether the quantity for judging the corresponding video file of the identified much-talked-about topic is more than
Second threshold, if it is, triggering clustering processing submodule;
The clustering processing submodule, for successively according to the issue time at intervals of issue moment adjacent video, to institute
The corresponding video file of the definite much-talked-about topic carries out hierarchical clustering processing, until obtained classification number is default no more than described
Second threshold;
Video determination sub-module is represented, for the quality according to video in each classification, determines each classification corresponding generation
Table video;
Associated video determination sub-module, for by the corresponding association for representing video and being determined as the much-talked-about topic of each classification
Video.
In a kind of embodiment of the present invention, the duplicate removal handles submodule, including:
Video sequencing unit, for the issue moment for the video file concentrated according to the candidate video of the much-talked-about topic, is pressed
The video that the order of cloth moment from morning to night approved for distribution concentrates the candidate video is ranked up;
Video judging unit is repeated, for judging to issue whether moment adjacent two video files are palinopsia successively
Frequently, if it is, triggering duplicate removal processing unit;
The duplicate removal processing unit, the video text of issue morning at moment is retained for being concentrated in the candidate video of the much-talked-about topic
Part, deletes the video file in issue evening at moment.
It is described to repeat video judging unit in a kind of embodiment of the present invention, it is specifically used for:
Calculate the text similarity of the description information of issue moment adjacent two video files, and according to text is calculated
This similarity, determines whether the two video files are palinopsia frequency;
Alternatively,
The visual signature similarity of two adjacent video files of moment is issued in calculating, and special according to the vision being calculated
Similarity is levied, determines whether the two video files are palinopsia frequency;
Or;
The text similarity and the two video files of the description information of two adjacent video files of moment are issued in calculating
Visual signature similarity, and according to the text similarity and visual signature similarity being calculated, determine the two videos text
Whether part is palinopsia frequency.
The technical solution provided using the embodiment of the present invention, by obtaining hot spot data in setting website, and by right
These hot spot datas such as are segmented, are clustered at the processing, in time the keyword of definite much-talked-about topic and much-talked-about topic, for each heat
Point topic, by the keyword of the much-talked-about topic and the text similarity of the description information of video file, tracking obtains the hot spot
The corresponding video file of topic.So far, definite and much-talked-about topic the tracking of much-talked-about topic keyword is automatically complete by server
Into effectively prevent by manually runing lag issues caused by definite much-talked-about topic, and even if some much-talked-about topic may
It is made of all multistage negotiation events, the technical program is performed a plurality of times by server, it is right that much-talked-about topic institute can be regularly updated
The video file answered, saves artificial operation cost.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing
There is attached drawing needed in technology description to be briefly described, it should be apparent that, drawings in the following description are only this
Some embodiments of invention, for those of ordinary skill in the art, without creative efforts, can be with
Other attached drawings are obtained according to these attached drawings.
Fig. 1 is a kind of implementing procedure figure that much-talked-about topic keyword determines method in the embodiment of the present invention;
Fig. 2 is a kind of implementing procedure figure of much-talked-about topic method for tracing in the embodiment of the present invention;
Fig. 3 is another implementing procedure figure of much-talked-about topic method for tracing in the embodiment of the present invention;
Fig. 4 is a kind of structure diagram of much-talked-about topic keyword determining device in the embodiment of the present invention;
Fig. 5 is a kind of structure diagram of much-talked-about topic follow-up mechanism in the embodiment of the present invention.
Embodiment
In order to make those skilled in the art more fully understand the technical solution in the embodiment of the present invention, below in conjunction with this hair
Attached drawing in bright embodiment, is clearly and completely described the technical solution in the embodiment of the present invention, it is clear that described
Embodiment is only part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, this area
Those of ordinary skill's all other embodiments obtained without making creative work, belong to protection of the present invention
Scope.
Shown in Figure 1, a kind of much-talked-about topic keyword provided by the embodiment of the present invention determines the implementation stream of method
Cheng Tu, this method are applied to server, may comprise steps of:
S110:The text message of every hot spot data to being obtained in setting website segments, and obtains every hot spot number
According to basic word set.
In Internet era, information is able to fast propagation, and in certain time, the focus of people often compares concentration.It is micro-
Hot spot data of concern has been converged in the websites such as rich topic list, search roll of the hour, has embodied a concentrated reflection of people for current time
The view of some topic or something in section.
In practical applications, same topic may have different language performances, but its theme showed is identical
's.Can be periodically or non-periodically by web crawlers or other information acquisition modes, from website set in advance (on such as
The websites such as the microblog topic list stated, search roll of the hour) in obtain hot spot data.Every acquired hot spot data can include should
The text messages such as title, summary, detailed description and/or the linked contents of bar hot spot data.
It is understood that different websites may be different for the expression way of hot spot data, so for obtaining
The hot spot data arrived, can be first unified data representation according to default format conversion by it, such as according to following form:Mark
Topic, description, time, relevant textual information.The text message amount of every hot spot data is larger, it is necessary to text to every hot spot data
This information carries out word segmentation processing, can obtain the set of the basic word of every hot spot data.
, can also be respectively in the set of the basic word of every hot spot data in a kind of embodiment of the present invention
Basic word carry out stop words filtration treatment.In practical applications, stop words dictionary can be pre-set, can be wrapped in the dictionary
Containing " ", " ", the function word such as " obtaining ", the word that " high definition ", " low clear " etc. be determined by operation personnel can also be included.
S120:Every hot spot data is directed to respectively, is life according to attribute in the set of the basic word of this hot spot data
The frequency that the basic word of name entity occurs in the text message of this hot spot data, determines for establishing this hot spot data
The attribute of text model is the basic word of name entity.
In embodiments of the present invention, Entity recognition can be named, determines basic word of the attribute for name entity, institute
Meaning attribute for the basic word of name entity can be with semantic basic word, such as name, place name, building name, can be with
It is basic word with verb or noun part-of-speech etc..Can in the set of the basic word for every hot spot data that step S110 is obtained
Information with the frequency occurred comprising each basic word in the text message of this hot spot data, so as to be according to attribute
The frequency that the basic word of name entity occurs in the text message of this hot spot data, determines to be used to establish this hot spot data
Text model attribute for name entity basic word.Alternatively, being scanned in the text message of this hot spot data, obtain
The frequency occurred to attribute for the basic word of name entity in the text message of this hot spot data, so as to be believed according to the frequency
Breath, the attribute for determining the text model for establishing this hot spot data are the basic word of name entity.It should be noted that tool
The recognition methods for having the basic word of name entity attribute is the prior art, is repeated no more in embodiments of the present invention.
In practical applications, the text in this hot spot data can be selected in the basic set of words of this hot spot data
The higher attribute of frequency of occurrence is used to characterize this hot spot data for the basic word of name entity in this information.For every hot spot
Data, can be according to each text envelope of the basic word in this hot spot data with name entity attribute of this hot spot data
The frequency occurred in breath, sorts according to the height of the frequency, top n is determined as the text model for establishing this hot spot data
Attribute be the basic word of name entity, or preceding x% is determined as to the category of the text model for establishing this hot spot data
Property for name entity basic word.It should be noted that N or x can be set and adjusted, this hair according to actual conditions here
Bright embodiment is without limitation.
S130:Basic word according to identified attribute for name entity, establishes the text model of every hot spot data.
It is determined that the attribute of the text model for establishing this hot spot data is the basis of name entity in step S120
Word, according to identified attribute to name the basic word of entity, can establish the text model of every hot spot data.Specifically,
The text model of every hot spot data can be expressed in the form of vector space model (VSM), wherein determined by can recording
The frequency that attribute occurs for the basic word of name entity in the text message of corresponding hot spot data.It is it should be noted that vectorial
Spatial model is the prior art, and the embodiment of the present invention repeats no more this.
S140:According to the text similarity of the text model of every two hot spot datas, to acquired all hot spot datas
Clustered, obtain at least one class cluster.
The text model of every hot spot data is established in step S130, can be according to the prior art, such as common cosine
Angle theorem, Jaccard distance metrics etc., calculate the text similarity of the text model of every two hot spot datas, and the present invention is right
This is repeated no more.
According to the text similarity of the text model for every two hot spot datas being calculated, to acquired all hot spots
Data are clustered, and obtain at least one class cluster.Such as, will be similar to the text of its text model for a certain bar hot spot data
Degree is classified as a class cluster higher than the hot spot data of a certain predetermined threshold value.Text similarity is carried out using the text model of hot spot data
Calculating, compared to using hot spot data all text messages carry out text similarity calculating, calculation amount is small, computational efficiency
It is high.
For convenience of understanding, lift a simple case and illustrate.Have four hot spot datas, corresponding text model be respectively A,
B, C, D, wherein, the text similarity between A, B, C, D is as shown in table 1:
Table 1
Text model | A | B | C | D |
A | 1 | 0.8 | 0.3 | 0.5 |
B | 0.8 | 1 | 0.3 | 0.2 |
C | 0.3 | 0.3 | 1 | 0.5 |
D | 0.5 | 0.2 | 0.5 | 1 |
It is assumed that if the text similarity of the text model of two hot spot datas is higher than 0.5, by this two hot spot datas
It is classified as a class cluster, then according to the condition, the corresponding hot spot datas of A and the corresponding hot spot datas of B are a class cluster, and C is corresponded to
Hot spot data be a class cluster, the corresponding hot spot datas of D are a class cluster.
S150:For each class cluster, the heat included according to attribute in such cluster for the basic word of name entity in such cluster
The frequency occurred in the text message of point data, determines the keyword of such cluster, and the keyword of such cluster is determined as such
The keyword of the corresponding much-talked-about topic of cluster.
Acquired all hot spot datas are clustered in step S140, obtain at least one class cluster, each class
Hot spot data in cluster can characterize a much-talked-about topic.For every a kind of cluster, the text of the hot spot data in such cluster
The basic word that frequency of occurrence is higher in information (meeting default threshold condition) can as the keyword of such cluster, so as to
The keyword of such cluster is determined as to the keyword of the corresponding much-talked-about topic of such cluster.
In one embodiment of the invention, each class cluster can be directed to, is searched in the keyword of such definite cluster
The highest at least one keyword of the frequency, and selected in the title of the hot spot data where at least one keyword found
One title, as the corresponding much-talked-about topic of such cluster.
The technical solution provided using the embodiment of the present invention, hot spot data is obtained by server in setting website, and
By the way that these hot spot datas are segmented, are clustered etc. with processing, the keyword of definite much-talked-about topic and much-talked-about topic in time, effectively
Avoid by manually runing lag issues caused by definite much-talked-about topic.
Method is determined based on above-mentioned much-talked-about topic keyword, the embodiment of the present invention additionally provides much-talked-about topic method for tracing,
Shown in Figure 2, this method is applied to server, may comprise steps of:
S210:For each much-talked-about topic, the keyword of the much-talked-about topic and the description information of each video file are determined
Text similarity;
Method is determined based on above-mentioned much-talked-about topic keyword, it is determined that current much-talked-about topic and the pass of each much-talked-about topic
Keyword.The description information of video file in video website can be title or the brief introduction of the video file.To Mr. Yu
For a much-talked-about topic, the keyword of the much-talked-about topic and the text similarity of the description information of some video file are higher, table
Show that the video file and the much-talked-about topic are closer.It should be noted that those skilled in the art can pass through prior art meter
Calculate the text similarity of the description information of the keyword for obtaining each much-talked-about topic and each video file, the embodiment of the present invention pair
This is repeated no more.
S220:According to identified text similarity, the corresponding video file of the much-talked-about topic is followed the trail of.
Step S210 determines that the keyword of each much-talked-about topic is similar to the text of the description information of each video file
Degree, according to identified text similarity, can follow the trail of to obtain the corresponding video file of each much-talked-about topic.
In a kind of embodiment of the present invention, step S220 may comprise steps of:
S221:Whether default first threshold is more than according to identified text similarity, determines the time of the much-talked-about topic
Select video set.
It is understood that the keyword of much-talked-about topic and the numerical value of text similarity of description information of video file are got over
Height, represents that the video file and the much-talked-about topic are closer.In practical applications, if identified text similarity is more than in advance
If first threshold, then corresponding video file can be belonged to the much-talked-about topic candidate video concentrate.
S222:Concentrated in the candidate video of the much-talked-about topic, carry out video duplicate removal processing;
It is understood that (User Generated Content, are used for the video file in video website, especially UGC
Family produces content) for class video file mainly by user's upload, description of the different user for the video of identical content may
It is not quite similar, so, there are more repetition or similar video file in video website.In practical applications, can be with pin
The video concentrated to the candidate video of each much-talked-about topic carries out duplicate removal processing.
In a kind of embodiment of the present invention, step S222 may comprise steps of:
Step 1:The issue moment for the video file concentrated according to the candidate video of the much-talked-about topic, according to the issue moment
The video that order from morning to night concentrates the candidate video is ranked up;
Step 2:Judge to issue whether moment adjacent two video files are palinopsia frequency successively, if it is, at this
The candidate video of much-talked-about topic concentrates the video file for retaining issue morning at moment, deletes the video file in issue evening at moment.
For convenience of description, above-mentioned two step is combined and is illustrated.
In embodiments of the present invention, it is the issue moment of video file at the time of user's uploaded videos file, to ensure first
The rights and interests of the user of uploaded videos file, for repeating video, can preferentially retain the video file of issue morning at moment, delete hair
The video file in evening at cloth moment.
In practical applications, at the issue moment for the video file that can be concentrated according to the candidate video of the much-talked-about topic, press
The video that the order of cloth moment from morning to night approved for distribution concentrates candidate video is ranked up, by judge the adjacent issue moment two
Whether a video file is palinopsia frequency, carries out duplicate removal processing.
Judge to issue moment adjacent two video files whether be palinopsia frequency determination methods can have it is following several:
The first:The text similarity of the description information of two adjacent video files of moment is issued in calculating, and according to meter
Calculation obtains text similarity, determines whether the two video files are palinopsia frequency;
It can be counted as previously mentioned, for the text similarity of the description information of two video files by the prior art
Calculate.If the text similarity of the description information of issue moment adjacent two video files is higher than predetermined threshold value, can be true
The two fixed video files are palinopsia frequency.
Second:Calculate the visual signature similarity of issue moment adjacent two video files, and according to being calculated
Visual signature similarity, determine the two video files whether be palinopsia frequency;
In video website, video file is intuitively checked for the convenience of the user, generally can be by video file with breviary diagram form
Show user.The description information of the video file of identical content may be different, but the visual signature of its thumbnail may be identical
Or close, the feature such as the color of the visual signature such as thumbnail of thumbnail, texture, shape.So issue moment phase can be calculated
The visual signature similarity of the thumbnail of two adjacent video files, and according to the visual signature similarity being calculated, determine
Whether the two video files are palinopsia frequency, specifically, visual signature similarity can be higher than to the two of a certain predetermined threshold value
A video file is determined as repeating video.Can be according to existing for the visual signature similarity of the thumbnail of different video file
Technology is calculated, and is such as contrasted color histogram and is obtained visual signature similarity.
In practical applications, the key frame picture in two video files can also be extracted respectively, passes through contrast
The visual signature of the key frame picture of two video files, carries out the calculating of the visual signature similarity of the two video files,
So as to according to result of calculation, determine whether the two video files are palinopsia frequency.For example, video file A and regarding
Frequency file B is two adjacent video files of issue moment, M key frame is first extracted from video file A, from video file B
The middle N number of key frame of extraction, M and N can be identical or different, the visual signature of each key frame is then extracted respectively, for each
Key frame, is reached with a high dimensional feature vector table, under the constraint of time and spatial relationship calculate key frame feature vector it
Between matching degree, to determine whether video file A and video file B are palinopsia frequency.For example, the key frame with video file A
On the basis of, key frame of the key frame of video file A sequentially successively with video file B is contrasted, if video file A
The visual signature similarity of i-th of key frame and j-th of key frame of video file B is more than predetermined threshold value, then it is assumed that finds just
Beginning matching double points i and j, after initial matching point, order calculate video file A, video file B key frame pair vision it is special
Similarity is levied, untill end or mismatch, obtains the matching picture pair for starting point with (i, j).Repeat the above steps, look for
To global maximum matching picture pair, if the number of global maximum matching graph piece pair accounts for total figure the piece number purpose ratio higher than another
Preset matching threshold value, then can determine the two video files for palinopsia frequency.
Certainly, those skilled in the art can also be regarded according to the prior art using other video frame of video file
Feel the calculating of characteristic similarity, the embodiment of the present invention repeats no more this.
The third:The text similarity that the description information of two adjacent video files of moment is issued in calculating is regarded with the two
The visual signature similarity of frequency file, and according to the text similarity and visual signature similarity being calculated, determine the two
Whether video file is palinopsia frequency.
In practical applications, the text similarity and the two videos text of the description information of two video files are considered
The visual signature similarity of part, more can accurately determine whether the two video files are palinopsia frequency, improve duplicate removal essence
Degree.Specifically, text similarity and visual signature similarity predetermined threshold value can be directed to respectively, when the two is above corresponding to it
Predetermined threshold value when, by the two video files be determined as repeat video, alternatively, text similarity and vision can be assigned respectively
The certain weight of characteristic similarity, when the weighted sum of the two is higher than certain predetermined threshold value, determines the two video files to repeat
Video.
It should be noted that above-mentioned three kinds judge whether two video files are that the method for palinopsia frequency can be according to reality
Situation makes choice use, wherein the setting for threshold value can be carried out according to actual conditions.
S223:According to duplicate removal handling result, the corresponding video file of the much-talked-about topic is determined.
According to step S222 duplicate removal handling results, it may be determined that the corresponding video file of the much-talked-about topic.
In practical applications, after carrying out duplicate removal processing, the quantity of the corresponding video file of identified much-talked-about topic may
Still it is very much, show that the much-talked-about topic is more wide in range, there may be certain ductility in time.If talked about as the hot spot
The associated video of topic is all showed, and can be made troubles to checking for user, because user needs constantly to ransack just find to think
The video file to be checked.
Based on this, in one embodiment of the invention, after step S223, can also comprise the following steps:
First step:Whether the quantity of the much-talked-about topic corresponding video file determined by judging is more than default the
Two threshold values, if it is, performing second step.
In practical applications, for each much-talked-about topic, the maximum of the associated video of the much-talked-about topic can be pre-set
Quantity, i.e. second threshold, if the quantity of the corresponding video file of much-talked-about topic determined by step S223 is more than default the
Two threshold values, then perform second step.The specific setting and adjustment of second threshold can be carried out according to actual conditions.
Second step:Successively according to the issue time at intervals of issue moment adjacent video, to the identified hot spot
The corresponding video file of topic carries out hierarchical clustering processing, until obtained classification number is not more than the default second threshold.
It is understood that the issue moment for the different video file that the candidate video after duplicate removal is handled is concentrated may
It is identical or different.If certain much-talked-about topic has certain ductility in time, then the issue of its corresponding video file
Time at intervals may be larger.Can be successively according to the issue time at intervals of issue moment adjacent video, to the identified heat
The corresponding video file of point topic carries out hierarchical clustering processing.Each classification can be as one in the much-talked-about topic evolutionary process
A important stage.
For example for some much-talked-about topic, the quantity for the video file that its candidate video is concentrated is still after duplicate removal is handled
So exceed default second threshold, sequentially these video files can be unfolded sooner or later according to the issue moment, be ordered as video text
Part 1, video file 2, video file 3, video file 4, video file 5, wherein, during the issue of video file 1 and video file 2
Carve at intervals of 1 it is small when, the issue time at intervals of video file 2 and video file 3 is 3 days, video file 3 and video file 4
It is 5 days to issue time at intervals, when the issue time at intervals of video file 4 and video file 5 is 2 small.
Assuming that cluster condition is:No more than 1 day, video file of the time at intervals no more than 1 day will be issued and be classified as one
Classification, obtained result are:
{ video file 1, video file 2 }, { video file 3 }, { video file 4, video file 5 }, totally three classifications.
Still greater than default second threshold, if can change cluster condition is obtained classification number 3:No more than 3 days, i.e.,
Video file of the time at intervals no more than 3 days will be issued and be classified as a classification, obtained result is:
{ video file 1, video file 2, video file 3 }, { video file 4, video file 5 }, totally two classifications.
It should be noted that after these video files are carried out hierarchical clustering processing, the classification number needs finally obtained are small
In default second threshold.
3rd step:According to the quality of video in each classification, determine that each classification is corresponding and represent video.
In practical applications, the video file concentrated by the candidate video of duplicate removal processing is subjected to hierarchical clustering processing
Afterwards, multiple video files are contained in each classification.It is understood that the quality of different video files is uneven, on
The identity of biography person may be different, and beholder is to possible difference of its fancy grade, etc..Each side factor can be considered,
Selection one represents video in multiple video files that each classification is included.
4th step:By the corresponding associated video for representing video and being determined as the much-talked-about topic of each classification.
By the corresponding associated video for representing video and being determined as the much-talked-about topic of each classification., will when there is displaying demand
The associated video of the much-talked-about topic shows user.
The technical solution provided using the embodiment of the present invention, for each much-talked-about topic, passes through the pass of the much-talked-about topic
The text similarity of the description information of keyword and video file, tracking obtains the corresponding video file of the much-talked-about topic, by servicing
Device is automatically performed, even if some much-talked-about topic may be made of all multistage negotiation events, this technology is performed a plurality of times by server
Scheme, can regularly update the video file corresponding to the much-talked-about topic, save artificial operation cost.
Corresponding to embodiment of the method shown in Fig. 1, the embodiment of the present invention additionally provides a kind of much-talked-about topic keyword and determines to fill
Put, shown in Figure 4 applied to server, which can include with lower module:
Basic set of words obtains module 310, for every hot spot data to being obtained in setting website text message into
Row participle, obtains the set of the basic word of every hot spot data;
Entity attribute basis word determining module 320 is named, for being directed to every hot spot data respectively, in this hot spot data
Basic word set in, the frequency that occurs in the text message of this hot spot data of basic word according to attribute for name entity
Secondary, the attribute for determining the text model for establishing this hot spot data is the basic word of name entity;
Text model establishes module 330, for the basic word according to identified attribute for name entity, establishes every heat
The text model of point data;
Hot spot data cluster module 340, for the text similarity of the text model according to every two hot spot datas, to institute
All hot spot datas obtained are clustered, and obtain at least one class cluster;
Much-talked-about topic keyword determining module 350, for being name entity according to attribute in such cluster for each class cluster
The frequency that occurs in the text message for the hot spot data that such cluster includes of basic word, determine the keyword of such cluster, and will
The keyword of such cluster is determined as the keyword of the corresponding much-talked-about topic of such cluster.
In one embodiment of the invention, the basic set of words obtains module 310 and can be also used for:
Stop words filtration treatment is carried out to the basic word in the set of the basic word of every hot spot data respectively.
In one embodiment of the invention, which can also include with lower module:
Much-talked-about topic title determining module, for for each class cluster, frequency to be searched in the keyword of such definite cluster
Secondary highest at least one keyword;One is selected in the title of the hot spot data where at least one keyword found
Title, as the corresponding much-talked-about topic of such cluster.
The device provided using the embodiment of the present invention, hot spot data is obtained by server in setting website, and is passed through
These hot spot datas are segmented, are clustered etc. with processing, the keyword of definite much-talked-about topic and much-talked-about topic, effectively avoids in time
By manually runing lag issues caused by definite much-talked-about topic.
Corresponding to embodiment of the method shown in Fig. 2, the embodiment of the present invention additionally provides a kind of much-talked-about topic follow-up mechanism, application
Shown in Figure 5 in server, which can include with lower module:
Text similarity determining module 410, for for each much-talked-about topic, determine the keyword of the much-talked-about topic with it is each
The text similarity of the description information of a video file;
Video file tracing module 420, for according to identified text similarity, following the trail of, the much-talked-about topic is corresponding to be regarded
Frequency file.
In a kind of embodiment of the present invention, the video file tracing module 420, can include following submodule
Block:
Candidate video collection determination sub-module, for whether being more than default first threshold according to identified text similarity
Value, determines the candidate video collection of the much-talked-about topic;
Duplicate removal handles submodule, for the candidate video concentration in the much-talked-about topic, carries out video duplicate removal processing;
Video file determination sub-module, for according to duplicate removal handling result, determining the corresponding video file of the much-talked-about topic.
In a kind of embodiment of the present invention, which can also include following submodule:
Judging submodule, it is default whether the quantity for judging the corresponding video file of the identified much-talked-about topic is more than
Second threshold, if it is, triggering clustering processing submodule;
The clustering processing submodule, for successively according to the issue time at intervals of issue moment adjacent video, to institute
The corresponding video file of the definite much-talked-about topic carries out hierarchical clustering processing, until obtained classification number is default no more than described
Second threshold;
Video determination sub-module is represented, for the quality according to video in each classification, determines each classification corresponding generation
Table video;
Associated video determination sub-module, for by the corresponding association for representing video and being determined as the much-talked-about topic of each classification
Video.
In a kind of embodiment of the present invention, the duplicate removal handles submodule, can include with lower unit:
Video sequencing unit, for the issue moment for the video file concentrated according to the candidate video of the much-talked-about topic, is pressed
The video that the order of cloth moment from morning to night approved for distribution concentrates the candidate video is ranked up;
Video judging unit is repeated, for judging to issue whether moment adjacent two video files are palinopsia successively
Frequently, if it is, triggering duplicate removal processing unit;
The duplicate removal processing unit, the video text of issue morning at moment is retained for being concentrated in the candidate video of the much-talked-about topic
Part, deletes the video file in issue evening at moment.
It is described to repeat video judging unit in a kind of embodiment of the present invention, it is specifically used for:
Calculate the text similarity of the description information of issue moment adjacent two video files, and according to text is calculated
This similarity, determines whether the two video files are palinopsia frequency;
Alternatively,
The visual signature similarity of two adjacent video files of moment is issued in calculating, and special according to the vision being calculated
Similarity is levied, determines whether the two video files are palinopsia frequency;
Or;
The text similarity and the two video files of the description information of two adjacent video files of moment are issued in calculating
Visual signature similarity, and according to the text similarity and visual signature similarity being calculated, determine the two videos text
Whether part is palinopsia frequency.
The device provided using the embodiment of the present invention, for each much-talked-about topic, passes through the keyword of the much-talked-about topic
With the text similarity of the description information of video file, tracking obtains the corresponding video file of the much-talked-about topic, by server certainly
It is dynamic to complete, even if some much-talked-about topic may be made of all multistage negotiation events, the technical program is performed a plurality of times by server,
The video file corresponding to the much-talked-about topic can be regularly updated, saves artificial operation cost.
It should be noted that herein, relational terms such as first and second and the like are used merely to a reality
Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation
In any this actual relation or order.Moreover, term " comprising ", "comprising" or its any other variant are intended to
Non-exclusive inclusion, so that process, method, article or equipment including a series of elements not only will including those
Element, but also including other elements that are not explicitly listed, or further include as this process, method, article or equipment
Intrinsic key element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that
Also there are other identical element in process, method, article or equipment including the key element.
Each embodiment in this specification is described using relevant mode, identical similar portion between each embodiment
Divide mutually referring to what each embodiment stressed is the difference with other embodiment.It is real especially for device
For applying example, since it is substantially similar to embodiment of the method, so description is fairly simple, related part is referring to embodiment of the method
Part explanation.
Can one of ordinary skill in the art will appreciate that realizing that all or part of step in above method embodiment is
To instruct relevant hardware to complete by program, the program can be stored in computer read/write memory medium,
The storage medium designated herein obtained, such as:ROM/RAM, magnetic disc, CD etc..
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all
Any modification, equivalent replacement, improvement and so within the spirit and principles in the present invention, are all contained in protection scope of the present invention
It is interior.
Claims (16)
1. a kind of much-talked-about topic keyword determines method, it is characterised in that applied to server, the described method includes:
The text message of every hot spot data to being obtained in setting website segments, and obtains the basic word of every hot spot data
Set;
Every hot spot data is directed to respectively, is name entity according to attribute in the set of the basic word of this hot spot data
The frequency that basic word occurs in the text message of this hot spot data, determines the text model for establishing this hot spot data
Attribute for name entity basic word;
The frequency occurred according to attribute for the basic word of name entity in the text message of this hot spot data, determines to use
In the basic word that the attribute for the text model for establishing this hot spot data is name entity, including:
According to each basic word with name entity attribute of this hot spot data in the text message of this hot spot data
The frequency of appearance, sorts according to the height of the frequency, top n is determined as to the category of the text model for establishing this hot spot data
Property for name entity basic word, or preceding x% is determined as to the attribute of the text model for establishing this hot spot data and is
Name the basic word of entity;
Basic word according to identified attribute for name entity, establishes the text model of every hot spot data;
According to the text similarity of the text model of every two hot spot datas, acquired all hot spot datas are clustered,
Obtain at least one class cluster;
For each class cluster, the text of the hot spot data included according to attribute in such cluster for the basic word of name entity in such cluster
The frequency occurred in this information, determines the keyword of such cluster, and the keyword of such cluster is determined as the corresponding heat of such cluster
The keyword of point topic;
It is described according to attribute in such cluster for name entity basic word in the text message for the hot spot data that such cluster includes
The frequency of appearance, determines the keyword of such cluster, including:
Using frequency of occurrence in the text message of the hot spot data in such cluster meet the basic word of default threshold condition as
The keyword of such cluster.
2. according to the method described in claim 1, it is characterized in that, the basic word for obtaining every hot spot data set
Afterwards, before basic word of the attribute for determining the text model for establishing this hot spot data for name entity, also wrap
Include:
Stop words filtration treatment is carried out to the basic word in the set of the basic word of every hot spot data respectively.
3. according to the method described in claim 1, it is characterized in that, further include:
For each class cluster, the highest at least one keyword of the frequency is searched in the keyword of such definite cluster;
A title is selected in the title of the hot spot data where at least one keyword found, is corresponded to as such cluster
Much-talked-about topic.
A kind of 4. much-talked-about topic tracking side that method is determined based on claims 1 to 3 any one of them much-talked-about topic keyword
Method, it is characterised in that applied to server, the described method includes:
For each much-talked-about topic, determine that the keyword of the much-talked-about topic is similar to the text of the description information of each video file
Degree;
According to identified text similarity, the corresponding video file of the much-talked-about topic is followed the trail of.
5. according to the method described in claim 4, it is characterized in that, text similarity determined by the basis, follows the trail of the heat
The corresponding video file of point topic, including:
Whether default first threshold is more than according to identified text similarity, determines the candidate video collection of the much-talked-about topic;
Concentrated in the candidate video of the much-talked-about topic, carry out video duplicate removal processing;
According to duplicate removal handling result, the corresponding video file of the much-talked-about topic is determined.
6. according to the method described in claim 5, it is characterized in that, determine that the hot spot is talked about according to duplicate removal handling result described
After inscribing corresponding video file, further include:
Whether the quantity of the corresponding video file of the much-talked-about topic is more than default second threshold determined by judging;
If it is, successively according to the issue time at intervals of issue moment adjacent video, to the identified much-talked-about topic pair
The video file answered carries out hierarchical clustering processing, until obtained classification number is not more than the default second threshold;
According to the quality of video in each classification, determine that each classification is corresponding and represent video;
By the corresponding associated video for representing video and being determined as the much-talked-about topic of each classification.
7. the method according to claim 5 or 6, it is characterised in that the candidate video in the much-talked-about topic is concentrated, into
The processing of row video duplicate removal, including:
The issue moment for the video file concentrated according to the candidate video of the much-talked-about topic, according to issue moment from morning to night suitable
The video that candidate video described in ordered pair is concentrated is ranked up;
Judge to issue whether moment adjacent two video files are palinopsia frequency successively, if it is, in the much-talked-about topic
Candidate video concentrates the video file for retaining issue morning at moment, deletes the video file in issue evening at moment.
8. the method according to the description of claim 7 is characterized in that described judge that issue moment adjacent two video files are
No is palinopsia frequency, including:
Calculate the text similarity of the description information of issue moment adjacent two video files, and according to text phase is calculated
Like degree, determine whether the two video files are palinopsia frequency;
Alternatively,
The visual signature similarity of two adjacent video files of moment is issued in calculating, and according to the visual signature phase being calculated
Like degree, determine whether the two video files are palinopsia frequency;
Or;
The text similarity of description information and the regarding for the two video files of two adjacent video files of moment are issued in calculating
Feel characteristic similarity, and according to the text similarity and visual signature similarity being calculated, determine that the two video files are
No is palinopsia frequency.
9. a kind of much-talked-about topic keyword determining device, it is characterised in that applied to server, described device includes:
Basic set of words obtains module, and the text message for every hot spot data to being obtained in setting website segments,
Obtain the set of the basic word of every hot spot data;
Entity attribute basis word determining module is named, for being directed to every hot spot data respectively, on the basis of this hot spot data
In the set of word, according to attribute to name the frequency that the basic word of entity occurs in the text message of this hot spot data, really
The attribute of the fixed text model for being used to establish this hot spot data is the basic word of name entity;
Name entity attribute basis word determining module, is specifically used for:Each according to this hot spot data has name in fact
The frequency that the basic word of body attribute occurs in the text message of this hot spot data, sorts according to the height of the frequency, by top n
The attribute for being determined as the text model for establishing this hot spot data is the basic word of name entity, or preceding x% is determined
Attribute for the text model for establishing this hot spot data is the basic word for naming entity;
Text model establishes module, for the basic word according to identified attribute for name entity, establishes every hot spot data
Text model;
Hot spot data cluster module, for the text similarity of the text model according to every two hot spot datas, to acquired
All hot spot datas are clustered, and obtain at least one class cluster;
Much-talked-about topic keyword determining module, for being the basis for naming entity according to attribute in such cluster for each class cluster
The frequency that word occurs in the text message for the hot spot data that such cluster includes, determines the keyword of such cluster, and by such cluster
Keyword be determined as the keyword of the corresponding much-talked-about topic of such cluster;
The much-talked-about topic keyword determining module, is specifically used for:It will go out in the text message of hot spot data in such cluster
The existing frequency meets keyword of the basic word of default threshold condition as such cluster.
10. device according to claim 9, it is characterised in that the basis set of words obtains module and is additionally operable to:
Stop words filtration treatment is carried out to the basic word in the set of the basic word of every hot spot data respectively.
11. the device according to claim 9 or 10, it is characterised in that further include:
Much-talked-about topic title determining module, for for each class cluster, the frequency to be searched most in the keyword of such definite cluster
High at least one keyword;A mark is selected in the title of the hot spot data where at least one keyword found
Topic, as the corresponding much-talked-about topic of such cluster.
12. a kind of much-talked-about topic follow-up mechanism of the much-talked-about topic keyword determining device based on described in claim 9, its feature
It is, applied to server, described device includes:
Text similarity determining module, for for each much-talked-about topic, determining the keyword of the much-talked-about topic and each video
The text similarity of the description information of file;
Video file tracing module, for according to identified text similarity, following the trail of the corresponding video file of the much-talked-about topic.
13. device according to claim 12, it is characterised in that the video file tracing module, including:
Candidate video collection determination sub-module, for whether being more than default first threshold according to identified text similarity, really
The candidate video collection of the fixed much-talked-about topic;
Duplicate removal handles submodule, for the candidate video concentration in the much-talked-about topic, carries out video duplicate removal processing;
Video file determination sub-module, for according to duplicate removal handling result, determining the corresponding video file of the much-talked-about topic.
14. device according to claim 13, it is characterised in that further include:
Whether judging submodule, the quantity for the corresponding video file of the much-talked-about topic determined by judging are more than default the
Two threshold values, if it is, triggering clustering processing submodule;
The clustering processing submodule, for successively according to the issue time at intervals of issue moment adjacent video, to determining
The corresponding video file of the much-talked-about topic carry out hierarchical clustering processing, until obtained classification number is no more than described default the
Two threshold values;
Video determination sub-module is represented, for the quality according to video in each classification, determines that the corresponding representative of each classification regards
Frequently;
Associated video determination sub-module, for by each classification it is corresponding represent video and be determined as the association of the much-talked-about topic regard
Frequently.
15. the device according to claim 13 or 14, it is characterised in that the duplicate removal handles submodule, including:
Video sequencing unit, for the issue moment for the video file concentrated according to the candidate video of the much-talked-about topic, according to hair
The video that the order of cloth moment from morning to night concentrates the candidate video is ranked up;
Video judging unit is repeated, for judging to issue whether moment adjacent two video files are palinopsia frequency successively, such as
Fruit is then to trigger duplicate removal processing unit;
The duplicate removal processing unit, the video file of issue morning at moment is retained for being concentrated in the candidate video of the much-talked-about topic,
Delete the video file in issue evening at moment.
16. device according to claim 15, it is characterised in that it is described to repeat video judging unit, it is specifically used for:
Calculate the text similarity of the description information of issue moment adjacent two video files, and according to text phase is calculated
Like degree, determine whether the two video files are palinopsia frequency;
Alternatively,
The visual signature similarity of two adjacent video files of moment is issued in calculating, and according to the visual signature phase being calculated
Like degree, determine whether the two video files are palinopsia frequency;
Or;
The text similarity of description information and the regarding for the two video files of two adjacent video files of moment are issued in calculating
Feel characteristic similarity, and according to the text similarity and visual signature similarity being calculated, determine that the two video files are
No is palinopsia frequency.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510372462.0A CN104915447B (en) | 2015-06-30 | 2015-06-30 | A kind of much-talked-about topic tracking and keyword determine method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510372462.0A CN104915447B (en) | 2015-06-30 | 2015-06-30 | A kind of much-talked-about topic tracking and keyword determine method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104915447A CN104915447A (en) | 2015-09-16 |
CN104915447B true CN104915447B (en) | 2018-04-20 |
Family
ID=54084510
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510372462.0A Active CN104915447B (en) | 2015-06-30 | 2015-06-30 | A kind of much-talked-about topic tracking and keyword determine method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104915447B (en) |
Families Citing this family (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105843798A (en) * | 2016-04-05 | 2016-08-10 | 江苏鼎中智能科技有限公司 | Internet information acquisition and fusion method based on divide-and-conquer strategy of long and short messages |
CN107273389A (en) * | 2016-04-08 | 2017-10-20 | 北京国双科技有限公司 | The querying method and device of trial video |
CN106202293B (en) * | 2016-06-30 | 2019-05-10 | 北京奇艺世纪科技有限公司 | A kind of update method and device of emergency event corpus |
CN106503064B (en) * | 2016-09-29 | 2019-07-02 | 中国国防科技信息中心 | A kind of generation method of adaptive microblog topic abstract |
CN107066633A (en) * | 2017-06-15 | 2017-08-18 | 厦门创材健康科技有限公司 | Deep learning method and apparatus based on human-computer interaction |
CN110020421A (en) * | 2018-01-10 | 2019-07-16 | 北京京东尚科信息技术有限公司 | The session information method of abstracting and system of communication software, equipment and storage medium |
CN108446296B (en) * | 2018-01-24 | 2021-10-15 | 北京奇艺世纪科技有限公司 | Information processing method and device |
CN108509517B (en) * | 2018-03-09 | 2021-05-11 | 东南大学 | Streaming topic evolution tracking method for real-time news content |
CN109271509B (en) * | 2018-08-23 | 2021-05-28 | 武汉斗鱼网络科技有限公司 | Live broadcast room topic generation method and device, computer equipment and storage medium |
CN110876070A (en) * | 2018-08-29 | 2020-03-10 | 中国电信股份有限公司 | Content distribution system, processing method, and storage medium |
CN109284286B (en) * | 2018-09-12 | 2021-04-06 | 贵州省赤水市气象局 | Method for extracting effective characteristics from original data set |
CN111309999B (en) * | 2018-12-11 | 2023-05-16 | 阿里巴巴集团控股有限公司 | Method and device for generating interactive scene content |
CN109933709B (en) * | 2019-01-31 | 2023-09-26 | 平安科技(深圳)有限公司 | Public opinion tracking method and device for video text combined data and computer equipment |
CN111666467A (en) * | 2019-03-07 | 2020-09-15 | 上海博泰悦臻网络技术服务有限公司 | Vehicle, vehicle equipment and vehicle equipment news tracking reporting method thereof |
CN110414232B (en) * | 2019-06-26 | 2023-07-25 | 腾讯科技(深圳)有限公司 | Malicious program early warning method and device, computer equipment and storage medium |
CN111027282A (en) * | 2019-11-21 | 2020-04-17 | 精硕科技(北京)股份有限公司 | Text duplicate removal method and device, electronic equipment and computer readable storage medium |
CN111159551B (en) * | 2019-12-30 | 2023-11-03 | 汉海信息技术(上海)有限公司 | User-generated content display method and device and computer equipment |
CN111581493A (en) * | 2020-04-07 | 2020-08-25 | 苏宁云计算有限公司 | Video pushing method and device, computer equipment and storage medium |
CN111881275B (en) * | 2020-07-24 | 2024-02-13 | 新华智云科技有限公司 | Efficient hot spot identification and matching method |
CN115858787B (en) * | 2022-12-12 | 2023-08-01 | 交通运输部公路科学研究所 | Hot spot extraction and mining method based on problem appeal information in road transportation |
CN116561401B (en) * | 2023-05-26 | 2024-03-15 | 北京国新汇金股份有限公司 | Information hotspot refining method and system based on big data analysis |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101751458A (en) * | 2009-12-31 | 2010-06-23 | 暨南大学 | Network public sentiment monitoring system and method |
CN101923544A (en) * | 2009-06-15 | 2010-12-22 | 北京百分通联传媒技术有限公司 | Method for monitoring and displaying Internet hot spots |
CN102937960A (en) * | 2012-09-06 | 2013-02-20 | 北京邮电大学 | Device and method for identifying and evaluating emergency hot topic |
CN102945290A (en) * | 2012-12-03 | 2013-02-27 | 北京奇虎科技有限公司 | Hot microblog topic digging device and method |
CN103577593A (en) * | 2013-11-14 | 2014-02-12 | 中国科学院声学研究所 | Method and system for video aggregation based on microblog hot topics |
-
2015
- 2015-06-30 CN CN201510372462.0A patent/CN104915447B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101923544A (en) * | 2009-06-15 | 2010-12-22 | 北京百分通联传媒技术有限公司 | Method for monitoring and displaying Internet hot spots |
CN101751458A (en) * | 2009-12-31 | 2010-06-23 | 暨南大学 | Network public sentiment monitoring system and method |
CN102937960A (en) * | 2012-09-06 | 2013-02-20 | 北京邮电大学 | Device and method for identifying and evaluating emergency hot topic |
CN102945290A (en) * | 2012-12-03 | 2013-02-27 | 北京奇虎科技有限公司 | Hot microblog topic digging device and method |
CN103577593A (en) * | 2013-11-14 | 2014-02-12 | 中国科学院声学研究所 | Method and system for video aggregation based on microblog hot topics |
Also Published As
Publication number | Publication date |
---|---|
CN104915447A (en) | 2015-09-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104915447B (en) | A kind of much-talked-about topic tracking and keyword determine method and device | |
US10572565B2 (en) | User behavior models based on source domain | |
CN104820629B (en) | A kind of intelligent public sentiment accident emergent treatment system and method | |
US9147154B2 (en) | Classifying resources using a deep network | |
CN103049440B (en) | A kind of recommendation process method of related article and disposal system | |
US8635281B2 (en) | System and method for attentive clustering and analytics | |
CN107578292B (en) | User portrait construction system | |
CN102890698B (en) | Method for automatically describing microblogging topic tag | |
TW201839628A (en) | Method, system and apparatus for discovering and tracking hot topics from network media data streams | |
CN110750656A (en) | Multimedia detection method based on knowledge graph | |
CN106294425A (en) | The automatic image-text method of abstracting of commodity network of relation article and system | |
CN103377258A (en) | Method and device for classification display of microblog information | |
CN106557558A (en) | A kind of data analysing method and device | |
CN106354860A (en) | Method for automatically labelling and pushing information resource based on label sets | |
CN107341199A (en) | A kind of recommendation method based on documentation & info general model | |
CN107679075A (en) | Method for monitoring network and equipment | |
An et al. | A heuristic approach on metadata recommendation for search engine optimization | |
CN110795613A (en) | Commodity searching method, device and system and electronic equipment | |
CN105159898B (en) | A kind of method and apparatus of search | |
CN106649380A (en) | Hot spot recommendation method and system based on tag | |
CN109992665A (en) | A kind of classification method based on the extension of problem target signature | |
CN109612465A (en) | A kind of method, apparatus and its application creating personal scene various dimensions characteristic spectrum | |
CN106933993B (en) | Information processing method and device | |
CN115345252A (en) | Extraction method based on 12345 hot spot | |
CN114706948A (en) | News processing method and device, storage medium and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |