CN106484724A - Information processor and information processing method - Google Patents
Information processor and information processing method Download PDFInfo
- Publication number
- CN106484724A CN106484724A CN201510547792.9A CN201510547792A CN106484724A CN 106484724 A CN106484724 A CN 106484724A CN 201510547792 A CN201510547792 A CN 201510547792A CN 106484724 A CN106484724 A CN 106484724A
- Authority
- CN
- China
- Prior art keywords
- information
- period
- cluster
- relevant
- mood
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
Abstract
Present disclose provides information processor and information processing method.The information processor includes:Information acquisition unit, its obtain a plurality of information that respectively comes with time tag relevant with perpetual object from information source;Sequence generating unit, its generate the time series of described information based on the time tag;Wave-peak detection unit, its are directed to the time series and are detected, to obtain the crest period of the time series;And object event detector unit, which is detected to the information in the crest period, to obtain the event relevant with the perpetual object.The object event detector unit includes:Cluster cell, its are directed to each crest period of the time series, and the information in the crest period is clustered;And timeslot event detector unit, which is directed to each crest period, comprising the information in the most cluster of information bar number in the cluster result based on the cluster cell, detects the event relevant with the crest period.
Description
Technical field
The disclosure relates generally to field of information processing, in particular to information processor and letter
Breath processing method.
Background technology
At present, microblogging, the micro-blog (microblog) for pushing away special (Twitter) etc. have been got over
Receive publicity to get over, the network information for becoming popular obtains platform.But, in internet, data is dug
Pick field, in such as micro-blog with the information source of the magnanimity information of a large number of users, how
It was found that the important content relevant with perpetual object is a difficult point.Can be by general search engine etc.
The information relevant with perpetual object is obtained, but these information is possibly mixed and disorderly, scattered, repetition,
Thus user cannot be best understood by perpetual object at short notice by direct reading.
It is relevant with perpetual object that accurately and efficiently extraction is desirable to from the magnanimity information of information source
Important content.
Content of the invention
The brief overview being given below with regard to the present invention, to provide with regard to some of the present invention
The basic comprehension of aspect.It should be appreciated that this general introduction is not the exhaustive general introduction with regard to the present invention.
It is not intended to determine key or the pith of the present invention, nor the model of the intended limitation present invention
Enclose.Its purpose only provides some concepts in simplified form, more detailed in this, as discussed after a while
The preamble of thin description.
In view of the drawbacks described above of prior art, an object of the present invention be provide one kind can obtain with
The information processor of the relevant event of perpetual object and information processing method, existing at least to overcome
Problem.
According to an aspect of this disclosure, a kind of information processor is provided, including:Acquisition of information list
Unit, its obtain a plurality of information that respectively comes with time tag relevant with perpetual object from information source;Sequence
Column-generation unit, its generate the time series of described information based on the time tag;Crest detection is single
Unit, its are directed to the time series and are detected, to obtain the crest period of the time series;With
And object event detector unit, which is detected to the information in the crest period, to obtain and institute
State the relevant event of perpetual object.The object event detector unit includes:Cluster cell, its are directed to
Each crest period of the time series, the information in the crest period is clustered;With timely
Section event detection unit, which is directed to each crest period, in the cluster result based on the cluster cell
Comprising the information in the most cluster of information bar number, the event relevant with the crest period is detected.
According to another aspect of the present disclosure, a kind of information processing method is provided, including step:From information
Source obtains a plurality of information that respectively comes with time tag relevant with perpetual object;Marked based on the time
Sign the time series for generating described information;Detected for the time series, during obtaining described
The crest period of sequence sequence;And the information in the crest period is detected, to obtain and institute
State the relevant event of perpetual object.Wherein, carrying out detection to the information in the crest period includes:
For each crest period of the time series, the information in the crest period is clustered;With
And each crest period is directed to, comprising the cluster that information bar number is most in the cluster result based on the cluster
Interior information, detects the event relevant with the crest period.
According to the other side of the disclosure, additionally provide a kind of so that computer is believed as above
The program of breath processing meanss.
According to the another aspect of the disclosure, corresponding computer-readable recording medium is additionally provided, the meter
Being stored with calculation machine readable storage medium storing program for executing can be by the computer program of computing device, the computer
Program can make computing device above- mentioned information processing method upon execution.
The above-mentioned various aspects according to the embodiment of the present disclosure, are at least obtained in that following benefit:Using from
Relevant with perpetual object, the time series of information with time tag that information source is obtained, are based on
Information in the crest period of time series obtains the event relevant with perpetual object such that it is able to from letter
With perpetual object relevant important content, and energy are accurately and efficiently extracted in the bulk information in breath source
The coverage and succinct degree of extracting content is enough taken into account.
By the detailed description below in conjunction with accompanying drawing most preferred embodiment of this disclosure, the disclosure these
And other advantages will be apparent from.
Description of the drawings
The disclosure can be by reference to preferably being managed below in association with the description given by accompanying drawing
Solution, wherein employs same or analogous reference to represent same or like in all of the figs
Part.The accompanying drawing includes in this manual and is formed this together with detailed description below
A part for specification, and be used for preferred embodiment of the present disclosure is further illustrated and explain this
Principle disclosed and advantage.Wherein:
Fig. 1 is the exemplary construction for schematically showing the information processor according to the embodiment of the present disclosure
Block diagram.
Fig. 2 is to schematically show the object thing according in the information processor of the embodiment of the present disclosure
The block diagram of the exemplary construction of part detector unit.
Fig. 3 is another example for schematically showing the information processor according to the embodiment of the present disclosure
The block diagram of structure.
Fig. 4 shows the flow process of the example flow of the information processing method according to the embodiment of the present disclosure
Figure.
Fig. 5 is showed and is detected according to the object event in the information processing method of the embodiment of the present disclosure
The flow chart of the example flow of step.
Fig. 6 shows another example flow of the information processing method according to the embodiment of the present disclosure
Flow chart.
Fig. 7 is showed and be can be used to realize information processor and the method according to the embodiment of the present disclosure
A kind of possible hardware configuration structure diagram.
Specific embodiment
The one exemplary embodiment of the present invention is described hereinafter in connection with accompanying drawing.In order to clear and
All features of actual embodiment, for the sake of simple and clear, are not described in the description.However, should
Solution, must make a lot of specific to embodiment during any this practical embodiments are developed
Determine, in order to the objectives of developer are realized, for example, meet that related to system and business
Restrictive conditions, and these restrictive conditions a bit may be changed with the difference of embodiment.
Additionally, it also should be appreciated that, although development is likely to be extremely complex and time-consuming, but to benefiting
For those skilled in the art of present disclosure, this development is only routine task.
Here, in addition it is also necessary to explanation be a bit, in order to avoid having obscured this because of unnecessary details
Bright, illustrate only in the accompanying drawings with according to the closely related apparatus structure of the solution of the present invention and/or
Process step, and eliminate the other details little with relation of the present invention.
The inventors discovered that, in existing data mining is processed, typically by search engine etc. from letter
Breath source (such as micro-blog) obtains the information relevant with perpetual object (such as paying close attention to personage), but
These information are possibly mixed and disorderly, scattered, repetition, thus user cannot be by direct reading in short-term
In be best understood by perpetual object.In addition, in the case that perpetual object is for personage, it is also possible to logical
Cross content (microblogging that for example personage issues) the acquisition correlation that personage's sheet is issued in information source
Information, but this mode equally has and cannot directly obtain important information.
Based on this, the present disclosure proposes at a kind of information that can obtain the event relevant with perpetual object
Reason device and information processing method, its from information source obtain relevant with perpetual object, with time mark
The time series of the information of label, it is right with concern to be obtained based on the information in the crest period of time series
As relevant event such that it is able to accurately and efficiently extract and concern from the magnanimity information of information source
The relevant important content of object, and the coverage for extracting content and succinct degree can be taken into account.
According to an aspect of this disclosure, there is provided a kind of information processor.Fig. 1 be schematically
The block diagram of the exemplary construction of information processor according to the embodiment of the present disclosure is shown.
As shown in figure 1, information processor 10 includes:Information acquisition unit 101, which is from information
Source obtains a plurality of information that respectively comes with time tag relevant with perpetual object;Sequence generating unit
102, which generates the time series of described information based on the time tag;Wave-peak detection unit 103,
Which is directed to the time series and is detected, to obtain the crest period of the time series;And it is right
As event detection unit 104, which is detected to the information in the crest period, to obtain and institute
State the relevant event of perpetual object.Above-mentioned object event detector unit 104 includes:Cluster cell 1041,
Which is directed to each crest period of the time series, and the information in the crest period is clustered;
And timeslot event detector unit 1042, which is directed to each crest period, based on the cluster cell
Comprising the information in the most cluster of information bar number in 1041 cluster result, detect and the crest period
Relevant event.
For the sake of explanation, it is described using microblogging as the example of information source below.The disclosure
Content is certainly not limited to the example, but goes for including any of the information with time attribute
Information source.
In the case of with microblogging as an example information source, information acquisition unit 101 obtain with concern
The information that the relevant band of object has time tag can be by search etc. mode obtain with pass
A relevant microblogging of note object, for example, a title comprising perpetual object includes as people
The microblogging of the name of the perpetual object of thing, the pet name, user name etc..Sequence generating unit 102 is generated
Time series can be the sequential of issuing time, a plurality of microblogging based on every microblogging being achieved in that
Sequence.
In a preferred embodiment, wave-peak detection unit 103 can be detected using sequential crest
(Burst Detection) technology obtains the crest period of time series.For example, wave-peak detection unit
103 can be using Jon Kleinberg in proposition in 2002 based on outburst level (bursty level)
Sequential crest detection technique detected.The details of the technology is referred to " Bursty and
Hierarchical structure in streams ", Jon Kleinberg, KDD 2002:91-101,
The document is incorporated herein by reference.Of course it is possible to use various other appropriate in this area
Mode obtains the crest period of time series, and here is no longer described in detail.
The crest period that wave-peak detection unit 103 is obtained by above-mentioned detection can be preset time model
The bar number of the information relevant with perpetual object in enclosing meets certain condition (for example, the bar of the information
Number with give ratio higher than other periods, the information bar number higher than given threshold value etc.) period.Ripple
The great deal of related information in information source in the peak period embodies the high attention rate to perpetual object, thus
The concern peak period to perpetual object can be considered.Still using microblogging as the example of information source, ripple
The peak period can be considered concern peak period or the interest of the user for given perpetual object of microblogging
Peak period.
In a preferred embodiment, wave-peak detection unit 103 can obtain the ripple in units of day
The peak period.In a lot of information sources, obtaining the crest period in units of day can obtain preferably
Effect.Certainly, wave-peak detection unit 103 can also obtain institute with the other times unit such as hour, week
State the crest period.Those skilled in the art can select suitable chronomere according to actual needs.
After the crest period for obtaining time series, the cluster cell of object event detector unit 104
1041 are directed to each crest period, and the information in the crest period is clustered.By this poly-
Class, can obtain different clusters according to the correlation between the information in each crest period.Correspondingly,
Timeslot event detector unit 1042 can be directed to each crest period, based on comprising letter in cluster result
Information in the most cluster of breath bar number, detects the event relevant with the crest period.So, using right
A plurality of information in the attention rate of the perpetual object high crest period, correlation with each other is high, obtains
The event relevant with the crest period.
The thing relevant with the crest period for being obtained by object event detector unit 104 in the above described manner
Part can be considered the reason for producing the crest period in information source, thus be relevant with perpetual object
Important content.Meanwhile, the event relevant with the crest period is obtained in object event detector unit 104
During, filter out information in the non-crest period and the low letter of correlation with each other in the crest period
Breath, that is, filtered out the low and scattered content of importance.Therefore, using the information of the present embodiment
Processing meanss 10 such that it is able to accurately and efficiently extract and concern from the bulk information of information source
The relevant important content of object, and the coverage for extracting content and succinct degree can be taken into account.
Referring to the object event in information processor of Fig. 2 description according to the embodiment of the present disclosure
The exemplary construction of detector unit.
Fig. 2 is to schematically show the object thing according in the information processor of the embodiment of the present disclosure
The block diagram of the exemplary construction of part detector unit.
As shown in Fig. 2 in a preferred embodiment, except single with the object event detection in Fig. 1
In unit 104 outside similar cluster cell 1041 and timeslot event detector unit 1042, object thing
Part detector unit 104 ' can also include:Term vector represents unit 1040, and which is by each crest period
In information be expressed as term vector, to be supplied to the cluster cell 1041.
As an example, term vector represents that unit 1040 can be to giving every information in the crest period
Carry out participle, each information be expressed as term vector, wherein, term vector space be by from information
Whole words composition in the text message set that source obtains.
In a preferred embodiment, cluster cell 1041 can utilize the automatic cluster based on threshold value
Method is clustered to the information in each crest period.In information source, with concern in different periods
The relevant information of object may be with different grain size, i.e. a plurality of information of such as microblogging has different
Content degree of scatter.For example, relevant with perpetual object micro- within the period that information source is obtained
Win and the multiple events relevant with perpetual object are may relate to, the theme of each bar content of microblog more will divide
Dissipate;And the microblogging relevant with perpetual object in another period is entirely around relevant with perpetual object
One event, now the description theme of each bar content of microblog just relatively concentrate.Above-mentioned two situations
Content of microblog granularity difference very big, the general clustering method towards identical cluster granularity cannot be obtained
Good treatment effect.
The automatic clustering method based on threshold value provided by this preferred embodiment can be according to different periods
Actual conditions, automatically set for the cluster threshold value for giving the period, to be adapted to the letter of different periods
Cease the granularity of content and obtain good cluster result.
In a specific example, it is assumed that have x bar in the given crest period relevant with perpetual object
Information, obtain a term vector from every information, then x term vector is partnered two-by-two, altogetherRight.Cluster cell 1041 can calculate the Euclidean distance between each pair term vector, to obtain
Average Euclidean distance, then it is multiplied by predetermined weighting parameter, you can obtain being adaptive to this x term vector
Cluster threshold value.
For example, can be expressed as follows based on the formula of cluster threshold value δ of Euclidean distance.
Wherein, w is weighting parameter, and more than 0 and less than or equal to 1, preferably value is 0.9.X is
Term vector number.S(mi) and S (mj) represent i-th and j-th strip content of microblog m respectivelyiAnd mjCorresponding
Term vector, Ed (S (mi),S(mj)) represent term vector S (mi) and S (mj) between Euclidean distance.
After determining self-adaption cluster threshold value δ, cluster cell 1041 can adopt base with the following method
Term vector is clustered in the self-adaption cluster threshold value.
(1) first, cluster cell 1041 can be random from the x term vector of given crest period
A term vector is selected, as a new cluster;
(2) then, a term vector not clustered is randomly choosed, calculates which respectively with each
The mean value of the Euclidean distance having between the whole term vectors in cluster;
(3) if Euclidean distance between the whole term vectors in the existing cluster of the term vector and
Mean value is less than self-adaption cluster threshold value, then the term vector is added in the existing cluster;
(4) if for all existing clusters, between the whole term vectors in the term vector and the cluster
The mean value of Euclidean distance both greater than or is equal to self-adaption cluster threshold value, then using the term vector as one
New cluster.
Repeat the above steps (2)-(4) are all clustered up to all term vectors, then cluster cell 1041
Final cluster result is obtained.
In addition, as shown in Fig. 2 in a preferred embodiment of object event detector unit 104 ',
Timeslot event detector unit 1042 can include:Keyword extracting unit 1042-1, its are directed to each
The crest period, close comprising extraction in the middle of the information in the most cluster of information bar number from the cluster result
Keyword, used as the event relevant with the crest period.
Keyword extracting unit 1042-1 can be comprising most one of information bar number from cluster result
Or in the middle of the information in multiple clusters, extract keyword.For purposes of illustration only, below can be by cluster result
The cluster for being extracted keyword is referred to as main cause event.In a preferred exemplary, can be from each ripple
One to two main cause events are selected in the cluster of peak period.For example, comprising information bar number most one
Individual cluster E1(the information bar number which includes is N1) it is chosen as automatically main cause event.For including
The deputy cluster E of information bar number sequence2(the information bar number which includes is N2), according to N2/N1Whether
Determine whether E more than given threshold value2It is classified as main cause event.It is preferred that the threshold value is set to 0.6,
If N2/N1>=0.6, then E2Will be with E1While being classified as main cause event, otherwise, only will
E1Regard as main cause event.It is appreciated that the mode of above-mentioned selection main cause event is only used
In illustration, keyword extracting unit 1042-1 can using any other suitably mode select one
Individual or multiple clusters are used as main cause event.
Selected main cause event for the given crest period (that is, will extract keyword
Cluster), keyword extracting unit 1042-1 can extract keyword by the example below method.
First, keyword extracting unit 1042-1 can carry out participle to selecting every information in cluster,
And part-of-speech tagging is carried out to the text after participle.Inventor has found, with the nominal of unitary and binary
Word string is used as keyword, it is possible to obtain better effects.Therefore, keyword extracting unit 1042-1 from point
Extracting unitary noun and the binary word string comprising noun in word result carries out sorting-out in statistics, according to word frequency
Be ranked up, as the word string list of candidate keywords, and each is calculated according to below equation (2)
The importance degree T of unitary noun or binary word stringvalue:
Tvalue=Tfrequency*Tlength(2)
Wherein, TlengthRefer to the number of words that the length of word string, i.e. word string include, TfrequencyRefer to the word string
The number of times of appearance, and importance degree TvalueDetermined by above-mentioned two factor.
Afterwards, Substring reduction is carried out to unitary word string using binary word string.Rule during Substring reduction is such as
Under:If a binary word string includes another unitary word string, and the importance degree of the binary word string
TvalueImportance degree T more than the unitary word stringvalue, then the unitary word string will be merged, otherwise, will
The binary word string is removed from word string list.Processed by this merger, most suitable word can be retained
The candidate gone here and there as keyword.
In the middle of all word strings obtained after above-mentioned merger is processed, keyword extracting unit 1042-1
Can be by importance degree TvalueThe word string for coming front K position is selected as the keyword of the main cause event.
Preferably, K can be set as 5.
In one example, for each crest period, keyword extracting unit 1042-1 can be from
Comprising information bar number most more than one clusters (that is, more than one main cause in cluster result
Event) in information in the middle of extract keyword, and by the multiple keywords for being extracted collectively as with this
Crest period relevant event.
In addition, as shown in Fig. 2 in a preferred embodiment, object event detector unit 104 '
Can also include:Timeslot event synthesis unit 1043, its are examined to timeslot event detector unit 1042
Survey the event relevant with each crest period of the time series synthesized, as with the pass
The relevant event of note object.
Using timeslot event synthesis unit 1043, the object event detector unit 104 ' of this preferred embodiment
The event relevant with each crest period of perpetual object can not only be obtained, additionally it is possible to whole from information source
The sequence of events relevant with perpetual object is obtained with the time on body.Due to the so overall event sequence for obtaining
Row be based on each crest period relevant with perpetual object, thus its can embody in information source to close
The timing variations of the attention rate of note object, and provide the event conduct corresponding with each crest period
The reason in the period to the high attention rate of perpetual object.
In a specific example, timeslot event detector unit 1042 detected with each crest when
The event of Duan Youguan can be the keyword extracted from each crest period, timeslot event synthesis unit
1043 enumerate the keyword of each crest period together collectively as the thing relevant with perpetual object
Part.
The exemplary construction of object event detector unit 104 ' is described above by reference to Fig. 2.It should be noted that to the greatest extent
Pipe also show in one drawing term vector represent unit 1040, keyword extracting unit 1042-1,
Timeslot event synthesis unit 1043, but this diagram is only illustrative;These units both can be
In a preferred embodiment while realizing, it is also possible to do not rely on each other respectively different preferred
Realize in embodiment.
Tied according to another example of the information processor of the embodiment of the present disclosure referring to Fig. 3 description
Structure.
Fig. 3 is another example for schematically showing the information processor according to the embodiment of the present disclosure
The block diagram of structure.
As shown in figure 3, in a preferred embodiment, except 10 institute of information processor of Fig. 1
Including information acquisition unit 101, sequence generating unit 102, wave-peak detection unit 103, object
Outside event detection unit 104 (or object event detector unit 104 ' of Fig. 3), information processing apparatus
Putting 10 ' can also include:Subjects' mood analytic unit 105, which is to the information in the crest period
Mood analysis is carried out, to obtain the mood relevant with the perpetual object.
Subjects' mood analytic unit 105 can be obtained using various appropriate methods to be had with perpetual object
The mood of pass.For example, subjects' mood analytic unit 105 can using the mood dictionary that is obtained ahead of time or
The mood analysis model of training in advance carries out mood analysis to all or part of information in the crest period,
To obtain the mood relevant with perpetual object.
In a preferred embodiment, subjects' mood analytic unit 105 can include:Period mood is divided
Analysis unit 1051, which is directed to each crest period, includes in the cluster result to cluster cell 1041
Information in the most cluster of information bar number carries out mood analysis, to obtain the feelings relevant with the crest period
Thread.This mood relevant with the crest period obtained by period mood analytic unit 1051 can be right
Mood for the event relevant with the crest period that should be reflected in the information in information source.Here,
The event relevant with the crest period can be object event detection before this with reference to described by Fig. 1 and Fig. 3
Unit 104 or 104 ' in timeslot event detector unit 1042 detected relevant with the crest period
Event.
Subjects' mood analytic unit 105 can be using various appropriate methods to including in cluster result
Information in the most cluster of information bar number carries out mood analysis.For example, period mood analytic unit 1051
Such feelings can be carried out using the mood analysis model of the mood dictionary being obtained ahead of time or training in advance
Thread is analyzed.
In case of mood dictionary, an exemplary mood dictionary can be comprising " happiness ", " compassion
36 kinds of moods and the conventional emotion expression service word corresponding with every kind of mood such as wound ", " sympathy " and " admiration "
Language.By taking " happiness " mood as an example, corresponding conventional emotion expression service word includes " pleasantly surprised ", " happiness "
" jubilant " etc..
Using mood dictionary, for comprising the cluster that information bar number is most in cluster result, period mood is divided
Analysis unit 1051 can search the appearance feelings of the mood word in mood dictionary in the full detail of the cluster
Condition.For example, for given cluster, if a packet in the cluster is containing corresponding in certain mood classification
Mood word, then the statistic frequency of the mood classification is added one, thus to different mood classifications distinguish
Carry out word frequency statisticses.It is then possible to carry out the sequence of mood classification according to frequency size, to obtain
Mood analysis result for the event of the cluster.
With microblogging information source as an example, the mood analysis obtained by period mood analytic unit 1051
As a result can be give the crest period in each user shown for be related to perpetual object and
The mood distribution of the crest period relevant event.The result can be used in help to be carried out and perpetual object
Relevant public relation maintenance, for example, carry out personage's public praise foundation and the crisis relevant with accident
The application such as public relations.
In a preferred embodiment, subjects' mood analytic unit 105 can also include:Period mood
Synthesis unit 1052, its to obtained by period mood analytic unit 1051 with the seasonal effect in time series
Each crest period relevant mood is synthesized, to obtain the mood relevant with the perpetual object.
Using period mood synthesis unit 1052, can obtain from the information of information source to given right
The overall mood distribution of elephant.Still with microblogging information source as an example, using period mood synthesis unit
The mood relevant with perpetual object obtained by 1052 can be each user shown for pass
The overall mood distribution series with the time of note object.Due to such overall mood distribution be based on
Each relevant crest period of perpetual object, thus which can embody the feelings in information source to perpetual object
The timing variations of thread, are particularly conducive to carry out the long-term public relation maintenance relevant with perpetual object, example
Personage public praise foundation is such as carried out.
According to another aspect of the present disclosure, there is provided a kind of information processing method.Fig. 4 shows root
Flow chart according to the example flow of the information processing method of the embodiment of the present disclosure.
Fig. 4 shows the example flow of the information processing method 400 according to the embodiment of the present disclosure.As schemed
Shown in 4, information processing method 400 comprises the steps:Information acquiring step S401, from information source
Obtain a plurality of information that respectively comes with time tag relevant with perpetual object;Sequence generation step
S402, generates the time series of described information based on the time tag;Crest detecting step S403,
Detected for the time series, to obtain the crest period of the time series;And object
Event detection step S404, detects to the information in the crest period, with obtain with described
The relevant event of perpetual object.Object event detecting step S404 can include:Sorting procedure S4041,
For each crest period of the time series, the information in the crest period is clustered;With
And period event detection step S4042, for each crest period, the cluster based on the cluster is tied
Comprising the information in the most cluster of information bar number in fruit, the event relevant with the crest period is detected.
Image processing method 400 and its each step S401-S404 can be included above by reference to Fig. 1 extremely
The image processing apparatus 10 of Fig. 2 description and its corresponding units 101-104, the various places carried out in 104 '
Reason, and the effect similar with the corresponding units described referring to figs. 1 to Fig. 2 can be obtained, at these
The detail of reason and effect will not be described here.
Additionally, Fig. 5 is shown according to the object event in the information processing method of the embodiment of the present disclosure
Detecting step example flow.
As shown in figure 5, with Fig. 4 in object event detecting step S404 in similar cluster
Before step S4041 and timeslot event detecting step S4042, in a preferred embodiment, object
Event detection step S404 ' can also include:Term vector represents step S4040, during by each crest
Information in section is expressed as term vector, for clustering to the information in the crest period.
In a preferred embodiment, in sorting procedure S4041, using based on the automatically poly- of threshold value
Class method carries out the cluster.
In a preferred embodiment, timeslot event detecting step S4042 can include:Keyword is carried
Step S4042-1 is taken, for each crest period, from the cluster result comprising information bar number most
Keyword is extracted in the middle of information in many clusters, as the event relevant with the crest period.
In a preferred embodiment, object event detecting step S404 ' can also include:Period thing
Part synthesis step S4043, to the thing relevant with each crest period of the time series for being detected
Part is synthesized, used as the event relevant with the perpetual object.
The example flow of object event detecting step S404 ' is described above by reference to Fig. 5.It should be noted that
Although also show term vector in one drawing to represent step S4040, keyword extraction step
S4042-1, timeslot event synthesis step S4043, but this diagram is only illustrative;These steps
Suddenly both can be in a preferred embodiment while having realized, it is also possible to do not rely on and exist respectively each other
Realize in different preferred embodiments.
Additionally, object event detecting step S404 ' and including each step can include more than
With reference to Fig. 2 description object event detector unit 104 ' and its each component units in carry out various
Process, and can obtain and the similar effect of corresponding units with reference to Fig. 2 description, these process with
The detail of effect will not be described here.
Fig. 6 shows another example flow of the information processing method according to the embodiment of the present disclosure
Flow chart.
As shown in fig. 6, in a preferred embodiment, except the information processing method 400 of Fig. 4
Included information acquiring step S401, sequence generation step S402, crest detecting step S403,
Outside object event detecting step S404 (or object event detecting step S404 ' of Fig. 5), information
Processing method 400 ' can also include:Subjects' mood analytical procedure S405, in the crest period
Information carry out mood analysis, to obtain the mood relevant with the perpetual object.
Information processing method 400 ' and its each step can include the information above by reference to Fig. 3 description
The various process carried out in processing meanss 10 ' and its corresponding units, and can obtain and reference Fig. 3
The similar effect of the corresponding units of description, these are processed and the detail of effect will not be described here.
In a preferred embodiment, subjects' mood analytical procedure S405 can include:Period mood
Analytical procedure S4051, for each crest period, to including information in the cluster result of the cluster
Information in the most cluster of bar number carries out mood analysis, to obtain the mood relevant with the crest period.
In a preferred embodiment, period mood analytical procedure S4051 is using the mood being obtained ahead of time
The mood analysis model of dictionary or training in advance carries out the mood analysis.
In a preferred embodiment, subjects' mood analytical procedure S405 can also include:Period feelings
Thread synthesis step S4052, to the obtained feelings relevant with each crest period of the seasonal effect in time series
Thread is synthesized, to obtain the mood relevant with the perpetual object.
Above-mentioned information processor (the letter shown in such as Fig. 1, Fig. 3 according to the embodiment of the present disclosure
Breath processing meanss 10,10 ') and each component units therein etc. can by software, firmware,
The mode of hardware or its any combination is configured.In the case of being realized by software or firmware, can
Install from storage medium or network to the machine with specialized hardware structure and constitute the software or firmware
Program, the machine are able to carry out the various functions of above-mentioned each component devices when various programs are provided with.
Fig. 7 is showed and be can be used to realize information processor and the method according to the embodiment of the present disclosure
A kind of possible hardware configuration structure diagram.
In the figure 7, CPU (CPU) 701 is according to depositing in read-only storage (ROM) 702
The program of storage is loaded into the program performing of random access memory (RAM) 703 from storage part 708
Various process.In RAM703, various process etc. are executed always according to needing to store as CPU 701
Deng when required data.CPU701, ROM 702 and RAM 703 connect each other via bus 704
Connect.Input/output interface 705 is also connected to bus 704.
Components described below is also connected to input/output interface 705:Importation 706 is (including keyboard, mouse
Mark etc.), output par, c 707 is (including display, such as cathode-ray tube (CRT), liquid crystal display
Device (LCD) etc., and loudspeaker etc.), storage part 708 (including hard disk etc.), communications portion 709
(including NIC such as LAN card, modem etc.).Communications portion 709 is via network
For example internet executes communication process.As needed, driver 710 can be connected to input/output
Interface 705.Detachable media 711 such as disk, CD, magneto-optic disk, semiconductor memory etc.
It can be installed on driver 710 as needed so that the computer program for reading out can basis
Needs are installed in storage part 708.
Additionally, the program that the disclosure also proposed a kind of instruction code of the machine-readable that is stored with is produced
Product.When above-mentioned instruction code is read and executed by machine, can perform above-mentioned according to the embodiment of the present disclosure
Image processing method.Correspondingly, for carrying the such as disk of this program product, CD, magneto-optic
The various storage mediums of disk, semiconductor memory etc. are also included within the disclosure of the disclosure.
In description above to disclosure specific embodiment, for a kind of embodiment describe and/or
The feature for illustrating can be made in one or more other embodiments in same or similar mode
With, combined with the feature in other embodiment, or substitute the feature in other embodiment.
Additionally, the method for the presently disclosed embodiments be not limited to specifications described in or accompanying drawing
In the time sequencing that illustrates executing, it is also possible to according to other time sequencings, concurrently or independently
Execute.Therefore, the execution sequence of the method described in this specification technical scope structure not of this disclosure
Become to limit.
It should be further understood that can also be to be stored according to each operating process of the said method of the disclosure
The mode of the computer executable program in various machine-readable storage mediums is realized.
And, the purpose of the disclosure can also be accomplished by:To be stored with above-mentioned executable
The storage medium of program code is directly or indirectly supplied to system or equipment, and the system or sets
Computer or CPU (CPU) in standby reads and executes said procedure code.
Now, as long as the system or equipment have the function of configuration processor, then the embodiment party of the disclosure
Formula is not limited to program, and the program can also be arbitrary form, for example, target program, solution
Release the program of device execution or be supplied to shell script of operating system etc..
These machinable mediums above-mentioned are included but is not limited to:Various memories and memory cell,
Semiconductor equipment, disk cell such as light, magnetic and magneto-optic disk, and other are suitable to Jie of storage information
Matter etc..
In addition, customer information processing terminal is by the corresponding website that is connected on internet, and will be according to
Download according to the computer program code of the disclosure and be installed in the information processing terminal and then execute the journey
Sequence, it is also possible to realize the presently disclosed embodiments.
To sum up, according in the embodiment of the present disclosure, present disclose provides following scheme, but not limited to this:
Scheme 1, a kind of information processor, including:
Information acquisition unit, its are obtained from information source and a plurality of relevant with perpetual object respectively come with the time
The information of label;
Sequence generating unit, its generate the time series of described information based on the time tag;
Wave-peak detection unit, its are directed to the time series and are detected, to obtain the time series
The crest period;And
Object event detector unit, its detected to the information in the crest period, with obtain with
The relevant event of the perpetual object, the object event detector unit include:
Cluster cell, its are directed to each crest period of the time series, to the crest period
In information clustered;And
Timeslot event detector unit, its are directed to each crest period, based on the cluster cell
Comprising the information in the most cluster of information bar number in cluster result, detect relevant with the crest period
Event.
Scheme 2, the information processor as described in scheme 1, wherein, the cluster cell utilizes base
The cluster is carried out in the automatic clustering method of threshold value.
Scheme 3, the information processor as described in scheme 1, wherein, the timeslot event detection is single
Unit includes:
Keyword extracting unit, its are directed to each crest period, include information from the cluster result
Keyword is extracted in the middle of information in the most cluster of bar number, as the event relevant with the crest period.
Scheme 4, the information processor as described in scheme 1, wherein, the object event detection is single
Unit also includes:
Timeslot event synthesis unit, its are being detected to the timeslot event detector unit with the sequential
The event that each crest period of sequence is relevant is synthesized, used as the thing relevant with the perpetual object
Part.
Scheme 5, the information processor as described in scheme 1, wherein, the object event detection is single
Unit also includes:
Term vector represents unit, and the information in each crest period is expressed as term vector by which, to provide
To the cluster cell.
Scheme 6, the information processor as described in scheme 1, also include:
Subjects' mood analytic unit, its carry out mood analysis to the information in the crest period, with
Arrive the mood relevant with the perpetual object.
Scheme 7, the information processor as described in scheme 6, wherein, the subjects' mood analysis is single
Unit includes:
Period mood analytic unit, its are directed to each crest period, and the cluster to the cluster cell is tied
Mood analysis is carried out comprising the information in the most cluster of information bar number in fruit, to obtain and the crest period
Relevant mood.
Scheme 8, the information processor as described in scheme 7, wherein, the period mood analysis is single
Unit carries out the mood using the mood analysis model of the mood dictionary being obtained ahead of time or training in advance and divides
Analysis.
Scheme 9, the information processor as described in scheme 7, wherein, the subjects' mood analysis is single
Unit also includes:
Period mood synthesis unit, its to obtained by the period mood analytic unit with the time
The mood that each crest period of sequence is relevant is synthesized, relevant with the perpetual object to obtain
Mood.
Scheme 10, the information processor as described in scheme 1, wherein, the wave-peak detection unit
The crest period is obtained using sequential crest detection technique.
Scheme 11, the information processor as described in scheme 1, wherein, the wave-peak detection unit
The crest period is obtained in units of day.
Scheme 12, a kind of information processing method, including:
The a plurality of information that respectively comes with time tag relevant with perpetual object is obtained from information source;
The time series of described information is generated based on the time tag;
Detected for the time series, to obtain the crest period of the time series;And
Information in the crest period is detected, to obtain the thing relevant with the perpetual object
Part, wherein, carrying out detection to the information in the crest period includes:
For each crest period of the time series, the information in the crest period is carried out
Cluster;And
For each crest period, in the cluster result based on the cluster comprising information bar number most
Information in many clusters, detects the event relevant with the crest period.
Scheme 13, the information processing method as described in scheme 12, wherein, using based on threshold value from
Dynamic clustering method carries out the cluster.
Scheme 14, the information processing method as described in scheme 12, wherein, when detection is with each crest
The event of Duan Youguan includes:
For each crest period, comprising the letter in the most cluster of information bar number from the cluster result
Keyword is extracted in the middle of breath, as the event relevant with the crest period.
Scheme 15, the information processing method as described in scheme 12, wherein, in the crest period
Information detected also and included:
The event relevant with each crest period of the time series to being detected synthesizes, and makees
It is the event relevant with the perpetual object.
Scheme 16, the information processing method as described in scheme 12, wherein, in the crest period
Information detected also and included:
Information in each crest period is expressed as term vector, for the letter in the crest period
Breath is clustered.
Scheme 17, the information processing method as described in scheme 12, also include:
Mood analysis is carried out to the information in the crest period, relevant with the perpetual object to obtain
Mood.
Scheme 18, the information processing method as described in scheme 17, wherein, in the crest period
Information carry out mood analysis include:
For each crest period, to comprising the most cluster of information bar number in the cluster result of the cluster
Interior information carries out mood analysis, to obtain the mood relevant with the crest period.
Scheme 19, the information processing method as described in scheme 18, wherein, using the feelings being obtained ahead of time
The mood analysis model of thread dictionary or training in advance carries out the mood analysis.
Scheme 20, a kind of computer-readable recording medium, store on the computer-readable recording medium
Having can be by the computer program of computing device, and the computer program can make calculating upon execution
Equipment executes a kind of information processing method, and described information processing method includes:
The a plurality of information that respectively comes with time tag relevant with perpetual object is obtained from information source;
The time series of described information is generated based on the time tag;
Detected for the time series, to obtain the crest period of the time series;And
Information in the crest period is detected, to obtain the thing relevant with the perpetual object
Part, wherein, carrying out detection to the information in the crest period includes:
For each crest period of the time series, the information in the crest period is carried out
Cluster;And
For each crest period, in the cluster result based on the cluster comprising information bar number most
Information in many clusters, detects the event relevant with the crest period.
Finally, in addition it is also necessary to explanation, in the disclosure, such as first and second or the like relation
Term is used merely to make a distinction an entity or operation with another entity or operation, and differs
Provisioning request implies there is any this actual relation or order between these entities or operation.
And, term " including ", "comprising" or its any other variant are intended to including for nonexcludability,
So that a series of process including key elements, method, article or equipment not only include those key elements,
But also other key elements including being not expressly set out, or also include for this process, method,
The intrinsic key element of article or equipment.In the absence of more restrictions, by sentence " including one
Individual ... " key element that limits, it is not excluded that include the process of the key element, method, article or
Also there is other identical element in equipment.
Although being had been disclosed to the disclosure by the description of the specific embodiment of the disclosure above,
However, it is to be understood that those skilled in the art can design in the spirit and scope of the appended claims
Various modifications of this disclosure, improvement or equivalent.These modifications, improvement or equivalent also should
When being to be considered as included in the claimed scope of the disclosure.
Claims (10)
1. a kind of information processor, including:
Information acquisition unit, its are obtained from information source and a plurality of relevant with perpetual object respectively come with the time
The information of label;
Sequence generating unit, its generate the time series of described information based on the time tag;
Wave-peak detection unit, its are directed to the time series and are detected, to obtain the time series
The crest period;And
Object event detector unit, its detected to the information in the crest period, with obtain with
The relevant event of the perpetual object, the object event detector unit include:
Cluster cell, its are directed to each crest period of the time series, to the crest period
In information clustered;And
Timeslot event detector unit, its are directed to each crest period, based on the cluster cell
Comprising the information in the most cluster of information bar number in cluster result, detect relevant with the crest period
Event.
2. information processor as claimed in claim 1, wherein, the cluster cell utilizes base
The cluster is carried out in the automatic clustering method of threshold value.
3. information processor as claimed in claim 1, wherein, the timeslot event detection is single
Unit includes:
Keyword extracting unit, its are directed to each crest period, include information from the cluster result
Keyword is extracted in the middle of information in the most cluster of bar number, as the event relevant with the crest period.
4. information processor as claimed in claim 1, wherein, the object event detection is single
Unit also includes:
Timeslot event synthesis unit, its are being detected to the timeslot event detector unit with the sequential
The event that each crest period of sequence is relevant is synthesized, used as the thing relevant with the perpetual object
Part.
5. information processor as claimed in claim 1, wherein, the object event detection is single
Unit also includes:
Term vector represents unit, and the information in each crest period is expressed as term vector by which, to provide
To the cluster cell.
6. information processor as claimed in claim 1, also includes:
Subjects' mood analytic unit, its carry out mood analysis to the information in the crest period, with
Arrive the mood relevant with the perpetual object.
7. information processor as claimed in claim 6, wherein, the subjects' mood analysis is single
Unit includes:
Period mood analytic unit, its are directed to each crest period, and the cluster to the cluster cell is tied
Mood analysis is carried out comprising the information in the most cluster of information bar number in fruit, to obtain and the crest period
Relevant mood.
8. information processor as claimed in claim 7, wherein, the period mood analysis is single
Unit carries out the mood using the mood analysis model of the mood dictionary being obtained ahead of time or training in advance and divides
Analysis.
9. information processor as claimed in claim 7, wherein, the subjects' mood analysis is single
Unit also includes:
Period mood synthesis unit, its to obtained by the period mood analytic unit with the time
The mood that each crest period of sequence is relevant is synthesized, relevant with the perpetual object to obtain
Mood.
10. a kind of information processing method, including:
The a plurality of information that respectively comes with time tag relevant with perpetual object is obtained from information source;
The time series of described information is generated based on the time tag;
Detected for the time series, to obtain the crest period of the time series;And
Information in the crest period is detected, to obtain the thing relevant with the perpetual object
Part, wherein, carrying out detection to the information in the crest period includes:
For each crest period of the time series, the information in the crest period is carried out
Cluster;And
For each crest period, most comprising information bar number in the cluster result based on the cluster
Information in cluster, detects the event relevant with the crest period.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510547792.9A CN106484724A (en) | 2015-08-31 | 2015-08-31 | Information processor and information processing method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510547792.9A CN106484724A (en) | 2015-08-31 | 2015-08-31 | Information processor and information processing method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106484724A true CN106484724A (en) | 2017-03-08 |
Family
ID=58236191
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510547792.9A Pending CN106484724A (en) | 2015-08-31 | 2015-08-31 | Information processor and information processing method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106484724A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107133224A (en) * | 2017-04-25 | 2017-09-05 | 中国人民大学 | A kind of language generation method based on descriptor |
CN107402742A (en) * | 2017-08-04 | 2017-11-28 | 北京京东尚科信息技术有限公司 | Information-pushing method and device |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1904893A (en) * | 2005-07-26 | 2007-01-31 | 兄弟工业株式会社 | Information management system, information processing device, and program |
CN102646114A (en) * | 2012-02-17 | 2012-08-22 | 清华大学 | News topic timeline abstract generating method based on breakthrough point |
CN103514167A (en) * | 2012-06-15 | 2014-01-15 | 富士通株式会社 | Data processing method and device |
CN103559176A (en) * | 2012-10-29 | 2014-02-05 | 中国人民解放军国防科学技术大学 | Microblog emotional evolution analysis method and system |
CN103870474A (en) * | 2012-12-11 | 2014-06-18 | 北京百度网讯科技有限公司 | News topic organizing method and device |
CN103955505A (en) * | 2014-04-24 | 2014-07-30 | 中国科学院信息工程研究所 | Micro-blog-based real-time event monitoring method and system |
CN104199974A (en) * | 2013-09-22 | 2014-12-10 | 中科嘉速(北京)并行软件有限公司 | Microblog-oriented dynamic topic detection and evolution tracking method |
CN104536956A (en) * | 2014-07-23 | 2015-04-22 | 中国科学院计算技术研究所 | A Microblog platform based event visualization method and system |
-
2015
- 2015-08-31 CN CN201510547792.9A patent/CN106484724A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1904893A (en) * | 2005-07-26 | 2007-01-31 | 兄弟工业株式会社 | Information management system, information processing device, and program |
CN102646114A (en) * | 2012-02-17 | 2012-08-22 | 清华大学 | News topic timeline abstract generating method based on breakthrough point |
CN103514167A (en) * | 2012-06-15 | 2014-01-15 | 富士通株式会社 | Data processing method and device |
CN103559176A (en) * | 2012-10-29 | 2014-02-05 | 中国人民解放军国防科学技术大学 | Microblog emotional evolution analysis method and system |
CN103870474A (en) * | 2012-12-11 | 2014-06-18 | 北京百度网讯科技有限公司 | News topic organizing method and device |
CN104199974A (en) * | 2013-09-22 | 2014-12-10 | 中科嘉速(北京)并行软件有限公司 | Microblog-oriented dynamic topic detection and evolution tracking method |
CN103955505A (en) * | 2014-04-24 | 2014-07-30 | 中国科学院信息工程研究所 | Micro-blog-based real-time event monitoring method and system |
CN104536956A (en) * | 2014-07-23 | 2015-04-22 | 中国科学院计算技术研究所 | A Microblog platform based event visualization method and system |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107133224A (en) * | 2017-04-25 | 2017-09-05 | 中国人民大学 | A kind of language generation method based on descriptor |
CN107133224B (en) * | 2017-04-25 | 2020-11-03 | 中国人民大学 | Language generation method based on subject word |
CN107402742A (en) * | 2017-08-04 | 2017-11-28 | 北京京东尚科信息技术有限公司 | Information-pushing method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104778158B (en) | A kind of document representation method and device | |
US8868609B2 (en) | Tagging method and apparatus based on structured data set | |
US20130159277A1 (en) | Target based indexing of micro-blog content | |
US20170075983A1 (en) | Subject-matter analysis of tabular data | |
CN104978332B (en) | User-generated content label data generation method, device and correlation technique and device | |
US20160085855A1 (en) | Perspective data analysis and management | |
US11361030B2 (en) | Positive/negative facet identification in similar documents to search context | |
WO2008098956A1 (en) | Method and apparatus for automatically discovering features in free form heterogeneous data | |
Eder et al. | An open stylometric system based on multilevel text analysis | |
CN110334268B (en) | Block chain project hot word generation method and device | |
CN110287314A (en) | Long text credibility evaluation method and system based on Unsupervised clustering | |
US11886515B2 (en) | Hierarchical clustering on graphs for taxonomy extraction and applications thereof | |
CN110990563A (en) | Artificial intelligence-based traditional culture material library construction method and system | |
CN110210038A (en) | Kernel entity determines method and its system, server and computer-readable medium | |
Nigam et al. | Towards a robust metric of polarity | |
CN111737607B (en) | Data processing method, device, electronic equipment and storage medium | |
US10055478B2 (en) | Perspective data analysis and management | |
CN106484724A (en) | Information processor and information processing method | |
Aslam et al. | Web-AM: An efficient boilerplate removal algorithm for Web articles | |
CN110909247A (en) | Text information pushing method, electronic equipment and computer storage medium | |
Voronov et al. | Forecasting popularity of news article by title analyzing with BN-LSTM network | |
US11822609B2 (en) | Prediction of future prominence attributes in data set | |
CN107590163B (en) | The methods, devices and systems of text feature selection | |
CN112529627B (en) | Method and device for extracting implicit attribute of commodity, computer equipment and storage medium | |
CN111339287B (en) | Abstract generation method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20170308 |
|
WD01 | Invention patent application deemed withdrawn after publication |