CN106484724A - Information processor and information processing method - Google Patents

Information processor and information processing method Download PDF

Info

Publication number
CN106484724A
CN106484724A CN201510547792.9A CN201510547792A CN106484724A CN 106484724 A CN106484724 A CN 106484724A CN 201510547792 A CN201510547792 A CN 201510547792A CN 106484724 A CN106484724 A CN 106484724A
Authority
CN
China
Prior art keywords
information
period
cluster
relevant
mood
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510547792.9A
Other languages
Chinese (zh)
Inventor
宋双永
孟遥
缪庆亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Priority to CN201510547792.9A priority Critical patent/CN106484724A/en
Publication of CN106484724A publication Critical patent/CN106484724A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Abstract

Present disclose provides information processor and information processing method.The information processor includes:Information acquisition unit, its obtain a plurality of information that respectively comes with time tag relevant with perpetual object from information source;Sequence generating unit, its generate the time series of described information based on the time tag;Wave-peak detection unit, its are directed to the time series and are detected, to obtain the crest period of the time series;And object event detector unit, which is detected to the information in the crest period, to obtain the event relevant with the perpetual object.The object event detector unit includes:Cluster cell, its are directed to each crest period of the time series, and the information in the crest period is clustered;And timeslot event detector unit, which is directed to each crest period, comprising the information in the most cluster of information bar number in the cluster result based on the cluster cell, detects the event relevant with the crest period.

Description

Information processor and information processing method
Technical field
The disclosure relates generally to field of information processing, in particular to information processor and letter Breath processing method.
Background technology
At present, microblogging, the micro-blog (microblog) for pushing away special (Twitter) etc. have been got over Receive publicity to get over, the network information for becoming popular obtains platform.But, in internet, data is dug Pick field, in such as micro-blog with the information source of the magnanimity information of a large number of users, how It was found that the important content relevant with perpetual object is a difficult point.Can be by general search engine etc. The information relevant with perpetual object is obtained, but these information is possibly mixed and disorderly, scattered, repetition, Thus user cannot be best understood by perpetual object at short notice by direct reading.
It is relevant with perpetual object that accurately and efficiently extraction is desirable to from the magnanimity information of information source Important content.
Content of the invention
The brief overview being given below with regard to the present invention, to provide with regard to some of the present invention The basic comprehension of aspect.It should be appreciated that this general introduction is not the exhaustive general introduction with regard to the present invention. It is not intended to determine key or the pith of the present invention, nor the model of the intended limitation present invention Enclose.Its purpose only provides some concepts in simplified form, more detailed in this, as discussed after a while The preamble of thin description.
In view of the drawbacks described above of prior art, an object of the present invention be provide one kind can obtain with The information processor of the relevant event of perpetual object and information processing method, existing at least to overcome Problem.
According to an aspect of this disclosure, a kind of information processor is provided, including:Acquisition of information list Unit, its obtain a plurality of information that respectively comes with time tag relevant with perpetual object from information source;Sequence Column-generation unit, its generate the time series of described information based on the time tag;Crest detection is single Unit, its are directed to the time series and are detected, to obtain the crest period of the time series;With And object event detector unit, which is detected to the information in the crest period, to obtain and institute State the relevant event of perpetual object.The object event detector unit includes:Cluster cell, its are directed to Each crest period of the time series, the information in the crest period is clustered;With timely Section event detection unit, which is directed to each crest period, in the cluster result based on the cluster cell Comprising the information in the most cluster of information bar number, the event relevant with the crest period is detected.
According to another aspect of the present disclosure, a kind of information processing method is provided, including step:From information Source obtains a plurality of information that respectively comes with time tag relevant with perpetual object;Marked based on the time Sign the time series for generating described information;Detected for the time series, during obtaining described The crest period of sequence sequence;And the information in the crest period is detected, to obtain and institute State the relevant event of perpetual object.Wherein, carrying out detection to the information in the crest period includes: For each crest period of the time series, the information in the crest period is clustered;With And each crest period is directed to, comprising the cluster that information bar number is most in the cluster result based on the cluster Interior information, detects the event relevant with the crest period.
According to the other side of the disclosure, additionally provide a kind of so that computer is believed as above The program of breath processing meanss.
According to the another aspect of the disclosure, corresponding computer-readable recording medium is additionally provided, the meter Being stored with calculation machine readable storage medium storing program for executing can be by the computer program of computing device, the computer Program can make computing device above- mentioned information processing method upon execution.
The above-mentioned various aspects according to the embodiment of the present disclosure, are at least obtained in that following benefit:Using from Relevant with perpetual object, the time series of information with time tag that information source is obtained, are based on Information in the crest period of time series obtains the event relevant with perpetual object such that it is able to from letter With perpetual object relevant important content, and energy are accurately and efficiently extracted in the bulk information in breath source The coverage and succinct degree of extracting content is enough taken into account.
By the detailed description below in conjunction with accompanying drawing most preferred embodiment of this disclosure, the disclosure these And other advantages will be apparent from.
Description of the drawings
The disclosure can be by reference to preferably being managed below in association with the description given by accompanying drawing Solution, wherein employs same or analogous reference to represent same or like in all of the figs Part.The accompanying drawing includes in this manual and is formed this together with detailed description below A part for specification, and be used for preferred embodiment of the present disclosure is further illustrated and explain this Principle disclosed and advantage.Wherein:
Fig. 1 is the exemplary construction for schematically showing the information processor according to the embodiment of the present disclosure Block diagram.
Fig. 2 is to schematically show the object thing according in the information processor of the embodiment of the present disclosure The block diagram of the exemplary construction of part detector unit.
Fig. 3 is another example for schematically showing the information processor according to the embodiment of the present disclosure The block diagram of structure.
Fig. 4 shows the flow process of the example flow of the information processing method according to the embodiment of the present disclosure Figure.
Fig. 5 is showed and is detected according to the object event in the information processing method of the embodiment of the present disclosure The flow chart of the example flow of step.
Fig. 6 shows another example flow of the information processing method according to the embodiment of the present disclosure Flow chart.
Fig. 7 is showed and be can be used to realize information processor and the method according to the embodiment of the present disclosure A kind of possible hardware configuration structure diagram.
Specific embodiment
The one exemplary embodiment of the present invention is described hereinafter in connection with accompanying drawing.In order to clear and All features of actual embodiment, for the sake of simple and clear, are not described in the description.However, should Solution, must make a lot of specific to embodiment during any this practical embodiments are developed Determine, in order to the objectives of developer are realized, for example, meet that related to system and business Restrictive conditions, and these restrictive conditions a bit may be changed with the difference of embodiment. Additionally, it also should be appreciated that, although development is likely to be extremely complex and time-consuming, but to benefiting For those skilled in the art of present disclosure, this development is only routine task.
Here, in addition it is also necessary to explanation be a bit, in order to avoid having obscured this because of unnecessary details Bright, illustrate only in the accompanying drawings with according to the closely related apparatus structure of the solution of the present invention and/or Process step, and eliminate the other details little with relation of the present invention.
The inventors discovered that, in existing data mining is processed, typically by search engine etc. from letter Breath source (such as micro-blog) obtains the information relevant with perpetual object (such as paying close attention to personage), but These information are possibly mixed and disorderly, scattered, repetition, thus user cannot be by direct reading in short-term In be best understood by perpetual object.In addition, in the case that perpetual object is for personage, it is also possible to logical Cross content (microblogging that for example personage issues) the acquisition correlation that personage's sheet is issued in information source Information, but this mode equally has and cannot directly obtain important information.
Based on this, the present disclosure proposes at a kind of information that can obtain the event relevant with perpetual object Reason device and information processing method, its from information source obtain relevant with perpetual object, with time mark The time series of the information of label, it is right with concern to be obtained based on the information in the crest period of time series As relevant event such that it is able to accurately and efficiently extract and concern from the magnanimity information of information source The relevant important content of object, and the coverage for extracting content and succinct degree can be taken into account.
According to an aspect of this disclosure, there is provided a kind of information processor.Fig. 1 be schematically The block diagram of the exemplary construction of information processor according to the embodiment of the present disclosure is shown.
As shown in figure 1, information processor 10 includes:Information acquisition unit 101, which is from information Source obtains a plurality of information that respectively comes with time tag relevant with perpetual object;Sequence generating unit 102, which generates the time series of described information based on the time tag;Wave-peak detection unit 103, Which is directed to the time series and is detected, to obtain the crest period of the time series;And it is right As event detection unit 104, which is detected to the information in the crest period, to obtain and institute State the relevant event of perpetual object.Above-mentioned object event detector unit 104 includes:Cluster cell 1041, Which is directed to each crest period of the time series, and the information in the crest period is clustered; And timeslot event detector unit 1042, which is directed to each crest period, based on the cluster cell Comprising the information in the most cluster of information bar number in 1041 cluster result, detect and the crest period Relevant event.
For the sake of explanation, it is described using microblogging as the example of information source below.The disclosure Content is certainly not limited to the example, but goes for including any of the information with time attribute Information source.
In the case of with microblogging as an example information source, information acquisition unit 101 obtain with concern The information that the relevant band of object has time tag can be by search etc. mode obtain with pass A relevant microblogging of note object, for example, a title comprising perpetual object includes as people The microblogging of the name of the perpetual object of thing, the pet name, user name etc..Sequence generating unit 102 is generated Time series can be the sequential of issuing time, a plurality of microblogging based on every microblogging being achieved in that Sequence.
In a preferred embodiment, wave-peak detection unit 103 can be detected using sequential crest (Burst Detection) technology obtains the crest period of time series.For example, wave-peak detection unit 103 can be using Jon Kleinberg in proposition in 2002 based on outburst level (bursty level) Sequential crest detection technique detected.The details of the technology is referred to " Bursty and Hierarchical structure in streams ", Jon Kleinberg, KDD 2002:91-101, The document is incorporated herein by reference.Of course it is possible to use various other appropriate in this area Mode obtains the crest period of time series, and here is no longer described in detail.
The crest period that wave-peak detection unit 103 is obtained by above-mentioned detection can be preset time model The bar number of the information relevant with perpetual object in enclosing meets certain condition (for example, the bar of the information Number with give ratio higher than other periods, the information bar number higher than given threshold value etc.) period.Ripple The great deal of related information in information source in the peak period embodies the high attention rate to perpetual object, thus The concern peak period to perpetual object can be considered.Still using microblogging as the example of information source, ripple The peak period can be considered concern peak period or the interest of the user for given perpetual object of microblogging Peak period.
In a preferred embodiment, wave-peak detection unit 103 can obtain the ripple in units of day The peak period.In a lot of information sources, obtaining the crest period in units of day can obtain preferably Effect.Certainly, wave-peak detection unit 103 can also obtain institute with the other times unit such as hour, week State the crest period.Those skilled in the art can select suitable chronomere according to actual needs.
After the crest period for obtaining time series, the cluster cell of object event detector unit 104 1041 are directed to each crest period, and the information in the crest period is clustered.By this poly- Class, can obtain different clusters according to the correlation between the information in each crest period.Correspondingly, Timeslot event detector unit 1042 can be directed to each crest period, based on comprising letter in cluster result Information in the most cluster of breath bar number, detects the event relevant with the crest period.So, using right A plurality of information in the attention rate of the perpetual object high crest period, correlation with each other is high, obtains The event relevant with the crest period.
The thing relevant with the crest period for being obtained by object event detector unit 104 in the above described manner Part can be considered the reason for producing the crest period in information source, thus be relevant with perpetual object Important content.Meanwhile, the event relevant with the crest period is obtained in object event detector unit 104 During, filter out information in the non-crest period and the low letter of correlation with each other in the crest period Breath, that is, filtered out the low and scattered content of importance.Therefore, using the information of the present embodiment Processing meanss 10 such that it is able to accurately and efficiently extract and concern from the bulk information of information source The relevant important content of object, and the coverage for extracting content and succinct degree can be taken into account.
Referring to the object event in information processor of Fig. 2 description according to the embodiment of the present disclosure The exemplary construction of detector unit.
Fig. 2 is to schematically show the object thing according in the information processor of the embodiment of the present disclosure The block diagram of the exemplary construction of part detector unit.
As shown in Fig. 2 in a preferred embodiment, except single with the object event detection in Fig. 1 In unit 104 outside similar cluster cell 1041 and timeslot event detector unit 1042, object thing Part detector unit 104 ' can also include:Term vector represents unit 1040, and which is by each crest period In information be expressed as term vector, to be supplied to the cluster cell 1041.
As an example, term vector represents that unit 1040 can be to giving every information in the crest period Carry out participle, each information be expressed as term vector, wherein, term vector space be by from information Whole words composition in the text message set that source obtains.
In a preferred embodiment, cluster cell 1041 can utilize the automatic cluster based on threshold value Method is clustered to the information in each crest period.In information source, with concern in different periods The relevant information of object may be with different grain size, i.e. a plurality of information of such as microblogging has different Content degree of scatter.For example, relevant with perpetual object micro- within the period that information source is obtained Win and the multiple events relevant with perpetual object are may relate to, the theme of each bar content of microblog more will divide Dissipate;And the microblogging relevant with perpetual object in another period is entirely around relevant with perpetual object One event, now the description theme of each bar content of microblog just relatively concentrate.Above-mentioned two situations Content of microblog granularity difference very big, the general clustering method towards identical cluster granularity cannot be obtained Good treatment effect.
The automatic clustering method based on threshold value provided by this preferred embodiment can be according to different periods Actual conditions, automatically set for the cluster threshold value for giving the period, to be adapted to the letter of different periods Cease the granularity of content and obtain good cluster result.
In a specific example, it is assumed that have x bar in the given crest period relevant with perpetual object Information, obtain a term vector from every information, then x term vector is partnered two-by-two, altogetherRight.Cluster cell 1041 can calculate the Euclidean distance between each pair term vector, to obtain Average Euclidean distance, then it is multiplied by predetermined weighting parameter, you can obtain being adaptive to this x term vector Cluster threshold value.
For example, can be expressed as follows based on the formula of cluster threshold value δ of Euclidean distance.
Wherein, w is weighting parameter, and more than 0 and less than or equal to 1, preferably value is 0.9.X is Term vector number.S(mi) and S (mj) represent i-th and j-th strip content of microblog m respectivelyiAnd mjCorresponding Term vector, Ed (S (mi),S(mj)) represent term vector S (mi) and S (mj) between Euclidean distance.
After determining self-adaption cluster threshold value δ, cluster cell 1041 can adopt base with the following method Term vector is clustered in the self-adaption cluster threshold value.
(1) first, cluster cell 1041 can be random from the x term vector of given crest period A term vector is selected, as a new cluster;
(2) then, a term vector not clustered is randomly choosed, calculates which respectively with each The mean value of the Euclidean distance having between the whole term vectors in cluster;
(3) if Euclidean distance between the whole term vectors in the existing cluster of the term vector and Mean value is less than self-adaption cluster threshold value, then the term vector is added in the existing cluster;
(4) if for all existing clusters, between the whole term vectors in the term vector and the cluster The mean value of Euclidean distance both greater than or is equal to self-adaption cluster threshold value, then using the term vector as one New cluster.
Repeat the above steps (2)-(4) are all clustered up to all term vectors, then cluster cell 1041 Final cluster result is obtained.
In addition, as shown in Fig. 2 in a preferred embodiment of object event detector unit 104 ', Timeslot event detector unit 1042 can include:Keyword extracting unit 1042-1, its are directed to each The crest period, close comprising extraction in the middle of the information in the most cluster of information bar number from the cluster result Keyword, used as the event relevant with the crest period.
Keyword extracting unit 1042-1 can be comprising most one of information bar number from cluster result Or in the middle of the information in multiple clusters, extract keyword.For purposes of illustration only, below can be by cluster result The cluster for being extracted keyword is referred to as main cause event.In a preferred exemplary, can be from each ripple One to two main cause events are selected in the cluster of peak period.For example, comprising information bar number most one Individual cluster E1(the information bar number which includes is N1) it is chosen as automatically main cause event.For including The deputy cluster E of information bar number sequence2(the information bar number which includes is N2), according to N2/N1Whether Determine whether E more than given threshold value2It is classified as main cause event.It is preferred that the threshold value is set to 0.6, If N2/N1>=0.6, then E2Will be with E1While being classified as main cause event, otherwise, only will E1Regard as main cause event.It is appreciated that the mode of above-mentioned selection main cause event is only used In illustration, keyword extracting unit 1042-1 can using any other suitably mode select one Individual or multiple clusters are used as main cause event.
Selected main cause event for the given crest period (that is, will extract keyword Cluster), keyword extracting unit 1042-1 can extract keyword by the example below method.
First, keyword extracting unit 1042-1 can carry out participle to selecting every information in cluster, And part-of-speech tagging is carried out to the text after participle.Inventor has found, with the nominal of unitary and binary Word string is used as keyword, it is possible to obtain better effects.Therefore, keyword extracting unit 1042-1 from point Extracting unitary noun and the binary word string comprising noun in word result carries out sorting-out in statistics, according to word frequency Be ranked up, as the word string list of candidate keywords, and each is calculated according to below equation (2) The importance degree T of unitary noun or binary word stringvalue
Tvalue=Tfrequency*Tlength(2)
Wherein, TlengthRefer to the number of words that the length of word string, i.e. word string include, TfrequencyRefer to the word string The number of times of appearance, and importance degree TvalueDetermined by above-mentioned two factor.
Afterwards, Substring reduction is carried out to unitary word string using binary word string.Rule during Substring reduction is such as Under:If a binary word string includes another unitary word string, and the importance degree of the binary word string TvalueImportance degree T more than the unitary word stringvalue, then the unitary word string will be merged, otherwise, will The binary word string is removed from word string list.Processed by this merger, most suitable word can be retained The candidate gone here and there as keyword.
In the middle of all word strings obtained after above-mentioned merger is processed, keyword extracting unit 1042-1 Can be by importance degree TvalueThe word string for coming front K position is selected as the keyword of the main cause event. Preferably, K can be set as 5.
In one example, for each crest period, keyword extracting unit 1042-1 can be from Comprising information bar number most more than one clusters (that is, more than one main cause in cluster result Event) in information in the middle of extract keyword, and by the multiple keywords for being extracted collectively as with this Crest period relevant event.
In addition, as shown in Fig. 2 in a preferred embodiment, object event detector unit 104 ' Can also include:Timeslot event synthesis unit 1043, its are examined to timeslot event detector unit 1042 Survey the event relevant with each crest period of the time series synthesized, as with the pass The relevant event of note object.
Using timeslot event synthesis unit 1043, the object event detector unit 104 ' of this preferred embodiment The event relevant with each crest period of perpetual object can not only be obtained, additionally it is possible to whole from information source The sequence of events relevant with perpetual object is obtained with the time on body.Due to the so overall event sequence for obtaining Row be based on each crest period relevant with perpetual object, thus its can embody in information source to close The timing variations of the attention rate of note object, and provide the event conduct corresponding with each crest period The reason in the period to the high attention rate of perpetual object.
In a specific example, timeslot event detector unit 1042 detected with each crest when The event of Duan Youguan can be the keyword extracted from each crest period, timeslot event synthesis unit 1043 enumerate the keyword of each crest period together collectively as the thing relevant with perpetual object Part.
The exemplary construction of object event detector unit 104 ' is described above by reference to Fig. 2.It should be noted that to the greatest extent Pipe also show in one drawing term vector represent unit 1040, keyword extracting unit 1042-1, Timeslot event synthesis unit 1043, but this diagram is only illustrative;These units both can be In a preferred embodiment while realizing, it is also possible to do not rely on each other respectively different preferred Realize in embodiment.
Tied according to another example of the information processor of the embodiment of the present disclosure referring to Fig. 3 description Structure.
Fig. 3 is another example for schematically showing the information processor according to the embodiment of the present disclosure The block diagram of structure.
As shown in figure 3, in a preferred embodiment, except 10 institute of information processor of Fig. 1 Including information acquisition unit 101, sequence generating unit 102, wave-peak detection unit 103, object Outside event detection unit 104 (or object event detector unit 104 ' of Fig. 3), information processing apparatus Putting 10 ' can also include:Subjects' mood analytic unit 105, which is to the information in the crest period Mood analysis is carried out, to obtain the mood relevant with the perpetual object.
Subjects' mood analytic unit 105 can be obtained using various appropriate methods to be had with perpetual object The mood of pass.For example, subjects' mood analytic unit 105 can using the mood dictionary that is obtained ahead of time or The mood analysis model of training in advance carries out mood analysis to all or part of information in the crest period, To obtain the mood relevant with perpetual object.
In a preferred embodiment, subjects' mood analytic unit 105 can include:Period mood is divided Analysis unit 1051, which is directed to each crest period, includes in the cluster result to cluster cell 1041 Information in the most cluster of information bar number carries out mood analysis, to obtain the feelings relevant with the crest period Thread.This mood relevant with the crest period obtained by period mood analytic unit 1051 can be right Mood for the event relevant with the crest period that should be reflected in the information in information source.Here, The event relevant with the crest period can be object event detection before this with reference to described by Fig. 1 and Fig. 3 Unit 104 or 104 ' in timeslot event detector unit 1042 detected relevant with the crest period Event.
Subjects' mood analytic unit 105 can be using various appropriate methods to including in cluster result Information in the most cluster of information bar number carries out mood analysis.For example, period mood analytic unit 1051 Such feelings can be carried out using the mood analysis model of the mood dictionary being obtained ahead of time or training in advance Thread is analyzed.
In case of mood dictionary, an exemplary mood dictionary can be comprising " happiness ", " compassion 36 kinds of moods and the conventional emotion expression service word corresponding with every kind of mood such as wound ", " sympathy " and " admiration " Language.By taking " happiness " mood as an example, corresponding conventional emotion expression service word includes " pleasantly surprised ", " happiness " " jubilant " etc..
Using mood dictionary, for comprising the cluster that information bar number is most in cluster result, period mood is divided Analysis unit 1051 can search the appearance feelings of the mood word in mood dictionary in the full detail of the cluster Condition.For example, for given cluster, if a packet in the cluster is containing corresponding in certain mood classification Mood word, then the statistic frequency of the mood classification is added one, thus to different mood classifications distinguish Carry out word frequency statisticses.It is then possible to carry out the sequence of mood classification according to frequency size, to obtain Mood analysis result for the event of the cluster.
With microblogging information source as an example, the mood analysis obtained by period mood analytic unit 1051 As a result can be give the crest period in each user shown for be related to perpetual object and The mood distribution of the crest period relevant event.The result can be used in help to be carried out and perpetual object Relevant public relation maintenance, for example, carry out personage's public praise foundation and the crisis relevant with accident The application such as public relations.
In a preferred embodiment, subjects' mood analytic unit 105 can also include:Period mood Synthesis unit 1052, its to obtained by period mood analytic unit 1051 with the seasonal effect in time series Each crest period relevant mood is synthesized, to obtain the mood relevant with the perpetual object.
Using period mood synthesis unit 1052, can obtain from the information of information source to given right The overall mood distribution of elephant.Still with microblogging information source as an example, using period mood synthesis unit The mood relevant with perpetual object obtained by 1052 can be each user shown for pass The overall mood distribution series with the time of note object.Due to such overall mood distribution be based on Each relevant crest period of perpetual object, thus which can embody the feelings in information source to perpetual object The timing variations of thread, are particularly conducive to carry out the long-term public relation maintenance relevant with perpetual object, example Personage public praise foundation is such as carried out.
According to another aspect of the present disclosure, there is provided a kind of information processing method.Fig. 4 shows root Flow chart according to the example flow of the information processing method of the embodiment of the present disclosure.
Fig. 4 shows the example flow of the information processing method 400 according to the embodiment of the present disclosure.As schemed Shown in 4, information processing method 400 comprises the steps:Information acquiring step S401, from information source Obtain a plurality of information that respectively comes with time tag relevant with perpetual object;Sequence generation step S402, generates the time series of described information based on the time tag;Crest detecting step S403, Detected for the time series, to obtain the crest period of the time series;And object Event detection step S404, detects to the information in the crest period, with obtain with described The relevant event of perpetual object.Object event detecting step S404 can include:Sorting procedure S4041, For each crest period of the time series, the information in the crest period is clustered;With And period event detection step S4042, for each crest period, the cluster based on the cluster is tied Comprising the information in the most cluster of information bar number in fruit, the event relevant with the crest period is detected.
Image processing method 400 and its each step S401-S404 can be included above by reference to Fig. 1 extremely The image processing apparatus 10 of Fig. 2 description and its corresponding units 101-104, the various places carried out in 104 ' Reason, and the effect similar with the corresponding units described referring to figs. 1 to Fig. 2 can be obtained, at these The detail of reason and effect will not be described here.
Additionally, Fig. 5 is shown according to the object event in the information processing method of the embodiment of the present disclosure Detecting step example flow.
As shown in figure 5, with Fig. 4 in object event detecting step S404 in similar cluster Before step S4041 and timeslot event detecting step S4042, in a preferred embodiment, object Event detection step S404 ' can also include:Term vector represents step S4040, during by each crest Information in section is expressed as term vector, for clustering to the information in the crest period.
In a preferred embodiment, in sorting procedure S4041, using based on the automatically poly- of threshold value Class method carries out the cluster.
In a preferred embodiment, timeslot event detecting step S4042 can include:Keyword is carried Step S4042-1 is taken, for each crest period, from the cluster result comprising information bar number most Keyword is extracted in the middle of information in many clusters, as the event relevant with the crest period.
In a preferred embodiment, object event detecting step S404 ' can also include:Period thing Part synthesis step S4043, to the thing relevant with each crest period of the time series for being detected Part is synthesized, used as the event relevant with the perpetual object.
The example flow of object event detecting step S404 ' is described above by reference to Fig. 5.It should be noted that Although also show term vector in one drawing to represent step S4040, keyword extraction step S4042-1, timeslot event synthesis step S4043, but this diagram is only illustrative;These steps Suddenly both can be in a preferred embodiment while having realized, it is also possible to do not rely on and exist respectively each other Realize in different preferred embodiments.
Additionally, object event detecting step S404 ' and including each step can include more than With reference to Fig. 2 description object event detector unit 104 ' and its each component units in carry out various Process, and can obtain and the similar effect of corresponding units with reference to Fig. 2 description, these process with The detail of effect will not be described here.
Fig. 6 shows another example flow of the information processing method according to the embodiment of the present disclosure Flow chart.
As shown in fig. 6, in a preferred embodiment, except the information processing method 400 of Fig. 4 Included information acquiring step S401, sequence generation step S402, crest detecting step S403, Outside object event detecting step S404 (or object event detecting step S404 ' of Fig. 5), information Processing method 400 ' can also include:Subjects' mood analytical procedure S405, in the crest period Information carry out mood analysis, to obtain the mood relevant with the perpetual object.
Information processing method 400 ' and its each step can include the information above by reference to Fig. 3 description The various process carried out in processing meanss 10 ' and its corresponding units, and can obtain and reference Fig. 3 The similar effect of the corresponding units of description, these are processed and the detail of effect will not be described here.
In a preferred embodiment, subjects' mood analytical procedure S405 can include:Period mood Analytical procedure S4051, for each crest period, to including information in the cluster result of the cluster Information in the most cluster of bar number carries out mood analysis, to obtain the mood relevant with the crest period.
In a preferred embodiment, period mood analytical procedure S4051 is using the mood being obtained ahead of time The mood analysis model of dictionary or training in advance carries out the mood analysis.
In a preferred embodiment, subjects' mood analytical procedure S405 can also include:Period feelings Thread synthesis step S4052, to the obtained feelings relevant with each crest period of the seasonal effect in time series Thread is synthesized, to obtain the mood relevant with the perpetual object.
Above-mentioned information processor (the letter shown in such as Fig. 1, Fig. 3 according to the embodiment of the present disclosure Breath processing meanss 10,10 ') and each component units therein etc. can by software, firmware, The mode of hardware or its any combination is configured.In the case of being realized by software or firmware, can Install from storage medium or network to the machine with specialized hardware structure and constitute the software or firmware Program, the machine are able to carry out the various functions of above-mentioned each component devices when various programs are provided with.
Fig. 7 is showed and be can be used to realize information processor and the method according to the embodiment of the present disclosure A kind of possible hardware configuration structure diagram.
In the figure 7, CPU (CPU) 701 is according to depositing in read-only storage (ROM) 702 The program of storage is loaded into the program performing of random access memory (RAM) 703 from storage part 708 Various process.In RAM703, various process etc. are executed always according to needing to store as CPU 701 Deng when required data.CPU701, ROM 702 and RAM 703 connect each other via bus 704 Connect.Input/output interface 705 is also connected to bus 704.
Components described below is also connected to input/output interface 705:Importation 706 is (including keyboard, mouse Mark etc.), output par, c 707 is (including display, such as cathode-ray tube (CRT), liquid crystal display Device (LCD) etc., and loudspeaker etc.), storage part 708 (including hard disk etc.), communications portion 709 (including NIC such as LAN card, modem etc.).Communications portion 709 is via network For example internet executes communication process.As needed, driver 710 can be connected to input/output Interface 705.Detachable media 711 such as disk, CD, magneto-optic disk, semiconductor memory etc. It can be installed on driver 710 as needed so that the computer program for reading out can basis Needs are installed in storage part 708.
Additionally, the program that the disclosure also proposed a kind of instruction code of the machine-readable that is stored with is produced Product.When above-mentioned instruction code is read and executed by machine, can perform above-mentioned according to the embodiment of the present disclosure Image processing method.Correspondingly, for carrying the such as disk of this program product, CD, magneto-optic The various storage mediums of disk, semiconductor memory etc. are also included within the disclosure of the disclosure.
In description above to disclosure specific embodiment, for a kind of embodiment describe and/or The feature for illustrating can be made in one or more other embodiments in same or similar mode With, combined with the feature in other embodiment, or substitute the feature in other embodiment.
Additionally, the method for the presently disclosed embodiments be not limited to specifications described in or accompanying drawing In the time sequencing that illustrates executing, it is also possible to according to other time sequencings, concurrently or independently Execute.Therefore, the execution sequence of the method described in this specification technical scope structure not of this disclosure Become to limit.
It should be further understood that can also be to be stored according to each operating process of the said method of the disclosure The mode of the computer executable program in various machine-readable storage mediums is realized.
And, the purpose of the disclosure can also be accomplished by:To be stored with above-mentioned executable The storage medium of program code is directly or indirectly supplied to system or equipment, and the system or sets Computer or CPU (CPU) in standby reads and executes said procedure code.
Now, as long as the system or equipment have the function of configuration processor, then the embodiment party of the disclosure Formula is not limited to program, and the program can also be arbitrary form, for example, target program, solution Release the program of device execution or be supplied to shell script of operating system etc..
These machinable mediums above-mentioned are included but is not limited to:Various memories and memory cell, Semiconductor equipment, disk cell such as light, magnetic and magneto-optic disk, and other are suitable to Jie of storage information Matter etc..
In addition, customer information processing terminal is by the corresponding website that is connected on internet, and will be according to Download according to the computer program code of the disclosure and be installed in the information processing terminal and then execute the journey Sequence, it is also possible to realize the presently disclosed embodiments.
To sum up, according in the embodiment of the present disclosure, present disclose provides following scheme, but not limited to this:
Scheme 1, a kind of information processor, including:
Information acquisition unit, its are obtained from information source and a plurality of relevant with perpetual object respectively come with the time The information of label;
Sequence generating unit, its generate the time series of described information based on the time tag;
Wave-peak detection unit, its are directed to the time series and are detected, to obtain the time series The crest period;And
Object event detector unit, its detected to the information in the crest period, with obtain with The relevant event of the perpetual object, the object event detector unit include:
Cluster cell, its are directed to each crest period of the time series, to the crest period In information clustered;And
Timeslot event detector unit, its are directed to each crest period, based on the cluster cell Comprising the information in the most cluster of information bar number in cluster result, detect relevant with the crest period Event.
Scheme 2, the information processor as described in scheme 1, wherein, the cluster cell utilizes base The cluster is carried out in the automatic clustering method of threshold value.
Scheme 3, the information processor as described in scheme 1, wherein, the timeslot event detection is single Unit includes:
Keyword extracting unit, its are directed to each crest period, include information from the cluster result Keyword is extracted in the middle of information in the most cluster of bar number, as the event relevant with the crest period.
Scheme 4, the information processor as described in scheme 1, wherein, the object event detection is single Unit also includes:
Timeslot event synthesis unit, its are being detected to the timeslot event detector unit with the sequential The event that each crest period of sequence is relevant is synthesized, used as the thing relevant with the perpetual object Part.
Scheme 5, the information processor as described in scheme 1, wherein, the object event detection is single Unit also includes:
Term vector represents unit, and the information in each crest period is expressed as term vector by which, to provide To the cluster cell.
Scheme 6, the information processor as described in scheme 1, also include:
Subjects' mood analytic unit, its carry out mood analysis to the information in the crest period, with Arrive the mood relevant with the perpetual object.
Scheme 7, the information processor as described in scheme 6, wherein, the subjects' mood analysis is single Unit includes:
Period mood analytic unit, its are directed to each crest period, and the cluster to the cluster cell is tied Mood analysis is carried out comprising the information in the most cluster of information bar number in fruit, to obtain and the crest period Relevant mood.
Scheme 8, the information processor as described in scheme 7, wherein, the period mood analysis is single Unit carries out the mood using the mood analysis model of the mood dictionary being obtained ahead of time or training in advance and divides Analysis.
Scheme 9, the information processor as described in scheme 7, wherein, the subjects' mood analysis is single Unit also includes:
Period mood synthesis unit, its to obtained by the period mood analytic unit with the time The mood that each crest period of sequence is relevant is synthesized, relevant with the perpetual object to obtain Mood.
Scheme 10, the information processor as described in scheme 1, wherein, the wave-peak detection unit The crest period is obtained using sequential crest detection technique.
Scheme 11, the information processor as described in scheme 1, wherein, the wave-peak detection unit The crest period is obtained in units of day.
Scheme 12, a kind of information processing method, including:
The a plurality of information that respectively comes with time tag relevant with perpetual object is obtained from information source;
The time series of described information is generated based on the time tag;
Detected for the time series, to obtain the crest period of the time series;And
Information in the crest period is detected, to obtain the thing relevant with the perpetual object Part, wherein, carrying out detection to the information in the crest period includes:
For each crest period of the time series, the information in the crest period is carried out Cluster;And
For each crest period, in the cluster result based on the cluster comprising information bar number most
Information in many clusters, detects the event relevant with the crest period.
Scheme 13, the information processing method as described in scheme 12, wherein, using based on threshold value from Dynamic clustering method carries out the cluster.
Scheme 14, the information processing method as described in scheme 12, wherein, when detection is with each crest The event of Duan Youguan includes:
For each crest period, comprising the letter in the most cluster of information bar number from the cluster result Keyword is extracted in the middle of breath, as the event relevant with the crest period.
Scheme 15, the information processing method as described in scheme 12, wherein, in the crest period Information detected also and included:
The event relevant with each crest period of the time series to being detected synthesizes, and makees It is the event relevant with the perpetual object.
Scheme 16, the information processing method as described in scheme 12, wherein, in the crest period Information detected also and included:
Information in each crest period is expressed as term vector, for the letter in the crest period Breath is clustered.
Scheme 17, the information processing method as described in scheme 12, also include:
Mood analysis is carried out to the information in the crest period, relevant with the perpetual object to obtain Mood.
Scheme 18, the information processing method as described in scheme 17, wherein, in the crest period Information carry out mood analysis include:
For each crest period, to comprising the most cluster of information bar number in the cluster result of the cluster Interior information carries out mood analysis, to obtain the mood relevant with the crest period.
Scheme 19, the information processing method as described in scheme 18, wherein, using the feelings being obtained ahead of time The mood analysis model of thread dictionary or training in advance carries out the mood analysis.
Scheme 20, a kind of computer-readable recording medium, store on the computer-readable recording medium Having can be by the computer program of computing device, and the computer program can make calculating upon execution Equipment executes a kind of information processing method, and described information processing method includes:
The a plurality of information that respectively comes with time tag relevant with perpetual object is obtained from information source;
The time series of described information is generated based on the time tag;
Detected for the time series, to obtain the crest period of the time series;And
Information in the crest period is detected, to obtain the thing relevant with the perpetual object Part, wherein, carrying out detection to the information in the crest period includes:
For each crest period of the time series, the information in the crest period is carried out Cluster;And
For each crest period, in the cluster result based on the cluster comprising information bar number most Information in many clusters, detects the event relevant with the crest period.
Finally, in addition it is also necessary to explanation, in the disclosure, such as first and second or the like relation Term is used merely to make a distinction an entity or operation with another entity or operation, and differs Provisioning request implies there is any this actual relation or order between these entities or operation. And, term " including ", "comprising" or its any other variant are intended to including for nonexcludability, So that a series of process including key elements, method, article or equipment not only include those key elements, But also other key elements including being not expressly set out, or also include for this process, method, The intrinsic key element of article or equipment.In the absence of more restrictions, by sentence " including one Individual ... " key element that limits, it is not excluded that include the process of the key element, method, article or Also there is other identical element in equipment.
Although being had been disclosed to the disclosure by the description of the specific embodiment of the disclosure above, However, it is to be understood that those skilled in the art can design in the spirit and scope of the appended claims Various modifications of this disclosure, improvement or equivalent.These modifications, improvement or equivalent also should When being to be considered as included in the claimed scope of the disclosure.

Claims (10)

1. a kind of information processor, including:
Information acquisition unit, its are obtained from information source and a plurality of relevant with perpetual object respectively come with the time The information of label;
Sequence generating unit, its generate the time series of described information based on the time tag;
Wave-peak detection unit, its are directed to the time series and are detected, to obtain the time series The crest period;And
Object event detector unit, its detected to the information in the crest period, with obtain with The relevant event of the perpetual object, the object event detector unit include:
Cluster cell, its are directed to each crest period of the time series, to the crest period In information clustered;And
Timeslot event detector unit, its are directed to each crest period, based on the cluster cell Comprising the information in the most cluster of information bar number in cluster result, detect relevant with the crest period Event.
2. information processor as claimed in claim 1, wherein, the cluster cell utilizes base The cluster is carried out in the automatic clustering method of threshold value.
3. information processor as claimed in claim 1, wherein, the timeslot event detection is single Unit includes:
Keyword extracting unit, its are directed to each crest period, include information from the cluster result Keyword is extracted in the middle of information in the most cluster of bar number, as the event relevant with the crest period.
4. information processor as claimed in claim 1, wherein, the object event detection is single Unit also includes:
Timeslot event synthesis unit, its are being detected to the timeslot event detector unit with the sequential The event that each crest period of sequence is relevant is synthesized, used as the thing relevant with the perpetual object Part.
5. information processor as claimed in claim 1, wherein, the object event detection is single Unit also includes:
Term vector represents unit, and the information in each crest period is expressed as term vector by which, to provide To the cluster cell.
6. information processor as claimed in claim 1, also includes:
Subjects' mood analytic unit, its carry out mood analysis to the information in the crest period, with Arrive the mood relevant with the perpetual object.
7. information processor as claimed in claim 6, wherein, the subjects' mood analysis is single Unit includes:
Period mood analytic unit, its are directed to each crest period, and the cluster to the cluster cell is tied Mood analysis is carried out comprising the information in the most cluster of information bar number in fruit, to obtain and the crest period Relevant mood.
8. information processor as claimed in claim 7, wherein, the period mood analysis is single Unit carries out the mood using the mood analysis model of the mood dictionary being obtained ahead of time or training in advance and divides Analysis.
9. information processor as claimed in claim 7, wherein, the subjects' mood analysis is single Unit also includes:
Period mood synthesis unit, its to obtained by the period mood analytic unit with the time The mood that each crest period of sequence is relevant is synthesized, relevant with the perpetual object to obtain Mood.
10. a kind of information processing method, including:
The a plurality of information that respectively comes with time tag relevant with perpetual object is obtained from information source;
The time series of described information is generated based on the time tag;
Detected for the time series, to obtain the crest period of the time series;And
Information in the crest period is detected, to obtain the thing relevant with the perpetual object Part, wherein, carrying out detection to the information in the crest period includes:
For each crest period of the time series, the information in the crest period is carried out Cluster;And
For each crest period, most comprising information bar number in the cluster result based on the cluster Information in cluster, detects the event relevant with the crest period.
CN201510547792.9A 2015-08-31 2015-08-31 Information processor and information processing method Pending CN106484724A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510547792.9A CN106484724A (en) 2015-08-31 2015-08-31 Information processor and information processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510547792.9A CN106484724A (en) 2015-08-31 2015-08-31 Information processor and information processing method

Publications (1)

Publication Number Publication Date
CN106484724A true CN106484724A (en) 2017-03-08

Family

ID=58236191

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510547792.9A Pending CN106484724A (en) 2015-08-31 2015-08-31 Information processor and information processing method

Country Status (1)

Country Link
CN (1) CN106484724A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107133224A (en) * 2017-04-25 2017-09-05 中国人民大学 A kind of language generation method based on descriptor
CN107402742A (en) * 2017-08-04 2017-11-28 北京京东尚科信息技术有限公司 Information-pushing method and device

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1904893A (en) * 2005-07-26 2007-01-31 兄弟工业株式会社 Information management system, information processing device, and program
CN102646114A (en) * 2012-02-17 2012-08-22 清华大学 News topic timeline abstract generating method based on breakthrough point
CN103514167A (en) * 2012-06-15 2014-01-15 富士通株式会社 Data processing method and device
CN103559176A (en) * 2012-10-29 2014-02-05 中国人民解放军国防科学技术大学 Microblog emotional evolution analysis method and system
CN103870474A (en) * 2012-12-11 2014-06-18 北京百度网讯科技有限公司 News topic organizing method and device
CN103955505A (en) * 2014-04-24 2014-07-30 中国科学院信息工程研究所 Micro-blog-based real-time event monitoring method and system
CN104199974A (en) * 2013-09-22 2014-12-10 中科嘉速(北京)并行软件有限公司 Microblog-oriented dynamic topic detection and evolution tracking method
CN104536956A (en) * 2014-07-23 2015-04-22 中国科学院计算技术研究所 A Microblog platform based event visualization method and system

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1904893A (en) * 2005-07-26 2007-01-31 兄弟工业株式会社 Information management system, information processing device, and program
CN102646114A (en) * 2012-02-17 2012-08-22 清华大学 News topic timeline abstract generating method based on breakthrough point
CN103514167A (en) * 2012-06-15 2014-01-15 富士通株式会社 Data processing method and device
CN103559176A (en) * 2012-10-29 2014-02-05 中国人民解放军国防科学技术大学 Microblog emotional evolution analysis method and system
CN103870474A (en) * 2012-12-11 2014-06-18 北京百度网讯科技有限公司 News topic organizing method and device
CN104199974A (en) * 2013-09-22 2014-12-10 中科嘉速(北京)并行软件有限公司 Microblog-oriented dynamic topic detection and evolution tracking method
CN103955505A (en) * 2014-04-24 2014-07-30 中国科学院信息工程研究所 Micro-blog-based real-time event monitoring method and system
CN104536956A (en) * 2014-07-23 2015-04-22 中国科学院计算技术研究所 A Microblog platform based event visualization method and system

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107133224A (en) * 2017-04-25 2017-09-05 中国人民大学 A kind of language generation method based on descriptor
CN107133224B (en) * 2017-04-25 2020-11-03 中国人民大学 Language generation method based on subject word
CN107402742A (en) * 2017-08-04 2017-11-28 北京京东尚科信息技术有限公司 Information-pushing method and device

Similar Documents

Publication Publication Date Title
CN104778158B (en) A kind of document representation method and device
US8868609B2 (en) Tagging method and apparatus based on structured data set
US20130159277A1 (en) Target based indexing of micro-blog content
US20170075983A1 (en) Subject-matter analysis of tabular data
CN104978332B (en) User-generated content label data generation method, device and correlation technique and device
US20160085855A1 (en) Perspective data analysis and management
US11361030B2 (en) Positive/negative facet identification in similar documents to search context
WO2008098956A1 (en) Method and apparatus for automatically discovering features in free form heterogeneous data
Eder et al. An open stylometric system based on multilevel text analysis
CN110334268B (en) Block chain project hot word generation method and device
CN110287314A (en) Long text credibility evaluation method and system based on Unsupervised clustering
US11886515B2 (en) Hierarchical clustering on graphs for taxonomy extraction and applications thereof
CN110990563A (en) Artificial intelligence-based traditional culture material library construction method and system
CN110210038A (en) Kernel entity determines method and its system, server and computer-readable medium
Nigam et al. Towards a robust metric of polarity
CN111737607B (en) Data processing method, device, electronic equipment and storage medium
US10055478B2 (en) Perspective data analysis and management
CN106484724A (en) Information processor and information processing method
Aslam et al. Web-AM: An efficient boilerplate removal algorithm for Web articles
CN110909247A (en) Text information pushing method, electronic equipment and computer storage medium
Voronov et al. Forecasting popularity of news article by title analyzing with BN-LSTM network
US11822609B2 (en) Prediction of future prominence attributes in data set
CN107590163B (en) The methods, devices and systems of text feature selection
CN112529627B (en) Method and device for extracting implicit attribute of commodity, computer equipment and storage medium
CN111339287B (en) Abstract generation method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20170308

WD01 Invention patent application deemed withdrawn after publication