CN106844466A

CN106844466A - Event train of thought generation method and device

Info

Publication number: CN106844466A
Application number: CN201611193377.9A
Authority: CN
Inventors: 莫洋; 沈剑平; 黄强; 郑景耀; 骆金昌
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2016-12-21
Filing date: 2016-12-21
Publication date: 2017-06-13

Abstract

The invention discloses event train of thought generation method and device, wherein method includes：For pending event, the resource in each time window is obtained respectively；For each time window, the prominence score of each resource in the time window is determined respectively, and select the resource that prominence score meets pre-provisioning request from each resource in the time window, the resource that will be selected is used as the representative resource in the time window；Representative resource in each time window is combined sequentially in time, event train of thought is obtained.Using scheme of the present invention, it is possible to increase the information acquisition efficiency of user.

Description

Event train of thought generation method and device

【Technical field】

The present invention relates to network technology, more particularly to event train of thought generation method and device.

【Background technology】

Currently, user is using search engine etc. when being scanned for, when such as being scanned for a certain event, can only by with this Related each resource such as the News Resources of event, are ranked up etc. after processing according to predetermined way, show user.

And user is if it is intended to the main process of understanding event progress, then need to search corresponding resource respectively and looked into See, implement it is extremely difficult, so as to reduce the information acquisition efficiency of user.

【The content of the invention】

In view of this, the invention provides event train of thought generation method and device, it is possible to increase the acquisition of information effect of user Rate.

Concrete technical scheme is as follows：

A kind of event train of thought generation method, including：

For pending event, the resource in each time window is obtained respectively；

For each time window, the prominence score of each resource in the time window is determined respectively, and from institute State and select the resource that prominence score meets pre-provisioning request in each resource in time window, the resource that will be selected as it is described when Between representative resource in window；

Representative resource in each time window is combined sequentially in time, event train of thought is obtained.

A kind of event train of thought generating means, including：Processing unit；

The processing unit, for for pending event, the resource in each time window being obtained respectively；For each Time window, determines the prominence score of each resource in the time window respectively, and each from the time window The resource that prominence score meets pre-provisioning request is selected in resource, the resource that will be selected is used as the representativeness in the time window Resource；Representative resource in each time window is combined sequentially in time, event train of thought is obtained.

Be can be seen that using scheme of the present invention based on above-mentioned introduction, for pending event, can respectively obtain each Resource in time window, and for each time window, therefrom selecting can most reflect the representativeness of event progress respectively Resource, and then event train of thought is obtained using the representative combination of resources in selected each time window, so, when user uses When being scanned for such as search engine, event train of thought directly can be showed into user, asked present in prior art so as to be overcome Topic, and then improve the information acquisition efficiency of user.

【Brief description of the drawings】

Fig. 1 is the flow chart of event train of thought generation method embodiment of the present invention.

Fig. 2 is the resource schematic diagram in the time window for getting of the present invention.

Fig. 3 is the schematic diagram of generation event train of thought of the present invention.

Fig. 4 is " star A divorces " the corresponding event train of thought schematic diagram of event of the present invention.

Fig. 5 is the composition structural representation of event train of thought generating means embodiment of the present invention.

【Specific embodiment】

For problems of the prior art, a kind of event train of thought generation scheme is proposed in the present invention, can be effectively Being filtered out from substantial amounts of resource can most reflect the representative resource of event progress, and automatically generate the displaying of event train of thought To user.

In order that technical scheme is clearer, clear, develop simultaneously embodiment referring to the drawings, to institute of the present invention The scheme of stating is described in further detail.

Embodiment one

Fig. 1 is the flow chart of event train of thought generation method embodiment of the present invention, as shown in figure 1, including in detail below Implementation：

In 11, for pending event, the resource in each time window is obtained respectively；

In 12, for each time window, the prominence score of each resource in the time window is determined respectively, and The resource that prominence score meets pre-provisioning request is selected from each resource in the time window, when the resource that will be selected is as this Between representative resource in window；

In 13, the representative resource in each time window is combined sequentially in time, obtains event train of thought.

The resource can be News Resources etc..

To realize such scheme, it is necessary to obtain training sample in advance, and assessment models are obtained according to training sample training, this Sample, for pending event, can be in units of time window, for the resource got in each time window, difference root The prominence score of each resource is determined according to assessment models, and prominence score is selected from each resource in each time window Meet the resource of pre-provisioning request, the resource that will be selected as the representative resource in the time window, and then by each time window Interior representative resource is combined sequentially in time, obtains event train of thought.

Each part mentioned above content is described in detail individually below.

One) training sample

In order to obtain follow-up assessment models, it is necessary to obtain training sample first.

In scheme of the present invention, can be using the method based on pairing (pairwise), from some moneys for having a time sequencing In source, the several resources that can most reflect event development are selected, so as to the resource being selected and the money not being selected can be got Good and bad relation between source, and then generate training sample.

Such as, the resource in the corresponding any time window of any one event can be shown, is obtained from the money for being shown The high-quality resource selected in source, constitutes a resource pair by each high-quality resource with each the non-prime resource for being shown respectively, Each resource is generated respectively to corresponding training sample.

By taking " star A divorces " event as an example, whole event is being continued to develop over time, can respectively get each Resource in time window.Time window, refers to that (can such as wait duration to be cut the time shaft cutting that whole event develops Point) it is resulting each time period after multiple continuous time periods.

Fig. 2 is the resource schematic diagram in the time window for getting of the present invention, as shown in Fig. 2 can be by these Resource shows sample to collect personnel, and sample is collected personnel and can therefrom be selected and thinks most reflect coming for " star A divorces " event Dragon goes 2 resources of arteries and veins, using selected resource as high-quality resource.

Afterwards, each high-quality resource can be constituted into a resource pair with each the non-prime resource for being shown respectively.

Such as, selected high-quality resource is the resource 1 and resource 2 shown in Fig. 2, then can obtain following resource pair：(money Source 1, resource 3), (resource 1, resource 4), (resource 1, resource 5), (resource 2, resource 3), (resource 2, resource 4), (resource 2, money Source 5) etc..

Followed by, each resource can be respectively generated to corresponding training sample, be may include in each training sample：Respectively from The feature extracted in one the two of resource centering resource, and, two resource result of determination which is better and which is worse.

Each resource pair is directed to, each resource that can be respectively to the resource centering carries out feature extraction, and combines two The resource allocation result of determination which is better and which is worse, generates a training sample.

Result of determination can represent with 1 and 0, such as, if a previous resource for resource centering is better than latter resource, Then result of determination can be 1, if conversely, latter resource is better than previous resource, result of determination can be 0.

So, by taking (resource 1, resource 3), (resource 2, resource 4) two resources pair as an example, its corresponding training sample will divide It is not：(feature, the feature of resource 3,1 of resource 1), (feature, the feature of resource 4,1 of resource 2).

Using above-mentioned processing mode, sample is only showed to collect some resources in one time window of personnel every time, Allow sample to collect personnel and therefrom select several optimal resources, so that sample collects personnel being taken into full account when being selected The timeliness background of event train of thought, that is, not only allow for the correlation of resource, it is also contemplated that the train of thought importance of resource, meanwhile, Using above-mentioned processing mode so that sample collects personnel and can just get more training sample by less work, so that Improve sample collection efficiency etc..

Two) feature extraction

The feature extracted from each resource including but not limited to one below or any combination, it is preferred that extractable Go out following whole features：

Plain text feature, resource temperature feature, search temperature feature, similar resource number feature.

1) plain text feature

The plain text for how obtaining resource is characterized as prior art, such as, can be based on bag of words (Bag of words) method, Using the weight meter of term frequency-inverse document frequency (TF-IDF, Term Frequency-Inverse Document Frequency) Calculation mode extracts the plain text feature of resource.

2) resource temperature feature

What this feature mainly reflected is the quantity that resource is clicked reading, how to obtain and is similarly prior art.

3) temperature feature is searched for

For event train of thought, in the key node of train of thought, tend to cause people to scan for it, by such as Baidu search daily record etc. is analyzed, and can find the volumes of searches to certain keyword at which time point and reach peak value, with This time point corresponding resource often has more important meaning in event evolution.

The resource different for two, it is assumed that keyword " star A divorces " is corresponded to, due to two issuing times of resource Difference, the search temperature of the corresponding keyword when resource is issued also can be different, therefore, can be using search temperature as resource One key character.

4) similar resource number feature

In internet, important resource can usually be reprinted in different forms, and it is typically similar in terms of content, because This, by the excavation to internet mass data, can extract the similar resource number of each resource, as the feature of the resource, Reflect the importance of resource from other side.

On the basis of content described above, the search temperature feature and similar resource number feature of resource how are obtained It is prior art.

Three) model training

After enough training samples are got, you can obtain required assessment models according to training sample training, How to be trained is prior art.

The number of assessment models can be one, or, to improve the accuracy of assessment result, the number of assessment models One can be more than, specific number can be decided according to the actual requirements.

Can be respectively trained and obtain each assessment models according to the training sample for getting.

Each assessment models is two disaggregated models of pairwise, i.e., can be to resource and money using assessment models Good and bad relation between source is judged.

Assessment models may include but be not limited to one below or any combination：SVMs (SVM, Support Vector Machine) model, logistic regression (Logistic Regression) model, random forest (Random Forest) Model etc..

Four) event train of thought generation

For pending event, the resource in each time window can be respectively obtained.

For each time window, the important of each resource in the time window can be respectively determined according to assessment models Property scoring.

By taking any time window as an example, for each resource in the time window, following treatment can be respectively carried out：

A) using the resource as resource to be assessed, by other each resources difference in resource to be assessed and the time window One resource pair of composition；

B) two resource result of determination which is better and which is worse of each resource centering are determined respectively according to assessment models；

C) statistical decision result meets the resource logarithm of following condition：Resource to be assessed is another better than place resource centering Resource；

D) using statistics as resource to be assessed prominence score.

Wherein, b) described in process, for each resource pair, can carry respectively according to feature extraction mode described in two) The feature of each resource of the resource centering is taken out, and then according to the feature and assessment models for extracting, determines the resource Two resource result of determination which is better and which is worse of centering, the feature that will be extracted is assessed as the input of assessment models The result of determination of model output.

In addition, when assessment models number is more than for the moment, for each resource pair, it will obtained respectively according to each assessment models To a result of determination, each result of determination can be collected, final result of determination is determined according to summarized results.

Such as, 3 assessment models are co-existed in, for any resource to x, 3 result of determination difference of assessment models output It is 1,1,0, then because assessment models number that result of determination is 1 is 2, result of determination is that 0 assessment models number is 1, therefore can According to the principle that the minority is subordinate to the majority, using 1 as resource to the corresponding result of determination of x.

Assuming that include 4 resources in time window, respectively 1~resource of resource 4 is processed in the manner described above Afterwards, can obtain the classification matrix of pairwise bis- shown in table one：

The classification matrix of one pairwise of table bis-

In Table 1, each resource and the comparative result between itself can represent with 0, so that will not be to subsequent statistical result Produce influence.

The numerical value in the 2nd row~the 5th row in table one can be sued for peace respectively, so as to respectively obtain 1~resource of resource 4 Prominence score, wherein, the prominence score of resource 1 is 1, and the prominence score of resource 2 is 3, the prominence score of resource 3 It is 2, the prominence score of resource 4 is 1.

For each time window, after the prominence score for getting each resource in the time window respectively, can The resource that prominence score meets pre-provisioning request is selected from each resource in the time window, when the resource that will be selected is as this Between representative resource in window.

Wherein, selecting the mode of the resource that prominence score meets pre-provisioning request can be：

Mode one

The N number of resource of prominence score highest is selected as the representative resource in the time window, N is positive integer, had Body value can be decided according to the actual requirements, such as can value be 1, by taking the time window corresponding to table one as an example, due to the weight of resource 2 The property wanted scoring highest, therefore can be using resource 2 as the representative resource in the time window；

Mode two

Resource of the prominence score more than predetermined threshold is selected as the representative resource in the time window, the threshold value Specific value can equally be decided according to the actual requirements.

After the representative resource in each time window is respectively obtained, by the representative resource in each time window according to Time sequencing is combined, you can obtain event train of thought.

Based on above-mentioned introduction, Fig. 3 is the schematic diagram of generation event train of thought of the present invention, as shown in figure 3, left side The all resources in each time window that each resource representation gets, in each time window that each resource representation on right side is determined Representative resource.

Above is the introduction on embodiment of the method, below by way of device embodiment, enters to advance to scheme of the present invention One step explanation.

Embodiment two

Fig. 5 is the composition structural representation of event train of thought generating means embodiment of the present invention, as shown in figure 5, including： Processing unit 51.

Processing unit 51, for for pending event, the resource in each time window being obtained respectively；During for each Between window, the prominence score of each resource in the time window is determined respectively, and from each resource in the time window The resource that prominence score meets pre-provisioning request is selected, the resource that will be selected is used as the representative resource in the time window；Will Representative resource in each time window is combined sequentially in time, obtains event train of thought.

As shown in figure 5, can be further included in described device：Model training unit 52.

Model training unit 52, for obtaining training sample, assessment models is obtained according to training sample training, will assess mould Type is sent to processing unit 51；Correspondingly, processing unit 51 determines each money in each time window respectively according to assessment models The prominence score in source.

Wherein, be may particularly include in model training unit 52：Sample collects subelement 521 and model training subelement 522。

Sample collects subelement 521, for the resource in the corresponding any time window of any one event to be shown, obtains The high-quality resource selected from the resource for being shown is taken, respectively by each high-quality resource and each the non-prime resource group for being shown Into a resource pair, each resource is generated respectively to corresponding training sample, training sample is sent to model training subelement 522。

Assessment models, for obtaining assessment models according to training sample training, are sent to place by model training subelement 522 Reason unit 51.

Be may include in each training sample for being generated：The spy for being extracted from the two of resource centering resources respectively Levy, and, two resource result of determination which is better and which is worse.

The feature extracted from each resource may include but be not limited to one below or any combination：Plain text feature, Resource temperature feature, search temperature feature, similar resource number feature.

In addition, the number of assessment models can be one, or, to improve the accuracy of assessment result, assessment models Number can also be more than one, and model training subelement 522 can be respectively trained and obtain each and comment according to the training sample for getting Estimate model.

Assessment models may include but be not limited to one below or any combination：Supporting vector machine model, Logic Regression Models, Random Forest model.

As shown in figure 5, be may particularly include in processing unit 51：Obtain subelement 511, selection subelement 512 and combination Subelement 513.

Subelement 511 is obtained, for for pending event, the resource in each time window being obtained respectively, and send Give selection subelement 512.

Selection subelement 512, for for each time window, following treatment being carried out respectively：

For each resource in the time window, respectively using the resource as resource to be assessed, by resource to be assessed with Other each resources in the time window separately constitute a resource pair；Get each resource pair respectively according to assessment models In two resource result of determination which is better and which is worse；Statistical decision result meets the resource logarithm of following condition：Resource to be assessed Better than another resource of place resource centering；Using statistics as resource to be assessed prominence score；

The resource that prominence score meets pre-provisioning request, the resource that will be selected are selected from each resource in the time window As the representative resource in the time window, combination subelement 513 is sent to.

Combination subelement 513, for the representative resource in each time window to be combined sequentially in time, obtains Event train of thought.

For each resource pair, selection subelement 512 can respectively extract the spy of each resource of the resource centering first Levy, and then according to the feature and assessment models for extracting, determine two resource judgements which is better and which is worse of the resource centering As a result, the feature that will be extracted obtains the result of determination of assessment models output as the input of assessment models.

When assessment models number is more than for the moment, for each resource pair, selection subelement 512 can respectively according to each assessment mould Type gets a result of determination, and then each result of determination is collected, and determines final judgement according to summarized results As a result.

For each time window, selection subelement 512 is getting the important of each resource in the time window respectively Property scoring after, the resource that prominence score meets pre-provisioning request can be selected from each resource in the time window, will select Resource as the representative resource in the time window.

Such as, for each time window, selection subelement 512 can be selected important from each resource in the time window Property the scoring N number of resource of highest, N is positive integer, and the resource that will be selected is used as the representative resource in the time window.

Or, for each time window, selection subelement 512 can be selected important from each resource in the time window Property scoring more than predetermined threshold resource, the resource that will be selected is used as the representative resource in the time window.

After the representative resource in each time window is respectively obtained, combination subelement 513 can be by each time window Representative resource be combined sequentially in time, so as to obtain event train of thought.

The specific workflow of Fig. 5 shown device embodiments refer to the respective description in preceding method embodiment, herein Repeat no more.

In a word, using scheme of the present invention, for pending event, the money in each time window can respectively be obtained Source, and for each time window, therefrom selecting can most reflect the representative resource of event progress respectively, and then utilize institute The representative combination of resources in each time window selected obtains event train of thought, so, when user is carried out using such as search engine During search, event train of thought directly can be showed into user, so as to overcome problems of the prior art, and then improve use The information acquisition efficiency at family.

In several embodiments provided by the present invention, it should be understood that disclosed apparatus and method, can be by it Its mode is realized.For example, device embodiment described above is only schematical, for example, the division of the unit, only Only a kind of division of logic function, can there is other dividing mode when actually realizing.

The unit that is illustrated as separating component can be or may not be it is physically separate, it is aobvious as unit The part for showing can be or may not be physical location, you can with positioned at a place, or can also be distributed to multiple On NE.Some or all of unit therein can be according to the actual needs selected to realize the mesh of this embodiment scheme 's.

In addition, during each functional unit in each embodiment of the invention can be integrated in a processing unit, it is also possible to It is that unit is individually physically present, it is also possible to which two or more units are integrated in a unit.Above-mentioned integrated list Unit can both be realized in the form of hardware, it would however also be possible to employ hardware adds the form of SFU software functional unit to realize.

The above-mentioned integrated unit realized in the form of SFU software functional unit, can store and be deposited in an embodied on computer readable In storage media.Above-mentioned SFU software functional unit storage is in a storage medium, including some instructions are used to so that a computer Equipment (can be personal computer, server, or network equipment etc.) or processor (processor) perform the present invention each The part steps of embodiment methods described.And foregoing storage medium includes：USB flash disk, mobile hard disk, read-only storage (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disc or CD etc. it is various Can be with the medium of store program codes.

Presently preferred embodiments of the present invention is the foregoing is only, is not intended to limit the invention, it is all in essence of the invention Within god and principle, any modification, equivalent substitution and improvements done etc. should be included within the scope of protection of the invention.

Claims

1. a kind of event train of thought generation method, it is characterised in that including：

For pending event, the resource in each time window is obtained respectively；

For each time window, determine the prominence score of each resource in the time window respectively, and from it is described when Between select the resource that prominence score meets pre-provisioning request in each resource in window, the resource that will be selected is used as the time window Intraoral representative resource；

2. method according to claim 1, it is characterised in that

The method is further included：Training sample is obtained, assessment models are obtained according to training sample training；

The prominence score of each resource determined respectively in the time window includes：

According to the assessment models, the prominence score of each resource in the time window is determined respectively.

3. method according to claim 2, it is characterised in that

It is described according to the assessment models, the prominence score of each resource in the time window is determined respectively to be included：

For each resource in the time window, following treatment is carried out respectively：

Using the resource as resource to be assessed, the resource to be assessed is divided with other each resources in the time window Zu Cheng not a resource pair；

Get two resource result of determination which is better and which is worse of each resource centering respectively according to the assessment models；

Statistical decision result meets the resource logarithm of following condition：Another money of the resource to be assessed better than place resource centering Source；

Using statistics as the resource to be assessed prominence score.

4. method according to claim 3, it is characterised in that

Each training sample includes：

The feature for being extracted from the two of resource centering resources respectively, and, two resource judgement knots which is better and which is worse Really；

The two resources result of determination bag which is better and which is worse for getting each resource centering respectively according to the assessment models Include：

Two features of resource of each resource centering are extracted respectively；

According to the feature for extracting and the assessment models, two resources of each resource centering are got respectively, and which is better and which is worse Result of determination.

5. method according to claim 4, it is characterised in that

The acquisition training sample includes：

Resource in the corresponding any time window of any one event is shown；

Obtain the high-quality resource selected from the resource for being shown；

Each high-quality resource is constituted into a resource pair with each the non-prime resource for being shown respectively；

Each resource is generated respectively to corresponding training sample.

6. method according to claim 3, it is characterised in that

The number of the assessment models is for one or more than one；

It is described to obtain assessment models and include according to training sample training：

Each assessment models is obtained according to training sample training respectively；

When the assessment models number is more than for the moment, for each resource pair, gets one according to each assessment models respectively and sentence Determine result, each result of determination is collected, final result of determination is determined according to summarized results.

7. method according to claim 6, it is characterised in that

The assessment models include one below or any combination：

Supporting vector machine model, Logic Regression Models, Random Forest model.

8. method according to claim 4, it is characterised in that

The feature extracted from each resource includes one below or any combination：

9. method according to claim 1, it is characterised in that

The resource that prominence score meets pre-provisioning request, the money that will be selected are selected in each resource in the time window Source includes as the representative resource in the time window：

The N number of resource of prominence score highest is selected from each resource in the time window, N is positive integer, by what is selected Resource is used as the representative resource in the time window；

Or, resource of the prominence score more than predetermined threshold is selected from each resource in the time window, by what is selected Resource is used as the representative resource in the time window.

10. a kind of event train of thought generating means, it is characterised in that including：Processing unit；

The processing unit, for for pending event, the resource in each time window being obtained respectively；For each time Window, determines the prominence score of each resource in the time window, and each resource from the time window respectively In select the resource that prominence score meets pre-provisioning request, the resource that will be selected as in the time window representativeness money Source；Representative resource in each time window is combined sequentially in time, event train of thought is obtained.

11. devices according to claim 10, it is characterised in that

Described device is further included：Model training unit；

The model training unit, for obtaining training sample, assessment models is obtained according to training sample training, will be described Assessment models are sent to the processing unit；

The processing unit determines that the importance of each resource in the time window is commented respectively according to the assessment models Point.

12. devices according to claim 11, it is characterised in that

The processing unit includes：Obtain subelement, selection subelement and combination subelement；

The acquisition subelement, for for pending event, the resource in each time window being obtained respectively, and be sent to institute State selection subelement；

The selection subelement, for for each time window, following treatment being carried out respectively：

For each resource in the time window, respectively using the resource as resource to be assessed, by the money to be assessed Source separately constitutes a resource pair with other each resources in the time window；Got respectively according to the assessment models Two resource result of determination which is better and which is worse of each resource centering；Statistical decision result meets the resource logarithm of following condition： Another resource of the resource to be assessed better than place resource centering；Using statistics as the resource to be assessed importance Scoring；

The resource that prominence score meets pre-provisioning request is selected from each resource in the time window, the resource that will be selected is made It is the representative resource in the time window, is sent to the combination subelement；

The combination subelement, for the representative resource in each time window to be combined sequentially in time, obtains thing Part train of thought.

13. devices according to claim 12, it is characterised in that

Each training sample includes：

The selection subelement extracts two features of resource of each resource centering respectively, according to the feature for extracting and The assessment models, get two resource result of determination which is better and which is worse of each resource centering respectively.

14. devices according to claim 13, it is characterised in that

The model training unit includes：Sample collects subelement and model training subelement；

The sample collects subelement, for the resource in the corresponding any time window of any one event to be shown, obtains The high-quality resource selected from the resource for being shown, respectively by each high-quality resource and each the non-prime resource composition for being shown One resource pair, generates each resource to corresponding training sample respectively, and the training sample is sent into the model training Subelement；

The model training subelement, for obtaining assessment models according to training sample training, by assessment models hair Give the processing unit.

15. devices according to claim 14, it is characterised in that

The number of the assessment models is for one or more than one；

The model training subelement obtains each assessment models according to training sample training respectively；

The selection subelement is further used for,

16. devices according to claim 15, it is characterised in that

The assessment models include one below or any combination：

Supporting vector machine model, Logic Regression Models, Random Forest model.

17. devices according to claim 13, it is characterised in that

18. devices according to claim 12, it is characterised in that

For each time window, the selection subelement selects prominence score most from each resource in the time window N number of resource high, N is positive integer, and the resource that will be selected is used as the representative resource in the time window；

Or, for each time window, the selection subelement selects importance from each resource in the time window More than the resource of predetermined threshold, the resource that will be selected is used as the representative resource in the time window for scoring.