WO2022095375A1

WO2022095375A1 - Event context generation method and apparatus, and terminal device and storage medium

Info

Publication number: WO2022095375A1
Application number: PCT/CN2021/091095
Authority: WO
Inventors: 殷子墨
Original assignee: 平安科技（深圳）有限公司
Priority date: 2020-11-06
Filing date: 2021-04-29
Publication date: 2022-05-12
Also published as: CN112328747B; CN112328747A

Abstract

An event context generation method and apparatus, and a terminal device and a storage medium, which are applied to the technical field of artificial intelligence. The method comprises: respectively acquiring first time information and event information in a plurality of event documents, so as to obtain a plurality of first time-event pairs corresponding to the plurality of event documents (S101); standardizing the time expression means of a plurality of pieces of first time information in the plurality of first time-event pairs, so as to obtain a plurality of pieces of standardized second time information, and correspondingly replacing the first time information in the plurality of first time-event pairs with the plurality of pieces of standardized second time information, so as to obtain a plurality of second time-event pairs (S102); determining, from the plurality of second time-event pairs, target event information corresponding to the plurality of pieces of second time information (S103); and according to the second time information corresponding to the target event information, sorting the target event information so as to generate event context (S104). By means of the event context generated by means of the method, corresponding event information can be generated for each time node when an event document encompasses event information under a plurality of time nodes, such that clear event context can be generated according to time nodes.

Description

Event context generation method, device, terminal device and storage medium

This application claims the priority of the Chinese patent application with the application number 202011229516.5 and the invention title "Event Context Generation Method, Apparatus, Terminal Equipment and Storage Medium", which was filed in the China Patent Office on November 06, 2020, the entire contents of which are Incorporated herein by reference.

technical field

The present application belongs to the technical field of intelligent decision-making, and in particular, relates to a method, apparatus, terminal device and storage medium for generating an event context.

Background technique

Event context is a form of presentation of long-term news events. Such events usually continue to change or cause social influence over a long period of time, and chain reactions or related events continue to appear. For such events, the complete event is often described through the display of time nodes and key event content, which is helpful for users to quickly grasp the full picture of the event. However, the inventor realized that, in the current automatic generation method of event context, the terminal device sorts out the events included in the news according to the time of news release. However, when a news article covers event information under multiple time nodes, the event information under multiple time nodes will be regarded as events that occurred under one time node (news release time), so that it is impossible to generate a clear event context.

technical problem

One of the purposes of the embodiments of the present application is to provide an event context generation method, device, terminal device and storage medium, which aims to solve the problem that when a piece of news covers event information under multiple time nodes, multiple time nodes The event information below will be regarded as an event that occurred under a time node, so that a clear event context cannot be generated.

technical solutions

In order to solve the above-mentioned technical problems, the technical solutions adopted in the embodiments of the present application are:

A first aspect of the embodiments of the present application provides a method for generating an event context, the method comprising:

respectively acquiring first time information and event information in multiple event documents, and obtaining multiple first time event pairs corresponding to the multiple event documents;

Unify the time expressions of the multiple first time information in the multiple first time event pairs, obtain multiple unified second time information, and replace the unified multiple second time information correspondingly with each other. Describe the first time information of the plurality of first time event pairs to obtain a plurality of second time event pairs;

From the plurality of second time event pairs, determine target event information corresponding to the plurality of second time information;

According to the second time information corresponding to the target event information, the target event information is sorted to generate an event context.

A second aspect of the embodiments of the present application provides an event context generation device, the device comprising:

an obtaining module, configured to obtain first time information and event information in multiple event documents respectively, and obtain multiple first time event pairs corresponding to the multiple event documents;

The processing module is configured to unify the time expressions of the multiple first time information in the multiple first time event pairs, obtain multiple unified second time information, and combine the multiple unified second time information The information corresponds to replacing the first time information of the plurality of first time event pairs, respectively, to obtain a plurality of second time event pairs;

a determining module, configured to determine target event information corresponding to the plurality of second time information from the plurality of second time event pairs;

The generating module is configured to sort the target event information according to the second time information corresponding to the target event information to generate an event context.

A third aspect of the embodiments of the present application provides a terminal device, including: a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program When realized:

A fourth aspect of the embodiments of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and the computer program is executed by a processor to implement:

A fifth aspect of the embodiments of the present application further provides a computer program product, when the computer program product is run on a terminal device, the terminal device can implement:

beneficial effect

Compared with the prior art, the embodiments of the present application include the following advantages:

In this embodiment of the present application, by acquiring all the first time information and event information from each event document, and generating one or more first time event pairs corresponding to each event document, it is possible to solve the problem of covering all events in one event document. When there are event information under multiple time nodes, all the event information in the event document is considered to be the problem of the event information that occurred at one time node. Then, by normalizing the first time information in each first time pair, the second time information with the unified time information dimension is obtained. Furthermore, the target event information corresponding to the second time information can be determined from the event information of a plurality of identical time nodes according to the second time information, so as to obtain the target event information corresponding to each second time information. Finally, the target event information can be sorted according to the second time information to generate an event context, so that in the generated event context, each time node corresponds to a target event information, which reduces the occurrence of repeated event information in the event context.

Description of drawings

In order to illustrate the technical solutions in the embodiments of the present application more clearly, the following briefly introduces the accompanying drawings that are used in the description of the embodiments or exemplary technologies. Obviously, the drawings in the following description are only for the present application. In some embodiments, for those of ordinary skill in the art, other drawings can also be obtained according to these drawings without any creative effort.

1 is a flowchart of an implementation of a method for generating an event context provided by an embodiment of the present application;

FIG. 2 is a schematic diagram of an implementation manner of S101 of an event context generation method provided by an embodiment of the present application;

3 is a schematic diagram of another implementation manner of S101 of an event context generation method provided by an embodiment of the present application;

FIG. 4 is a schematic diagram of an implementation manner of S304 of an event context generation method provided by an embodiment of the present application;

FIG. 5 is a schematic diagram of an implementation manner of S402 of an event context generation method provided by an embodiment of the present application;

FIG. 6 is a schematic diagram of an implementation manner of S103 of an event context generation method provided by an embodiment of the present application;

7 is a structural block diagram of an event context generating apparatus provided by an embodiment of the present application;

FIG. 8 is a structural block diagram of a terminal device provided by an embodiment of the present application.

Embodiments of the present invention

In order to make the purpose, technical solutions and advantages of the present application more clearly understood, the present application will be described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, but not to limit the present application.

The event context generation method provided by the embodiments of the present application can be applied to terminal devices such as tablet computers, notebook computers, ultra-mobile personal computers (UMPCs), netbooks, etc. The specific types of the terminal devices are not specified in the embodiments of the present application. any restrictions.

Please refer to FIG. 1. FIG. 1 shows an implementation flowchart of a method for generating an event context provided by an embodiment of the present application. The method includes the following steps:

S101. Obtain first time information and event information in multiple event documents respectively, and obtain multiple first time event pairs corresponding to the multiple event documents.

In an application, the above-mentioned event documents include but are not limited to documents containing text information, such as news and books. The manner of acquiring the event document may be that the terminal device crawls, in real time, news or microblogs containing keywords from a webpage according to the keywords of the event. It is also possible to obtain multiple event documents pre-stored by the user from a specified path for the terminal device. The above-mentioned event information may be the summary information of the core event in the event document, and the event information may be determined from the event document by using event keywords or a method based on a core event selection model.

In an application, the above-mentioned core event (event information) contained in an event document may be a sentence or a paragraph. The event document can be segmented, and the number of keywords contained in each clause is calculated according to the preset event keywords, and the first sentence weight corresponding to each clause is calculated according to the number of keywords. Afterwards, according to the position of the sentence in the event document, the weight of the second sentence corresponding to each sentence is assigned. For example, for the title and body content, the second sentence weight corresponding to the title clause may be assigned a weight higher than the second sentence weight corresponding to the remaining clauses in the body content. Finally, according to the weight of the first sentence and the weight of the second sentence of each sentence, the target sentence is selected as the core event from multiple sentences. The preset event keywords may be phrases extracted from the event document, that is, the word frequency of each phrase in the event document is counted, and the word frequency reaching a threshold is determined as the event keyword. It is understandable that this phrase can fully express the content of the event document itself, and is concise and general.

In an application, the above-mentioned first time information is the occurrence time point corresponding to the event information, and it can be considered that the event information that occurs under the node of the first time information is the first time event pair. The time information in the event document may be time information that specifically describes the event existing in the event document, or may be the time at which the terminal device obtains the event document as the time information. However, for news, the release time of the news can also be used as the time information in the event file. In this embodiment, the time information in the event document that specifically describes the event information may be prioritized as the first time information corresponding to the event information. If no time information specifically describing the event information is found in the event document, the release time of the event document may be used as the first time information of the event document. Otherwise, the time point at which the terminal device acquires the event document is used as the first time information of the event information.

In the application, for multiple pieces of first time information in an event document, multiple pieces of event information can be correspondingly obtained from the event document, and the first time information and the event information can be paired to obtain the multiple pieces of event information in the event document. time event pair. In this way, for multiple event documents, multiple time event pairs in each event document can be correspondingly obtained.

S102. Unify the time expressions of the multiple first time information in the multiple first time event pairs, obtain multiple unified second time information, and respectively correspond to the multiple unified second time information A plurality of second time event pairs are obtained by replacing multiple pieces of first time information of the first time event pair.

In application, normalizing the above-mentioned first time information can be understood as unifying the dimension of each first time information. Since there are various ways of expressing time, if the dimension of the first time information is not unified, it is difficult to determine the sequence of time points expressed by each first time information. Therefore, in order to facilitate the comparison of the sequence of the time information, the time information may be normalized. Exemplarily, time information such as "July 10" and "July 10" can be normalized to obtain July 10. Wherein, after unifying the time expressions of the first time information, the obtained time information is the second time information. Therefore, by correspondingly replacing the first time information in the first time event pair with the second time information, a plurality of second time event pairs can be obtained. It should be noted that the time information may be specific to any time point such as hour, minute, second, etc., and the first time information may be normalized according to specific circumstances, which is not limited.

S103. From the plurality of second time event pairs, determine target event information corresponding to the plurality of second time information.

In an application, for multiple second time event pairs obtained from multiple event documents, there may be multiple second event information at the same time point (second time information). For example, for multiple pieces of news that report on event information B at time point A in the morning, the terminal device may acquire a second time event pair about AB from each piece of news. However, multiple second time event pairs all report the same event information at the same time point. Therefore, it is necessary to select one of the multiple second time event pairs at the same time point as the representative target event information. For example, if the event document is news, the authority of the news source site, the time when the site publishes the news (the time when the site publishes the news, the time information corresponding to the event information may be inconsistent), the amount of news reprints, etc., can be used as a reference. .

It should be noted that the target event information corresponding to the above-mentioned multiple pieces of second time information can be understood as determining one piece of information from multiple pieces of the same second time information (that is, multiple pieces of the same event information corresponding to the same time point). The event information is used as the target event information of the second time information. In this case, the event information corresponding to the remaining second time information at the same time point can be ignored. In addition, for the rest of the second time information that does not have the same time point (that is, there is only one second time information), it can be considered that the event information in the second time event pair is the target event corresponding to the second time information. information.

S104. According to the second time information corresponding to the target event information, sort the target event information to generate an event context.

In application, the above-mentioned event context is a display form for long-term developing news events. After acquiring the target event information at each time point (second time information), according to the time sequence, the target event information is put into the time line to generate an event context. Among them, in the generated event context, the user can observe the development of the event according to the event context. For example, for news with obvious event development, an area with relatively dense news can be seen on the event context, and this area can be considered as the main stage of event development.

In this embodiment, by acquiring all the first time information and event information from each event document, and generating one or more first time event pairs corresponding to each event document, the solution can be solved in one event document When covering the event information under multiple time nodes, all the event information in the event document is considered to be the problem of the event information that occurred at one time node. After that, by normalizing the first time information in each first time pair, the second time information after the unified time information dimension is obtained, and then, according to the second time information, data from multiple same time nodes can be obtained. Target event information corresponding to the second time information is determined in the event information, thereby obtaining target event information corresponding to each second time information. Finally, according to the second time information, the target event information is sorted to generate an event context. In the generated event context, each time node corresponds to a target event information, which reduces the occurrence of repeated event information in the event context.

Referring to FIG. 2 , in a specific embodiment, the first time information includes multiple time expressions. S101 obtains first time information and event information in multiple event documents respectively, and obtains the first time information in the multiple event documents. The corresponding multiple first time event pairs specifically include the following sub-steps S201-S202, which are described in detail as follows:

S201. Query, according to the multiple time expression manners, multiple pieces of first time information in the multiple event documents that conform to any time expression manner.

In application, the above-mentioned time expression methods include, but are not limited to, using Chinese characters to express time nodes, and using Roman numerals to express time nodes, which are not limited thereto. For the event information in the event document, if the language of the event information is English, Japanese and other languages, the terminal device can translate it into the specified language (Chinese), and then query each event document according to the time expression The first time information in .

Exemplarily, multiple time expressions may be established in advance to query the first time information in the event document. For example, for a dated event document, a time representation of "dd-mm-yy" can be established. Among them, dd represents the hour, and the rule is a value between 0 and 23; mm represents the minute, and the rule is a value between 0 and 59; yy represents the second, and the rule is a value between 0 and 59. And according to the format rule, the text information in the event document is sequentially obtained and compared, and the fight time information that conforms to the time expression method is screened out. The above function is only an example of the time expression manner, which can be set according to the actual situation, which is not limited.

S202. Input the multiple first time information and the corresponding multiple event documents into the sequence annotation model respectively, determine the event information that is respectively matched with the multiple first time information, and obtain the multiple first time information. First time event pair.

In application, the above sequence labeling model can be a time recursive neural network model (Long Short Term Memory Network, LSTM), a conditional random field network model (Conditional Random Field, CRF), or a sequence labeling model formed by a combination of time recursion and conditional random field . The sequence labeling model is used to output an accurate score (probability) of the pairing of the event information and the first time information based on the event feature of the currently input event information and the time feature of the first time information. The sequence labeling model can be trained based on the existing training data (multiple event information and multiple time information in the event document) and the classification result of the training data (the time information corresponding to each event information), and the obtained training Model. Then, the event document is input into the training model, the training model determines the first time information in the event document, and extracts the document position of the first time information in the event document as the time feature of the first time information. According to the time feature, output the exact probability value of each event information in the event document paired with the first time information. The event information and the corresponding first time information are determined according to the probability value, and a first time event pair is generated. In this way, one or more first time event pairs in each event document are obtained.

In this embodiment, by setting a variety of time expression methods, it is possible to accurately query a plurality of time information in each event document, and determine the paired event information for each first time information according to the sequence labeling model, so as to solve a problem. When an event document covers event information that occurs under multiple first time nodes, the terminal device only generates one first time event pair for the event document.

Referring to FIG. 3, in a specific embodiment, S101 obtains first time information and event information in a plurality of event documents respectively, and obtains a plurality of first time event pairs corresponding to the plurality of event documents respectively, which specifically includes The following sub-steps S301-S304 are detailed as follows:

S301. Obtain each first time information in each event document respectively, and determine the location of each first time information in one or more first document positions in the corresponding event document.

In the application, after the first time information is determined from the event document according to the time expression method, the position of the first document may be determined correspondingly according to the position of the first time information in the document. Wherein, the first document position of the first time information in the event document may specifically be: performing word segmentation on the event document to obtain multiple text segmentations, determining the sorting position of the first time information in the multiple text segmentations, and using the sorting position as First document location. It should be noted that, for the determined first time information, the first time information can be directly used as a text word segmentation, and only the content in the event document that does not belong to the first time information is segmented.

S302. Perform word segmentation processing on each event document to obtain multiple word segmentations in each event document.

S303. Determine that in each event document, the plurality of segmented words are respectively located in a plurality of second document positions in the corresponding event document.

In the application, the above-mentioned word segmentation processing for each event document can be performed by using a word segmentation method based on string matching, a word segmentation method based on understanding, and a word segmentation method based on statistics to perform word segmentation processing on the event document. For example, for the word segmentation method based on string matching, the news to be segmented can be segmented and paired with the entries in the preset machine dictionary. If a certain string is found in the dictionary, it can be determined that the string is successfully matched (ie, a participle is recognized).

Exemplarily, first pair a sentence in the news with the entry, and if the pairing is not successful, delete the first word (or the last word) in the sentence to form a new string to be paired with the entry, until the pairing is successful. After that, the remaining strings of the sentence are used as new strings to pair with the entry, and the above operations are repeated to obtain multiple word segmentations of a sentence. In this way, word segmentation is performed on the remaining sentences in the news, and multiple word segmentations of the news can be obtained. After the news segmentation is completed, the second document position of each segmentation can be determined according to the order of each segmentation in the news. Afterwards, the first document position of the first time information is determined according to the position of the first time information in the news.

S304. According to the one or more first document positions and the plurality of second document positions, determine a target word segment matched with the first time information from the plurality of word segments, and generate the each event Event information corresponding to each of the first time information in the document.

In an application, for a piece of news, if there are multiple first time information, there are multiple first document positions correspondingly. According to the position of the first document and the position of the second document, the separation distance between each position of the first document and the position of each second document in the news can be calculated, and the target paired with each first time information can be determined according to the separation distance. Participle.

Illustratively, generally in news, event information is usually reported in conjunction with time information. News usually includes time information, location information, event information and many other contents, and to accurately report events, the format of describing event information in news is generally, at xx time (first time information), xx place (location information), occurrence xx event (event information). That is, it can be considered that each first time information and the corresponding event information are located close to each other in the news. Therefore, the separation distance can be calculated according to the first document position of each first time information and the second document position of the word segmentation. Afterwards, when it is determined that the interval distance is smaller than the preset threshold, the word segmentation corresponding to the position of the second document is determined as the target word segmentation. The event information corresponding to the first time information can be generated by combining multiple target word segments according to the position sequence. In this way, each first time information and corresponding event information in each event document can be generated.

Referring to FIG. 4 , in a specific embodiment, in S304 , according to the one or more first document positions and the plurality of second document positions, it is determined from the plurality of word segments to match the first time information For the target word segmentation, the event information corresponding to the first time information in each event document is generated, which specifically includes the following sub-steps S401-S403, which are described in detail as follows:

S401. Calculate the separation distance between the second document position of each word segment and the first document position in each event document respectively.

S402. Calculate, according to the separation distance, a classification probability that each word segment is paired with the first time information.

In an application, for calculating the above-mentioned separation distance, reference may be made to the description in S304, which will not be described in detail.

In the application, after the interval distance is obtained, the word segmentation and the interval distance can be input into the neural network structure in the sequence labeling model, and the sequence labeling model performs feature processing on it to obtain the processed feature vector. After that, the sequence tagging model can output the classification probability of the current word segment and the corresponding first time information based on the processed feature vector.

S403. According to the classification probability corresponding to each word segment, determine a target word segment matched with the first time information from the plurality of word segments, and generate a target word segment that matches the first time information in each event document. Information corresponding to the event information.

In the application, each event document has a plurality of the above-mentioned segmented words, therefore, the classification probability of the multiple word segmentations and the first time information can be obtained. However, event information can be viewed as a sentence or paragraph composed of multiple word segments. Therefore, a plurality of classification probabilities can be sorted in descending order, and the segmented words corresponding to the top N classification probabilities can be used as target segmented words, and event information can be generated according to the order of the second document position of each target segmented word. It should be noted that the classification probability of the event information paired with the first time information is higher than the classification probability of the event information composed of other word segmentations paired with the same first time information. Therefore, even when there are multiple pieces of first time information in the event document, event information paired with each first time information can be accurately generated.

Referring to FIG. 5 , in a specific embodiment, the plurality of word segmentations include a first word segmentation and a second word segmentation, and S402 calculates, according to the interval distance, the pairing of each word segmentation with the first time information respectively. The classification probability specifically includes the following sub-steps S501-S504, which are described in detail as follows:

S501. Extract the first feature of the first participle in each event document, and obtain the previous second participle adjacent to the first participle, and determine the second participle and the corresponding event document in the event document. The pairing category between the described event information.

In an application, the above-mentioned first feature may be a feature obtained by a feature extraction network in a sequence tagging model, which performs feature extraction on word segmentation. Wherein, the above-mentioned event document has multiple word segments, the word segment currently being processed to determine the classification probability may be used as the first word segment, and the previous word segment adjacent to the first word segment may be used as the second word segment. The pairing type between the second word segment and the event information is the pairing type of the second word segment and the event information that is determined when the terminal device processes the second word segment before. The pairing category includes that the second participle can be used to generate event information, that is, the second participle is paired with the event information, or the second participle cannot be used to generate event information, that is, the second participle is not paired with the event information.

S502. Calculate a first probability that the first segmented word belongs to the event information according to the first feature.

In an application, whether the first participle can be used as a participle in the event information is generally determined by the first participle itself. Specifically, word segmentation can be considered as a word segmentation consisting of a single word or multiple words, a word vector library can be constructed in advance, and a corresponding serial number is assigned to each word in the word vector library. The sequence numbers corresponding to the words contained in the first participle in the word vector library are identified as the first feature. At the same time, the terminal device can determine the core abstract in the event document according to the existing method for determining the core abstract (core event) of the event document. For example, methods for generating abstracts include, but are not limited to: supervised extraction methods and abstract abstract methods for abstract generation. Then, the recognized first feature and the core abstract are input into the sequence annotation model, and the sequence annotation model outputs the first probability that the current first segmented word belongs to the event information (core abstract). The core abstract includes, but is not limited to, each core abstract in the event document, the overall core abstract of the entire event document, and the like.

S503. Calculate a second probability that the first segmented word belongs to the event information according to the pairing category.

In the application, whether the current first participle can be used as a participle in the event information is also determined by the participles around the first participle. Therefore, in order to more accurately determine whether the current first participle can be used as a participle in the event information, the judgment can also be made based on the previous second participle adjacent to the current first participle. Wherein, whether the second participle can be used as the participle in the event information can be determined through the pairing category in the above S501. After determining the pairing category, the first feature of the first segmented word, and the core abstract, it can be determined through the conditional random field network model in the sequence labeling model. That is, under the probability condition that the adjacent second participle belongs to the event information, calculate the second probability that the current first participle belongs to the event information, or, under the condition that the adjacent second participle does not belong to the event information, calculate the probability condition. The current first participle belongs to the second probability of event information.

S504: Calculate the classification probability of the pairing of the first word segment and the first time information according to the distance between the first word segment and the first time information, the first probability, and the second probability.

In the application, calculating the classification probability when each first segmented word is accurately paired with the first time information by the above sequence labeling model is specifically: after obtaining the separation distance, the first probability and the second probability, the sequence labeling model can calculate the separation distance , the specific values of the first probability and the second probability are normalized, and the normalized value is used as an input feature, which is input to the classifier in the sequence model, and the classifier outputs the first participle in the event that can be used as an event. On the basis of the target word segmentation of the information, the accurate value paired with the first time information is the classification probability.

Specifically, for the specific classification probability of pairing between the first word segment and the first time information, the following formula can be used to calculate and pair:

Among them, X is the first participle in the event document, y is the pairing category in the first participle (whether the first participle belongs to the event information); i is the ith participle position of the participle X in the event information, and n is the event information There are _n participles in the The ith part (the first participle) belongs to the second probability of event information, P _i,yi represents the feature of the participle based on the ith participle, predicting the ith participle as the first probability of belonging to the event information, Q _i,yi represents Based on the distance between the ith participle and the first time information, predict the third probability that the ith participle is accurately paired with the first time information; On the basis of the information, determine the accurate probability that the first participle is paired with the first time information. Wherein, the distance between the first participle and the first time information can be calculated by the following formula: Q(X)=dist(min(Tm,X), where X represents the first participle in the event document, and Tm is the number of The mth first time information in the pieces of time information, min(Tm, X) is the interval distance between the first participle and the mth first time information.

In the application, by adding the interval distance feature between the first participle and the first time information in the sequence tagging model, the first participle can be further judged on the basis of judging that the first participle can be used as the target participle in the event information. Correlation with first-time information. Therefore, even if there are multiple pieces of first time information in a news document, the classification probability matched with the word segmentation can be accurately calculated based on the interval distance feature, that is, the matching accuracy between the first time information and the event information can be improved.

It can be understood that when there is only one first-time information in the event document, an overall core summary of the entire event document can be generated. According to the multiple word segments in the event document, the first probability that each word segment belongs to the core abstract and the second probability of the adjacent second word segment are calculated, and the distance between each word segment and the first time information is determined. According to the separation distance, the first probability and the second probability, the classification probability that each word segment is also paired with the first time information when it belongs to the core abstract is determined. In the case where there are multiple first time information in the event document, each core summary can be generated according to each paragraph containing the first time information, and combined with multiple word segments in the event document, it is determined that each segment belongs to each core segment The first probability of the abstract, and the second probability of the adjacent second participle, and the distance between each participle and the corresponding first time information is calculated. According to the separation distance, the first probability and the second probability, the classification probability of each segmented word being paired with the first time information contained in the segment when it belongs to the core abstract of each segment is determined. According to the classification probability and a preset probability threshold, the target word segmentation corresponding to each first time information is determined, and the event information generated according to the second document position of the target word segmentation is determined.

Referring to FIG. 6, in a specific embodiment, each second time information corresponds to at least one event information; S103, from the plurality of second time event pairs, it is determined corresponding to the plurality of second time information The target event information also includes the following sub-steps S601-S603, which are described in detail as follows:

S601. In the plurality of second time information, if there is any second time information corresponding to a plurality of event information, obtain information sources of the plurality of event information respectively.

In an application, each of the above second time information corresponds to at least one event information. It can be understood that there is one second time information corresponding to one event information, and there are also multiple second time information of the same time node and multiple second time information. corresponding to the event information. It can be understood that, for the second event information corresponding to the second time information under the same time node, it may be considered as reports on the same event information by source sites (news source sites) of different event documents. Based on this, the information source of each event information can be obtained from multiple event information at the same time point.

S602. Acquire the target event information with the highest priority from the plurality of event information according to the priority of the information source.

In the application, the priority of the above information sources can be preset in the terminal device. The priority of the information source is high, and it can be considered that the event document corresponding to the information source has more authenticity and authority in the recorded information content. When the terminal device obtains the event document from the network, it can correspondingly obtain the information source of the event document. Therefore, according to the information source of each event document, the target event information with the highest priority can be obtained from multiple event information. The sources of information include but are not limited to event documents (news) published by official sites and event documents published by unofficial sites.

S603. In the plurality of second time information, if there is no one second time information corresponding to a plurality of first event information, determine the event information corresponding to each second time information as the target event information.

In the application, when there is one and only one second time information, that is, the time node of the second time information is inconsistent with the time node of the second time information, it can be determined that the event information corresponding to the second time information is the target event information.

In one embodiment, after sorting the target event information to generate an event context according to the second time information corresponding to the target event information, the method further includes:

Upload the event context to the blockchain.

Specifically, in all the embodiments of the present application, the corresponding event context is obtained based on the terminal device. Specifically, the event context is obtained by processing the terminal tool. Uploading the event context to the blockchain ensures its security and fairness and transparency to users. The user equipment can download the event context from the blockchain in order to verify whether the event context has been tampered with. The blockchain referred to in this example is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain, essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

Please refer to FIG. 7. FIG. 7 is a structural block diagram of an event context generating apparatus provided by an embodiment of the present application. In this embodiment, each unit included in the terminal device is used to execute each step in the embodiment corresponding to FIG. 1 to FIG. 6 . For details, please refer to FIG. 1 to FIG. 6 and the related descriptions in the embodiments corresponding to FIG. 1 to FIG. 6 . For convenience of explanation, only the parts related to this embodiment are shown. Referring to FIG. 7 , the event context generating apparatus 700 includes: an acquiring module 710, a processing module 720, a determining module 730 and a generating module 740, wherein:

The obtaining module 710 is configured to obtain first time information and event information in multiple event documents respectively, and obtain multiple first time event pairs corresponding to the multiple event documents.

The processing module 720 is configured to unify the time expressions of the multiple first time information in the multiple first time event pairs, obtain multiple unified second time information, and combine the multiple unified second time information. The time information corresponds to replacing the first time information of the plurality of first time event pairs, respectively, to obtain a plurality of second time event pairs.

The determining module 730 is configured to determine target event information corresponding to the plurality of second time information from the plurality of second time event pairs.

The generating module 740 is configured to sort the target event information according to the second time information corresponding to the target event information to generate an event context.

In one embodiment, the first time information includes multiple time expressions, and the acquiring module 710 is further configured to:

According to the multiple time expression manners, query the multiple first time information in the multiple event documents that conform to any time expression manner; separate the multiple first time information and the multiple corresponding event documents Input into the sequence labeling model, determine the event information that is respectively matched with the plurality of first time information, and obtain the plurality of first time event pairs.

In one embodiment, the obtaining module 710 is further configured to:

Respectively obtain each first time information in each event document, and determine that each first time information is in one or more first document positions in the corresponding event document respectively; perform word segmentation on each event document processing to obtain a plurality of word segments in each event document; determining that in each event document, the plurality of word segments are respectively in multiple second document positions in the corresponding event document; according to the one or more a plurality of first document positions and the plurality of second document positions, determine the target word segmentation matched with the first time information from the plurality of word segmentations, and generate the Event information corresponding to time information.

In one embodiment, the obtaining module 710 is further configured to:

Calculate the separation distance between the second document position of each word segment and the first document position in each event document respectively; according to the separation distance, calculate the distance between each word segment and the first document The classification probability that the time information is matched; according to the classification probability corresponding to each participle, the target participle that is matched with the first time information is determined from the plurality of participles, and the target participle that is matched with the first time information is generated in each event document. Describe the event information corresponding to each first time information.

In one embodiment, the plurality of word segmentations include a first word segmentation and a second word segmentation; the obtaining module 710 is further configured to:

Respectively extract the first feature of the first participle in each of the event documents, and obtain the previous second participle adjacent to the first participle, and determine the second participle and the event described in the corresponding event document. The pairing category between the information; calculate the first probability that the first participle belongs to the event information according to the first feature; calculate the second probability that the first participle belongs to the event information according to the pairing category ; According to the separation distance between the first participle and the first time information, the first probability and the second probability, calculate the classification probability of the pairing of the first participle and the first time information.

In one embodiment, each second time information corresponds to at least one event information; the determining module 730 is further configured to:

Among the plurality of second time information, if any second time information corresponds to a plurality of event information, the information sources of the plurality of event information are obtained respectively; Obtain the target event information with the highest priority from the plurality of event information; in the plurality of second time information, if there is no one second time information corresponding to a plurality of first event information, then determine and each second time information. The event information respectively corresponding to the second time information is the target event information.

In one embodiment, the event context generating apparatus 700 further includes:

The uploading module 710 is configured to upload the event context to the blockchain.

It should be understood that, in the structural block diagram of the event context generating apparatus shown in FIG. 7 , each unit/module is used to execute each step in the embodiment corresponding to FIG. 1 to FIG. The steps in the examples have been explained in detail in the above-mentioned embodiments. For details, please refer to FIG. 1 to FIG. 6 and the relevant descriptions in the embodiments corresponding to FIG. 1 to FIG. 6 , which will not be repeated here.

FIG. 8 is a structural block diagram of a terminal device provided by another embodiment of the present application. As shown in FIG. 8 , the terminal device 800 of this embodiment includes: a processor 801 , a memory 802 , and a computer program 803 stored in the memory 802 and executable on the processor 801 , such as a program of an event context generation method. When the processor 801 executes the computer program 803, the steps in each of the above embodiments of the event context generation methods are implemented, for example, S101 to S104 shown in FIG. 1 . Alternatively, when the processor 801 executes the computer program 803, the functions of each module in the embodiment corresponding to FIG. 7 are implemented, for example, the functions of the modules 710 to 740 shown in FIG. 7 . Specifically as follows:

A terminal device, comprising a memory, a processor, and a computer program stored in the memory and running on the processor, the processor implements when the processor executes the computer program:

In one embodiment, the first time information includes multiple time expressions, and when the processor executes the computer program, the processor further implements:

According to the multiple time expression manners, query the multiple first time information in the multiple event documents that conform to any time expression manner;

Inputting the plurality of first time information and the corresponding plurality of event documents into the sequence annotation model respectively, determining the event information respectively matched with the plurality of first time information, and obtaining the plurality of first time information time event pair.

In one embodiment, when the processor executes the computer program, it further implements:

Respectively obtain each first time information in each event document, and determine that each first time information is respectively in one or more first document positions in the corresponding event document;

Perform word segmentation processing on each of the event documents to obtain multiple word segmentations in each of the event documents;

Determine that in each event document, the multiple word segmentations are respectively in multiple second document positions in the corresponding event document;

According to the one or more first document positions and the plurality of second document positions, a target word segmentation matched with the first time information is determined from the plurality of word segmentations, and a target word segmentation in each event document is generated. Event information corresponding to each of the first time information.

Calculate the separation distance between the second document position of each word segment and the first document position in each of the event documents respectively;

According to the separation distance, calculate the classification probability that each word segment is paired with the first time information respectively;

According to the classification probability corresponding to each participle, a target participle matched with the first time information is determined from the plurality of participles, and a target participle corresponding to the first time information in each event document is generated. event information.

In one embodiment, the plurality of word segmentations include a first word segmentation and a second word segmentation; when the processor executes the computer program, it further implements:

Respectively extract the first feature of the first participle in each of the event documents, and obtain the previous second participle adjacent to the first participle, and determine the second participle and the event described in the corresponding event document. pairing categories between messages;

Calculate the first probability that the first segmented word belongs to the event information according to the first feature;

calculating a second probability that the first segmented word belongs to the event information according to the pairing category;

According to the separation distance between the first participle and the first time information, the first probability and the second probability, the classification probability of the pairing of the first participle and the first time information is calculated.

In one embodiment, each second time information corresponds to at least one event information; when the processor executes the computer program, the processor further implements:

In the plurality of second time information, if there is any second time information corresponding to a plurality of event information, respectively acquiring the information sources of the plurality of event information;

According to the priority of the information source, obtain the target event information with the highest priority from the plurality of event information;

In the plurality of second time information, if there is no one second time information corresponding to a plurality of first event information, then the event information corresponding to each second time information is determined as the target event information.

Upload the event context to the blockchain.

A computer-readable storage medium, the computer-readable storage medium stores a computer program, and the computer program is implemented when executed by a processor:

In one embodiment, the first time information includes multiple time expressions, and when the computer program is executed by the processor, the computer program further implements:

In one embodiment, the computer program, when executed by the processor, further implements:

In one embodiment, the plurality of word segmentations include a first word segmentation and a second word segmentation; when the computer program is executed by the processor, it further implements:

In one embodiment, each second time information corresponds to at least one event information; when the computer program is executed by the processor, it further implements:

Upload the event context to the blockchain.

Exemplarily, the computer program 803 may be divided into one or more units, and the one or more units are stored in the memory 802 and executed by the processor 801 to complete the present application. One or more units may be a series of computer program instruction segments capable of performing specific functions, and the instruction segments are used to describe the execution process of the computer program 803 in the terminal device 800 . For example, the computer program 803 can be divided into an acquisition module, a processing module, a determination module, and a generation module, and the specific functions of each module are as above.

The terminal device may include, but is not limited to, the processor 801 and the memory 802 . Those skilled in the art can understand that FIG. 8 is only an example of the terminal device 800, and does not constitute a limitation on the terminal device 800, and may include more or less components than the one shown, or combine some components, or different components For example, the terminal device may also include an input and output device, a network access device, a bus, and the like.

The so-called processor 801 can be a central processing unit, and can also be other general-purpose processors, digital signal processors, application-specific integrated circuits, off-the-shelf programmable gate arrays or other programmable logic devices, discrete gate or transistor logic devices, and discrete hardware components. Wait. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 802 may be an internal storage unit of the terminal device 800 , such as a hard disk or a memory of the terminal device 800 . The memory 802 may also be an external storage device of the terminal device 800 , such as a plug-in hard disk, a smart memory card, a flash memory card, etc., which are equipped on the terminal device 800 . Further, the memory 802 may also include both an internal storage unit of the terminal device 800 and an external storage device.

The computer-readable storage medium may be an internal storage unit of the terminal device described in the foregoing embodiments, such as a hard disk or a memory of the terminal device. The computer-readable storage medium may be non-volatile or volatile. The computer-readable storage medium may also be an external storage device of the terminal device, for example, a pluggable hard disk, a smart memory card, a secure digital card, a flash memory card, etc. equipped on the terminal device.

The above-mentioned embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the above-mentioned embodiments, those of ordinary skill in the art should understand that: it can still be used for the above-mentioned implementations. The technical solutions described in the examples are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions in the embodiments of the application, and should be included in the within the scope of protection of this application.

Claims

A method for generating event context, wherein the method comprises:

respectively acquiring first time information and event information in multiple event documents, and obtaining multiple first time event pairs corresponding to the multiple event documents;

Unify the time expressions of the multiple first time information in the multiple first time event pairs, obtain multiple unified second time information, and replace the unified multiple second time information correspondingly with each other. Describe the first time information of the plurality of first time event pairs to obtain a plurality of second time event pairs;

From the plurality of second time event pairs, determine target event information corresponding to the plurality of second time information;

According to the second time information corresponding to the target event information, the target event information is sorted to generate an event context.
The method for generating an event context according to claim 1, wherein the first time information includes multiple time expressions, and the first time information and event information in multiple event documents are obtained respectively, and the multiple event documents are obtained. Multiple first-time event pairs corresponding to each other in the event document, including:

According to the multiple time expression manners, query the multiple first time information in the multiple event documents that conform to any time expression manner;

Inputting the plurality of first time information and the corresponding plurality of event documents into the sequence annotation model respectively, determining the event information respectively matched with the plurality of first time information, and obtaining the plurality of first time information time event pair.
The method for generating event context according to claim 1 or 2, wherein the acquiring the first time information and the event information in the multiple event documents respectively comprises:

Respectively obtain each first time information in each event document, and determine that each first time information is respectively in one or more first document positions in the corresponding event document;

Perform word segmentation processing on each of the event documents to obtain multiple word segmentations in each of the event documents;

Determine that in each event document, the multiple word segmentations are respectively in multiple second document positions in the corresponding event document;

According to the one or more first document positions and the plurality of second document positions, a target word segmentation matched with the first time information is determined from the plurality of word segmentations, and a target word segmentation in each event document is generated. Event information corresponding to each of the first time information.
The event context generation method according to claim 3, wherein, according to the one or more first document positions and the plurality of second document positions, determining the relationship with the first document from the plurality of word segmentations The target word segmentation paired with time information generates event information corresponding to each first time information in each event document, including:

Calculate the separation distance between the second document position of each word segment and the first document position in each of the event documents respectively;

According to the separation distance, calculate the classification probability that each word segment is paired with the first time information respectively;

According to the classification probability corresponding to each participle, a target participle matched with the first time information is determined from the plurality of participles, and a target participle corresponding to the first time information in each event document is generated. event information.
The event context generation method according to claim 4, wherein the plurality of word segments comprise a first word segment and a second word segment;

The calculating, according to the separation distance, the classification probability that each word segment is paired with the first time information, including:

Respectively extract the first feature of the first participle in each of the event documents, and obtain the previous second participle adjacent to the first participle, and determine the second participle and the event described in the corresponding event document. pairing categories between messages;

Calculate the first probability that the first segmented word belongs to the event information according to the first feature;

calculating a second probability that the first segmented word belongs to the event information according to the pairing category;

According to the separation distance between the first participle and the first time information, the first probability and the second probability, the classification probability of the pairing of the first participle and the first time information is calculated.
The event context generation method according to any one of claims 1-2 or 4-5, wherein each second time information corresponds to at least one event information;

The determining, from the plurality of second time event pairs, the target event information corresponding to the plurality of second time information includes:

In the plurality of second time information, if there is any second time information corresponding to a plurality of event information, respectively acquiring the information sources of the plurality of event information;

According to the priority of the information source, obtain the target event information with the highest priority from the plurality of event information;

In the plurality of second time information, if there is no one second time information corresponding to a plurality of first event information, then the event information corresponding to each second time information is determined as the target event information.
The method for generating an event context according to any one of claims 1-2 or 4-5, wherein after sorting the target event information according to the second time information corresponding to the target event information to generate the event context, further include:

Upload the event context to the blockchain.
An event context generation device, wherein the device comprises:

an obtaining module, configured to obtain first time information and event information in multiple event documents respectively, and obtain multiple first time event pairs corresponding to the multiple event documents;

The processing module is configured to unify the time expressions of the multiple first time information in the multiple first time event pairs, obtain multiple unified second time information, and combine the multiple unified second time information The information corresponds to replacing the first time information of the plurality of first time event pairs, respectively, to obtain a plurality of second time event pairs;

a determining module, configured to determine target event information corresponding to the plurality of second time information from the plurality of second time event pairs;

A generating module, configured to sort the target event information according to the second time information corresponding to the target event information to generate an event context.
A terminal device, comprising a memory, a processor, and a computer program stored in the memory and running on the processor, wherein the processor implements when the processor executes the computer program:

respectively acquiring first time information and event information in multiple event documents, and obtaining multiple first time event pairs corresponding to the multiple event documents;

Unify the time expressions of the multiple first time information in the multiple first time event pairs, obtain multiple unified second time information, and replace the unified multiple second time information correspondingly with each other. Describe the first time information of the plurality of first time event pairs to obtain a plurality of second time event pairs;

From the plurality of second time event pairs, determine target event information corresponding to the plurality of second time information;

According to the second time information corresponding to the target event information, the target event information is sorted to generate an event context.
The terminal device according to claim 9, wherein the first time information includes multiple time expressions, and when the processor executes the computer program, the processor further implements:

According to the multiple time expression manners, query the multiple first time information in the multiple event documents that conform to any time expression manner;

Inputting the plurality of first time information and the corresponding plurality of event documents into the sequence labeling model respectively, determining the event information respectively matched with the plurality of first time information, and obtaining the plurality of first time information time event pair.
The terminal device according to claim 9 or 10, wherein, when the processor executes the computer program, it further implements:

Respectively obtain each first time information in each event document, and determine that each first time information is respectively in one or more first document positions in the corresponding event document;

Perform word segmentation processing on each of the event documents to obtain multiple word segmentations in each of the event documents;

Determine that in each event document, the multiple word segmentations are respectively at multiple second document positions in the corresponding event document;

According to the one or more first document positions and the plurality of second document positions, a target word segmentation matched with the first time information is determined from the plurality of word segmentations, and a target word segmentation in each event document is generated. Event information corresponding to each of the first time information.
The terminal device according to claim 11, wherein, when the processor executes the computer program, it further implements:

Calculate the separation distance between the second document position of each word segment and the first document position in each of the event documents respectively;

According to the separation distance, calculating the classification probability that each participle is paired with the first time information respectively;

According to the classification probability corresponding to each participle, a target participle matched with the first time information is determined from the plurality of participles, and a target participle corresponding to the first time information in each event document is generated. event information.
The terminal device according to claim 12, wherein the plurality of word segments include a first word segment and a second word segment; when the processor executes the computer program, the processor further implements:

Respectively extract the first feature of the first participle in each of the event documents, and obtain the previous second participle adjacent to the first participle, and determine the second participle and the event described in the corresponding event document. pairing categories between messages;

Calculate the first probability that the first segmented word belongs to the event information according to the first feature;

calculating a second probability that the first segmented word belongs to the event information according to the pairing category;

According to the separation distance between the first participle and the first time information, the first probability and the second probability, the classification probability of the pairing of the first participle and the first time information is calculated.
The terminal device according to any one of claims 9-10 or 11-12, wherein each second time information corresponds to at least one event information; the plurality of word segments include a first word segment and a second word segment; the When the processor executes the computer program, it also implements:

In the plurality of second time information, if there is any second time information corresponding to a plurality of event information, respectively acquiring the information sources of the plurality of event information;

According to the priority of the information source, obtain the target event information with the highest priority from the plurality of event information;

In the plurality of second time information, if there is no one second time information corresponding to a plurality of first event information, then the event information corresponding to each second time information is determined as the target event information.
A computer-readable storage medium storing a computer program, wherein the computer program is executed by a processor to realize:

respectively acquiring first time information and event information in multiple event documents, and obtaining multiple first time event pairs corresponding to the multiple event documents;

Unify the time expressions of the multiple first time information in the multiple first time event pairs, obtain multiple unified second time information, and replace the unified multiple second time information correspondingly with the Describe the first time information of the plurality of first time event pairs to obtain a plurality of second time event pairs;

From the plurality of second time event pairs, determine target event information corresponding to the plurality of second time information;

According to the second time information corresponding to the target event information, the target event information is sorted to generate an event context.
The computer-readable storage medium according to claim 15, wherein the first time information includes multiple time expressions, and the computer program further implements when executed by the processor:

According to the multiple time expression manners, query the multiple first time information in the multiple event documents that conform to any time expression manner;

Inputting the plurality of first time information and the corresponding plurality of event documents into the sequence labeling model respectively, determining the event information respectively matched with the plurality of first time information, and obtaining the plurality of first time information time event pair.
The computer-readable storage medium of claim 15 or 16, wherein the computer program, when executed by the processor, further implements:

Respectively obtain each first time information in each event document, and determine that each first time information is respectively in one or more first document positions in the corresponding event document;

Perform word segmentation processing on each of the event documents to obtain multiple word segmentations in each of the event documents;

Determine that in each event document, the multiple word segmentations are respectively in multiple second document positions in the corresponding event document;

According to the one or more first document positions and the plurality of second document positions, the target word segmentation matched with the first time information is determined from the plurality of word segmentations, and the target word segmentation in each event document is generated. Event information corresponding to each of the first time information.
The computer-readable storage medium of claim 17, wherein the computer program, when executed by the processor, further implements:

Calculate the separation distance between the second document position of each word segment and the first document position in each of the event documents respectively;

According to the separation distance, calculating the classification probability that each participle is paired with the first time information respectively;

According to the classification probability corresponding to each participle, a target participle matched with the first time information is determined from the plurality of participles, and a target participle corresponding to the first time information in each event document is generated. event information.
The computer-readable storage medium according to claim 18, wherein the plurality of word segmentations include a first word segmentation and a second word segmentation; the computer program further implements when executed by the processor:

Respectively extract the first feature of the first participle in each of the event documents, and obtain the previous second participle adjacent to the first participle, and determine the second participle and the event described in the corresponding event document. pairing categories between messages;

Calculate the first probability that the first participle belongs to the event information according to the first feature;

calculating a second probability that the first segmented word belongs to the event information according to the pairing category;

According to the separation distance between the first participle and the first time information, the first probability and the second probability, the classification probability of the pairing of the first participle and the first time information is calculated.
The computer-readable storage medium according to any one of claims 15-16 or 18-19, wherein each second time information corresponds to at least one event information; when the computer program is executed by the processor, it further implements:

In the plurality of second time information, if there is any second time information corresponding to a plurality of event information, respectively acquiring the information sources of the plurality of event information;

According to the priority of the information source, obtain the target event information with the highest priority from the plurality of event information;

Among the plurality of second time information, if there is no one second time information corresponding to a plurality of first event information, the event information corresponding to each second time information is determined as the target event information.