WO2022095375A1 - Event context generation method and apparatus, and terminal device and storage medium - Google Patents

Event context generation method and apparatus, and terminal device and storage medium Download PDF

Info

Publication number
WO2022095375A1
WO2022095375A1 PCT/CN2021/091095 CN2021091095W WO2022095375A1 WO 2022095375 A1 WO2022095375 A1 WO 2022095375A1 CN 2021091095 W CN2021091095 W CN 2021091095W WO 2022095375 A1 WO2022095375 A1 WO 2022095375A1
Authority
WO
WIPO (PCT)
Prior art keywords
event
information
time
time information
document
Prior art date
Application number
PCT/CN2021/091095
Other languages
French (fr)
Chinese (zh)
Inventor
殷子墨
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022095375A1 publication Critical patent/WO2022095375A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3346Query execution using probabilistic model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Definitions

  • the present application belongs to the technical field of intelligent decision-making, and in particular, relates to a method, apparatus, terminal device and storage medium for generating an event context.
  • Event context is a form of presentation of long-term news events. Such events usually continue to change or cause social influence over a long period of time, and chain reactions or related events continue to appear. For such events, the complete event is often described through the display of time nodes and key event content, which is helpful for users to quickly grasp the full picture of the event.
  • the terminal device sorts out the events included in the news according to the time of news release. However, when a news article covers event information under multiple time nodes, the event information under multiple time nodes will be regarded as events that occurred under one time node (news release time), so that it is impossible to generate a clear event context.
  • One of the purposes of the embodiments of the present application is to provide an event context generation method, device, terminal device and storage medium, which aims to solve the problem that when a piece of news covers event information under multiple time nodes, multiple time nodes
  • the event information below will be regarded as an event that occurred under a time node, so that a clear event context cannot be generated.
  • a first aspect of the embodiments of the present application provides a method for generating an event context, the method comprising:
  • the target event information is sorted to generate an event context.
  • a second aspect of the embodiments of the present application provides an event context generation device, the device comprising:
  • an obtaining module configured to obtain first time information and event information in multiple event documents respectively, and obtain multiple first time event pairs corresponding to the multiple event documents;
  • the processing module is configured to unify the time expressions of the multiple first time information in the multiple first time event pairs, obtain multiple unified second time information, and combine the multiple unified second time information
  • the information corresponds to replacing the first time information of the plurality of first time event pairs, respectively, to obtain a plurality of second time event pairs;
  • a determining module configured to determine target event information corresponding to the plurality of second time information from the plurality of second time event pairs
  • the generating module is configured to sort the target event information according to the second time information corresponding to the target event information to generate an event context.
  • a third aspect of the embodiments of the present application provides a terminal device, including: a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program When realized:
  • the target event information is sorted to generate an event context.
  • a fourth aspect of the embodiments of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and the computer program is executed by a processor to implement:
  • the target event information is sorted to generate an event context.
  • a fifth aspect of the embodiments of the present application further provides a computer program product, when the computer program product is run on a terminal device, the terminal device can implement:
  • the target event information is sorted to generate an event context.
  • the embodiments of the present application include the following advantages:
  • the target event information corresponding to the second time information can be determined from the event information of a plurality of identical time nodes according to the second time information, so as to obtain the target event information corresponding to each second time information.
  • the target event information can be sorted according to the second time information to generate an event context, so that in the generated event context, each time node corresponds to a target event information, which reduces the occurrence of repeated event information in the event context.
  • FIG. 1 is a flowchart of an implementation of a method for generating an event context provided by an embodiment of the present application
  • FIG. 2 is a schematic diagram of an implementation manner of S101 of an event context generation method provided by an embodiment of the present application
  • FIG. 3 is a schematic diagram of another implementation manner of S101 of an event context generation method provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of an implementation manner of S304 of an event context generation method provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of an implementation manner of S402 of an event context generation method provided by an embodiment of the present application
  • FIG. 6 is a schematic diagram of an implementation manner of S103 of an event context generation method provided by an embodiment of the present application.
  • FIG. 7 is a structural block diagram of an event context generating apparatus provided by an embodiment of the present application.
  • FIG. 8 is a structural block diagram of a terminal device provided by an embodiment of the present application.
  • the event context generation method provided by the embodiments of the present application can be applied to terminal devices such as tablet computers, notebook computers, ultra-mobile personal computers (UMPCs), netbooks, etc.
  • terminal devices such as tablet computers, notebook computers, ultra-mobile personal computers (UMPCs), netbooks, etc.
  • UMPCs ultra-mobile personal computers
  • netbooks etc.
  • the specific types of the terminal devices are not specified in the embodiments of the present application. any restrictions.
  • FIG. 1 shows an implementation flowchart of a method for generating an event context provided by an embodiment of the present application. The method includes the following steps:
  • the above-mentioned event documents include but are not limited to documents containing text information, such as news and books.
  • the manner of acquiring the event document may be that the terminal device crawls, in real time, news or microblogs containing keywords from a webpage according to the keywords of the event. It is also possible to obtain multiple event documents pre-stored by the user from a specified path for the terminal device.
  • the above-mentioned event information may be the summary information of the core event in the event document, and the event information may be determined from the event document by using event keywords or a method based on a core event selection model.
  • the above-mentioned core event (event information) contained in an event document may be a sentence or a paragraph.
  • the event document can be segmented, and the number of keywords contained in each clause is calculated according to the preset event keywords, and the first sentence weight corresponding to each clause is calculated according to the number of keywords.
  • the weight of the second sentence corresponding to each sentence is assigned. For example, for the title and body content, the second sentence weight corresponding to the title clause may be assigned a weight higher than the second sentence weight corresponding to the remaining clauses in the body content.
  • the target sentence is selected as the core event from multiple sentences.
  • the preset event keywords may be phrases extracted from the event document, that is, the word frequency of each phrase in the event document is counted, and the word frequency reaching a threshold is determined as the event keyword. It is understandable that this phrase can fully express the content of the event document itself, and is concise and general.
  • the above-mentioned first time information is the occurrence time point corresponding to the event information, and it can be considered that the event information that occurs under the node of the first time information is the first time event pair.
  • the time information in the event document may be time information that specifically describes the event existing in the event document, or may be the time at which the terminal device obtains the event document as the time information.
  • the release time of the news can also be used as the time information in the event file.
  • the time information in the event document that specifically describes the event information may be prioritized as the first time information corresponding to the event information. If no time information specifically describing the event information is found in the event document, the release time of the event document may be used as the first time information of the event document. Otherwise, the time point at which the terminal device acquires the event document is used as the first time information of the event information.
  • multiple pieces of event information in an event document can be correspondingly obtained from the event document, and the first time information and the event information can be paired to obtain the multiple pieces of event information in the event document. time event pair. In this way, for multiple event documents, multiple time event pairs in each event document can be correspondingly obtained.
  • normalizing the above-mentioned first time information can be understood as unifying the dimension of each first time information. Since there are various ways of expressing time, if the dimension of the first time information is not unified, it is difficult to determine the sequence of time points expressed by each first time information. Therefore, in order to facilitate the comparison of the sequence of the time information, the time information may be normalized. Exemplarily, time information such as "July 10" and "July 10" can be normalized to obtain July 10. Wherein, after unifying the time expressions of the first time information, the obtained time information is the second time information. Therefore, by correspondingly replacing the first time information in the first time event pair with the second time information, a plurality of second time event pairs can be obtained. It should be noted that the time information may be specific to any time point such as hour, minute, second, etc., and the first time information may be normalized according to specific circumstances, which is not limited.
  • second time information For multiple second time event pairs obtained from multiple event documents, there may be multiple second event information at the same time point (second time information). For example, for multiple pieces of news that report on event information B at time point A in the morning, the terminal device may acquire a second time event pair about AB from each piece of news. However, multiple second time event pairs all report the same event information at the same time point. Therefore, it is necessary to select one of the multiple second time event pairs at the same time point as the representative target event information.
  • the event document is news
  • the authority of the news source site the time when the site publishes the news (the time when the site publishes the news, the time information corresponding to the event information may be inconsistent), the amount of news reprints, etc., can be used as a reference. .
  • the target event information corresponding to the above-mentioned multiple pieces of second time information can be understood as determining one piece of information from multiple pieces of the same second time information (that is, multiple pieces of the same event information corresponding to the same time point).
  • the event information is used as the target event information of the second time information.
  • the event information corresponding to the remaining second time information at the same time point can be ignored.
  • the event information in the second time event pair is the target event corresponding to the second time information. information.
  • the above-mentioned event context is a display form for long-term developing news events.
  • the target event information is put into the time line to generate an event context.
  • the user can observe the development of the event according to the event context. For example, for news with obvious event development, an area with relatively dense news can be seen on the event context, and this area can be considered as the main stage of event development.
  • the solution can be solved in one event document
  • all the event information in the event document is considered to be the problem of the event information that occurred at one time node.
  • the second time information after the unified time information dimension is obtained, and then, according to the second time information, data from multiple same time nodes can be obtained.
  • Target event information corresponding to the second time information is determined in the event information, thereby obtaining target event information corresponding to each second time information.
  • the target event information is sorted to generate an event context.
  • each time node corresponds to a target event information, which reduces the occurrence of repeated event information in the event context.
  • the first time information includes multiple time expressions.
  • S101 obtains first time information and event information in multiple event documents respectively, and obtains the first time information in the multiple event documents.
  • the corresponding multiple first time event pairs specifically include the following sub-steps S201-S202, which are described in detail as follows:
  • the above-mentioned time expression methods include, but are not limited to, using Chinese characters to express time nodes, and using Roman numerals to express time nodes, which are not limited thereto.
  • the terminal device can translate it into the specified language (Chinese), and then query each event document according to the time expression The first time information in .
  • multiple time expressions may be established in advance to query the first time information in the event document.
  • a time representation of "dd-mm-yy" can be established.
  • dd represents the hour
  • mm represents the minute
  • the rule is a value between 0 and 59
  • yy represents the second
  • the rule is a value between 0 and 59.
  • the text information in the event document is sequentially obtained and compared, and the fight time information that conforms to the time expression method is screened out.
  • the above function is only an example of the time expression manner, which can be set according to the actual situation, which is not limited.
  • the above sequence labeling model can be a time recursive neural network model (Long Short Term Memory Network, LSTM), a conditional random field network model (Conditional Random Field, CRF), or a sequence labeling model formed by a combination of time recursion and conditional random field .
  • the sequence labeling model is used to output an accurate score (probability) of the pairing of the event information and the first time information based on the event feature of the currently input event information and the time feature of the first time information.
  • the sequence labeling model can be trained based on the existing training data (multiple event information and multiple time information in the event document) and the classification result of the training data (the time information corresponding to each event information), and the obtained training Model.
  • the training model determines the first time information in the event document, and extracts the document position of the first time information in the event document as the time feature of the first time information. According to the time feature, output the exact probability value of each event information in the event document paired with the first time information. The event information and the corresponding first time information are determined according to the probability value, and a first time event pair is generated. In this way, one or more first time event pairs in each event document are obtained.
  • the terminal device by setting a variety of time expression methods, it is possible to accurately query a plurality of time information in each event document, and determine the paired event information for each first time information according to the sequence labeling model, so as to solve a problem.
  • the terminal device When an event document covers event information that occurs under multiple first time nodes, the terminal device only generates one first time event pair for the event document.
  • S101 obtains first time information and event information in a plurality of event documents respectively, and obtains a plurality of first time event pairs corresponding to the plurality of event documents respectively, which specifically includes The following sub-steps S301-S304 are detailed as follows:
  • the position of the first document may be determined correspondingly according to the position of the first time information in the document.
  • the first document position of the first time information in the event document may specifically be: performing word segmentation on the event document to obtain multiple text segmentations, determining the sorting position of the first time information in the multiple text segmentations, and using the sorting position as First document location. It should be noted that, for the determined first time information, the first time information can be directly used as a text word segmentation, and only the content in the event document that does not belong to the first time information is segmented.
  • the above-mentioned word segmentation processing for each event document can be performed by using a word segmentation method based on string matching, a word segmentation method based on understanding, and a word segmentation method based on statistics to perform word segmentation processing on the event document.
  • a word segmentation method based on string matching the news to be segmented can be segmented and paired with the entries in the preset machine dictionary. If a certain string is found in the dictionary, it can be determined that the string is successfully matched (ie, a participle is recognized).
  • first pair a sentence in the news with the entry and if the pairing is not successful, delete the first word (or the last word) in the sentence to form a new string to be paired with the entry, until the pairing is successful.
  • the remaining strings of the sentence are used as new strings to pair with the entry, and the above operations are repeated to obtain multiple word segmentations of a sentence.
  • word segmentation is performed on the remaining sentences in the news, and multiple word segmentations of the news can be obtained.
  • the second document position of each segmentation can be determined according to the order of each segmentation in the news.
  • the first document position of the first time information is determined according to the position of the first time information in the news.
  • the separation distance between each position of the first document and the position of each second document in the news can be calculated, and the target paired with each first time information can be determined according to the separation distance. Participle.
  • event information is usually reported in conjunction with time information.
  • News usually includes time information, location information, event information and many other contents, and to accurately report events, the format of describing event information in news is generally, at xx time (first time information), xx place (location information), occurrence xx event (event information). That is, it can be considered that each first time information and the corresponding event information are located close to each other in the news. Therefore, the separation distance can be calculated according to the first document position of each first time information and the second document position of the word segmentation. Afterwards, when it is determined that the interval distance is smaller than the preset threshold, the word segmentation corresponding to the position of the second document is determined as the target word segmentation.
  • the event information corresponding to the first time information can be generated by combining multiple target word segments according to the position sequence. In this way, each first time information and corresponding event information in each event document can be generated.
  • the event information corresponding to the first time information in each event document is generated, which specifically includes the following sub-steps S401-S403, which are described in detail as follows:
  • the word segmentation and the interval distance can be input into the neural network structure in the sequence labeling model, and the sequence labeling model performs feature processing on it to obtain the processed feature vector.
  • the sequence tagging model can output the classification probability of the current word segment and the corresponding first time information based on the processed feature vector.
  • each word segment determines a target word segment matched with the first time information from the plurality of word segments, and generate a target word segment that matches the first time information in each event document. Information corresponding to the event information.
  • each event document has a plurality of the above-mentioned segmented words, therefore, the classification probability of the multiple word segmentations and the first time information can be obtained.
  • event information can be viewed as a sentence or paragraph composed of multiple word segments. Therefore, a plurality of classification probabilities can be sorted in descending order, and the segmented words corresponding to the top N classification probabilities can be used as target segmented words, and event information can be generated according to the order of the second document position of each target segmented word.
  • the classification probability of the event information paired with the first time information is higher than the classification probability of the event information composed of other word segmentations paired with the same first time information. Therefore, even when there are multiple pieces of first time information in the event document, event information paired with each first time information can be accurately generated.
  • the plurality of word segmentations include a first word segmentation and a second word segmentation
  • S402 calculates, according to the interval distance, the pairing of each word segmentation with the first time information respectively.
  • the classification probability specifically includes the following sub-steps S501-S504, which are described in detail as follows:
  • the above-mentioned first feature may be a feature obtained by a feature extraction network in a sequence tagging model, which performs feature extraction on word segmentation.
  • the above-mentioned event document has multiple word segments, the word segment currently being processed to determine the classification probability may be used as the first word segment, and the previous word segment adjacent to the first word segment may be used as the second word segment.
  • the pairing type between the second word segment and the event information is the pairing type of the second word segment and the event information that is determined when the terminal device processes the second word segment before.
  • the pairing category includes that the second participle can be used to generate event information, that is, the second participle is paired with the event information, or the second participle cannot be used to generate event information, that is, the second participle is not paired with the event information.
  • whether the first participle can be used as a participle in the event information is generally determined by the first participle itself.
  • word segmentation can be considered as a word segmentation consisting of a single word or multiple words
  • a word vector library can be constructed in advance, and a corresponding serial number is assigned to each word in the word vector library.
  • the sequence numbers corresponding to the words contained in the first participle in the word vector library are identified as the first feature.
  • the terminal device can determine the core abstract in the event document according to the existing method for determining the core abstract (core event) of the event document.
  • methods for generating abstracts include, but are not limited to: supervised extraction methods and abstract abstract methods for abstract generation.
  • the core abstract includes, but is not limited to, each core abstract in the event document, the overall core abstract of the entire event document, and the like.
  • whether the current first participle can be used as a participle in the event information is also determined by the participles around the first participle. Therefore, in order to more accurately determine whether the current first participle can be used as a participle in the event information, the judgment can also be made based on the previous second participle adjacent to the current first participle.
  • whether the second participle can be used as the participle in the event information can be determined through the pairing category in the above S501. After determining the pairing category, the first feature of the first segmented word, and the core abstract, it can be determined through the conditional random field network model in the sequence labeling model.
  • S504 Calculate the classification probability of the pairing of the first word segment and the first time information according to the distance between the first word segment and the first time information, the first probability, and the second probability.
  • calculating the classification probability when each first segmented word is accurately paired with the first time information by the above sequence labeling model is specifically: after obtaining the separation distance, the first probability and the second probability, the sequence labeling model can calculate the separation distance , the specific values of the first probability and the second probability are normalized, and the normalized value is used as an input feature, which is input to the classifier in the sequence model, and the classifier outputs the first participle in the event that can be used as an event.
  • the accurate value paired with the first time information is the classification probability.
  • X is the first participle in the event document
  • y is the pairing category in the first participle (whether the first participle belongs to the event information)
  • i is the ith participle position of the participle X in the event information
  • n is the event information
  • P i,yi represents the feature of the participle based on the ith participle, predicting the ith participle as the first probability of belonging to the event information
  • Q i,yi represents Based on the distance between the ith participle and the first time information, predict the third probability that the ith participle is accurately paired with the first time information; On the basis of the information, determine the accurate probability that the first participle is paired with the first time information.
  • the first participle by adding the interval distance feature between the first participle and the first time information in the sequence tagging model, the first participle can be further judged on the basis of judging that the first participle can be used as the target participle in the event information. Correlation with first-time information. Therefore, even if there are multiple pieces of first time information in a news document, the classification probability matched with the word segmentation can be accurately calculated based on the interval distance feature, that is, the matching accuracy between the first time information and the event information can be improved.
  • an overall core summary of the entire event document can be generated.
  • the first probability that each word segment belongs to the core abstract and the second probability of the adjacent second word segment are calculated, and the distance between each word segment and the first time information is determined.
  • the separation distance, the first probability and the second probability, the classification probability that each word segment is also paired with the first time information when it belongs to the core abstract is determined.
  • each core summary can be generated according to each paragraph containing the first time information, and combined with multiple word segments in the event document, it is determined that each segment belongs to each core segment
  • the first probability of the abstract, and the second probability of the adjacent second participle, and the distance between each participle and the corresponding first time information is calculated.
  • the classification probability of each segmented word being paired with the first time information contained in the segment when it belongs to the core abstract of each segment is determined.
  • the classification probability and a preset probability threshold the target word segmentation corresponding to each first time information is determined, and the event information generated according to the second document position of the target word segmentation is determined.
  • each second time information corresponds to at least one event information; S103, from the plurality of second time event pairs, it is determined corresponding to the plurality of second time information
  • the target event information also includes the following sub-steps S601-S603, which are described in detail as follows:
  • each of the above second time information corresponds to at least one event information. It can be understood that there is one second time information corresponding to one event information, and there are also multiple second time information of the same time node and multiple second time information. corresponding to the event information. It can be understood that, for the second event information corresponding to the second time information under the same time node, it may be considered as reports on the same event information by source sites (news source sites) of different event documents. Based on this, the information source of each event information can be obtained from multiple event information at the same time point.
  • S602. Acquire the target event information with the highest priority from the plurality of event information according to the priority of the information source.
  • the priority of the above information sources can be preset in the terminal device.
  • the priority of the information source is high, and it can be considered that the event document corresponding to the information source has more authenticity and authority in the recorded information content.
  • the terminal device obtains the event document from the network, it can correspondingly obtain the information source of the event document. Therefore, according to the information source of each event document, the target event information with the highest priority can be obtained from multiple event information.
  • the sources of information include but are not limited to event documents (news) published by official sites and event documents published by unofficial sites.
  • the event information corresponding to the second time information is the target event information.
  • the method further includes:
  • the corresponding event context is obtained based on the terminal device.
  • the event context is obtained by processing the terminal tool.
  • Uploading the event context to the blockchain ensures its security and fairness and transparency to users.
  • the user equipment can download the event context from the blockchain in order to verify whether the event context has been tampered with.
  • the blockchain referred to in this example is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
  • FIG. 7 is a structural block diagram of an event context generating apparatus provided by an embodiment of the present application.
  • each unit included in the terminal device is used to execute each step in the embodiment corresponding to FIG. 1 to FIG. 6 .
  • the event context generating apparatus 700 includes: an acquiring module 710, a processing module 720, a determining module 730 and a generating module 740, wherein:
  • the obtaining module 710 is configured to obtain first time information and event information in multiple event documents respectively, and obtain multiple first time event pairs corresponding to the multiple event documents.
  • the processing module 720 is configured to unify the time expressions of the multiple first time information in the multiple first time event pairs, obtain multiple unified second time information, and combine the multiple unified second time information.
  • the time information corresponds to replacing the first time information of the plurality of first time event pairs, respectively, to obtain a plurality of second time event pairs.
  • the determining module 730 is configured to determine target event information corresponding to the plurality of second time information from the plurality of second time event pairs.
  • the generating module 740 is configured to sort the target event information according to the second time information corresponding to the target event information to generate an event context.
  • the first time information includes multiple time expressions
  • the acquiring module 710 is further configured to:
  • the multiple time expression manners query the multiple first time information in the multiple event documents that conform to any time expression manner; separate the multiple first time information and the multiple corresponding event documents Input into the sequence labeling model, determine the event information that is respectively matched with the plurality of first time information, and obtain the plurality of first time event pairs.
  • the obtaining module 710 is further configured to:
  • the obtaining module 710 is further configured to:
  • the plurality of word segmentations include a first word segmentation and a second word segmentation; the obtaining module 710 is further configured to:
  • the pairing category between the information calculate the first probability that the first participle belongs to the event information according to the first feature; calculate the second probability that the first participle belongs to the event information according to the pairing category ; According to the separation distance between the first participle and the first time information, the first probability and the second probability, calculate the classification probability of the pairing of the first participle and the first time information.
  • each second time information corresponds to at least one event information; the determining module 730 is further configured to:
  • the information sources of the plurality of event information are obtained respectively; Obtain the target event information with the highest priority from the plurality of event information; in the plurality of second time information, if there is no one second time information corresponding to a plurality of first event information, then determine and each second time information.
  • the event information respectively corresponding to the second time information is the target event information.
  • the event context generating apparatus 700 further includes:
  • the uploading module 710 is configured to upload the event context to the blockchain.
  • each unit/module is used to execute each step in the embodiment corresponding to FIG. 1 to FIG.
  • the steps in the examples have been explained in detail in the above-mentioned embodiments.
  • FIG. 8 is a structural block diagram of a terminal device provided by another embodiment of the present application.
  • the terminal device 800 of this embodiment includes: a processor 801 , a memory 802 , and a computer program 803 stored in the memory 802 and executable on the processor 801 , such as a program of an event context generation method.
  • the processor 801 executes the computer program 803
  • the steps in each of the above embodiments of the event context generation methods are implemented, for example, S101 to S104 shown in FIG. 1 .
  • the processor 801 executes the computer program 803, the functions of each module in the embodiment corresponding to FIG. 7 are implemented, for example, the functions of the modules 710 to 740 shown in FIG. 7 .
  • the functions of each module in the embodiment corresponding to FIG. 7 are implemented, for example, the functions of the modules 710 to 740 shown in FIG. 7 . Specifically as follows:
  • a terminal device comprising a memory, a processor, and a computer program stored in the memory and running on the processor, the processor implements when the processor executes the computer program:
  • the target event information is sorted to generate an event context.
  • the first time information includes multiple time expressions, and when the processor executes the computer program, the processor further implements:
  • the processor when the processor executes the computer program, it further implements:
  • a target word segmentation matched with the first time information is determined from the plurality of word segmentations, and a target word segmentation in each event document is generated. Event information corresponding to each of the first time information.
  • the processor when the processor executes the computer program, it further implements:
  • a target participle matched with the first time information is determined from the plurality of participles, and a target participle corresponding to the first time information in each event document is generated. event information.
  • the plurality of word segmentations include a first word segmentation and a second word segmentation; when the processor executes the computer program, it further implements:
  • the separation distance between the first participle and the first time information the first probability and the second probability, the classification probability of the pairing of the first participle and the first time information is calculated.
  • each second time information corresponds to at least one event information; when the processor executes the computer program, the processor further implements:
  • the plurality of second time information if there is any second time information corresponding to a plurality of event information, respectively acquiring the information sources of the plurality of event information;
  • the priority of the information source obtain the target event information with the highest priority from the plurality of event information
  • the event information corresponding to each second time information is determined as the target event information.
  • the processor when the processor executes the computer program, it further implements:
  • a computer-readable storage medium stores a computer program, and the computer program is implemented when executed by a processor:
  • the target event information is sorted to generate an event context.
  • the first time information includes multiple time expressions, and when the computer program is executed by the processor, the computer program further implements:
  • the computer program when executed by the processor, further implements:
  • a target word segmentation matched with the first time information is determined from the plurality of word segmentations, and a target word segmentation in each event document is generated. Event information corresponding to each of the first time information.
  • the computer program when executed by the processor, further implements:
  • a target participle matched with the first time information is determined from the plurality of participles, and a target participle corresponding to the first time information in each event document is generated. event information.
  • the plurality of word segmentations include a first word segmentation and a second word segmentation; when the computer program is executed by the processor, it further implements:
  • the separation distance between the first participle and the first time information the first probability and the second probability, the classification probability of the pairing of the first participle and the first time information is calculated.
  • each second time information corresponds to at least one event information; when the computer program is executed by the processor, it further implements:
  • the plurality of second time information if there is any second time information corresponding to a plurality of event information, respectively acquiring the information sources of the plurality of event information;
  • the priority of the information source obtain the target event information with the highest priority from the plurality of event information
  • the event information corresponding to each second time information is determined as the target event information.
  • the computer program when executed by the processor, further implements:
  • the computer program 803 may be divided into one or more units, and the one or more units are stored in the memory 802 and executed by the processor 801 to complete the present application.
  • One or more units may be a series of computer program instruction segments capable of performing specific functions, and the instruction segments are used to describe the execution process of the computer program 803 in the terminal device 800 .
  • the computer program 803 can be divided into an acquisition module, a processing module, a determination module, and a generation module, and the specific functions of each module are as above.
  • the terminal device may include, but is not limited to, the processor 801 and the memory 802 .
  • FIG. 8 is only an example of the terminal device 800, and does not constitute a limitation on the terminal device 800, and may include more or less components than the one shown, or combine some components, or different components
  • the terminal device may also include an input and output device, a network access device, a bus, and the like.
  • the so-called processor 801 can be a central processing unit, and can also be other general-purpose processors, digital signal processors, application-specific integrated circuits, off-the-shelf programmable gate arrays or other programmable logic devices, discrete gate or transistor logic devices, and discrete hardware components. Wait.
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the memory 802 may be an internal storage unit of the terminal device 800 , such as a hard disk or a memory of the terminal device 800 .
  • the memory 802 may also be an external storage device of the terminal device 800 , such as a plug-in hard disk, a smart memory card, a flash memory card, etc., which are equipped on the terminal device 800 . Further, the memory 802 may also include both an internal storage unit of the terminal device 800 and an external storage device.
  • the computer-readable storage medium may be an internal storage unit of the terminal device described in the foregoing embodiments, such as a hard disk or a memory of the terminal device.
  • the computer-readable storage medium may be non-volatile or volatile.
  • the computer-readable storage medium may also be an external storage device of the terminal device, for example, a pluggable hard disk, a smart memory card, a secure digital card, a flash memory card, etc. equipped on the terminal device.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Probability & Statistics with Applications (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

An event context generation method and apparatus, and a terminal device and a storage medium, which are applied to the technical field of artificial intelligence. The method comprises: respectively acquiring first time information and event information in a plurality of event documents, so as to obtain a plurality of first time-event pairs corresponding to the plurality of event documents (S101); standardizing the time expression means of a plurality of pieces of first time information in the plurality of first time-event pairs, so as to obtain a plurality of pieces of standardized second time information, and correspondingly replacing the first time information in the plurality of first time-event pairs with the plurality of pieces of standardized second time information, so as to obtain a plurality of second time-event pairs (S102); determining, from the plurality of second time-event pairs, target event information corresponding to the plurality of pieces of second time information (S103); and according to the second time information corresponding to the target event information, sorting the target event information so as to generate event context (S104). By means of the event context generated by means of the method, corresponding event information can be generated for each time node when an event document encompasses event information under a plurality of time nodes, such that clear event context can be generated according to time nodes.

Description

事件脉络生成方法、装置、终端设备及存储介质Event context generation method, device, terminal device and storage medium
本申请要求于2020年11月06日在中国专利局提交的、申请号为202011229516.5、发明名称为“事件脉络生成方法、装置、终端设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number 202011229516.5 and the invention title "Event Context Generation Method, Apparatus, Terminal Equipment and Storage Medium", which was filed in the China Patent Office on November 06, 2020, the entire contents of which are Incorporated herein by reference.
技术领域technical field
本申请属于智能决策技术领域,尤其涉及一种事件脉络生成方法、装置、终端设备及存储介质。The present application belongs to the technical field of intelligent decision-making, and in particular, relates to a method, apparatus, terminal device and storage medium for generating an event context.
背景技术Background technique
事件脉络是一种对于长时间发展的新闻事件的展示形式。这类事件通常在一个较长的时间段内持续的发生变化或引发社会影响,不断出现连锁反应或相关事件。对于这类事件,经常通过时间节点与关键事件内容的展示形式来描述完整事件,有利于用户快速把握事件全貌。但是,发明人意识到,目前事件脉络的自动生成方法中,终端设备均是通过新闻发布的时间对新闻包含的事件进行梳理。然而,在一篇新闻涵盖多个时间节点下的事件信息时,多个时间节点下的事件信息将会被作为是在一个时间节点(新闻发布时间)下发生的事件,以至于无法生成清晰的事件脉络。Event context is a form of presentation of long-term news events. Such events usually continue to change or cause social influence over a long period of time, and chain reactions or related events continue to appear. For such events, the complete event is often described through the display of time nodes and key event content, which is helpful for users to quickly grasp the full picture of the event. However, the inventor realized that, in the current automatic generation method of event context, the terminal device sorts out the events included in the news according to the time of news release. However, when a news article covers event information under multiple time nodes, the event information under multiple time nodes will be regarded as events that occurred under one time node (news release time), so that it is impossible to generate a clear event context.
技术问题technical problem
本申请实施例的目的之一在于:提供一种事件脉络生成方法、装置、终端设备及存储介质,旨在解决一篇新闻中在涵盖了多个时间节点下的事件信息时,多个时间节点下的事件信息将会被作为是在一个时间节点下发生的事件,以至于无法生成清晰的事件脉络的技术问题。One of the purposes of the embodiments of the present application is to provide an event context generation method, device, terminal device and storage medium, which aims to solve the problem that when a piece of news covers event information under multiple time nodes, multiple time nodes The event information below will be regarded as an event that occurred under a time node, so that a clear event context cannot be generated.
技术解决方案technical solutions
为解决上述技术问题,本申请实施例采用的技术方案是:In order to solve the above-mentioned technical problems, the technical solutions adopted in the embodiments of the present application are:
本申请实施例的第一方面提供了一种事件脉络生成方法,所述方法包括:A first aspect of the embodiments of the present application provides a method for generating an event context, the method comprising:
分别获取多个事件文档中的第一时间信息以及事件信息,得到与所述多个事件文档中对应的多个第一时间事件对;respectively acquiring first time information and event information in multiple event documents, and obtaining multiple first time event pairs corresponding to the multiple event documents;
统一所述多个第一时间事件对中多个第一时间信息的时间表达方式,得到统一后的多个第二时间信息,并将所述统一后的多个第二时间信息分别对应替换所述多个第一时间事件对的第一时间信息,得到多个第二时间事件对;Unify the time expressions of the multiple first time information in the multiple first time event pairs, obtain multiple unified second time information, and replace the unified multiple second time information correspondingly with each other. Describe the first time information of the plurality of first time event pairs to obtain a plurality of second time event pairs;
从所述多个第二时间事件对中,确定与所述多个第二时间信息对应的目标事件信息;From the plurality of second time event pairs, determine target event information corresponding to the plurality of second time information;
根据所述目标事件信息对应的第二时间信息,对所述目标事件信息进行排序生成事件脉络。According to the second time information corresponding to the target event information, the target event information is sorted to generate an event context.
本申请实施例的第二方面提供了一种事件脉络生成装置,所述装置包括:A second aspect of the embodiments of the present application provides an event context generation device, the device comprising:
获取模块,用于分别获取多个事件文档中的第一时间信息以及事件信息,得到与所述多个事件文档中对应的多个第一时间事件对;an obtaining module, configured to obtain first time information and event information in multiple event documents respectively, and obtain multiple first time event pairs corresponding to the multiple event documents;
处理模块,用于统一所述多个第一时间事件对中多个第一时间信息的时间表达方式,得到统一后的多个第二时间信息,并将所述统一后的多个第二时间信息分别对应替换所述多个第一时间事件对的第一时间信息,得到多个第二时间事件对;The processing module is configured to unify the time expressions of the multiple first time information in the multiple first time event pairs, obtain multiple unified second time information, and combine the multiple unified second time information The information corresponds to replacing the first time information of the plurality of first time event pairs, respectively, to obtain a plurality of second time event pairs;
确定模块,用于从所述多个第二时间事件对中,确定与所述多个第二时间信息对应的目标事件信息;a determining module, configured to determine target event information corresponding to the plurality of second time information from the plurality of second time event pairs;
生成模块,用于根据所述目标事件信息对应的第二时间信息,对所述目标事件信息进行排序生成事件脉络。The generating module is configured to sort the target event information according to the second time information corresponding to the target event information to generate an event context.
本申请实施例的第三方面提供了一种终端设备,包括:存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现:A third aspect of the embodiments of the present application provides a terminal device, including: a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program When realized:
分别获取多个事件文档中的第一时间信息以及事件信息,得到与所述多个事件文档中对应的多个第一时间事件对;respectively acquiring first time information and event information in multiple event documents, and obtaining multiple first time event pairs corresponding to the multiple event documents;
统一所述多个第一时间事件对中多个第一时间信息的时间表达方式,得到统一后的多个第二时间信息,并将所述统一后的多个第二时间信息分别对应替换所述多个第一时间事件对的第一时间信息,得到多个第二时间事件对;Unify the time expressions of the multiple first time information in the multiple first time event pairs, obtain multiple unified second time information, and replace the unified multiple second time information correspondingly with each other. Describe the first time information of the plurality of first time event pairs to obtain a plurality of second time event pairs;
从所述多个第二时间事件对中,确定与所述多个第二时间信息对应的目标事件信息;From the plurality of second time event pairs, determine target event information corresponding to the plurality of second time information;
根据所述目标事件信息对应的第二时间信息,对所述目标事件信息进行排序生成事件脉络。According to the second time information corresponding to the target event information, the target event information is sorted to generate an event context.
本申请实施例的第四方面提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现:A fourth aspect of the embodiments of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and the computer program is executed by a processor to implement:
分别获取多个事件文档中的第一时间信息以及事件信息,得到与所述多个事件文档中对应的多个第一时间事件对;respectively acquiring first time information and event information in multiple event documents, and obtaining multiple first time event pairs corresponding to the multiple event documents;
统一所述多个第一时间事件对中多个第一时间信息的时间表达方式,得到统一后的多个第二时间信息,并将所述统一后的多个第二时间信息分别对应替换所述多个第一时间事件对的第一时间信息,得到多个第二时间事件对;Unify the time expressions of the multiple first time information in the multiple first time event pairs, obtain multiple unified second time information, and replace the unified multiple second time information correspondingly with each other. Describe the first time information of the plurality of first time event pairs to obtain a plurality of second time event pairs;
从所述多个第二时间事件对中,确定与所述多个第二时间信息对应的目标事件信息;From the plurality of second time event pairs, determine target event information corresponding to the plurality of second time information;
根据所述目标事件信息对应的第二时间信息,对所述目标事件信息进行排序生成事件脉络。According to the second time information corresponding to the target event information, the target event information is sorted to generate an event context.
本申请实施例的第五方面还提供了一种计算机程序产品,当所述计算机程序产品在终端设备上运行时,使得所述终端设备行时实现:A fifth aspect of the embodiments of the present application further provides a computer program product, when the computer program product is run on a terminal device, the terminal device can implement:
分别获取多个事件文档中的第一时间信息以及事件信息,得到与所述多个事件文档中对应的多个第一时间事件对;respectively acquiring first time information and event information in multiple event documents, and obtaining multiple first time event pairs corresponding to the multiple event documents;
统一所述多个第一时间事件对中多个第一时间信息的时间表达方式,得到统一后的多个第二时间信息,并将所述统一后的多个第二时间信息分别对应替换所述多个第一时间事件对的第一时间信息,得到多个第二时间事件对;Unify the time expressions of the multiple first time information in the multiple first time event pairs, obtain multiple unified second time information, and replace the unified multiple second time information correspondingly with each other. Describe the first time information of the plurality of first time event pairs to obtain a plurality of second time event pairs;
从所述多个第二时间事件对中,确定与所述多个第二时间信息对应的目标事件信息;From the plurality of second time event pairs, determine target event information corresponding to the plurality of second time information;
根据所述目标事件信息对应的第二时间信息,对所述目标事件信息进行排序生成事件脉络。According to the second time information corresponding to the target event information, the target event information is sorted to generate an event context.
有益效果beneficial effect
与现有技术相比,本申请实施例包括以下优点:Compared with the prior art, the embodiments of the present application include the following advantages:
本申请实施例,通过从每个事件文档中获取所有的第一时间信息以及事件信息,生成每个事件文档中对应的一个或多个第一时间事件对,可解决在一篇事件文档中涵盖多个时间节点下的事件信息时,该事件文档中的所有事件信息被认为是在一个时间节点发生的事件信息的问题。之后通过对每个第一时间对中的第一时间信息进行归一化处理,得到统一时间信息量纲后的第二时间信息。进而,可根据第二时间信息,从多个相同时间节点的事件信息中确定该第二时间信息对应的目标事件信息,以此得到每个第二时间信息对应的目标事件信息。最后,可根据第二时间信息,对目标事件信息进行排序生成事件脉络,使得生成的事件脉络中,每个时间节点下均对应一个目标事件信息,减少事件脉络中出现重复事件信息的情况。In this embodiment of the present application, by acquiring all the first time information and event information from each event document, and generating one or more first time event pairs corresponding to each event document, it is possible to solve the problem of covering all events in one event document. When there are event information under multiple time nodes, all the event information in the event document is considered to be the problem of the event information that occurred at one time node. Then, by normalizing the first time information in each first time pair, the second time information with the unified time information dimension is obtained. Furthermore, the target event information corresponding to the second time information can be determined from the event information of a plurality of identical time nodes according to the second time information, so as to obtain the target event information corresponding to each second time information. Finally, the target event information can be sorted according to the second time information to generate an event context, so that in the generated event context, each time node corresponds to a target event information, which reduces the occurrence of repeated event information in the event context.
附图说明Description of drawings
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例或示范性技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。In order to illustrate the technical solutions in the embodiments of the present application more clearly, the following briefly introduces the accompanying drawings that are used in the description of the embodiments or exemplary technologies. Obviously, the drawings in the following description are only for the present application. In some embodiments, for those of ordinary skill in the art, other drawings can also be obtained according to these drawings without any creative effort.
图1是本申请一实施例提供的一种事件脉络生成方法的实现流程图;1 is a flowchart of an implementation of a method for generating an event context provided by an embodiment of the present application;
图2是本申请一实施例提供的一种事件脉络生成方法的S101的一种实现方式示意图;FIG. 2 is a schematic diagram of an implementation manner of S101 of an event context generation method provided by an embodiment of the present application;
图3是本申请一实施例提供的一种事件脉络生成方法的S101的另一种实现方式示意图;3 is a schematic diagram of another implementation manner of S101 of an event context generation method provided by an embodiment of the present application;
图4是本申请一实施例提供的一种事件脉络生成方法的S304的一种实现方式示意图;FIG. 4 is a schematic diagram of an implementation manner of S304 of an event context generation method provided by an embodiment of the present application;
图5是本申请一实施例提供的一种事件脉络生成方法的S402的一种实现方式示意图;FIG. 5 is a schematic diagram of an implementation manner of S402 of an event context generation method provided by an embodiment of the present application;
图6是本申请一实施例提供的一种事件脉络生成方法的S103的一种实现方式示意图;FIG. 6 is a schematic diagram of an implementation manner of S103 of an event context generation method provided by an embodiment of the present application;
图7是本申请实施例提供的一种事件脉络生成装置的结构框图;7 is a structural block diagram of an event context generating apparatus provided by an embodiment of the present application;
图8是本申请实施例提供的一种终端设备的结构框图。FIG. 8 is a structural block diagram of a terminal device provided by an embodiment of the present application.
本发明的实施方式Embodiments of the present invention
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。In order to make the purpose, technical solutions and advantages of the present application more clearly understood, the present application will be described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, but not to limit the present application.
本申请实施例提供的事件脉络生成方法可以应用于平板电脑、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本等终端设备上,本申请实施例对终端设备的具体类型不作任何限制。The event context generation method provided by the embodiments of the present application can be applied to terminal devices such as tablet computers, notebook computers, ultra-mobile personal computers (UMPCs), netbooks, etc. The specific types of the terminal devices are not specified in the embodiments of the present application. any restrictions.
请参阅图1,图1示出了本申请实施例提供的一种事件脉络生成方法的实现流程图,该方法包括如下步骤:Please refer to FIG. 1. FIG. 1 shows an implementation flowchart of a method for generating an event context provided by an embodiment of the present application. The method includes the following steps:
S101、分别获取多个事件文档中的第一时间信息以及事件信息,得到与所述多个事件文档中对应的多个第一时间事件对。S101. Obtain first time information and event information in multiple event documents respectively, and obtain multiple first time event pairs corresponding to the multiple event documents.
在应用中,上述事件文档包括但不限于新闻、书籍等包含文字信息的文档。其中,获取事件文档的方式可以为终端设备根据事件的关键词,实时从网页爬取包含关键词的新闻,或者微博。也可以为终端设备从指定路径下获取用户预先存储的多个事件文档。上述事件信息可以为事件文档中核心事件的摘要信息,可以采用事件关键词,或者基于核心事件选取模型的方法,从事件文档中确定事件信息。In an application, the above-mentioned event documents include but are not limited to documents containing text information, such as news and books. The manner of acquiring the event document may be that the terminal device crawls, in real time, news or microblogs containing keywords from a webpage according to the keywords of the event. It is also possible to obtain multiple event documents pre-stored by the user from a specified path for the terminal device. The above-mentioned event information may be the summary information of the core event in the event document, and the event information may be determined from the event document by using event keywords or a method based on a core event selection model.
在应用中,上述对于一篇事件文档中包含的核心事件(事件信息),可以是一句话,也可以是一段话。可对该事件文档进行分句,并根据预先设置的事件关键词计算每个分句中包含关键词的数量,并根据关键词的数量计算每个分句对应的第一句子权值。之后,根据分句在事件文档中的位置,赋予每个分句对应的第二句子权值。例如,对于标题和正文内容,可赋予标题分句对应的第二句子权值,高于正文内容中其余分句对应的第二句子权值。最后,根据每个分句的第一句子权值以及第二句子权值,从多个分句中挑选目标分句作为核心事件。其中,预先设置的事件关键词可以为从事件文档中提取到的词组,即统计每个词组在事件文档中的词频,将达到阈值的词频确定为事件关键词。可以理解的是,该词组能够充分表达事件文档本身内容,且简练而又具有概括性。In an application, the above-mentioned core event (event information) contained in an event document may be a sentence or a paragraph. The event document can be segmented, and the number of keywords contained in each clause is calculated according to the preset event keywords, and the first sentence weight corresponding to each clause is calculated according to the number of keywords. Afterwards, according to the position of the sentence in the event document, the weight of the second sentence corresponding to each sentence is assigned. For example, for the title and body content, the second sentence weight corresponding to the title clause may be assigned a weight higher than the second sentence weight corresponding to the remaining clauses in the body content. Finally, according to the weight of the first sentence and the weight of the second sentence of each sentence, the target sentence is selected as the core event from multiple sentences. The preset event keywords may be phrases extracted from the event document, that is, the word frequency of each phrase in the event document is counted, and the word frequency reaching a threshold is determined as the event keyword. It is understandable that this phrase can fully express the content of the event document itself, and is concise and general.
在应用中,上述第一时间信息为事件信息对应的发生时间点,可认为在第一时间信息的节点下发生的事件信息即为第一时间事件对。其中,事件文档中的时间信息可以为事件文档中存在的具体描述事件的时间信息,也可以为终端设备获取事件文档的获取时间作为时间信息。然而,对于新闻而言,也可将新闻的发布时间作为事件文档中的时间信息。在本实施例中,可将事件文档中存在的具体描述事件信息的时间信息,优先作为事件信息对应的第一时间信息。若事件文档中未发现具体描述事件信息的时间信息,则可将事件文档的发布时间作为事件文档的第一时间信息。否则,将终端设备获取事件文档的时间点作为事件信息的第一时间信息。In an application, the above-mentioned first time information is the occurrence time point corresponding to the event information, and it can be considered that the event information that occurs under the node of the first time information is the first time event pair. The time information in the event document may be time information that specifically describes the event existing in the event document, or may be the time at which the terminal device obtains the event document as the time information. However, for news, the release time of the news can also be used as the time information in the event file. In this embodiment, the time information in the event document that specifically describes the event information may be prioritized as the first time information corresponding to the event information. If no time information specifically describing the event information is found in the event document, the release time of the event document may be used as the first time information of the event document. Otherwise, the time point at which the terminal device acquires the event document is used as the first time information of the event information.
在应用中,对于一篇事件文档中的多个第一时间信息,可对应从事件文档中获取多个事件信息,并将第一时间信息与事件信息进行配对,得到该篇事件文档中的多个时间事件对。以此,对于多篇事件文档,可对应得到每篇事件文档中的多个时间事件对。In the application, for multiple pieces of first time information in an event document, multiple pieces of event information can be correspondingly obtained from the event document, and the first time information and the event information can be paired to obtain the multiple pieces of event information in the event document. time event pair. In this way, for multiple event documents, multiple time event pairs in each event document can be correspondingly obtained.
S102、统一所述多个第一时间事件对中多个第一时间信息的时间表达方式,得到统一后的多个第二时间信息,并将所述统一后的多个第二时间信息分别对应替换所述第一时间事件对的多个第一时间信息,得到多个第二时间事件对。S102. Unify the time expressions of the multiple first time information in the multiple first time event pairs, obtain multiple unified second time information, and respectively correspond to the multiple unified second time information A plurality of second time event pairs are obtained by replacing multiple pieces of first time information of the first time event pair.
在应用中,对上述第一时间信息进行归一化处理可以理解为统一每个第一时间信息的量纲。因时间表达方式具有多种,若不统一第一时间信息的量纲,则难以确定每个第一时 间信息表述的时间点的先后顺序。因此,为了便于比较时间信息的先后顺序,可对时间信息进行归一化处理。示例性的,“七月十日”和“七月十号”等时间信息,均可进行归一化处理得到7月10号。其中,对第一时间信息的时间表达方式进行统一后,得到的时间信息便是第二时间信息。因此,将第二时间信息对应替换第一时间事件对中的第一时间信息,即可得到多个第二时间事件对。需要说明的是,时间信息可具体到时、分、秒等任一时间点,可具体视情况对第一时间信息进行归一化处理,对此不作限定。In application, normalizing the above-mentioned first time information can be understood as unifying the dimension of each first time information. Since there are various ways of expressing time, if the dimension of the first time information is not unified, it is difficult to determine the sequence of time points expressed by each first time information. Therefore, in order to facilitate the comparison of the sequence of the time information, the time information may be normalized. Exemplarily, time information such as "July 10" and "July 10" can be normalized to obtain July 10. Wherein, after unifying the time expressions of the first time information, the obtained time information is the second time information. Therefore, by correspondingly replacing the first time information in the first time event pair with the second time information, a plurality of second time event pairs can be obtained. It should be noted that the time information may be specific to any time point such as hour, minute, second, etc., and the first time information may be normalized according to specific circumstances, which is not limited.
S103、从所述多个第二时间事件对中,确定与所述多个第二时间信息对应的目标事件信息。S103. From the plurality of second time event pairs, determine target event information corresponding to the plurality of second time information.
在应用中,对于从多个事件文档中得到的多个第二时间事件对,可能存在相同时间点(第二时间信息)下具有多个第二事件信息。例如,对于上午A时间点关于B事件信息进行报道的多篇新闻,终端设备可从每篇新闻中获取到关于AB的第二时间事件对。然而,多个第二时间事件对中均是对相同时间点下的同一事件信息进行报道。因此,需要从相同时间点下的多个第二时间事件对中,选出其中之一作为代表性的目标事件信息。例如,对于事件文档为新闻,可根据新闻来源的站点的权威性,站点发布该新闻的时间(站点发布新闻的时间,与事件信息对应的时间信息可能不一致),新闻的转载量等作为参考依据。In an application, for multiple second time event pairs obtained from multiple event documents, there may be multiple second event information at the same time point (second time information). For example, for multiple pieces of news that report on event information B at time point A in the morning, the terminal device may acquire a second time event pair about AB from each piece of news. However, multiple second time event pairs all report the same event information at the same time point. Therefore, it is necessary to select one of the multiple second time event pairs at the same time point as the representative target event information. For example, if the event document is news, the authority of the news source site, the time when the site publishes the news (the time when the site publishes the news, the time information corresponding to the event information may be inconsistent), the amount of news reprints, etc., can be used as a reference. .
需要说明的是,上述多个第二时间信息对应的目标事件信息,可以理解为是从多个相同的第二时间信息中(即同一时间点对应了多篇相同的事件信息),确定一篇事件信息作为该第二时间信息的目标事件信息。此时,同一时间点下的其余第二时间信息对应的事件信息则可忽略。另外,对于其余未存在相同时间点的第二时间信息(即该第二时间信息有且只有一个),可认为第二时间事件对中的事件信息,即为该第二时间信息对应的目标事件信息。It should be noted that the target event information corresponding to the above-mentioned multiple pieces of second time information can be understood as determining one piece of information from multiple pieces of the same second time information (that is, multiple pieces of the same event information corresponding to the same time point). The event information is used as the target event information of the second time information. In this case, the event information corresponding to the remaining second time information at the same time point can be ignored. In addition, for the rest of the second time information that does not have the same time point (that is, there is only one second time information), it can be considered that the event information in the second time event pair is the target event corresponding to the second time information. information.
S104、根据所述目标事件信息对应的第二时间信息,对所述目标事件信息进行排序生成事件脉络。S104. According to the second time information corresponding to the target event information, sort the target event information to generate an event context.
在应用中,上述事件脉络是为对于长时间发展的新闻事件的展示形式。在获取到每个时间点(第二时间信息)下的目标事件信息后,根据时间顺序,将目标事件信息放入时间线中生成事件脉络。其中,在生成的事件脉络中,用户可根据事件脉络观测事件的发展情况。例如,对于事件发展明显的新闻,可在事件脉络上看到新闻比较密集的区域,而该区域可认为是事件发展的主要阶段。In application, the above-mentioned event context is a display form for long-term developing news events. After acquiring the target event information at each time point (second time information), according to the time sequence, the target event information is put into the time line to generate an event context. Among them, in the generated event context, the user can observe the development of the event according to the event context. For example, for news with obvious event development, an area with relatively dense news can be seen on the event context, and this area can be considered as the main stage of event development.
在本实施例中,通过从每个事件文档中获取所有的第一时间信息以及事件信息,生成每个事件文档中对应的一个或多个第一时间事件对,可解决在一篇事件文档中涵盖多个时间节点下的事件信息时,该事件文档中的所有事件信息被认为是在一个时间节点发生的事件信息的问题。之后通过对每个第一时间对中的第一时间信息进行归一化处理,得到统一时间信息量纲后的第二时间信息,进而,可根据第二时间信息,从多个相同时间节点的事件信息中确定该第二时间信息对应的目标事件信息,以此得到每个第二时间信息对应的目标事件信息。最后根据第二时间信息,对目标事件信息进行排序生成事件脉络。使得生成的事件脉络中,每个时间节点下均对应一个目标事件信息,减少事件脉络中出现重复事件信息的情况。In this embodiment, by acquiring all the first time information and event information from each event document, and generating one or more first time event pairs corresponding to each event document, the solution can be solved in one event document When covering the event information under multiple time nodes, all the event information in the event document is considered to be the problem of the event information that occurred at one time node. After that, by normalizing the first time information in each first time pair, the second time information after the unified time information dimension is obtained, and then, according to the second time information, data from multiple same time nodes can be obtained. Target event information corresponding to the second time information is determined in the event information, thereby obtaining target event information corresponding to each second time information. Finally, according to the second time information, the target event information is sorted to generate an event context. In the generated event context, each time node corresponds to a target event information, which reduces the occurrence of repeated event information in the event context.
请参照图2,在一具体实施例中,所述第一时间信息包括多种时间表达方式,S101分别获取多个事件文档中的第一时间信息以及事件信息,得到所述多个事件文档中分别对应的多个第一时间事件对,具体包括如下子步骤S201-S202,详述如下:Referring to FIG. 2 , in a specific embodiment, the first time information includes multiple time expressions. S101 obtains first time information and event information in multiple event documents respectively, and obtains the first time information in the multiple event documents. The corresponding multiple first time event pairs specifically include the following sub-steps S201-S202, which are described in detail as follows:
S201、根据所述多种时间表述方式,查询所述多个事件文档中符合任一时间表达方式的多个第一时间信息。S201. Query, according to the multiple time expression manners, multiple pieces of first time information in the multiple event documents that conform to any time expression manner.
在应用中,上述时间表述方式包括但不限于使用中文汉字进行时间节点表述,使用罗马数字进行时间节点表述,对此不作限定。对于事件文档中的事件信息,若事件信息的语种为英语、日语等其他形式的语种,则终端设备可将其进行翻译成指定形式的语种(汉语),再根据时间表达方式查询每个事件文档中的第一时间信息。In application, the above-mentioned time expression methods include, but are not limited to, using Chinese characters to express time nodes, and using Roman numerals to express time nodes, which are not limited thereto. For the event information in the event document, if the language of the event information is English, Japanese and other languages, the terminal device can translate it into the specified language (Chinese), and then query each event document according to the time expression The first time information in .
示例性的,可预先建立多个时间表述方式,来查询事件文档中的第一时间信息。例如,对于以确定日期的事件文档,可建立"dd-mm-yy"的时间表述方式。其中,dd代表小时,规则为0到23之间的数值;mm代表分钟,规则为0到59之间的数值;yy代表秒,规则为0到59之间的数值。并根据该格式规则依次获取事件文档中的文字信息并进行比较,筛选出符合时间表述方式的斗殴时间信息。其中,上述函数只是时间表述方式中的一个示例,具体可根据实际情况进行设置,对此不作限定。Exemplarily, multiple time expressions may be established in advance to query the first time information in the event document. For example, for a dated event document, a time representation of "dd-mm-yy" can be established. Among them, dd represents the hour, and the rule is a value between 0 and 23; mm represents the minute, and the rule is a value between 0 and 59; yy represents the second, and the rule is a value between 0 and 59. And according to the format rule, the text information in the event document is sequentially obtained and compared, and the fight time information that conforms to the time expression method is screened out. The above function is only an example of the time expression manner, which can be set according to the actual situation, which is not limited.
S202、将所述多个第一时间信息与对应的所述多个事件文档分别输入至序列标注模型中,确定与所述多个第一时间信息分别相配对的事件信息,得到所述多个第一时间事件对。S202. Input the multiple first time information and the corresponding multiple event documents into the sequence annotation model respectively, determine the event information that is respectively matched with the multiple first time information, and obtain the multiple first time information. First time event pair.
在应用中,上述序列标注模型可以为时间递归神经网络模型(Long Short Term Memory Network,LSTM)、条件随机场网络模型(Conditional Random Field,CRF)或者时间递归与条件随机场结合形成的序列标注模型。其中,序列标注模型用于基于当前输入的事件信息的事件特征与第一时间信息的时间特征,输出该事件信息与第一时间信息配对准确的得分(概率)。其中,序列标注模型可基于已有的训练数据(事件文档中的多个事件信息和多个时间信息)和训练数据的分类结果(每个事件信息具体对应的时间信息)进行训练,得到的训练模型。之后,将事件文档输入至训练模型中,训练模型确定事件文档中的第一时间信息,并提取第一时间信息在事件文档中的文档位置,作为第一时间信息的时间特征。根据时间特征,输出事件文档中每个事件信息与第一时间信息配对准确的概率值。根据概率值确定事件信息与对应的第一时间信息,生成第一时间事件对。以此,得到每个事件文档中的一个或多个第一时间事件对。In application, the above sequence labeling model can be a time recursive neural network model (Long Short Term Memory Network, LSTM), a conditional random field network model (Conditional Random Field, CRF), or a sequence labeling model formed by a combination of time recursion and conditional random field . The sequence labeling model is used to output an accurate score (probability) of the pairing of the event information and the first time information based on the event feature of the currently input event information and the time feature of the first time information. The sequence labeling model can be trained based on the existing training data (multiple event information and multiple time information in the event document) and the classification result of the training data (the time information corresponding to each event information), and the obtained training Model. Then, the event document is input into the training model, the training model determines the first time information in the event document, and extracts the document position of the first time information in the event document as the time feature of the first time information. According to the time feature, output the exact probability value of each event information in the event document paired with the first time information. The event information and the corresponding first time information are determined according to the probability value, and a first time event pair is generated. In this way, one or more first time event pairs in each event document are obtained.
在本实施例中,通过设置多种时间表述方式,可以准确的查询每个事件文档中的多个时间信息,并根据序列标注模型确定每个第一时间信息分别相配对的事件信息,解决一篇事件文档在涵盖多个第一时间节点下发生的事件信息时,终端设备只对该事件文档生成一个第一时间事件对的问题。In this embodiment, by setting a variety of time expression methods, it is possible to accurately query a plurality of time information in each event document, and determine the paired event information for each first time information according to the sequence labeling model, so as to solve a problem. When an event document covers event information that occurs under multiple first time nodes, the terminal device only generates one first time event pair for the event document.
请参照图3,在一具体实施例中,S101分别获取多个事件文档中的第一时间信息以及事件信息,得到所述多个事件文档中分别对应的多个第一时间事件对,具体包括如下子步骤S301-S304,详述如下:Referring to FIG. 3, in a specific embodiment, S101 obtains first time information and event information in a plurality of event documents respectively, and obtains a plurality of first time event pairs corresponding to the plurality of event documents respectively, which specifically includes The following sub-steps S301-S304 are detailed as follows:
S301、分别获取每个事件文档中的每个第一时间信息,确定所述每个第一时间信息分别在对应的事件文档中的一个或多个第一文档位置。S301. Obtain each first time information in each event document respectively, and determine the location of each first time information in one or more first document positions in the corresponding event document.
在应用中,在根据时间表述方式从事件文档中确定出第一时间信息后,可对应的根据第一时间信息在文档中的位置,确定第一文档位置。其中,对于第一时间信息在事件文档中的第一文档位置具体可以为,对事件文档进行分词,得到多个文本分词,确定第一时间信息在多个文本分词的排序位置,将排序位置作为第一文档位置。需要说明的是,对于确定出的第一时间信息,可直接将第一时间信息作为一个文本分词,只对事件文档中不属于第一时间信息的内容进行分词。In the application, after the first time information is determined from the event document according to the time expression method, the position of the first document may be determined correspondingly according to the position of the first time information in the document. Wherein, the first document position of the first time information in the event document may specifically be: performing word segmentation on the event document to obtain multiple text segmentations, determining the sorting position of the first time information in the multiple text segmentations, and using the sorting position as First document location. It should be noted that, for the determined first time information, the first time information can be directly used as a text word segmentation, and only the content in the event document that does not belong to the first time information is segmented.
S302、对所述每个事件文档进行分词处理,得到所述每个事件文档中的多个分词。S302. Perform word segmentation processing on each event document to obtain multiple word segmentations in each event document.
S303、确定所述每个事件文档中,所述多个分词分别在对应的事件文档中的多个第二文档位置。S303. Determine that in each event document, the plurality of segmented words are respectively located in a plurality of second document positions in the corresponding event document.
在应用中,上述对每个事件文档进行分词处理,可采用基于字符串匹配的分词方法、基于理解的分词方法以及基于统计的分词方法,对事件文档进行分词处理。例如,对于基于字符串匹配的分词方法,可将待分词的新闻进行分词,并与预先设置的机器词典中的词条进行配对。若在词典中找到某个字符串,则可确定该字符串匹配成功(即识别出一个分词)。In the application, the above-mentioned word segmentation processing for each event document can be performed by using a word segmentation method based on string matching, a word segmentation method based on understanding, and a word segmentation method based on statistics to perform word segmentation processing on the event document. For example, for the word segmentation method based on string matching, the news to be segmented can be segmented and paired with the entries in the preset machine dictionary. If a certain string is found in the dictionary, it can be determined that the string is successfully matched (ie, a participle is recognized).
示例性的,先将新闻中的一句话与词条进行配对,若没有配对成功,则删除该句话中的第一个字(或最后一个字)形成新的字符串与词条进行配对,直至配对成功。之后,将该句子的剩余字符串作为新的字符串与词条进行配对,重复上述操作,得到一个句子的多个分词。以此,对新闻中的其余句子进行分词处理,可得到新闻的多个分词。在对新闻分 词结束后,根据各个分词在新闻中的顺序,即可确定每个分词的第二文档位置。之后,根据第一时间信息在新闻中的位置,确定第一时间信息的第一文档位置。Exemplarily, first pair a sentence in the news with the entry, and if the pairing is not successful, delete the first word (or the last word) in the sentence to form a new string to be paired with the entry, until the pairing is successful. After that, the remaining strings of the sentence are used as new strings to pair with the entry, and the above operations are repeated to obtain multiple word segmentations of a sentence. In this way, word segmentation is performed on the remaining sentences in the news, and multiple word segmentations of the news can be obtained. After the news segmentation is completed, the second document position of each segmentation can be determined according to the order of each segmentation in the news. Afterwards, the first document position of the first time information is determined according to the position of the first time information in the news.
S304、根据所述一个或多个第一文档位置和所述多个第二文档位置,从所述多个分词中确定与所述第一时间信息相配对的目标分词,生成所述每个事件文档中与所述每个第一时间信息对应的事件信息。S304. According to the one or more first document positions and the plurality of second document positions, determine a target word segment matched with the first time information from the plurality of word segments, and generate the each event Event information corresponding to each of the first time information in the document.
在应用中,对于一篇新闻中,若存在多个第一时间信息,即对应有多个第一文档位置。根据第一文档位置与第二文档位置,可计算每个第一文档位置分别与每个第二文档位置在新闻中的间隔距离,并根据间隔距离确定每个第一时间信息分别相配对的目标分词。In an application, for a piece of news, if there are multiple first time information, there are multiple first document positions correspondingly. According to the position of the first document and the position of the second document, the separation distance between each position of the first document and the position of each second document in the news can be calculated, and the target paired with each first time information can be determined according to the separation distance. Participle.
示例性的,一般在新闻中,通常是将事件信息与时间信息一起进行结合报道。新闻通常包括时间信息、地点信息、事件信息等诸多内容,且为准确报道事件,在新闻中描述事件信息的格式一般为,在xx时间(第一时间信息),xx地点(地点信息),发生xx事件(事件信息)。即可认为每个第一时间信息与相配对的事件信息在新闻中的位置间隔接近。因此,可根据每个第一时间信息的第一文档位置与分词的第二文档位置计算间隔距离。之后,在判定间隔距离小于预设阈值时,确定该第二文档位置对应的分词为目标分词。将多个目标分词根据位置顺序进行组合,即可生成第一时间信息对应的事件信息。以此,可生成每个事件文档中的每个第一时间信息与对应的事件信息。Illustratively, generally in news, event information is usually reported in conjunction with time information. News usually includes time information, location information, event information and many other contents, and to accurately report events, the format of describing event information in news is generally, at xx time (first time information), xx place (location information), occurrence xx event (event information). That is, it can be considered that each first time information and the corresponding event information are located close to each other in the news. Therefore, the separation distance can be calculated according to the first document position of each first time information and the second document position of the word segmentation. Afterwards, when it is determined that the interval distance is smaller than the preset threshold, the word segmentation corresponding to the position of the second document is determined as the target word segmentation. The event information corresponding to the first time information can be generated by combining multiple target word segments according to the position sequence. In this way, each first time information and corresponding event information in each event document can be generated.
请参照图4,在一具体实施例中,S304根据所述一个或多个第一文档位置和所述多个第二文档位置,从所述多个分词中确定与所述第一时间信息相配对的目标分词,生成所述每个事件文档中与所述每个第一时间信息对应的事件信息,具体包括如下子步骤S401-S403,详述如下:Referring to FIG. 4 , in a specific embodiment, in S304 , according to the one or more first document positions and the plurality of second document positions, it is determined from the plurality of word segments to match the first time information For the target word segmentation, the event information corresponding to the first time information in each event document is generated, which specifically includes the following sub-steps S401-S403, which are described in detail as follows:
S401、分别计算在所述每个事件文档中,每个分词的第二文档位置与所述第一文档位置之间的间隔距离。S401. Calculate the separation distance between the second document position of each word segment and the first document position in each event document respectively.
S402、根据所述间隔距离,计算所述每个分词分别与所述第一时间信息相配对的分类概率。S402. Calculate, according to the separation distance, a classification probability that each word segment is paired with the first time information.
在应用中,计算上述间隔距离具体可参照S304中的描述内容,对此不再详细描述。In an application, for calculating the above-mentioned separation distance, reference may be made to the description in S304, which will not be described in detail.
在应用中,在得到间隔距离后,可将分词与间隔距离输入至序列标注模型中的神经网络结构,由序列标注模型对其进行特征处理,得到处理后的特征向量。之后,序列标注模型可基于处理后的特征向量,输出当前分词与对应的第一时间信息的分类概率。In the application, after the interval distance is obtained, the word segmentation and the interval distance can be input into the neural network structure in the sequence labeling model, and the sequence labeling model performs feature processing on it to obtain the processed feature vector. After that, the sequence tagging model can output the classification probability of the current word segment and the corresponding first time information based on the processed feature vector.
S403、根据所述每个分词对应的分类概率,从所述多个分词中确定与所述第一时间信息相配对的目标分词,生成所述每个事件文档中与所述每个第一时间信息对应的事件信息。S403. According to the classification probability corresponding to each word segment, determine a target word segment matched with the first time information from the plurality of word segments, and generate a target word segment that matches the first time information in each event document. Information corresponding to the event information.
在应用中,每个事件文档中均具有多个上述分词,因此,可得到多个分词与第一时间信息的分类概率。然而,事件信息可以看成是多个分词组成的句子或段落。因此,可将多个分类概率从大到小进行排序,并将处于前列的N个分类概率对应的分词作为目标分词,根据每个目标分词的第二文档位置的顺序,生成事件信息。需要说明的,该事件信息与第一时间信息相配对的分类概率,比其余分词组成的事件信息与同一第一时间信息相配对的分类概率更高。因此,即便在事件文档中具有多个第一时间信息时,也可准确生成每个第一时间信息相配对的事件信息。In the application, each event document has a plurality of the above-mentioned segmented words, therefore, the classification probability of the multiple word segmentations and the first time information can be obtained. However, event information can be viewed as a sentence or paragraph composed of multiple word segments. Therefore, a plurality of classification probabilities can be sorted in descending order, and the segmented words corresponding to the top N classification probabilities can be used as target segmented words, and event information can be generated according to the order of the second document position of each target segmented word. It should be noted that the classification probability of the event information paired with the first time information is higher than the classification probability of the event information composed of other word segmentations paired with the same first time information. Therefore, even when there are multiple pieces of first time information in the event document, event information paired with each first time information can be accurately generated.
请参照图5,在一具体实施例中,所述多个分词包括第一分词和第二分词,S402根据所述间隔距离,计算所述每个分词分别与所述第一时间信息相配对的分类概率,具体包括如下子步骤S501-S504,详述如下:Referring to FIG. 5 , in a specific embodiment, the plurality of word segmentations include a first word segmentation and a second word segmentation, and S402 calculates, according to the interval distance, the pairing of each word segmentation with the first time information respectively. The classification probability specifically includes the following sub-steps S501-S504, which are described in detail as follows:
S501、分别在所述每个事件文档内提取第一分词的第一特征,以及获取与所述第一分词相邻的前一个第二分词,并确定所述第二分词与对应事件文档中所述事件信息之间的配对类别。S501. Extract the first feature of the first participle in each event document, and obtain the previous second participle adjacent to the first participle, and determine the second participle and the corresponding event document in the event document. The pairing category between the described event information.
在应用中,上述第一特征可为序列标注模型中的特征提取网络,对分词进行特征提取得到的特征。其中,上述事件文档具有多个分词,可将当前进行处理判断分类概率的分词作为第一分词,与第一分词相邻的前一个分词可作为第二分词。第二分词与事件信息之间 的配对类别,为终端设备之前对第二分词进行处理时,判定第二分词与事件信息的配对类别。其中,配对类别包括第二分词可用于生成事件信息,即第二分词与事件信息配对,或者,第二分词不可用于生成事件信息,即第二分词与事件信息不配对。In an application, the above-mentioned first feature may be a feature obtained by a feature extraction network in a sequence tagging model, which performs feature extraction on word segmentation. Wherein, the above-mentioned event document has multiple word segments, the word segment currently being processed to determine the classification probability may be used as the first word segment, and the previous word segment adjacent to the first word segment may be used as the second word segment. The pairing type between the second word segment and the event information is the pairing type of the second word segment and the event information that is determined when the terminal device processes the second word segment before. The pairing category includes that the second participle can be used to generate event information, that is, the second participle is paired with the event information, or the second participle cannot be used to generate event information, that is, the second participle is not paired with the event information.
S502、根据所述第一特征计算所述第一分词属于所述事件信息的第一概率。S502. Calculate a first probability that the first segmented word belongs to the event information according to the first feature.
在应用中,第一分词是否可以作为事件信息中的分词,一般由第一分词本身确定。具体的,分词可以认为是由单个字或者多个字组成的分词,可预先构建词向量库,并对词向量库中的每个字赋予对应的序列号。识别第一分词包含的字在词向量库中对应的序列号作为第一特征。同时,终端设备可根据已有的事件文档核心摘要(核心事件)确定方法,确定事件文档中的核心摘要。例如,生成摘要的方法包括但不限于:有监督抽取式方法、摘取式摘要方法进行摘要生成。而后,将识别到的第一特征以及核心摘要输入至序列标注模型中,由序列标注模型输出当前第一分词属于事件信息(核心摘要)中的第一概率。其中,核心摘要包括但不限于事件文档中的每段核心摘要、整篇事件文档的整体核心摘要等。In an application, whether the first participle can be used as a participle in the event information is generally determined by the first participle itself. Specifically, word segmentation can be considered as a word segmentation consisting of a single word or multiple words, a word vector library can be constructed in advance, and a corresponding serial number is assigned to each word in the word vector library. The sequence numbers corresponding to the words contained in the first participle in the word vector library are identified as the first feature. At the same time, the terminal device can determine the core abstract in the event document according to the existing method for determining the core abstract (core event) of the event document. For example, methods for generating abstracts include, but are not limited to: supervised extraction methods and abstract abstract methods for abstract generation. Then, the recognized first feature and the core abstract are input into the sequence annotation model, and the sequence annotation model outputs the first probability that the current first segmented word belongs to the event information (core abstract). The core abstract includes, but is not limited to, each core abstract in the event document, the overall core abstract of the entire event document, and the like.
S503、根据所述配对类别,计算所述第一分词属于所述事件信息的第二概率。S503. Calculate a second probability that the first segmented word belongs to the event information according to the pairing category.
在应用中,当前第一分词是否可以作为事件信息中的分词,还由第一分词周围的分词确定。因此,为了更准确的判断出当前第一分词是否可作为事件信息中的分词,还可基于当前第一分词相邻的前一个第二分词进行判断。其中,第二分词是否可作为事件信息中的分词,可通过上述S501中的配对类别进行确定。在确定配对类别、第一分词的第一特征以及核心摘要后,可通过序列标注模型中的条件随机场网络模型进行确定。即在已确定相邻第二分词属于事件信息的概率条件下,计算当前第一分词属于事件信息的第二概率,或者,在已确定相邻第二分词不属于事件信息的概率条件下,计算当前第一分词属于事件信息的第二概率。In the application, whether the current first participle can be used as a participle in the event information is also determined by the participles around the first participle. Therefore, in order to more accurately determine whether the current first participle can be used as a participle in the event information, the judgment can also be made based on the previous second participle adjacent to the current first participle. Wherein, whether the second participle can be used as the participle in the event information can be determined through the pairing category in the above S501. After determining the pairing category, the first feature of the first segmented word, and the core abstract, it can be determined through the conditional random field network model in the sequence labeling model. That is, under the probability condition that the adjacent second participle belongs to the event information, calculate the second probability that the current first participle belongs to the event information, or, under the condition that the adjacent second participle does not belong to the event information, calculate the probability condition. The current first participle belongs to the second probability of event information.
S504、根据所述第一分词与所述第一时间信息的间隔距离、所述第一概率以及所述第二概率,计算所述第一分词与所述第一时间信息相配对的分类概率。S504: Calculate the classification probability of the pairing of the first word segment and the first time information according to the distance between the first word segment and the first time information, the first probability, and the second probability.
在应用中,通过上述序列标注模型计算每个第一分词与第一时间信息配对准确时的分类概率具体为:序列标注模型在得到间隔距离、第一概率以及第二概率后,可对间隔距离、第一概率以及第二概率的具体数值进行归一化处理,并将归一化后的数值作为输入特征,输入至序列模型中的分类器中,由分类器输出第一分词在可作为事件信息的目标分词的基础上,又与该第一时间信息配对准确的数值,即为分类概率。In the application, calculating the classification probability when each first segmented word is accurately paired with the first time information by the above sequence labeling model is specifically: after obtaining the separation distance, the first probability and the second probability, the sequence labeling model can calculate the separation distance , the specific values of the first probability and the second probability are normalized, and the normalized value is used as an input feature, which is input to the classifier in the sequence model, and the classifier outputs the first participle in the event that can be used as an event. On the basis of the target word segmentation of the information, the accurate value paired with the first time information is the classification probability.
具体的,对于第一分词与第一时间信息之间进行配对的具体分类概率,可通过如下公式进行计算并配对:
Figure PCTCN2021091095-appb-000001
其中,X为事件文档中的第一分词,y为第一分词中的配对类别(第一分词是否属于事件信息);i为分词X在事件信息中处于第i个分词位置,n为事件信息中共有n个分词,yi表示第i个第一分词的标记分类序列;A yi-1,yi-1表示在第i-1个分词(第二分词)的配对类别已知的情况下,计算第i个分(第一分词)属于事件信息的第二概率,P i,yi表示基于第i个分词的分词特征,预测第i个分词为属于事件信息的第一概率,Q i,yi表示基于第i个分词与第一时间信息之间的间隔距离,预测第i个分词与第一时间信息配对准确的第三概率;S(x,y)为根据上述公式在计算第一分词属于事件信息的基础上,确定该第一分词与第一时间信息配对准确的概率。其中,第一分词与第一时间信息之间的距离可通过如下公式进行计算:Q(X)=dist(min(Tm,X),其中,X表 示事件文档中的第一分词,Tm为多个时间信息中的第m个第一时间信息,min(Tm,X)为第一分词与第m个第一时间信息之间的间隔距离。
Specifically, for the specific classification probability of pairing between the first word segment and the first time information, the following formula can be used to calculate and pair:
Figure PCTCN2021091095-appb-000001
Among them, X is the first participle in the event document, y is the pairing category in the first participle (whether the first participle belongs to the event information); i is the ith participle position of the participle X in the event information, and n is the event information There are n participles in the The ith part (the first participle) belongs to the second probability of event information, P i,yi represents the feature of the participle based on the ith participle, predicting the ith participle as the first probability of belonging to the event information, Q i,yi represents Based on the distance between the ith participle and the first time information, predict the third probability that the ith participle is accurately paired with the first time information; On the basis of the information, determine the accurate probability that the first participle is paired with the first time information. Wherein, the distance between the first participle and the first time information can be calculated by the following formula: Q(X)=dist(min(Tm,X), where X represents the first participle in the event document, and Tm is the number of The mth first time information in the pieces of time information, min(Tm, X) is the interval distance between the first participle and the mth first time information.
在应用中,通过在序列标注模型中加入第一分词与第一时间信息之间的间隔距离特征,可在判断第一分词可作为事件信息中的目标分词的基础上,进一步的判断第一分词与第一时间信息的关联性。使得一篇新闻文档中,即便存在多个第一时间信息,也可基于间隔距离特征,准确计算与分词相配对的分类概率,即提高第一时间信息与事件信息的匹配准确率。In the application, by adding the interval distance feature between the first participle and the first time information in the sequence tagging model, the first participle can be further judged on the basis of judging that the first participle can be used as the target participle in the event information. Correlation with first-time information. Therefore, even if there are multiple pieces of first time information in a news document, the classification probability matched with the word segmentation can be accurately calculated based on the interval distance feature, that is, the matching accuracy between the first time information and the event information can be improved.
可以理解的是,当事件文档中只有一个第一时间信息时,可生成整篇事件文档的整体核心摘要。根据事件文档中的多个分词,计算每个分词属于核心摘要的第一概率,以及相邻的第二分词的第二概率,并确定每个分词与第一时间信息的间隔距离。根据间隔距离、第一概率和第二概率,判断每个分词在属于核心摘要的情况下,还与第一时间信息相配对的分类概率。而在事件文档中具有多个第一时间信息的情况下,可根据每个包含第一时间信息的段落生成每段核心摘要,结合事件文档中的多个分词,判断每个分词属于每段核心摘要的第一概率,以及相邻的第二分词的第二概率,并计算每个分词与相应第一时间信息的间隔距离。根据间隔距离、第一概率和第二概率,判断每个分词在属于每段核心摘要的情况下,还与该段包含的第一时间信息相配对的分类概率。根据分类概率,与预设的概率阈值,确定每个第一时间信息对应的目标分词,以及确定根据目标分词的第二文档位置生成的事件信息。It can be understood that when there is only one first-time information in the event document, an overall core summary of the entire event document can be generated. According to the multiple word segments in the event document, the first probability that each word segment belongs to the core abstract and the second probability of the adjacent second word segment are calculated, and the distance between each word segment and the first time information is determined. According to the separation distance, the first probability and the second probability, the classification probability that each word segment is also paired with the first time information when it belongs to the core abstract is determined. In the case where there are multiple first time information in the event document, each core summary can be generated according to each paragraph containing the first time information, and combined with multiple word segments in the event document, it is determined that each segment belongs to each core segment The first probability of the abstract, and the second probability of the adjacent second participle, and the distance between each participle and the corresponding first time information is calculated. According to the separation distance, the first probability and the second probability, the classification probability of each segmented word being paired with the first time information contained in the segment when it belongs to the core abstract of each segment is determined. According to the classification probability and a preset probability threshold, the target word segmentation corresponding to each first time information is determined, and the event information generated according to the second document position of the target word segmentation is determined.
请参照图6,在一具体实施例中,每个第二时间信息至少与一个事件信息相对应;S103从所述多个第二时间事件对中,确定与所述多个第二时间信息对应的目标事件信息,还包括如下子步骤S601-S603,详述如下:Referring to FIG. 6, in a specific embodiment, each second time information corresponds to at least one event information; S103, from the plurality of second time event pairs, it is determined corresponding to the plurality of second time information The target event information also includes the following sub-steps S601-S603, which are described in detail as follows:
S601、在所述多个第二时间信息中,若存在任一第二时间信息对应有多个事件信息,则分别获取所述多个事件信息的信息来源。S601. In the plurality of second time information, if there is any second time information corresponding to a plurality of event information, obtain information sources of the plurality of event information respectively.
在应用中,上述每个第二时间信息至少与一个事件信息相对应,可以理解为,存在一个第二时间信息与一个事件信息相对应,也存在多个相同时间节点的第二时间信息与多个事件信息相对应。可以理解的是,对于相同时间节点下的第二时间信息对应的第二事件信息,可认为是不同事件文档的来源站点(新闻来源站点)对同一事件信息进行的报道。基于此,则可从相同时间点下的多个事件信息中,获取每个事件信息的信息来源。In an application, each of the above second time information corresponds to at least one event information. It can be understood that there is one second time information corresponding to one event information, and there are also multiple second time information of the same time node and multiple second time information. corresponding to the event information. It can be understood that, for the second event information corresponding to the second time information under the same time node, it may be considered as reports on the same event information by source sites (news source sites) of different event documents. Based on this, the information source of each event information can be obtained from multiple event information at the same time point.
S602、根据所述信息来源的优先级,从所述多个事件信息中获取所述优先级最高的目标事件信息。S602. Acquire the target event information with the highest priority from the plurality of event information according to the priority of the information source.
在应用中,上述信息来源的优先级可预先设置在终端设备内部。其信息来源的优先级高,可认为该信息来源对应的事件文档,其记载的信息内容更具有真实性和权威性。终端设备从网络上获取事件文档时,便可对应获取事件文档的信息来源。因此,可根据每个事件文档的信息来源,从多个事件信息中获取优先级最高的目标事件信息。其中,信息来源包括但不限于官方站点发布的事件文档(新闻)、非官方站点发布的事件文档。In the application, the priority of the above information sources can be preset in the terminal device. The priority of the information source is high, and it can be considered that the event document corresponding to the information source has more authenticity and authority in the recorded information content. When the terminal device obtains the event document from the network, it can correspondingly obtain the information source of the event document. Therefore, according to the information source of each event document, the target event information with the highest priority can be obtained from multiple event information. The sources of information include but are not limited to event documents (news) published by official sites and event documents published by unofficial sites.
S603、在所述多个第二时间信息中,若未存在任一第二时间信息对应有多个第一事件信息,则确定与每个第二时间信息分别对应的事件信息为目标事件信息。S603. In the plurality of second time information, if there is no one second time information corresponding to a plurality of first event information, determine the event information corresponding to each second time information as the target event information.
在应用中,在第二时间信息有且只有一个时,即其于第二时间信息的时间节点与该第二时间信息的时间节点不一致,可确定该第二时间信息对应的事件信息即为目标事件信息。In the application, when there is one and only one second time information, that is, the time node of the second time information is inconsistent with the time node of the second time information, it can be determined that the event information corresponding to the second time information is the target event information.
在一实施例中,在根据所述目标事件信息对应的第二时间信息,对所述目标事件信息进行排序生成事件脉络之后,还包括:In one embodiment, after sorting the target event information to generate an event context according to the second time information corresponding to the target event information, the method further includes:
将所述事件脉络上传至区块链中。Upload the event context to the blockchain.
具体的,在本申请的所有实施例中,基于终端设备得到对应的事件脉络,具体来说,事件脉络由终端工具进行处理得到。将事件脉络上传至区块链可保证其安全性和对用户的公正透明性。用户设备可以从区块链中下载得该事件脉络,以便查证事件脉络是否被篡改。 本示例所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。Specifically, in all the embodiments of the present application, the corresponding event context is obtained based on the terminal device. Specifically, the event context is obtained by processing the terminal tool. Uploading the event context to the blockchain ensures its security and fairness and transparency to users. The user equipment can download the event context from the blockchain in order to verify whether the event context has been tampered with. The blockchain referred to in this example is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain, essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
请参阅图7,图7是本申请实施例提供的一种事件脉络生成装置的结构框图。本实施例中该终端设备包括的各单元用于执行图1至图6对应的实施例中的各步骤。具体请参阅图1至图6以及图1至图6所对应的实施例中的相关描述。为了便于说明,仅示出了与本实施例相关的部分。参见图7,事件脉络生成装置700包括:获取模块710、处理模块720、确定模块730和生成模块740,其中:Please refer to FIG. 7. FIG. 7 is a structural block diagram of an event context generating apparatus provided by an embodiment of the present application. In this embodiment, each unit included in the terminal device is used to execute each step in the embodiment corresponding to FIG. 1 to FIG. 6 . For details, please refer to FIG. 1 to FIG. 6 and the related descriptions in the embodiments corresponding to FIG. 1 to FIG. 6 . For convenience of explanation, only the parts related to this embodiment are shown. Referring to FIG. 7 , the event context generating apparatus 700 includes: an acquiring module 710, a processing module 720, a determining module 730 and a generating module 740, wherein:
获取模块710,用于分别获取多个事件文档中的第一时间信息以及事件信息,得到与所述多个事件文档中对应的多个第一时间事件对。The obtaining module 710 is configured to obtain first time information and event information in multiple event documents respectively, and obtain multiple first time event pairs corresponding to the multiple event documents.
处理模块720,用于统一所述多个第一时间事件对中多个第一时间信息的时间表达方式,得到统一后的多个第二时间信息,并将所述统一后的多个第二时间信息分别对应替换所述多个第一时间事件对的第一时间信息,得到多个第二时间事件对。The processing module 720 is configured to unify the time expressions of the multiple first time information in the multiple first time event pairs, obtain multiple unified second time information, and combine the multiple unified second time information. The time information corresponds to replacing the first time information of the plurality of first time event pairs, respectively, to obtain a plurality of second time event pairs.
确定模块730,用于从所述多个第二时间事件对中,确定与所述多个第二时间信息对应的目标事件信息。The determining module 730 is configured to determine target event information corresponding to the plurality of second time information from the plurality of second time event pairs.
生成模块740,用于根据所述目标事件信息对应的第二时间信息,对所述目标事件信息进行排序生成事件脉络。The generating module 740 is configured to sort the target event information according to the second time information corresponding to the target event information to generate an event context.
在一实施例中,所述第一时间信息包括多种时间表达方式,获取模块710还用于:In one embodiment, the first time information includes multiple time expressions, and the acquiring module 710 is further configured to:
根据所述多种时间表述方式,查询所述多个事件文档中符合任一时间表达方式的多个第一时间信息;将所述多个第一时间信息与对应的所述多个事件文档分别输入至序列标注模型中,确定与所述多个第一时间信息分别相配对的事件信息,得到所述多个第一时间事件对。According to the multiple time expression manners, query the multiple first time information in the multiple event documents that conform to any time expression manner; separate the multiple first time information and the multiple corresponding event documents Input into the sequence labeling model, determine the event information that is respectively matched with the plurality of first time information, and obtain the plurality of first time event pairs.
在一实施例中,获取模块710还用于:In one embodiment, the obtaining module 710 is further configured to:
分别获取每个事件文档中的每个第一时间信息,确定所述每个第一时间信息分别在对应的事件文档中的一个或多个第一文档位置;对所述每个事件文档进行分词处理,得到所述每个事件文档中的多个分词;确定所述每个事件文档中,所述多个分词分别在对应的事件文档中的多个第二文档位置;根据所述一个或多个第一文档位置和所述多个第二文档位置,从所述多个分词中确定与所述第一时间信息相配对的目标分词,生成所述每个事件文档中与所述每个第一时间信息对应的事件信息。Respectively obtain each first time information in each event document, and determine that each first time information is in one or more first document positions in the corresponding event document respectively; perform word segmentation on each event document processing to obtain a plurality of word segments in each event document; determining that in each event document, the plurality of word segments are respectively in multiple second document positions in the corresponding event document; according to the one or more a plurality of first document positions and the plurality of second document positions, determine the target word segmentation matched with the first time information from the plurality of word segmentations, and generate the Event information corresponding to time information.
在一实施例中,获取模块710还用于:In one embodiment, the obtaining module 710 is further configured to:
分别计算在所述每个事件文档中,每个分词的第二文档位置与所述第一文档位置之间的间隔距离;根据所述间隔距离,计算所述每个分词分别与所述第一时间信息相配对的分类概率;根据所述每个分词对应的分类概率,从所述多个分词中确定与所述第一时间信息相配对的目标分词,生成所述每个事件文档中与所述每个第一时间信息对应的事件信息。Calculate the separation distance between the second document position of each word segment and the first document position in each event document respectively; according to the separation distance, calculate the distance between each word segment and the first document The classification probability that the time information is matched; according to the classification probability corresponding to each participle, the target participle that is matched with the first time information is determined from the plurality of participles, and the target participle that is matched with the first time information is generated in each event document. Describe the event information corresponding to each first time information.
在一实施例中,所述多个分词包括第一分词和第二分词;获取模块710还用于:In one embodiment, the plurality of word segmentations include a first word segmentation and a second word segmentation; the obtaining module 710 is further configured to:
分别在所述每个事件文档内提取第一分词的第一特征,以及获取与所述第一分词相邻的前一个第二分词,并确定所述第二分词与对应事件文档中所述事信息之间的配对类别;根据所述第一特征计算所述第一分词属于所述事件信息的第一概率;根据所述配对类别,计算所述第一分词属于所述事件信息的第二概率;根据所述第一分词与所述第一时间信息的间隔距离、所述第一概率以及所述第二概率,计算所述第一分词与所述第一时间信息相配对的分类概率。Respectively extract the first feature of the first participle in each of the event documents, and obtain the previous second participle adjacent to the first participle, and determine the second participle and the event described in the corresponding event document. The pairing category between the information; calculate the first probability that the first participle belongs to the event information according to the first feature; calculate the second probability that the first participle belongs to the event information according to the pairing category ; According to the separation distance between the first participle and the first time information, the first probability and the second probability, calculate the classification probability of the pairing of the first participle and the first time information.
在一实施例中,每个第二时间信息至少与一个事件信息相对应;确定模块730还用于:In one embodiment, each second time information corresponds to at least one event information; the determining module 730 is further configured to:
在所述多个第二时间信息中,若存在任一第二时间信息对应有多个事件信息,则分别获取所述多个事件信息的信息来源;根据所述信息来源的优先级,从所述多个事件信息中 获取所述优先级最高的目标事件信息;在所述多个第二时间信息中,若未存在任一第二时间信息对应有多个第一事件信息,则确定与每个第二时间信息分别对应的事件信息为目标事件信息。Among the plurality of second time information, if any second time information corresponds to a plurality of event information, the information sources of the plurality of event information are obtained respectively; Obtain the target event information with the highest priority from the plurality of event information; in the plurality of second time information, if there is no one second time information corresponding to a plurality of first event information, then determine and each second time information. The event information respectively corresponding to the second time information is the target event information.
在一实施例中,事件脉络生成装置700还包括:In one embodiment, the event context generating apparatus 700 further includes:
上传模块710,用于将所述事件脉络上传至区块链中。The uploading module 710 is configured to upload the event context to the blockchain.
应当理解的是,图7示出的事件脉络生成装置的结构框图中,各单元/模块用于执行图1至图6对应的实施例中的各步骤,而对于图1至图6对应的实施例中的各步骤已在上述实施例中进行详细解释,具体请参阅图1至图6以及图1至图6所对应的实施例中的相关描述,此处不再赘述。It should be understood that, in the structural block diagram of the event context generating apparatus shown in FIG. 7 , each unit/module is used to execute each step in the embodiment corresponding to FIG. 1 to FIG. The steps in the examples have been explained in detail in the above-mentioned embodiments. For details, please refer to FIG. 1 to FIG. 6 and the relevant descriptions in the embodiments corresponding to FIG. 1 to FIG. 6 , which will not be repeated here.
图8是本申请另一实施例提供的一种终端设备的结构框图。如图8所示,该实施例的终端设备800包括:处理器801、存储器802以及存储在存储器802中并可在处理器801运行的计算机程序803,例如事件脉络生成方法的程序。处理器801执行计算机程序803时实现上述各个事件脉络生成方法各实施例中的步骤,例如图1所示的S101至S104。或者,处理器801执行计算机程序803时实现上述图7对应的实施例中各模块的功能,例如,图7所示的模块710至740的功能。具体如下所述:FIG. 8 is a structural block diagram of a terminal device provided by another embodiment of the present application. As shown in FIG. 8 , the terminal device 800 of this embodiment includes: a processor 801 , a memory 802 , and a computer program 803 stored in the memory 802 and executable on the processor 801 , such as a program of an event context generation method. When the processor 801 executes the computer program 803, the steps in each of the above embodiments of the event context generation methods are implemented, for example, S101 to S104 shown in FIG. 1 . Alternatively, when the processor 801 executes the computer program 803, the functions of each module in the embodiment corresponding to FIG. 7 are implemented, for example, the functions of the modules 710 to 740 shown in FIG. 7 . Specifically as follows:
一种终端设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现:A terminal device, comprising a memory, a processor, and a computer program stored in the memory and running on the processor, the processor implements when the processor executes the computer program:
分别获取多个事件文档中的第一时间信息以及事件信息,得到与所述多个事件文档中对应的多个第一时间事件对;respectively acquiring first time information and event information in multiple event documents, and obtaining multiple first time event pairs corresponding to the multiple event documents;
统一所述多个第一时间事件对中多个第一时间信息的时间表达方式,得到统一后的多个第二时间信息,并将所述统一后的多个第二时间信息分别对应替换所述多个第一时间事件对的第一时间信息,得到多个第二时间事件对;Unify the time expressions of the multiple first time information in the multiple first time event pairs, obtain multiple unified second time information, and replace the unified multiple second time information correspondingly with each other. Describe the first time information of the plurality of first time event pairs to obtain a plurality of second time event pairs;
从所述多个第二时间事件对中,确定与所述多个第二时间信息对应的目标事件信息;From the plurality of second time event pairs, determine target event information corresponding to the plurality of second time information;
根据所述目标事件信息对应的第二时间信息,对所述目标事件信息进行排序生成事件脉络。According to the second time information corresponding to the target event information, the target event information is sorted to generate an event context.
在一个实施例中,所述第一时间信息包括多种时间表达方式,所述处理器执行所述计算机程序时还实现:In one embodiment, the first time information includes multiple time expressions, and when the processor executes the computer program, the processor further implements:
根据所述多种时间表述方式,查询所述多个事件文档中符合任一时间表达方式的多个第一时间信息;According to the multiple time expression manners, query the multiple first time information in the multiple event documents that conform to any time expression manner;
将所述多个第一时间信息与对应的所述多个事件文档分别输入至序列标注模型中,确定与所述多个第一时间信息分别相配对的事件信息,得到所述多个第一时间事件对。Inputting the plurality of first time information and the corresponding plurality of event documents into the sequence annotation model respectively, determining the event information respectively matched with the plurality of first time information, and obtaining the plurality of first time information time event pair.
在一个实施例中,所述处理器执行所述计算机程序时还实现:In one embodiment, when the processor executes the computer program, it further implements:
分别获取每个事件文档中的每个第一时间信息,确定所述每个第一时间信息分别在对应的事件文档中的一个或多个第一文档位置;Respectively obtain each first time information in each event document, and determine that each first time information is respectively in one or more first document positions in the corresponding event document;
对所述每个事件文档进行分词处理,得到所述每个事件文档中的多个分词;Perform word segmentation processing on each of the event documents to obtain multiple word segmentations in each of the event documents;
确定所述每个事件文档中,所述多个分词分别在对应的事件文档中的多个第二文档位置;Determine that in each event document, the multiple word segmentations are respectively in multiple second document positions in the corresponding event document;
根据所述一个或多个第一文档位置和所述多个第二文档位置,从所述多个分词中确定与所述第一时间信息相配对的目标分词,生成所述每个事件文档中与所述每个第一时间信息对应的事件信息。According to the one or more first document positions and the plurality of second document positions, a target word segmentation matched with the first time information is determined from the plurality of word segmentations, and a target word segmentation in each event document is generated. Event information corresponding to each of the first time information.
在一个实施例中,所述处理器执行所述计算机程序时还实现:In one embodiment, when the processor executes the computer program, it further implements:
分别计算在所述每个事件文档中,每个分词的第二文档位置与所述第一文档位置之间的间隔距离;Calculate the separation distance between the second document position of each word segment and the first document position in each of the event documents respectively;
根据所述间隔距离,计算所述每个分词分别与所述第一时间信息相配对的分类概率;According to the separation distance, calculate the classification probability that each word segment is paired with the first time information respectively;
根据所述每个分词对应的分类概率,从所述多个分词中确定与所述第一时间信息相配对的目标分词,生成所述每个事件文档中与所述每个第一时间信息对应的事件信息。According to the classification probability corresponding to each participle, a target participle matched with the first time information is determined from the plurality of participles, and a target participle corresponding to the first time information in each event document is generated. event information.
在一个实施例中,所述多个分词包括第一分词和第二分词;所述处理器执行所述计算机程序时还实现:In one embodiment, the plurality of word segmentations include a first word segmentation and a second word segmentation; when the processor executes the computer program, it further implements:
分别在所述每个事件文档内提取第一分词的第一特征,以及获取与所述第一分词相邻的前一个第二分词,并确定所述第二分词与对应事件文档中所述事信息之间的配对类别;Respectively extract the first feature of the first participle in each of the event documents, and obtain the previous second participle adjacent to the first participle, and determine the second participle and the event described in the corresponding event document. pairing categories between messages;
根据所述第一特征计算所述第一分词属于所述事件信息的第一概率;Calculate the first probability that the first segmented word belongs to the event information according to the first feature;
根据所述配对类别,计算所述第一分词属于所述事件信息的第二概率;calculating a second probability that the first segmented word belongs to the event information according to the pairing category;
根据所述第一分词与所述第一时间信息的间隔距离、所述第一概率以及所述第二概率,计算所述第一分词与所述第一时间信息相配对的分类概率。According to the separation distance between the first participle and the first time information, the first probability and the second probability, the classification probability of the pairing of the first participle and the first time information is calculated.
在一个实施例中,每个第二时间信息至少与一个事件信息相对应;所述处理器执行所述计算机程序时还实现:In one embodiment, each second time information corresponds to at least one event information; when the processor executes the computer program, the processor further implements:
在所述多个第二时间信息中,若存在任一第二时间信息对应有多个事件信息,则分别获取所述多个事件信息的信息来源;In the plurality of second time information, if there is any second time information corresponding to a plurality of event information, respectively acquiring the information sources of the plurality of event information;
根据所述信息来源的优先级,从所述多个事件信息中获取所述优先级最高的目标事件信息;According to the priority of the information source, obtain the target event information with the highest priority from the plurality of event information;
在所述多个第二时间信息中,若未存在任一第二时间信息对应有多个第一事件信息,则确定与每个第二时间信息分别对应的事件信息为目标事件信息。In the plurality of second time information, if there is no one second time information corresponding to a plurality of first event information, then the event information corresponding to each second time information is determined as the target event information.
在一个实施例中,所述处理器执行所述计算机程序时还实现:In one embodiment, when the processor executes the computer program, it further implements:
将所述事件脉络上传至区块链中。Upload the event context to the blockchain.
一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现:A computer-readable storage medium, the computer-readable storage medium stores a computer program, and the computer program is implemented when executed by a processor:
分别获取多个事件文档中的第一时间信息以及事件信息,得到与所述多个事件文档中对应的多个第一时间事件对;respectively acquiring first time information and event information in multiple event documents, and obtaining multiple first time event pairs corresponding to the multiple event documents;
统一所述多个第一时间事件对中多个第一时间信息的时间表达方式,得到统一后的多个第二时间信息,并将所述统一后的多个第二时间信息分别对应替换所述多个第一时间事件对的第一时间信息,得到多个第二时间事件对;Unify the time expressions of the multiple first time information in the multiple first time event pairs, obtain multiple unified second time information, and replace the unified multiple second time information correspondingly with each other. Describe the first time information of the plurality of first time event pairs to obtain a plurality of second time event pairs;
从所述多个第二时间事件对中,确定与所述多个第二时间信息对应的目标事件信息;From the plurality of second time event pairs, determine target event information corresponding to the plurality of second time information;
根据所述目标事件信息对应的第二时间信息,对所述目标事件信息进行排序生成事件脉络。According to the second time information corresponding to the target event information, the target event information is sorted to generate an event context.
在一个实施例中,所述第一时间信息包括多种时间表达方式,所述计算机程序被处理器执行时还实现:In one embodiment, the first time information includes multiple time expressions, and when the computer program is executed by the processor, the computer program further implements:
根据所述多种时间表述方式,查询所述多个事件文档中符合任一时间表达方式的多个第一时间信息;According to the multiple time expression manners, query the multiple first time information in the multiple event documents that conform to any time expression manner;
将所述多个第一时间信息与对应的所述多个事件文档分别输入至序列标注模型中,确定与所述多个第一时间信息分别相配对的事件信息,得到所述多个第一时间事件对。Inputting the plurality of first time information and the corresponding plurality of event documents into the sequence annotation model respectively, determining the event information respectively matched with the plurality of first time information, and obtaining the plurality of first time information time event pair.
在一个实施例中,所述计算机程序被处理器执行时还实现:In one embodiment, the computer program, when executed by the processor, further implements:
分别获取每个事件文档中的每个第一时间信息,确定所述每个第一时间信息分别在对应的事件文档中的一个或多个第一文档位置;Respectively obtain each first time information in each event document, and determine that each first time information is respectively in one or more first document positions in the corresponding event document;
对所述每个事件文档进行分词处理,得到所述每个事件文档中的多个分词;Perform word segmentation processing on each of the event documents to obtain multiple word segmentations in each of the event documents;
确定所述每个事件文档中,所述多个分词分别在对应的事件文档中的多个第二文档位置;Determine that in each event document, the multiple word segmentations are respectively in multiple second document positions in the corresponding event document;
根据所述一个或多个第一文档位置和所述多个第二文档位置,从所述多个分词中确定与所述第一时间信息相配对的目标分词,生成所述每个事件文档中与所述每个第一时间信息对应的事件信息。According to the one or more first document positions and the plurality of second document positions, a target word segmentation matched with the first time information is determined from the plurality of word segmentations, and a target word segmentation in each event document is generated. Event information corresponding to each of the first time information.
在一个实施例中,所述计算机程序被处理器执行时还实现:In one embodiment, the computer program, when executed by the processor, further implements:
分别计算在所述每个事件文档中,每个分词的第二文档位置与所述第一文档位置之间的间隔距离;Calculate the separation distance between the second document position of each word segment and the first document position in each of the event documents respectively;
根据所述间隔距离,计算所述每个分词分别与所述第一时间信息相配对的分类概率;According to the separation distance, calculate the classification probability that each word segment is paired with the first time information respectively;
根据所述每个分词对应的分类概率,从所述多个分词中确定与所述第一时间信息相配对的目标分词,生成所述每个事件文档中与所述每个第一时间信息对应的事件信息。According to the classification probability corresponding to each participle, a target participle matched with the first time information is determined from the plurality of participles, and a target participle corresponding to the first time information in each event document is generated. event information.
在一个实施例中,所述多个分词包括第一分词和第二分词;所述计算机程序被处理器执行时还实现:In one embodiment, the plurality of word segmentations include a first word segmentation and a second word segmentation; when the computer program is executed by the processor, it further implements:
分别在所述每个事件文档内提取第一分词的第一特征,以及获取与所述第一分词相邻的前一个第二分词,并确定所述第二分词与对应事件文档中所述事信息之间的配对类别;Respectively extract the first feature of the first participle in each of the event documents, and obtain the previous second participle adjacent to the first participle, and determine the second participle and the event described in the corresponding event document. pairing categories between messages;
根据所述第一特征计算所述第一分词属于所述事件信息的第一概率;Calculate the first probability that the first segmented word belongs to the event information according to the first feature;
根据所述配对类别,计算所述第一分词属于所述事件信息的第二概率;calculating a second probability that the first segmented word belongs to the event information according to the pairing category;
根据所述第一分词与所述第一时间信息的间隔距离、所述第一概率以及所述第二概率,计算所述第一分词与所述第一时间信息相配对的分类概率。According to the separation distance between the first participle and the first time information, the first probability and the second probability, the classification probability of the pairing of the first participle and the first time information is calculated.
在一个实施例中,每个第二时间信息至少与一个事件信息相对应;所述计算机程序被处理器执行时还实现:In one embodiment, each second time information corresponds to at least one event information; when the computer program is executed by the processor, it further implements:
在所述多个第二时间信息中,若存在任一第二时间信息对应有多个事件信息,则分别获取所述多个事件信息的信息来源;In the plurality of second time information, if there is any second time information corresponding to a plurality of event information, respectively acquiring the information sources of the plurality of event information;
根据所述信息来源的优先级,从所述多个事件信息中获取所述优先级最高的目标事件信息;According to the priority of the information source, obtain the target event information with the highest priority from the plurality of event information;
在所述多个第二时间信息中,若未存在任一第二时间信息对应有多个第一事件信息,则确定与每个第二时间信息分别对应的事件信息为目标事件信息。In the plurality of second time information, if there is no one second time information corresponding to a plurality of first event information, then the event information corresponding to each second time information is determined as the target event information.
在一个实施例中,所述计算机程序被处理器执行时还实现:In one embodiment, the computer program, when executed by the processor, further implements:
将所述事件脉络上传至区块链中。Upload the event context to the blockchain.
示例性的,计算机程序803可以被分割成一个或多个单元,一个或者多个单元被存储在存储器802中,并由处理器801执行,以完成本申请。一个或多个单元可以是能够完成特定功能的一系列计算机程序指令段,该指令段用于描述计算机程序803在终端设备800中的执行过程。例如,计算机程序803可以被分割成获取模块、处理模块、确定模块以及生成模块,各模块具体功能如上。Exemplarily, the computer program 803 may be divided into one or more units, and the one or more units are stored in the memory 802 and executed by the processor 801 to complete the present application. One or more units may be a series of computer program instruction segments capable of performing specific functions, and the instruction segments are used to describe the execution process of the computer program 803 in the terminal device 800 . For example, the computer program 803 can be divided into an acquisition module, a processing module, a determination module, and a generation module, and the specific functions of each module are as above.
终端设备可包括,但不仅限于,处理器801、存储器802。本领域技术人员可以理解,图8仅仅是终端设备800的示例,并不构成对终端设备800的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如终端设备还可以包括输入输出设备、网络接入设备、总线等。The terminal device may include, but is not limited to, the processor 801 and the memory 802 . Those skilled in the art can understand that FIG. 8 is only an example of the terminal device 800, and does not constitute a limitation on the terminal device 800, and may include more or less components than the one shown, or combine some components, or different components For example, the terminal device may also include an input and output device, a network access device, a bus, and the like.
所称处理器801可以是中央处理单元,还可以是其他通用处理器、数字信号处理器、专用集成电路、现成可编程门阵列或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。The so-called processor 801 can be a central processing unit, and can also be other general-purpose processors, digital signal processors, application-specific integrated circuits, off-the-shelf programmable gate arrays or other programmable logic devices, discrete gate or transistor logic devices, and discrete hardware components. Wait. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
存储器802可以是终端设备800的内部存储单元,例如终端设备800的硬盘或内存。存储器802也可以是终端设备800的外部存储设备,例如终端设备800上配备的插接式硬盘,智能存储卡,闪存卡等。进一步地,存储器802还可以既包括终端设备800的内部存储单元也包括外部存储设备。The memory 802 may be an internal storage unit of the terminal device 800 , such as a hard disk or a memory of the terminal device 800 . The memory 802 may also be an external storage device of the terminal device 800 , such as a plug-in hard disk, a smart memory card, a flash memory card, etc., which are equipped on the terminal device 800 . Further, the memory 802 may also include both an internal storage unit of the terminal device 800 and an external storage device.
所述计算机可读存储介质可以是前述实施例所述的终端设备的内部存储单元,例如所述终端设备的硬盘或内存。所述计算机可读存储介质可以是非易失性,也可以是易失性。所述计算机可读存储介质也可以是所述终端设备的外部存储设备,例如所述终端设备上配备的插接式硬盘,智能存储卡安全数字卡,闪存卡等。The computer-readable storage medium may be an internal storage unit of the terminal device described in the foregoing embodiments, such as a hard disk or a memory of the terminal device. The computer-readable storage medium may be non-volatile or volatile. The computer-readable storage medium may also be an external storage device of the terminal device, for example, a pluggable hard disk, a smart memory card, a secure digital card, a flash memory card, etc. equipped on the terminal device.
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。The above-mentioned embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the above-mentioned embodiments, those of ordinary skill in the art should understand that: it can still be used for the above-mentioned implementations. The technical solutions described in the examples are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions in the embodiments of the application, and should be included in the within the scope of protection of this application.

Claims (20)

  1. 一种事件脉络生成方法,其中,所述方法包括:A method for generating event context, wherein the method comprises:
    分别获取多个事件文档中的第一时间信息以及事件信息,得到与所述多个事件文档中对应的多个第一时间事件对;respectively acquiring first time information and event information in multiple event documents, and obtaining multiple first time event pairs corresponding to the multiple event documents;
    统一所述多个第一时间事件对中多个第一时间信息的时间表达方式,得到统一后的多个第二时间信息,并将所述统一后的多个第二时间信息分别对应替换所述多个第一时间事件对的第一时间信息,得到多个第二时间事件对;Unify the time expressions of the multiple first time information in the multiple first time event pairs, obtain multiple unified second time information, and replace the unified multiple second time information correspondingly with each other. Describe the first time information of the plurality of first time event pairs to obtain a plurality of second time event pairs;
    从所述多个第二时间事件对中,确定与所述多个第二时间信息对应的目标事件信息;From the plurality of second time event pairs, determine target event information corresponding to the plurality of second time information;
    根据所述目标事件信息对应的第二时间信息,对所述目标事件信息进行排序生成事件脉络。According to the second time information corresponding to the target event information, the target event information is sorted to generate an event context.
  2. 如权利要求1所述的事件脉络生成方法,其中,所述第一时间信息包括多种时间表达方式,所述分别获取多个事件文档中的第一时间信息以及事件信息,得到所述多个事件文档中分别对应的多个第一时间事件对,包括:The method for generating an event context according to claim 1, wherein the first time information includes multiple time expressions, and the first time information and event information in multiple event documents are obtained respectively, and the multiple event documents are obtained. Multiple first-time event pairs corresponding to each other in the event document, including:
    根据所述多种时间表述方式,查询所述多个事件文档中符合任一时间表达方式的多个第一时间信息;According to the multiple time expression manners, query the multiple first time information in the multiple event documents that conform to any time expression manner;
    将所述多个第一时间信息与对应的所述多个事件文档分别输入至序列标注模型中,确定与所述多个第一时间信息分别相配对的事件信息,得到所述多个第一时间事件对。Inputting the plurality of first time information and the corresponding plurality of event documents into the sequence annotation model respectively, determining the event information respectively matched with the plurality of first time information, and obtaining the plurality of first time information time event pair.
  3. 如权利要求1或2所述的事件脉络生成方法,其中,所述分别获取多个事件文档中的第一时间信息以及事件信息,包括:The method for generating event context according to claim 1 or 2, wherein the acquiring the first time information and the event information in the multiple event documents respectively comprises:
    分别获取每个事件文档中的每个第一时间信息,确定所述每个第一时间信息分别在对应的事件文档中的一个或多个第一文档位置;Respectively obtain each first time information in each event document, and determine that each first time information is respectively in one or more first document positions in the corresponding event document;
    对所述每个事件文档进行分词处理,得到所述每个事件文档中的多个分词;Perform word segmentation processing on each of the event documents to obtain multiple word segmentations in each of the event documents;
    确定所述每个事件文档中,所述多个分词分别在对应的事件文档中的多个第二文档位置;Determine that in each event document, the multiple word segmentations are respectively in multiple second document positions in the corresponding event document;
    根据所述一个或多个第一文档位置和所述多个第二文档位置,从所述多个分词中确定与所述第一时间信息相配对的目标分词,生成所述每个事件文档中与所述每个第一时间信息对应的事件信息。According to the one or more first document positions and the plurality of second document positions, a target word segmentation matched with the first time information is determined from the plurality of word segmentations, and a target word segmentation in each event document is generated. Event information corresponding to each of the first time information.
  4. 如权利要求3所述的事件脉络生成方法,其中,所述根据所述一个或多个第一文档位置和所述多个第二文档位置,从所述多个分词中确定与所述第一时间信息相配对的目标分词,生成所述每个事件文档中与所述每个第一时间信息对应的事件信息,包括:The event context generation method according to claim 3, wherein, according to the one or more first document positions and the plurality of second document positions, determining the relationship with the first document from the plurality of word segmentations The target word segmentation paired with time information generates event information corresponding to each first time information in each event document, including:
    分别计算在所述每个事件文档中,每个分词的第二文档位置与所述第一文档位置之间的间隔距离;Calculate the separation distance between the second document position of each word segment and the first document position in each of the event documents respectively;
    根据所述间隔距离,计算所述每个分词分别与所述第一时间信息相配对的分类概率;According to the separation distance, calculate the classification probability that each word segment is paired with the first time information respectively;
    根据所述每个分词对应的分类概率,从所述多个分词中确定与所述第一时间信息相配对的目标分词,生成所述每个事件文档中与所述每个第一时间信息对应的事件信息。According to the classification probability corresponding to each participle, a target participle matched with the first time information is determined from the plurality of participles, and a target participle corresponding to the first time information in each event document is generated. event information.
  5. 如权利要求4所述的事件脉络生成方法,其中,所述多个分词包括第一分词和第二分词;The event context generation method according to claim 4, wherein the plurality of word segments comprise a first word segment and a second word segment;
    所述根据所述间隔距离,计算所述每个分词分别与所述第一时间信息相配对的分类概率,包括:The calculating, according to the separation distance, the classification probability that each word segment is paired with the first time information, including:
    分别在所述每个事件文档内提取第一分词的第一特征,以及获取与所述第一分词相邻的前一个第二分词,并确定所述第二分词与对应事件文档中所述事信息之间的配对类别;Respectively extract the first feature of the first participle in each of the event documents, and obtain the previous second participle adjacent to the first participle, and determine the second participle and the event described in the corresponding event document. pairing categories between messages;
    根据所述第一特征计算所述第一分词属于所述事件信息的第一概率;Calculate the first probability that the first segmented word belongs to the event information according to the first feature;
    根据所述配对类别,计算所述第一分词属于所述事件信息的第二概率;calculating a second probability that the first segmented word belongs to the event information according to the pairing category;
    根据所述第一分词与所述第一时间信息的间隔距离、所述第一概率以及所述第二概率, 计算所述第一分词与所述第一时间信息相配对的分类概率。According to the separation distance between the first participle and the first time information, the first probability and the second probability, the classification probability of the pairing of the first participle and the first time information is calculated.
  6. 如权利要求1-2或4-5任一所述的事件脉络生成方法,其中,每个第二时间信息至少与一个事件信息相对应;The event context generation method according to any one of claims 1-2 or 4-5, wherein each second time information corresponds to at least one event information;
    所述从所述多个第二时间事件对中,确定与所述多个第二时间信息对应的目标事件信息,包括:The determining, from the plurality of second time event pairs, the target event information corresponding to the plurality of second time information includes:
    在所述多个第二时间信息中,若存在任一第二时间信息对应有多个事件信息,则分别获取所述多个事件信息的信息来源;In the plurality of second time information, if there is any second time information corresponding to a plurality of event information, respectively acquiring the information sources of the plurality of event information;
    根据所述信息来源的优先级,从所述多个事件信息中获取所述优先级最高的目标事件信息;According to the priority of the information source, obtain the target event information with the highest priority from the plurality of event information;
    在所述多个第二时间信息中,若未存在任一第二时间信息对应有多个第一事件信息,则确定与每个第二时间信息分别对应的事件信息为目标事件信息。In the plurality of second time information, if there is no one second time information corresponding to a plurality of first event information, then the event information corresponding to each second time information is determined as the target event information.
  7. 如权利要求1-2或4-5任一所述的事件脉络生成方法,其中,在根据所述目标事件信息对应的第二时间信息,对所述目标事件信息进行排序生成事件脉络之后,还包括:The method for generating an event context according to any one of claims 1-2 or 4-5, wherein after sorting the target event information according to the second time information corresponding to the target event information to generate the event context, further include:
    将所述事件脉络上传至区块链中。Upload the event context to the blockchain.
  8. 一种事件脉络生成装置,其中,所述装置包括:An event context generation device, wherein the device comprises:
    获取模块,用于分别获取多个事件文档中的第一时间信息以及事件信息,得到与所述多个事件文档中对应的多个第一时间事件对;an obtaining module, configured to obtain first time information and event information in multiple event documents respectively, and obtain multiple first time event pairs corresponding to the multiple event documents;
    处理模块,用于统一所述多个第一时间事件对中多个第一时间信息的时间表达方式,得到统一后的多个第二时间信息,并将所述统一后的多个第二时间信息分别对应替换所述多个第一时间事件对的第一时间信息,得到多个第二时间事件对;The processing module is configured to unify the time expressions of the multiple first time information in the multiple first time event pairs, obtain multiple unified second time information, and combine the multiple unified second time information The information corresponds to replacing the first time information of the plurality of first time event pairs, respectively, to obtain a plurality of second time event pairs;
    确定模块,用于从所述多个第二时间事件对中,确定与所述多个第二时间信息对应的目标事件信息;a determining module, configured to determine target event information corresponding to the plurality of second time information from the plurality of second time event pairs;
    生成模块,用于根据所述目标事件信息对应的第二时间信息,对所述目标事件信息进行排序生成事件脉络。A generating module, configured to sort the target event information according to the second time information corresponding to the target event information to generate an event context.
  9. 一种终端设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现:A terminal device, comprising a memory, a processor, and a computer program stored in the memory and running on the processor, wherein the processor implements when the processor executes the computer program:
    分别获取多个事件文档中的第一时间信息以及事件信息,得到与所述多个事件文档中对应的多个第一时间事件对;respectively acquiring first time information and event information in multiple event documents, and obtaining multiple first time event pairs corresponding to the multiple event documents;
    统一所述多个第一时间事件对中多个第一时间信息的时间表达方式,得到统一后的多个第二时间信息,并将所述统一后的多个第二时间信息分别对应替换所述多个第一时间事件对的第一时间信息,得到多个第二时间事件对;Unify the time expressions of the multiple first time information in the multiple first time event pairs, obtain multiple unified second time information, and replace the unified multiple second time information correspondingly with each other. Describe the first time information of the plurality of first time event pairs to obtain a plurality of second time event pairs;
    从所述多个第二时间事件对中,确定与所述多个第二时间信息对应的目标事件信息;From the plurality of second time event pairs, determine target event information corresponding to the plurality of second time information;
    根据所述目标事件信息对应的第二时间信息,对所述目标事件信息进行排序生成事件脉络。According to the second time information corresponding to the target event information, the target event information is sorted to generate an event context.
  10. 根据权利要求9所述的终端设备,其中,所述第一时间信息包括多种时间表达方式,所述处理器执行所述计算机程序时还实现:The terminal device according to claim 9, wherein the first time information includes multiple time expressions, and when the processor executes the computer program, the processor further implements:
    根据所述多种时间表述方式,查询所述多个事件文档中符合任一时间表达方式的多个第一时间信息;According to the multiple time expression manners, query the multiple first time information in the multiple event documents that conform to any time expression manner;
    将所述多个第一时间信息与对应的所述多个事件文档分别输入至序列标注模型中,确定与所述多个第一时间信息分别相配对的事件信息,得到所述多个第一时间事件对。Inputting the plurality of first time information and the corresponding plurality of event documents into the sequence labeling model respectively, determining the event information respectively matched with the plurality of first time information, and obtaining the plurality of first time information time event pair.
  11. 根据权利要求9或10所述的终端设备,其中,所述处理器执行所述计算机程序时还实现:The terminal device according to claim 9 or 10, wherein, when the processor executes the computer program, it further implements:
    分别获取每个事件文档中的每个第一时间信息,确定所述每个第一时间信息分别在对应的事件文档中的一个或多个第一文档位置;Respectively obtain each first time information in each event document, and determine that each first time information is respectively in one or more first document positions in the corresponding event document;
    对所述每个事件文档进行分词处理,得到所述每个事件文档中的多个分词;Perform word segmentation processing on each of the event documents to obtain multiple word segmentations in each of the event documents;
    确定所述每个事件文档中,所述多个分词分别在对应的事件文档中的多个第二文档位 置;Determine that in each event document, the multiple word segmentations are respectively at multiple second document positions in the corresponding event document;
    根据所述一个或多个第一文档位置和所述多个第二文档位置,从所述多个分词中确定与所述第一时间信息相配对的目标分词,生成所述每个事件文档中与所述每个第一时间信息对应的事件信息。According to the one or more first document positions and the plurality of second document positions, a target word segmentation matched with the first time information is determined from the plurality of word segmentations, and a target word segmentation in each event document is generated. Event information corresponding to each of the first time information.
  12. 根据权利要求11所述的终端设备,其中,所述处理器执行所述计算机程序时还实现:The terminal device according to claim 11, wherein, when the processor executes the computer program, it further implements:
    分别计算在所述每个事件文档中,每个分词的第二文档位置与所述第一文档位置之间的间隔距离;Calculate the separation distance between the second document position of each word segment and the first document position in each of the event documents respectively;
    根据所述间隔距离,计算所述每个分词分别与所述第一时间信息相配对的分类概率;According to the separation distance, calculating the classification probability that each participle is paired with the first time information respectively;
    根据所述每个分词对应的分类概率,从所述多个分词中确定与所述第一时间信息相配对的目标分词,生成所述每个事件文档中与所述每个第一时间信息对应的事件信息。According to the classification probability corresponding to each participle, a target participle matched with the first time information is determined from the plurality of participles, and a target participle corresponding to the first time information in each event document is generated. event information.
  13. 根据权利要求12所述的终端设备,其中,所述多个分词包括第一分词和第二分词;所述处理器执行所述计算机程序时还实现:The terminal device according to claim 12, wherein the plurality of word segments include a first word segment and a second word segment; when the processor executes the computer program, the processor further implements:
    分别在所述每个事件文档内提取第一分词的第一特征,以及获取与所述第一分词相邻的前一个第二分词,并确定所述第二分词与对应事件文档中所述事信息之间的配对类别;Respectively extract the first feature of the first participle in each of the event documents, and obtain the previous second participle adjacent to the first participle, and determine the second participle and the event described in the corresponding event document. pairing categories between messages;
    根据所述第一特征计算所述第一分词属于所述事件信息的第一概率;Calculate the first probability that the first segmented word belongs to the event information according to the first feature;
    根据所述配对类别,计算所述第一分词属于所述事件信息的第二概率;calculating a second probability that the first segmented word belongs to the event information according to the pairing category;
    根据所述第一分词与所述第一时间信息的间隔距离、所述第一概率以及所述第二概率,计算所述第一分词与所述第一时间信息相配对的分类概率。According to the separation distance between the first participle and the first time information, the first probability and the second probability, the classification probability of the pairing of the first participle and the first time information is calculated.
  14. 根据权利要求9-10或11-12任一所述的终端设备,其中,每个第二时间信息至少与一个事件信息相对应;所述多个分词包括第一分词和第二分词;所述处理器执行所述计算机程序时还实现:The terminal device according to any one of claims 9-10 or 11-12, wherein each second time information corresponds to at least one event information; the plurality of word segments include a first word segment and a second word segment; the When the processor executes the computer program, it also implements:
    在所述多个第二时间信息中,若存在任一第二时间信息对应有多个事件信息,则分别获取所述多个事件信息的信息来源;In the plurality of second time information, if there is any second time information corresponding to a plurality of event information, respectively acquiring the information sources of the plurality of event information;
    根据所述信息来源的优先级,从所述多个事件信息中获取所述优先级最高的目标事件信息;According to the priority of the information source, obtain the target event information with the highest priority from the plurality of event information;
    在所述多个第二时间信息中,若未存在任一第二时间信息对应有多个第一事件信息,则确定与每个第二时间信息分别对应的事件信息为目标事件信息。In the plurality of second time information, if there is no one second time information corresponding to a plurality of first event information, then the event information corresponding to each second time information is determined as the target event information.
  15. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其中,所述计算机程序被处理器执行时实现:A computer-readable storage medium storing a computer program, wherein the computer program is executed by a processor to realize:
    分别获取多个事件文档中的第一时间信息以及事件信息,得到与所述多个事件文档中对应的多个第一时间事件对;respectively acquiring first time information and event information in multiple event documents, and obtaining multiple first time event pairs corresponding to the multiple event documents;
    统一所述多个第一时间事件对中多个第一时间信息的时间表达方式,得到统一后的多个第二时间信息,并将所述统一后的多个第二时间信息分别对应替换所述多个第一时间事件对的第一时间信息,得到多个第二时间事件对;Unify the time expressions of the multiple first time information in the multiple first time event pairs, obtain multiple unified second time information, and replace the unified multiple second time information correspondingly with the Describe the first time information of the plurality of first time event pairs to obtain a plurality of second time event pairs;
    从所述多个第二时间事件对中,确定与所述多个第二时间信息对应的目标事件信息;From the plurality of second time event pairs, determine target event information corresponding to the plurality of second time information;
    根据所述目标事件信息对应的第二时间信息,对所述目标事件信息进行排序生成事件脉络。According to the second time information corresponding to the target event information, the target event information is sorted to generate an event context.
  16. 根据权利要求15所述的计算机可读存储介质,其中,所述第一时间信息包括多种时间表达方式,所述计算机程序被处理器执行时还实现:The computer-readable storage medium according to claim 15, wherein the first time information includes multiple time expressions, and the computer program further implements when executed by the processor:
    根据所述多种时间表述方式,查询所述多个事件文档中符合任一时间表达方式的多个第一时间信息;According to the multiple time expression manners, query the multiple first time information in the multiple event documents that conform to any time expression manner;
    将所述多个第一时间信息与对应的所述多个事件文档分别输入至序列标注模型中,确定与所述多个第一时间信息分别相配对的事件信息,得到所述多个第一时间事件对。Inputting the plurality of first time information and the corresponding plurality of event documents into the sequence labeling model respectively, determining the event information respectively matched with the plurality of first time information, and obtaining the plurality of first time information time event pair.
  17. 根据权利要求15或16所述的计算机可读存储介质,其中,所述计算机程序被处理器执行时还实现:The computer-readable storage medium of claim 15 or 16, wherein the computer program, when executed by the processor, further implements:
    分别获取每个事件文档中的每个第一时间信息,确定所述每个第一时间信息分别在对应的事件文档中的一个或多个第一文档位置;Respectively obtain each first time information in each event document, and determine that each first time information is respectively in one or more first document positions in the corresponding event document;
    对所述每个事件文档进行分词处理,得到所述每个事件文档中的多个分词;Perform word segmentation processing on each of the event documents to obtain multiple word segmentations in each of the event documents;
    确定所述每个事件文档中,所述多个分词分别在对应的事件文档中的多个第二文档位置;Determine that in each event document, the multiple word segmentations are respectively in multiple second document positions in the corresponding event document;
    根据所述一个或多个第一文档位置和所述多个第二文档位置,从所述多个分词中确定与所述第一时间信息相配对的目标分词,生成所述每个事件文档中与所述每个第一时间信息对应的事件信息。According to the one or more first document positions and the plurality of second document positions, the target word segmentation matched with the first time information is determined from the plurality of word segmentations, and the target word segmentation in each event document is generated. Event information corresponding to each of the first time information.
  18. 根据权利要求17所述的计算机可读存储介质,其中,所述计算机程序被处理器执行时还实现:The computer-readable storage medium of claim 17, wherein the computer program, when executed by the processor, further implements:
    分别计算在所述每个事件文档中,每个分词的第二文档位置与所述第一文档位置之间的间隔距离;Calculate the separation distance between the second document position of each word segment and the first document position in each of the event documents respectively;
    根据所述间隔距离,计算所述每个分词分别与所述第一时间信息相配对的分类概率;According to the separation distance, calculating the classification probability that each participle is paired with the first time information respectively;
    根据所述每个分词对应的分类概率,从所述多个分词中确定与所述第一时间信息相配对的目标分词,生成所述每个事件文档中与所述每个第一时间信息对应的事件信息。According to the classification probability corresponding to each participle, a target participle matched with the first time information is determined from the plurality of participles, and a target participle corresponding to the first time information in each event document is generated. event information.
  19. 根据权利要求18所述的计算机可读存储介质,其中,所述多个分词包括第一分词和第二分词;所述计算机程序被处理器执行时还实现:The computer-readable storage medium according to claim 18, wherein the plurality of word segmentations include a first word segmentation and a second word segmentation; the computer program further implements when executed by the processor:
    分别在所述每个事件文档内提取第一分词的第一特征,以及获取与所述第一分词相邻的前一个第二分词,并确定所述第二分词与对应事件文档中所述事信息之间的配对类别;Respectively extract the first feature of the first participle in each of the event documents, and obtain the previous second participle adjacent to the first participle, and determine the second participle and the event described in the corresponding event document. pairing categories between messages;
    根据所述第一特征计算所述第一分词属于所述事件信息的第一概率;Calculate the first probability that the first participle belongs to the event information according to the first feature;
    根据所述配对类别,计算所述第一分词属于所述事件信息的第二概率;calculating a second probability that the first segmented word belongs to the event information according to the pairing category;
    根据所述第一分词与所述第一时间信息的间隔距离、所述第一概率以及所述第二概率,计算所述第一分词与所述第一时间信息相配对的分类概率。According to the separation distance between the first participle and the first time information, the first probability and the second probability, the classification probability of the pairing of the first participle and the first time information is calculated.
  20. 根据权利要求15-16或18-19任一所述的计算机可读存储介质,其中,每个第二时间信息至少与一个事件信息相对应;所述计算机程序被处理器执行时还实现:The computer-readable storage medium according to any one of claims 15-16 or 18-19, wherein each second time information corresponds to at least one event information; when the computer program is executed by the processor, it further implements:
    在所述多个第二时间信息中,若存在任一第二时间信息对应有多个事件信息,则分别获取所述多个事件信息的信息来源;In the plurality of second time information, if there is any second time information corresponding to a plurality of event information, respectively acquiring the information sources of the plurality of event information;
    根据所述信息来源的优先级,从所述多个事件信息中获取所述优先级最高的目标事件信息;According to the priority of the information source, obtain the target event information with the highest priority from the plurality of event information;
    在所述多个第二时间信息中,若未存在任一第二时间信息对应有多个第一事件信息,则确定与每个第二时间信息分别对应的事件信息为目标事件信息。Among the plurality of second time information, if there is no one second time information corresponding to a plurality of first event information, the event information corresponding to each second time information is determined as the target event information.
PCT/CN2021/091095 2020-11-06 2021-04-29 Event context generation method and apparatus, and terminal device and storage medium WO2022095375A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011229516.5 2020-11-06
CN202011229516.5A CN112328747B (en) 2020-11-06 2020-11-06 Event context generation method, device, terminal equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2022095375A1 true WO2022095375A1 (en) 2022-05-12

Family

ID=74317076

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/091095 WO2022095375A1 (en) 2020-11-06 2021-04-29 Event context generation method and apparatus, and terminal device and storage medium

Country Status (2)

Country Link
CN (1) CN112328747B (en)
WO (1) WO2022095375A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115033668A (en) * 2022-08-12 2022-09-09 清华大学 Story venation construction method and device, electronic equipment and storage medium

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112328747B (en) * 2020-11-06 2024-05-24 平安科技(深圳)有限公司 Event context generation method, device, terminal equipment and storage medium
CN113553407B (en) * 2021-06-18 2022-09-27 北京百度网讯科技有限公司 Event tracing method and device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105243092A (en) * 2015-09-11 2016-01-13 天津海量信息技术有限公司 Internet based event occurrence time collecting method
US20160364488A1 (en) * 2015-06-12 2016-12-15 Baidu Online Network Technology (Beijing) Co., Ltd Microblog-based event context acquiring method and system
CN106844466A (en) * 2016-12-21 2017-06-13 百度在线网络技术(北京)有限公司 Event train of thought generation method and device
CN110309256A (en) * 2018-03-09 2019-10-08 北京国双科技有限公司 The acquisition methods and device of event data in a kind of text
CN112328747A (en) * 2020-11-06 2021-02-05 平安科技(深圳)有限公司 Event context generation method and device, terminal equipment and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW201025035A (en) * 2008-12-18 2010-07-01 Univ Nat Taiwan Analysis algorithm of time series word summary and story plot evolution
CN108170838B (en) * 2018-01-12 2022-07-08 平安科技(深圳)有限公司 Topic evolution visualization display method, application server and computer readable storage medium
CN110555108B (en) * 2018-05-31 2022-03-15 北京百度网讯科技有限公司 Event context generation method, device, equipment and storage medium
CN109582796A (en) * 2018-12-05 2019-04-05 深圳前海微众银行股份有限公司 Generation method, device, equipment and the storage medium of enterprise's public sentiment event network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160364488A1 (en) * 2015-06-12 2016-12-15 Baidu Online Network Technology (Beijing) Co., Ltd Microblog-based event context acquiring method and system
CN105243092A (en) * 2015-09-11 2016-01-13 天津海量信息技术有限公司 Internet based event occurrence time collecting method
CN106844466A (en) * 2016-12-21 2017-06-13 百度在线网络技术(北京)有限公司 Event train of thought generation method and device
CN110309256A (en) * 2018-03-09 2019-10-08 北京国双科技有限公司 The acquisition methods and device of event data in a kind of text
CN112328747A (en) * 2020-11-06 2021-02-05 平安科技(深圳)有限公司 Event context generation method and device, terminal equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115033668A (en) * 2022-08-12 2022-09-09 清华大学 Story venation construction method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN112328747B (en) 2024-05-24
CN112328747A (en) 2021-02-05

Similar Documents

Publication Publication Date Title
WO2022095375A1 (en) Event context generation method and apparatus, and terminal device and storage medium
WO2022105122A1 (en) Answer generation method and apparatus based on artificial intelligence, and computer device and medium
US11093854B2 (en) Emoji recommendation method and device thereof
US20220405592A1 (en) Multi-feature log anomaly detection method and system based on log full semantics
CN108376151B (en) Question classification method and device, computer equipment and storage medium
US10891699B2 (en) System and method in support of digital document analysis
US11055327B2 (en) Unstructured data parsing for structured information
CN112069321B (en) Method, electronic device and storage medium for text hierarchical classification
CN108959418A (en) Character relation extraction method and device, computer device and computer readable storage medium
CN111324771B (en) Video tag determination method and device, electronic equipment and storage medium
WO2020259280A1 (en) Log management method and apparatus, network device and readable storage medium
CN110162771A (en) The recognition methods of event trigger word, device, electronic equipment
US20220207483A1 (en) Automatic document classification
CN111177375A (en) Electronic document classification method and device
CN115086182B (en) Mail recognition model optimization method and device, electronic equipment and storage medium
CN114372475A (en) Network public opinion emotion analysis method and system based on RoBERTA model
CN114722832A (en) Abstract extraction method, device, equipment and storage medium
CN114757178A (en) Core product word extraction method, device, equipment and medium
CN112668325B (en) Machine translation enhancement method, system, terminal and storage medium
CN117669513A (en) Data management system and method based on artificial intelligence
WO2021004118A1 (en) Correlation value determination method and apparatus
CN115544213A (en) Method, device and storage medium for acquiring information in text
CN110941713A (en) Self-optimization financial information plate classification method based on topic model
CN115455416A (en) Malicious code detection method and device, electronic equipment and storage medium
Chowdhury et al. Detection of compatibility, proximity and expectancy of Bengali sentences using long short term memory

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21888079

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21888079

Country of ref document: EP

Kind code of ref document: A1