WO2023040516A1

WO2023040516A1 - Event integration method and apparatus, and electronic device, computer-readable storage medium and computer program product

Info

Publication number: WO2023040516A1
Application number: PCT/CN2022/111164
Authority: WO
Inventors: 房育勋; 朱斌; 刘晨
Original assignee: 腾讯科技（深圳）有限公司
Priority date: 2021-09-18
Filing date: 2022-08-09
Publication date: 2023-03-23
Also published as: CN115840796A

Abstract

An event integration method and apparatus, and an electronic device, a computer-readable storage medium and a computer program product, which are applied to various event integration scenarios such as cloud technology, artificial intelligence, smart transportation and vehicle mounting. The event integration method comprises: acquiring an event to be integrated and at least two topics to be integrated, wherein each topic to be integrated comprises at least one topic event; on the basis of at least one of a semantic similarity, a character string graph similarity and a question-answer similarity, determining a target similarity between the event to be integrated and each topic to be integrated, wherein the semantic similarity refers to the similarity of semantic features, the character string graph similarity refers to the similarity of graph features which correspond to a key character string, and the question-answer similarity refers to the similarity of question-answer features; determining, from the at least two topics to be integrated and on the basis of the target similarity, a target topic to which the event to be integrated belongs; and integrating, into the target topic, the event to be integrated, so as to obtain event context, wherein the event context comprises the event to be integrated and at least one topic event.

Description

An event integration method, device, electronic device, computer-readable storage medium, and computer program product

Cross References to Related Applications

This application is based on a Chinese patent application with application number 202111111428.X and a filing date of September 18, 2021, and claims the priority of this Chinese patent application. The entire content of this Chinese patent application is hereby incorporated by reference into this application.

technical field

The present application relates to information processing technology in the field of computer application, and in particular to an event integration method, device, electronic equipment, computer-readable storage medium and computer program product.

Background technique

For topics with a long duration (greater than the duration threshold) (often composed of multiple events that have occurred), when the latest progress event is obtained, the latest progress event needs to be integrated into the corresponding topic to form an event containing the latest progress event The context enables users to intuitively understand the process of event development through the event context.

Generally speaking, in order to integrate the latest progress event into the topic, clustering is usually used, that is, the latest progress event and the topic are incrementally clustered, so as to determine the topic to which the latest progress event belongs according to the cluster center and threshold. However, when events are integrated through incremental clustering, the accuracy rate of the clustering is low, which affects the accuracy of the determined topic, and then, when the latest progress event is integrated into the topic, it affects The accuracy of event integration is improved.

Contents of the invention

Embodiments of the present application provide an event integration method, device, electronic device, computer-readable storage medium, and computer program product, which can improve the accuracy of event integration.

The technical scheme of the embodiment of the application is realized in this way:

The embodiment of this application provides an event integration method, including:

Obtaining events to be integrated, and acquiring at least two topics to be integrated, wherein each topic to be integrated includes at least one topic event;

Based on one or more of semantic similarity, string graph similarity and question-answer similarity, determine the target similarity between the event to be integrated and each topic to be integrated, wherein the semantic similarity Refers to the similarity in terms of semantic features, the similarity in character string graphs refers to the similarity in graph features corresponding to key character strings, and the question-answer similarity refers to the similarity in question-and-answer features;

Based on the target similarity, determining a target topic to which the event to be integrated belongs from at least two topics to be integrated;

Integrating the event to be integrated into the target topic to obtain an event context, wherein the event context includes the event to be integrated and at least one topic event.

An embodiment of the present application provides an event integration device, including:

An information acquisition module configured to acquire events to be integrated, and acquire at least two topics to be integrated, wherein each of the topics to be integrated includes at least one topic event;

The similarity acquisition module is configured to determine the target similarity between the event to be integrated and each topic to be integrated based on one or more of semantic similarity, string graph similarity and question-answer similarity, Wherein, the semantic similarity refers to the similarity in semantic features, the string graph similarity refers to the similarity in graph features corresponding to key character strings, and the question-answer similarity refers to the similarity in question-answer features ;

A topic determination module configured to determine the target topic to which the event to be integrated belongs from at least two topics to be integrated based on the target similarity;

The event integration module is configured to integrate the event to be integrated into the target topic to obtain an event context, wherein the event context includes the event to be integrated and at least one topic event.

An embodiment of the present application provides an electronic device for event integration, including:

memory for storing executable instructions;

The processor is configured to implement the event integration method provided in the embodiment of the present application when executing the executable instructions stored in the memory.

The embodiment of the present application provides a computer-readable storage medium, which stores executable instructions for causing a processor to execute the event integration method provided in the embodiment of the present application.

An embodiment of the present application provides a computer program product, including a computer program or an instruction, and when the computer program or instruction is executed by a processor, the event integration method provided in the embodiment of the present application is implemented.

The embodiments of the present application have at least the following beneficial effects: when determining the target topic to which the event to be integrated belongs among at least two topics to be integrated, it is determined by judging the target similarity between the event to be integrated and each topic to be integrated, That is, the target topic is determined by directly comparing the target similarity between the event to be integrated and each topic to be integrated, and since the target similarity includes one or more of semantic similarity, string graph similarity and question-answer similarity, Therefore, whether each topic to be integrated is the target topic to which the event to be integrated belongs can be accurately determined through the obtained target similarity, and then, when the event to be integrated is integrated into the target topic, the accuracy of event integration can be improved.

Description of drawings

FIG. 1 is a schematic diagram of an optional architecture of the event integration system provided by the embodiment of the present application;

FIG. 2 is a schematic diagram of an exemplary composition and structure of the server in FIG. 1 provided by the embodiment of the present application;

FIG. 3 is an optional schematic flow diagram 1 of the event integration method provided by the embodiment of the present application;

Fig. 4a is an optional schematic flow diagram II of the event integration method provided by the embodiment of the present application;

Fig. 4b is an optional schematic flow diagram III of the event integration method provided by the embodiment of the present application;

Fig. 4c is an optional schematic flowchart 4 of the event integration method provided by the embodiment of the present application;

Fig. 4d is a schematic flow diagram of obtaining semantic statistical similarity provided by the embodiment of the present application;

Fig. 4e is a schematic flow diagram of the semantic statistical similarity model provided by the embodiment of the present application;

Fig. 4f is a schematic flow diagram of obtaining the similarity of character string graphs provided by the embodiment of the present application;

Fig. 4g is a schematic flow diagram of obtaining the question-and-answer similarity provided by the embodiment of the present application;

Fig. 4h is a schematic flow diagram of obtaining at least two topics to be integrated provided by the embodiment of the present application;

Fig. 4i is a first schematic flow diagram of obtaining a key character string of a topic event provided by the embodiment of the present application;

Fig. 4j is the second schematic flow diagram for obtaining the key character string of the topic event provided by the embodiment of the present application;

FIG. 5 is an optional schematic flowchart five of the event integration method provided by the embodiment of the present application;

FIG. 6 is a schematic diagram showing an exemplary event context provided by the embodiment of the present application;

Fig. 7 is a schematic presentation of another exemplary event context provided by the embodiment of the present application;

FIG. 8 is an optional schematic flowchart six of the event integration method provided by the embodiment of the present application;

Fig. 9 is a schematic diagram of an exemplary news topic recall provided by an embodiment of the present application;

Fig. 10 is an exemplary schematic diagram of determining whether a news topic is related to the latest progress event provided by the embodiment of the present application;

Fig. 11a is a schematic diagram of an exemplary model for obtaining vector semantic similarity provided by an embodiment of the present application;

Fig. 11b is a schematic diagram of another exemplary model for obtaining vector semantic similarity provided by the embodiment of the present application;

Fig. 12 is a schematic diagram of an exemplary keyword map provided by an embodiment of the present application;

Fig. 13 is a schematic diagram of another exemplary keyword map provided by the embodiment of the present application;

Fig. 14 is a schematic diagram of an exemplary acquisition of semantic similarity of questions and answers provided by an embodiment of the present application;

Fig. 15 is a schematic diagram of an exemplary feature importance provided by an embodiment of the present application.

Detailed ways

In order to make the purpose, technical solutions and advantages of the application clearer, the application will be further described in detail below in conjunction with the accompanying drawings. All other embodiments obtained under the premise of creative labor belong to the scope of protection of this application.

In the following description, references to "some embodiments" describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or a different subset of all possible embodiments, and Can be combined with each other without conflict.

In the following description, the terms "first\second\third\fourth\fifth" are only used to distinguish similar objects, and do not represent a specific ordering of objects. Understandably, "first\second Two\third\fourth\fifth" can be interchanged in specific order or sequential order if allowed, so that the embodiments of the present application described here can be implemented in an order other than those illustrated or described here.

Unless otherwise defined, all technical and scientific terms used in the embodiments of the present application have the same meaning as commonly understood by those skilled in the technical field of the present application. The terms used in the embodiments of the present application are only for the purpose of describing the embodiments of the present application, and are not intended to limit the present application.

Before further describing the embodiments of the present application in detail, the nouns and terms involved in the embodiments of the present application are described, and the nouns and terms involved in the embodiments of the present application are applicable to the following explanations.

1) Artificial Intelligence (AI) is the theory, method, technology and application of using digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results system. That is, artificial intelligence is a comprehensive technique of computer science used to capture the essence of intelligence and produce a new intelligent machine that can respond in a similar way to human intelligence. In addition, artificial intelligence is also used to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making. In addition, artificial intelligence technology is a comprehensive subject that involves a wide range of fields, including both hardware-level technology and software-level technology. Artificial intelligence basic technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics. Artificial intelligence software technology mainly includes several major directions such as computer vision technology, speech processing technology, natural language processing technology, and machine learning (Machine Learning, ML)/deep learning.

2) Natural language processing (Nature Language processing, NLP), is a direction in the field of computer science and artificial intelligence; it refers to the study of various theories and methods that can realize effective communication between humans and computers in natural language. Therefore, natural language processing is a science that integrates linguistics, computer science and mathematics; thus, the research in the field of natural language processing will involve natural language, that is, the language people use every day, so the research of natural language processing and linguistics have a close connection. Natural language processing technologies usually include technologies such as machine reading comprehension (Machine Reading Comprehension, MRC), text processing, semantic understanding, machine translation, robot question answering, and knowledge graphs.

3) Machine learning is a multi-field interdisciplinary subject involving probability theory, statistics, approximation theory, convex analysis and algorithm complexity theory; it is used to study how computers simulate or realize human learning behaviors to obtain New knowledge or skills, reorganize the existing knowledge structure so that it can continuously improve its own performance. Machine learning is the core of artificial intelligence and the fundamental way to make computers intelligent. The application of machine learning pervades all fields of artificial intelligence. Machine learning and deep learning usually include artificial neural networks, belief networks, reinforcement learning, migration learning, and inductive learning. Japanese teaching and learning techniques.

4) Machine reading comprehension is a natural language processing task; given an article and a question in the article, it is used to answer the question after reading the article by a machine; among them, the article and the question, namely The sentence sequence to be answered in the embodiment of this application.

5) Graph Convolutional Network (GCN), which is used to calculate the representation of the graph, the corresponding processing data is the data of the graph structure, and the graph (graph) is a data format used to represent the key string network, Social networks, communication networks, and protein molecular networks, etc., the nodes in the graph represent individuals in the network, and the edges represent the connections between individuals. In the embodiment of the present application, the vector representation of the first key string graph and the vector representation of the second key string graph can be obtained through a graph convolutional network.

6) Named Entity Recognition (NER), also known as entity recognition, entity segmentation and entity extraction, is used to locate and classify named entities in text into predefined categories, such as people, organizations, locations, Time expression, quantity, value resource percentage, etc.; usually, the task of named entity recognition is to identify three categories (entity category, time category and number category) and seven subcategories (person name, organization name, place name, time, date, value resource and percentage) named entities. In this embodiment of the present application, entities of a preset entity type, such as entities of a person name and a place name type, are acquired through named entity recognition.

Generally speaking, in order to integrate the latest progress event into the topic, a clustering method is usually used to incrementally cluster the latest progress event and the topic, so as to determine the topic to which the latest progress event belongs according to the cluster center and threshold. However, in the above-mentioned process of integrating the latest progress events into topics through incremental clustering, due to the low accuracy of clustering, the accuracy of event integration is affected, and there is still computational overhead as the number of topics increases. The increased problem, thus, affects the efficiency of event integration.

Based on this, the embodiments of the present application provide an event integration method, device, electronic device, computer-readable storage medium, and computer program product, which can improve the accuracy and efficiency of event integration and reduce the computing resource consumption of event integration. The following describes the exemplary application of the electronic device for event integration (hereinafter referred to as event integration device) provided by the embodiment of the present application. The event integration device provided by the embodiment of the present application can be implemented as a smart phone, a smart watch, a notebook computer, a tablet Various types of terminals such as computers, desktop computers, smart home appliances, set-top boxes, smart car devices, portable music players, personal digital assistants, dedicated messaging devices, smart voice interaction devices, portable game devices, and smart speakers can also be implemented as servers . Next, an exemplary application when the device is implemented as a server will be described.

Referring to FIG. 1, FIG. 1 is a schematic diagram of an optional architecture of the event integration system provided by the embodiment of the present application; as shown in FIG. The terminal 200-1 and the terminal 200-2) are connected to the server 400 (called an event integration device) through the network 300, and the network 300 may be a wide area network or a local area network, or a combination of both. In addition, the event integration system 100 also includes a database 500 for providing data support to the server 400 (for example, providing at least two topics to be integrated to the server 400); and, shown in FIG. 1 is that the database 500 is independent of the server 400, in addition, the database 500 may also be integrated in the server 400, which is not limited in this embodiment of the present application.

The terminal 200 is configured to obtain the event context from the server 400 through the network 300 and display the event context on a graphical interface.

The server 400 is configured to acquire events to be integrated, and acquire at least two topics to be integrated, wherein each topic to be integrated includes at least one topic event; based on one of semantic similarity, string graph similarity, and question-answer similarity or more, determine the target similarity between the event to be integrated and each topic to be integrated, where the semantic similarity refers to the similarity in semantic features, and the string graph similarity refers to the graph features corresponding to key strings Q&A similarity refers to the similarity of Q&A features; based on the target similarity, determine the target topic to which the event to be integrated belongs from at least two topics to be integrated; integrate the event to be integrated into the target topic, and get the event A context, wherein the event context includes an event to be integrated and at least one topic event. It is also used to send the event context to the terminal 200 through the network 300 .

In some embodiments, the server 400 can be an independent physical server, or a server cluster or a distributed system composed of multiple physical servers, and can also provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, Cloud servers for basic cloud computing services such as network services, cloud communications, middleware services, domain name services, security services, content delivery network (CDN, Content Delivery Network), and big data and artificial intelligence platforms. The terminal 200 may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, a smart voice interaction device, a smart home appliance, a vehicle terminal, etc., but is not limited thereto. The terminal and the server may be connected directly or indirectly through wired or wireless communication, which is not limited in this embodiment of the present application.

Referring to FIG. 2, FIG. 2 is a schematic diagram of an exemplary composition structure of the server in FIG. 1 provided by the embodiment of the present application; the server 400 shown in FIG. 2 includes: at least one processor 410, a memory 450 and at least one network interface 420 ; in some embodiments of the present application, the server 400 further includes a user interface 430 . Various components in the server 400 are coupled together through a bus system 440 . It can be understood that the bus system 440 is used to realize connection and communication among these components. In addition to the data bus, the bus system 440 also includes a power bus, a control bus and a status signal bus. However, for clarity of illustration, the various buses are labeled as bus system 440 in FIG. 2 .

Processor 410 can be a kind of integrated circuit chip, has signal processing capability, such as general processor, digital signal processor (DSP, Digital Signal Processor), or other programmable logic device, discrete gate or transistor logic device, discrete hardware Components, etc., wherein the general-purpose processor can be a microprocessor or any conventional processor, etc.

User interface 430 includes one or more output devices 431 that enable presentation of media content, including one or more speakers and/or one or more visual displays. The user interface 430 also includes one or more input devices 432, including user interface components that facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, other input buttons and controls.

Memory 450 may be removable, non-removable or a combination thereof. Exemplary hardware devices include solid state memory, hard drives, optical drives, and the like. Memory 450 optionally includes one or more storage devices located physically remote from processor 410 .

Memory 450 includes volatile memory or nonvolatile memory, and may include both volatile and nonvolatile memory. The non-volatile memory can be a read-only memory (ROM, Read Only Memory), and the volatile memory can be a random access memory (RAM, Random Access Memory). The memory 450 described in the embodiment of the present application is intended to include any suitable type of memory.

In some embodiments of the present application, the memory 450 can store data to support various operations, and examples of these data include programs, modules and data structures or subsets or supersets thereof, which are exemplarily described below.

Operating system 451, including system programs for processing various basic system services and performing hardware-related tasks, such as framework layer, core library layer, driver layer, etc., for implementing various basic services and processing hardware-based tasks;

A network communication module 452 for reaching other electronic devices via one or more (wired or wireless) network interfaces 420. Exemplary network interfaces 420 include: Bluetooth, Wireless Compatibility Authentication (Wi-Fi), and Universal Serial Bus (USB, Universal Serial Bus), etc.;

Presentation module 453 for enabling presentation of information via one or more output devices 431 (e.g., display screen, speakers, etc.) associated with user interface 430 (e.g., a user interface for operating peripherals and displaying content and information );

The input processing module 454 is configured to detect one or more user inputs or interactions from one or more of the input devices 432 and translate the detected inputs or interactions.

In some embodiments of the present application, the event integration device provided by the embodiment of the present application can be realized by software. FIG. 2 shows the event integration device 455 stored in the memory 450, which can be software in the form of programs and plug-ins. , including the following software modules: information acquisition module 4551, similarity acquisition module 4552, topic determination module 4553, event integration module 4554, model training module 4555, and event display module 4556, these modules are logical, so according to the implemented functions Arbitrary combinations or further splits are possible. The function of each module will be explained below.

In other embodiments of the present application, the event integration device provided in the embodiment of the present application may be implemented in a hardware manner. As an example, the event integration device provided in the embodiment of the present application may be a processor in the form of a hardware decoding processor. It is programmed to execute the event integration method provided by the embodiment of the present application. For example, the processor in the form of a hardware decoding processor can adopt one or more application-specific integrated circuits (ASIC, Application Specific Integrated Circuit), DSP, programmable logic Device (PLD, Programmable Logic Device), Complex Programmable Logic Device (CPLD, Complex Programmable Logic Device), Field Programmable Gate Array (FPGA, Field-Programmable Gate Array) or other electronic components.

In the following, the event integration method provided in the embodiment of the application will be described in conjunction with the exemplary application and implementation of the event integration device provided in the embodiment of the application. And, the event integration method provided by the embodiment of the present application is applied to various event integration scenarios such as cloud technology, artificial intelligence, smart transportation, and vehicle.

Referring to FIG. 3 , FIG. 3 is an optional schematic flowchart 1 of the event integration method provided by the embodiment of the present application, which will be described in conjunction with the steps shown in FIG. 3 .

S301. Acquire events to be integrated, and acquire at least two topics to be integrated.

In this embodiment of the application, the event integration device obtains the event to be integrated, and thus obtains the event to be integrated; here, the event to be integrated can be obtained by the event integration device detecting the event, or the event integration device Events to be integrated obtained by receiving events sent by other devices, etc., are not limited in this embodiment of the present application. In addition, the event integration device obtains the topics to be integrated, and obtains at least two topics to be integrated.

It should be noted that the event to be integrated refers to the event to be integrated, and the event is used to describe the information of what happened, such as news event, highlight event; and the event to be integrated can be the latest progress event, or it can be Historical events, where historical events refer to events that have occurred after the corresponding event time, which is not limited in the embodiment of the present application; in addition, the events to be integrated include at least text information, and may also include audio, video, images, and tables. at least one of . In addition, at least one topic to be integrated may be all topics in the database, or may be selected from the database and may be associated with the event to be integrated, etc., and the embodiment of the present application does not limit this; and the topic to be integrated is An event topic is a collection of related events, including at least one topic event, and a topic event is also an event.

S302. Determine the target similarity between the event to be integrated and each topic to be integrated based on one or more of semantic similarity, character string graph similarity, and question-answer similarity.

In the embodiment of the present application, the event integration device determines whether each topic to be integrated is a topic to which the event to be integrated belongs by comparing the target similarity between the event to be integrated and each topic to be integrated.

It should be noted that the target similarity refers to the possibility that the event to be integrated belongs to each topic to be integrated. Moreover, the target similarity can be determined from one or more aspects, thus, the target similarity includes one or more of semantic similarity, string graph similarity and question-answer similarity, and the target similarity includes semantic similarity One or more of string graph similarity and question-answer similarity are determined based on selection logic; among them, semantic similarity refers to the similarity in terms of semantic features, and string graph similarity refers to the key string Corresponding to the similarity in terms of graph features, the question-and-answer similarity refers to the similarity in question-and-answer features, and the selection logic is the basis for the event integration device to select from semantic similarity, string graph similarity, and question-and-answer similarity. And, based on the selection logic, the event integration device selects one or more from semantic similarity, string graph similarity and question-answer similarity to obtain the target similarity, including: based on the selection logic, the event integration device selects from semantic similarity, Select one of string graph similarity and question-answer similarity to obtain the target similarity; or, based on the selection logic, the event integration device selects at least two from semantic similarity, string graph similarity and question-answer similarity to obtain the target similarity.

It should also be noted that the selection logic includes one or more of selection order, acquisition speed, accuracy rate, topic scale, selection quantity, topic type, model training scale, and model applicable scale. Among them, the selection order is determined based on the priority of the similarity, and the priority can be determined based on one or both of accuracy and time-consuming; the acquisition speed is the speed of obtaining the similarity, and the acquisition speed can be It is determined based on one or both of the time-consuming feature extraction in the similarity and the feature extraction method (parallel or serial); the accuracy rate is the accuracy of the similarity, which can be based on the characteristics of the features used in the similarity acquisition process Or one or both of the accuracy of the corresponding network model is determined; the topic type is the content form of the topic to be integrated, for example, when the content form is an image form, you can choose string graph similarity and question-answer similarity as target similarity When the content is in the form of text, one or more of semantic similarity, string graph similarity, and question-and-answer similarity including semantic similarity can be selected as the target similarity; the topic scale is the scale of at least one topic to be integrated , can be determined based on one or more of the number of topics to be integrated and the content of the topics to be integrated; the model training scale is the training data scale corresponding to the network model used to obtain each similarity; the applicable scale of the model It is the maximum amount of data that can be carried by the network model used to obtain each similarity; the scope of application of the model is the corresponding data form of the network model used to obtain each similarity.

In the embodiment of the present application, the event integration device compares the features of the graph structure corresponding to the events to be integrated with the features of the graph structure corresponding to each topic to be integrated to obtain the similarity of the string graph. The event integration device can construct questions and articles in machine reading comprehension based on the event to be integrated and each topic to be integrated, and determine the answer information through the information interaction between the question and the article, and determine the similarity of the corresponding question and answer based on the answer information. And, semantic similarity, string graph similarity and question-answer similarity are similarities obtained from different dimensions.

S303. Based on the target similarity, determine the target topic to which the event to be integrated belongs from at least two topics to be integrated.

In this embodiment of the application, the event integration device may determine the topic that best matches the event to be integrated from at least two topics to be integrated based on the target similarity, and determine the topic that best matches the event to be integrated as the event to be integrated. The topic is the target topic; wherein, the target topic refers to the topic to be integrated corresponding to the maximum target similarity.

It should be noted that the event integration device obtains at least two corresponding target similarities between the event to be integrated and at least two topics to be integrated by obtaining the target similarity between the event to be integrated and each topic to be integrated; and then , determining a target topic from at least two topics to be integrated based on at least two target similarities, which is not limited in this embodiment of the present application.

It should also be noted that the target similarity between the target topic and the event to be integrated may be greater than the similarity threshold; thus, when the maximum target similarity is lower than the similarity threshold, the event integration device determines the maximum target similarity The topic of the event is not the topic of the event to be integrated; and when the maximum target similarity is greater than or equal to the similarity threshold, the event integration device will determine the topic corresponding to the maximum target similarity as the topic of the event to be integrated.

S304. Integrate the event to be integrated into the target topic to obtain the context of the event.

In this embodiment of the present application, the event integration device integrates the event to be integrated as a topic event in the target topic into at least one topic event included in the target topic to obtain an event context including the event to be integrated and at least one topic event. Among them, the event context refers to the occurrence process of the events described for the target topic.

It can be understood that when determining the target topic to which the event to be integrated belongs in at least two topics to be integrated, it is determined by judging the target similarity between the event to be integrated and each topic to be integrated, that is, the target topic is directly The event to be integrated is determined by comparing the target similarity with each topic to be integrated, and since the target similarity includes one or more of semantic similarity, string graph similarity and question-answer similarity, thus, through the obtained The target similarity of can accurately determine whether each topic to be integrated is the target topic to which the event to be integrated belongs, and then, when the event to be integrated is integrated into the target topic, the accuracy of event integration can be improved. In addition, when the event integration device determines the target topic of the event to be integrated from at least two topics to be integrated by using multiple types of semantic similarity, string graph similarity and question-answer similarity, it is a multi-dimensional heterogeneous Features determine the process of target topics, thus, can improve the accuracy and effectiveness of the obtained target topics, and in turn, can improve the accuracy of event integration.

Referring to Fig. 4a, Fig. 4a is an optional flow diagram II of the event integration method provided by the embodiment of the present application; as shown in Fig. 4a, in the embodiment of the present application, when the selection logic includes the selection order, S302 can pass through S3021 to S3024 implementation; that is to say, the event integration device selects one or more of semantic similarity, string graph similarity and question-answer similarity to determine the target similarity between the event to be integrated and each topic to be integrated, including From S3021 to S3024, each step will be described respectively below.

S3021. Based on the selection order, sequentially select a first set number of similarities from the descending order of priorities of semantic similarity, character string graph similarity, and question-answer similarity.

It should be noted that the first set number of similarities includes one or more of semantic similarity, character string graph similarity, and question-answer similarity.

Exemplarily, the event integration device can first select the question-answer similarity and semantic similarity with the highest accuracy, and end the selection if the result can be determined, and then select the character graph similarity if the result cannot be determined; the event integration device can also first select the time-consuming The least question-answer similarity, if the result can be determined, the selection will end, if the result cannot be determined, then choose the similarity from the semantic similarity and string graph similarity. Among them, the determined result means that the similarity of the selected target is greater than the first similarity threshold or smaller than the second similarity threshold, and the undetermined result means that the selected target similarity is less than or equal to the first similarity threshold and greater than the second similarity threshold. is equal to the second similarity threshold; here, the first similarity threshold is greater than the second similarity threshold.

S3022. Acquire comparison results between the first set amount of similarity and the similarity threshold.

It should be noted that the similarity threshold may include a first set number of sub-similarity thresholds, and the first set number of sub-similarity thresholds corresponds to the first set number of similarities one-to-one.

S3023. When the comparison result is a similar result of the event to be integrated and the topic to be integrated, determine a first set amount of similarity as a target similarity between the event to be integrated and the topic to be integrated.

It should be noted that the similar result of the event to be integrated and the topic to be integrated means that the event to be integrated is similar or not similar to the topic to be integrated, which is the above determinable result.

S3024. When the comparison result is the undetermined similarity result of the event to be integrated and the topic to be integrated, select the remaining similarities based on the selection sequence, and determine the selected similarities as the target similarity until the selection end condition is satisfied.

It should be noted that the undetermined similarity result means that it is impossible to determine whether the event to be integrated is similar or not similar to the topic to be integrated, which is the above-mentioned undetermined result; the remaining similarity is semantic similarity, string graph similarity and question-answer similarity In addition to the similarity of the first set number of similarities; the selection end condition is to determine the similarity between the event to be integrated and the topic to be integrated, or the selection end condition is to select semantic similarity, string graph similarity and question-and-answer similarity. Wherein, the selected multiple similarities refer to all similarities selected by all selection times.

Referring to Fig. 4b, Fig. 4b is an optional flow diagram III of the event integration method provided by the embodiment of the present application; as shown in Fig. 4b, in the embodiment of the present application, when the selection logic includes acquisition speed and topic scale, S302 also It can be realized through S3025 and S3026; that is, the event integration device determines the target between the event to be integrated and each topic to be integrated based on one or more of semantic similarity, string graph similarity and question-answer similarity Similarity, including S3025 and S3026, each step will be described separately below.

S3025. When the topic scale is larger than the set scale, select the second set number of similarities in descending order from the acquisition speed of semantic similarity, string graph similarity and question-answer similarity, and obtain the event to be integrated and the number of similarities to be integrated. Integrate target similarity between topics.

It should be noted that if the event integration device determines that the topic scale is larger than the set scale, it indicates that at least two topics to be integrated have a large scale, and a small number (called the second set number) needs to be used to obtain a faster similarity ( It is called the second set number of similarities selected from the descending order of acquisition speed) to determine the result.

S3026. When the topic scale is less than or equal to the set scale, select the third set number of similarities from the descending order of acquisition speed of semantic similarity, string graph similarity and question-answer similarity to obtain the event to be integrated and The target similarity between topics to be integrated.

It should be noted that if the event integration device determines that the topic scale is smaller than or equal to the set scale, it indicates that the scale of at least two topics to be integrated is relatively small, and more (called the third set number) can be used to obtain similar degree (referred to as the third set number of similarities selected from the descending order of acquisition speed) to determine the result; in addition, the second set number is smaller than the third set number.

Referring to Fig. 4c, Fig. 4c is an optional flow diagram IV of the event integration method provided by the embodiment of the present application; as shown in Fig. 4c, in the embodiment of the present application, when the target similarity includes semantic similarity, string graph similarity degree and question-answer similarity, S303 can be implemented through S3031 to S3033; that is, the event integration device determines the target topic to which the event to be integrated belongs from at least two topics to be integrated based on the target similarity, including S3031 Up to S3033, each step will be described below.

S3031. Determine weight ratios of various similarities in the target similarity based on the accuracy rate.

It should be noted that the event integration device determines the weight positively related to the accuracy rate for each of the selected similarities, and thus obtains the weight between the various similarities in the target similarity Matching ratio; wherein, the weight matching ratio represents a ratio between weights corresponding to various similarities, for example, 0.3:0.4:0.3.

S3032. Based on the weight ratio, the various similarities in the target similarity are fused to obtain the discrimination similarity.

It should be noted that, based on the weight ratio, the event integration device fuses each similarity in the target similarity with the corresponding weight of the similarity. After the fusion of all similarities in the target similarity is completed, The final similarity for judging whether the event to be integrated is similar to the topic to be integrated is obtained; wherein the final similarity for judging whether the event to be integrated is similar to the topic to be integrated is the discrimination similarity.

S3033. From the at least two topics to be integrated, select the topic to be integrated corresponding to the highest discriminant similarity, and obtain the target topic to which the event to be integrated belongs.

It should be noted that the event integration device can directly determine the topic to be integrated corresponding to the highest discriminant similarity as the target topic to which the event to be integrated belongs; it can also compare the highest discriminative similarity with a threshold, and then determine whether to use the highest The topic to be integrated corresponding to the discriminant similarity degree of , is directly determined as the target topic to which the event to be integrated belongs; etc., which are not limited in this embodiment of the present application.

In the embodiment of the present application, the semantic similarity includes one or both of semantic self-attention similarity and semantic statistical similarity; wherein, the semantic self-attention similarity is based on the self-attention between the event to be integrated and the topic event The semantic statistical similarity is determined based on the target semantics, and the target semantics refers to the semantics corresponding to at least one of the title, key string and event content; that is, the semantic statistical similarity is determined by treating the title, The vector semantic features of information such as key strings and texts are obtained by corresponding comparison with the vector semantic features of information such as titles, key strings, and texts of topic events for each topic to be integrated.

Here, the semantic self-attention similarity is obtained through the following steps: the event integration device obtains the semantic features to be integrated corresponding to the events to be integrated, and the topic event semantic features corresponding to each topic event in the topic to be integrated; based on the event to be integrated and the topic The distinguishing mark of the event, the semantic feature to be integrated is enhanced to obtain the first enhanced semantic feature, and the semantic feature of the topic event is enhanced based on the distinguishing mark to obtain the second enhanced semantic feature; the first enhanced semantic feature corresponds to the topic to be integrated At least one second enhanced semantic feature forms a semantic feature sequence, and the semantic self-attention similarity is determined based on self-attention information between two sequence units in the semantic feature sequence.

It should be noted that the self-attention information between two sequence units refers to the self-attention between the event to be integrated and any topic event.

It should also be noted that when the semantic similarity includes two types of semantic self-attention similarity and semantic statistical similarity, S3021 to S3024 include: the event integration device selects semantic self-attention similarity and question-answer similarity (the first setting A certain amount of similarity), if the semantic self-attention similarity and question-answer similarity are compared with the corresponding sub-similarity thresholds, and it is determined that the event to be integrated is not similar or similar to the topic event, then end; and if the semantic self-attention After the similarity and question-answer similarity are compared with the corresponding sub-similarity thresholds, if the event to be integrated cannot be determined to be dissimilar or similar to the topic event, continue to select semantic statistical similarity and string graph similarity for discrimination. Here, from the accuracy rate and acquisition speed, the descending order of priority can be determined as question-answer similarity, semantic self-attention similarity, string graph similarity and semantic statistical similarity; and, question-answer similarity, semantic self-attention similarity Degree, string graph similarity and semantic statistical similarity are the transition from precision to breadth in turn.

Referring to Fig. 4d, Fig. 4d is a schematic flow diagram of obtaining semantic statistical similarity provided by the embodiment of the present application; as shown in Fig. 4d, in the embodiment of the present application, the semantic statistical similarity can be obtained through S30211 to S30214, the following for each The steps are explained separately.

S30211. In each topic to be integrated, obtain the first sub-semantic similarity between the title of each topic event and the title of the event to be integrated, and determine the average first sub-semantic similarity based on the first sub-semantic similarity and the maximum first subsemantic similarity.

In the embodiment of the present application, the event integration device determines the semantic statistical similarity from one or more of the first similarity degree, the second similarity degree, the third similarity degree, and the fourth similarity degree; wherein, the first similarity The degree refers to the degree of similarity between the title of the event to be integrated and the title of each topic event in each topic to be integrated, and the second degree of similarity refers to the key string of the event to be integrated and the title of each topic to be integrated The degree of similarity between the key strings of each topic event, the third degree of similarity refers to the degree of similarity between the key character strings of the event to be integrated and the key character strings of each topic to be integrated, and the fourth degree of similarity refers to The degree of similarity between the events to be aggregated and each topic to be aggregated.

Here, for each topic event in each topic to be integrated, the event integration device obtains the degree of similarity between the title of the topic event and the title of the event to be integrated, and obtains the first sub-semantic similarity (also called first degree of similarity), so that for each topic to be integrated, at least one first sub-semantic similarity corresponding to at least one topic event can be obtained; the event integration device calculates the average value of at least one first sub-semantic similarity, The average first sub-semantic similarity is obtained; the event integration device selects the largest first sub-semantic similarity from at least one first sub-semantic similarity, and thus obtains the largest first sub-semantic similarity.

S30212. In each topic to be integrated, obtain the second sub-semantic similarity between the key string of the topic event corresponding to each topic event and the key string of the event to be integrated corresponding to the event to be integrated, and based on the second sub-semantic Similarity, to determine the average second sub-semantic similarity and the maximum sub-second semantic similarity.

In this embodiment of the application, for each topic event in each topic to be integrated, the event integration device obtains the second similarity between the topic event key string of the topic event and the event key string of the event to be integrated degree, that is, the second sub-semantic similarity is obtained, so that for each topic to be integrated, at least one second sub-semantic similarity corresponding to at least one topic event can be obtained; Calculate the average value to obtain the average second sub-semantic similarity; the event integration device selects the largest second sub-semantic similarity from at least two first sub-semantic similarities, and obtains the largest second sub-semantic similarity .

It should be noted that the key character string of the topic event is the key character string of the topic event; the key character string of the event to be integrated is the key character string of the event to be integrated.

S30213. Obtain the third sub-semantic similarity between the topic key string corresponding to each topic to be integrated and the event key string to be integrated.

In the embodiment of the present application, the event integration device obtains the third degree of similarity between the key character string of the topic and the key character string of the event to be integrated, and thus obtains the third sub-semantic similarity.

S30214. Determine the average first sub-semantic similarity, maximum first sub-semantic similarity, average second sub-semantic similarity, maximum second sub-semantic similarity, and third sub-semantic similarity as semantic statistical similarity.

It should be noted that the event integration device can combine the average first sub-semantic similarity, the maximum first sub-semantic similarity, the average second sub-semantic similarity, the maximum second sub-semantic similarity and the third sub-semantic similarity At least one is determined as the semantic statistical similarity between the event to be integrated and each topic to be integrated.

In the embodiment of this application, the first sub-semantic similarity in S30211, the second sub-semantic similarity in S30212, and the third sub-semantic similarity in S30213 can all be obtained through the semantic statistical similarity model. The statistical similarity model is used to obtain the similarity of text pairs in terms of semantic features; wherein, referring to Fig. 4e, Fig. 4e is a schematic flow chart of the semantic statistical similarity model provided by the embodiment of the present application; as shown in Fig. 4e, in this application In the embodiment, the semantic statistical similarity model is obtained through training from S305 to S307, and each step will be described below.

S305. Obtain a training sample, where the training sample includes a first character string sample, a second character string sample, and label similarity.

It should be noted that the training samples refer to the data samples used to train the semantic statistical similarity model, the first character string sample and the second character string sample are text pairs whose similarity in semantic features is to be determined, and the label similarity is The actual degree of similarity between the first string sample and the second string sample in terms of semantic features.

S306. Use the first semantic branch in the semantic statistical similarity model to be trained to obtain the first estimated semantics corresponding to the first character string sample, and use the second semantic branch in the semantic statistical similarity model to be trained to obtain the second character string The second predicted semantics corresponding to the sample, and based on the comparison result between the first predicted semantics and the second predicted semantics, determine the predicted similarity between the first character string sample and the second character string sample.

In the embodiment of the present application, the event integration device initializes the parameters of the model structure, and thus obtains the semantic statistical similarity model to be trained, wherein the semantic statistical similarity model to be trained includes the first semantic branch and the second semantic branch; then , the event integration device uses the first semantic branch to obtain the semantics corresponding to the first string sample, and thus obtains the first estimated semantics; the event integration device uses the second semantic branch to obtain the semantics corresponding to the second string sample, that is A second estimated semantics is obtained. Finally, the similarity model in the semantic statistical similarity model to be trained is used to determine the similarity between the first string sample and the second string sample, and the estimated similarity is obtained; here, the similarity model compares the first The predicted semantics and the second predicted semantics, and based on the comparison result between the first predicted semantics and the second predicted semantics, determine the predicted similarity between the first character string sample and the second character string sample.

It should be noted that the semantic statistical similarity model to be trained is a model to be trained for obtaining the similarity in the semantic features of text pairs; and the semantic statistical similarity model to be trained adopts a double-tower structure (the first semantic branch and second semantic branch), each semantic branch in the twin-tower structure is used to obtain semantic features, and the parameters in the first semantic branch and the second semantic branch in the twin-tower structure are shared. In addition, the semantic statistical similarity model to be trained may also be a pre-trained model.

It is understandable that the semantic statistical similarity model to be trained can improve the efficiency of obtaining estimated similarity by using a double-tower structure to obtain semantic features.

S307. Based on the difference between the estimated similarity and the tagged similarity, perform backpropagation in the semantic statistical similarity model to be trained to obtain a semantic statistical similarity model.

In the embodiment of the present application, the event integration device adjusts the parameters in the semantic statistical similarity model to be trained based on the difference between the estimated similarity and the labeled similarity, so as to train the semantic statistical similarity model to be trained; here, the event The integration device adjusts the parameters by performing backpropagation in the semantic statistical similarity model to be trained. Wherein, the training process of the semantic statistical similarity model to be trained is an iterative training process, and the semantic statistical similarity model to be trained after the training is the semantic statistical similarity model.

Referring to Fig. 4f, Fig. 4f is a schematic flow chart for obtaining the similarity of character string graph provided by the embodiment of the present application; as shown in Fig. 4f, in the embodiment of the present application, the similarity of character string graph can be obtained through S30221 to S30223, as follows Each step is explained separately.

S30221. In each topic to be integrated, determine each subtopic event key string corresponding to at least one topic event as a graph node, and build an edge between two graph nodes corresponding to the same topic event, and connect the graph node and Edge, determined to obtain the first key string graph.

It should be noted that, in at least one topic event in each topic to be integrated, the topic event key string corresponding to each topic event includes one or more subtopic event key strings; here, the event integration device integrates a subtopic The event key string is used as a graph node, and any two graph nodes among all the obtained graph nodes are traversed. If it is determined that the two subtopic event key strings corresponding to the two graph nodes belong to the same topic event, it is two graph nodes. If a node corresponds to the same topic event, an edge is established between the two graph nodes, and if it is determined that the key strings of the two subtopic events corresponding to the two graph nodes do not belong to the same topic event, then an edge between the two graph nodes Boundless, finally, at the end of the traversal, the obtained graph structure is the first key string graph.

S30222. Based on the key string of the event to be integrated corresponding to the event to be integrated, construct a second key string map.

It should be noted that the event integration device builds a graph structure corresponding to the event to be integrated based on the construction method of the first key string graph: the event integration device uses each sub-key string of the event to be integrated in the key string of the event to be integrated as A graph node, build an edge between any two graph nodes, and get the second key string graph.

S30223. Comparing the vector representation of the first key character string graph with the vector representation of the second key character string graph to obtain a graph comparison result, and determine the string graph similarity based on the graph comparison result.

In this embodiment of the application, the event integration device obtains the vector representation of the first key string graph, and obtains the vector representation of the second key character string graph; then, the vector representation of the first key string graph and the second key character string Compare the vector representations of string graphs, and determine the character between the event to be integrated and each topic to be integrated based on the graph comparison result between the vector representation of the first key string graph and the vector representation of the second key string graph String graph similarity.

It should be noted that the acquisition process of the vector representation of the first key character string graph includes: obtaining (for example, obtaining through the Bert model) the vector representation of the graph node and the vector representation of the edge in the first key character string graph, based on the vector representation and The vector representation of the edge obtains (for example, through a graph convolution model) the vector representation of the first key string graph. The process of obtaining the vector representation of the first key character string graph is similar to the process of obtaining the vector representation of the first key character string graph, and will not be described repeatedly in this embodiment of the present application.

Referring to Fig. 4g, Fig. 4g is a schematic flow diagram of obtaining the similarity of question and answer provided by the embodiment of the present application; as shown in Fig. 4g, in the embodiment of the present application, the similarity of question and answer can be obtained through S30231 to S30233, and the following steps are respectively Be explained.

S30231. Based on the title of each topic to be integrated, the topic key character string and the event to be integrated, combine the sentence sequence to be answered.

It should be noted that, in order to determine whether the event to be integrated belongs to a topic to be integrated through question-and-answer interaction, the event integration device constructs a question-and-answer statement corresponding to each topic to be integrated and the event to be integrated, and thus obtains a sequence of sentences to be answered. Among them, the event integration device combines the title of each topic to be integrated, the topic key character string and the event to be integrated according to the preset sentence pattern of the question and answer statement, and the obtained combination result is the constructed question and answer statement; for example: "to be Whether the "integration event" is the progress of the "title of the topic to be integrated" whose key string is "topic key string"; another example is: whether the next sentence is the progress of the "topic to be integrated" whose key string is "topic key string" Title" progress, "events to integrate".

S30232. Obtain answer information of the sentence sequence to be answered.

In the embodiment of this application, based on the questions and articles in machine reading comprehension, the event integration device determines the corresponding questions and articles in the sentence sequence to be answered, and performs underlying processing on the determined articles and questions respectively, and converts the text into numbers Coding; then, determine the semantic connection between the article and the question based on the digital code, and combine the results of the semantic analysis of the article to obtain the characteristics of the determined question, and also combine the results of the semantic analysis of the question to obtain the characteristics of the determined article; finally, The event integration result is based on the determined representation information of the question, the determined characteristics of the article, and the type of the answer, and the output answer information is obtained.

It should be noted that the answer information refers to whether the event to be integrated belongs to each topic to be integrated, which can be "yes" (the event to be integrated belongs to the topic to be integrated), or "no" (the event to be integrated does not belong to the topic to be integrated). The topic to be integrated) may also be the possibility that the event to be integrated belongs to the topic to be integrated, etc., which is not limited in this embodiment of the present application.

S30233. Based on the answer information, determine the question-answer similarity.

It should be noted that, based on the answer information, the event integration device determines the possibility that the event to be integrated belongs to each topic to be integrated, and determines the determined possibility as the question-answer similarity between the event to be integrated and each topic to be integrated .

Referring to FIG. 4h, FIG. 4h is a schematic flow diagram of obtaining at least two topics to be integrated provided by the embodiment of the present application; as shown in FIG. 4h, in the embodiment of the present application, the event integration device obtains at least two topics to be integrated in S301, Including S3011 to S3013, each step will be described respectively below.

S3011. Obtain a matching result between a topic key character string corresponding to each topic and an event to be integrated in the topic database, wherein the topic database includes a plurality of topics.

In the embodiment of the present application, the event integration device can obtain the preset topic library, so that after obtaining the event to be integrated, the event integration device determines the topic to which the event to be integrated belongs from the topic library, and then integrates the event to be integrated into in the topic it belongs to. Here, the event integration device first matches each topic in the topic library with the event to be integrated, and the matching is to match the topic key string corresponding to each topic with the event to be integrated.

It should be noted that the topic key character string is the key character string of the topic. The topic library includes a plurality of topics, each topic is a theme of an event; and, each topic in the topic library includes at least one topic event, and the topic events included in different topics can be the same or different; and, at least one Topic events refer to events associated with topics that occur in different time periods, so at least one topic event has a time sequence.

S3012. Based on the matching result, when it is determined that at least one subtopic key string in the topic key string matches the event to be integrated, determine the topic corresponding to the matching result as the topic to be integrated that matches the event to be integrated.

S3013. Obtain at least two topics to be integrated that match the event to be integrated from the topic database.

It should be noted that, if the matching result of the topic key string corresponding to each topic obtained by the event integration device and the event to be integrated indicates that at least one subtopic key string in the topic key string matches the event to be integrated, it is determined The topic corresponding to the matching result is the topic to be integrated that matches the event to be integrated; and the matching result of the topic key string corresponding to each topic and the event to be integrated, if it indicates that the topic key string does not match the event to be integrated, then determine The topic corresponding to the matching result is not the topic to be integrated that matches the event to be integrated. Here, when the event integration device obtains multiple matching results of multiple topics corresponding to the event to be integrated, it can obtain at least one topic to be integrated that matches the event to be integrated from the topic database; If there is no topic to be integrated that matches the event to be integrated, a new topic including the event to be integrated will be constructed, and the new topic will be updated to the topic library; in addition, after obtaining at least one topic to be integrated, when at least one topic to be integrated is a When the topic to be integrated, the event integration device can determine whether the topic to be integrated is the target topic by comparing the target similarity with the similarity threshold, or directly determine the topic to be integrated as the target topic.

It can be understood that, in the process of integrating the event to be integrated into the target topic, at least two topics to be integrated that may be related to the event to be integrated are recalled based on the key character string of the topic and the event to be integrated, Then, based on the similarity between the event to be integrated and each topic to be integrated, accurately determine the target topic to which the event to be integrated belongs from at least two topics to be integrated; therefore, the embodiment of the present application adopts recall-similarity The classification method can quickly realize the integration of events to be integrated into target topics, reduce the time-consuming calculation of the integration process and the number of topics to be integrated, and thus improve the efficiency of event integration.

Continuing to refer to Figure 4h, in the embodiment of the present application, S3011 also includes S3014 to S3016; that is, before the event integration device obtains the matching result of the topic key string corresponding to each topic and the event to be integrated, the event integration method S3014 to S3016 are also included, and each step will be described separately below.

S3014. From at least one topic event corresponding to each topic, acquire a topic event key string corresponding to each topic event.

It should be noted that the topic key string corresponding to the topic is obtained through the key string of at least one topic event; here, the event integration device first obtains the topic event key string corresponding to each topic event, because each topic includes at least A topic event, thus, at least one topic event corresponds to at least one topic event key string.

S3015. Count the number of topic events corresponding to each subtopic event key string in the topic event key string.

It should be noted that the event integration device counts the number of topic events corresponding to each subtopic event key string in each topic event key string in each topic event key string of at least one topic event key string, The number of topic events corresponding to each subtopic event key string is obtained, thereby obtaining the number of multiple topic events corresponding to multiple subtopic event key strings under the topic.

S3016. Combine the fourth set number of sub-topic event key strings with the maximum number of topic events into a topic key string.

It should be noted that the event integration device selects a fourth set number (for example, 2) of subtopic event key characters corresponding to the number of subtopic event key strings under the topic with the maximum number of topic events string, and determine the fourth set number of sub-topic event key strings with the largest number of topic events as topic key strings.

Referring to Fig. 4i, Fig. 4i is a schematic flow diagram of obtaining a topic event key character string provided by the embodiment of the present application; as shown in Fig. 4i, in the embodiment of the present application, in S3014, the event integration device obtains the topic corresponding to each topic event The event key character string can be realized through S30141 to S30143, and each step will be described below.

S30141. Perform entity recognition on each topic event to obtain an entity key string corresponding to a preset entity type.

It should be noted that the event integration device obtains key strings of topic events from multiple dimensions; one of the dimensions is the entity of the topic event, and the event integration device can obtain preset entity types in advance, such as person name type and place name type; Here, the event integration device performs entity identification on each topic event, selects an entity of a preset entity type from the identified entities, and uses the selected entity of a preset entity type as an entity key string.

S30142. Perform string weight analysis on each topic event to obtain an action key string.

It should be noted that the event integration device can also obtain key strings of topic events from the dimension of string weight; here, the event integration device analyzes the weight of strings in topic events to obtain strings greater than the weight threshold, And the character string representing the action among the obtained character strings greater than the weight threshold is determined as the action key character string.

S30143. Based on one or both of the entity key string and the action key string, determine the topic event key string.

It should be noted that when the event integration device determines the topic event key string based on the entity key string, it may determine all the entity key strings as the topic event key string, or extract a string from the entity key string to Obtain the topic event key string; when the event integration device determines the topic event key string based on the action key string, it can determine all the action key strings as the topic event key string, or extract characters from the action key string string to obtain the key string of the topic event; the event integration device can determine the string obtained in any combination of the entity key string and the action key string as the topic event key string.

It can be understood that, since a topic event usually includes at least one of characters, locations and actions, the event integration device determines the key strings of the topic event based on the character strings associated with the characters, locations and actions in the topic event, and can Improve the accuracy of topic event key strings.

Referring to Fig. 4j, Fig. 4j is the second schematic flow diagram for obtaining the key character string of the topic event provided by the embodiment of the present application; as shown in Fig. 4j, in the embodiment of the present application, S30143 can be realized through S301431 to S301433; that is, the event The integration device determines the topic event key string based on one or both of the entity key string and the action key string, including S301431 to S301433, and each step is described below.

S301431. Obtain the number of entity key character strings corresponding to the entity key character string.

It should be noted that the event integration device can first determine the topic event key string based on the entity key string; here, when the number of character strings in the key string of each topic event is limited, the event integration device can be based on the entity key character The number of character strings included in the string, the character string selected from the action key string is determined as the topic event key string, and it is also possible to determine whether to use the action key string as a topic event based on the number of character strings included in the entity key string key string.

S301432. When the number of entity key strings is less than the fifth set number, combine entity key strings and action key strings into topic event key strings.

It should be noted that when the number of character strings in the key character strings of each topic event is limited and is the fifth set number, the event integration device determines that the entity The key string is not enough to be the key string of the topic event, and the key string of the action needs to be determined as the key string of the topic event; that is, at this time, the key string of the topic event includes the key string of the entity and the key string of the action.

S301433. When the entity key string is greater than or equal to the fifth set number, determine the entity key string as the topic event key string.

It should be noted that, when the number of key strings in each topic event is limited and is the fifth set number, the event integration device, when the number of entity key strings is greater than or equal to the fifth set number, It is determined that the character strings in the entity key strings are sufficient as the topic event key strings, and at this time, the topic event key strings include the entity key strings.

Referring to FIG. 5, FIG. 5 is an optional schematic flowchart five of the event integration method provided by the embodiment of the present application; as shown in FIG. 5, in the embodiment of the present application, S304 also includes S308 to S311; that is, the event After the integration device integrates the event to be integrated into the target topic, and obtains the event context including the event to be integrated and at least one topic event, the event integration method further includes S308 to S311, and each step will be described below.

S308. Presenting a search control.

It should be noted that the search control is used for searching information, thus, the search control can be used for searching topic events.

S309. In response to the first search operation acting on the search control, present a simplified event context corresponding to the event context and a presentation control corresponding to the simplified event context.

In this embodiment of the application, when the user triggers the search control to search for information, if the searched information is information associated with the integrated target topic, the event integration device receives the first search operation on the search control; thus At this point, the event integration device presents search results in response to the first search operation. Here, the presented search results may include a simplified event context corresponding to the event context and a presentation control corresponding to the simplified event context.

It should be noted that the simplified event context belongs to the event context, and the presentation content is part of the events in the event context; the presentation control is used to present the entire event context, for example, the "View More" button, the expansion icon, etc.

S310. Present an event context in response to a presentation operation acting on the presentation control, wherein each event in the presented event context includes an event title and an event time, and the event is any one of an event to be integrated and at least one topic event.

It should be noted that when the user triggers the presentation control to view the entire event context, the event integration device also receives the presentation operation on the presentation control; at this time, the event integration device presents the entire event context in response to the presentation operation and, the event integration device realizes the presentation of the event context by presenting the event title and event time of each event in the event context, wherein the event is any one of the event to be integrated and at least one topic event.

In this embodiment of the application, the presented search results may include search recommendation results, where the search recommendation results refer to recommended information for the integrated target topic, for example, "Are you searching for the title of the integrated target topic?" "; here, when the user performs a trigger operation on the search recommendation result, the event integration device can present the simplified event context corresponding to the event context and the presentation control corresponding to the simplified event context, and respond to the presentation operation acting on the presentation control, presenting The event context; the event context may also be presented directly; this is not limited in this embodiment of the present application.

For example, refer to FIG. 6, which is a schematic diagram of an exemplary event context presentation provided by the embodiment of the present application; as shown in FIG. Simplified event context 6-11, also presents a presentation control 6-12; when clicking (presentation operation) presentation control 6-12, presents the entire event context 6-21 as shown in area 6-2; here, the presented Each event in the event context is realized by presenting the event title (for example, event title 6-211) and event time (for example, event time 6-212), and the detailed information of the corresponding event is displayed by clicking on the event title 6-211.

For example, refer to FIG. 7 , which is a schematic diagram showing another exemplary event context provided by the embodiment of the present application; as shown in FIG. 7 , page 7-1 is a page for presenting search results, and other results are presented At the same time, the search recommendation result 7-11 is presented. When the search recommendation result 7-11 is clicked, the event context 6-21 shown in the area 6-2 in FIG. 6 is presented.

S311. Present event detailed information in response to a view operation acting on the event title or event time.

It should be noted that the event title or event time is a control that can be triggered, or each event corresponds to a control for viewing details. When the user triggers the event title, the event integration device also receives the view function on the event title. Operation; when the user triggers the event time, the event integration device also receives the view operation on the event time; when the user triggers the control for viewing details, the event integration device also receives the action on the control for viewing the details View operation; at this time, the event integration device presents event detailed information in response to the view operation, wherein the event detailed information refers to the detailed description information of the event in the event context, that is, the event content.

Referring to Fig. 8, Fig. 8 is an optional schematic flow diagram six of the event integration method provided by the embodiment of the present application; as shown in Fig. 8, in the embodiment of the present application, S312 to S314 are also included after S304; that is, the event After the integration device integrates the event to be integrated into the target topic, and obtains the event context including the event to be integrated and at least one topic event, the event integration method further includes S312 to S314, and each step will be described below.

S312. Present the last information to be presented of the target event.

It should be noted that the target event is any one of the events to be integrated and at least one topic event included in the event context; the last information to be presented refers to the information of the final presentation progress of the target event, for example, the last page of the target event , the bottom area of the target event.

S313. In the recommendation area corresponding to the last information to be presented, present remaining events in the event context associated with the target event.

It should be noted that there is also a recommendation area on the page presenting the final information to be presented, and the recommendation area is used to present the recommendation information; here, the recommendation information presented by the event integration device in the recommendation area is the remaining events, wherein the remaining events are Any event in the event context except the target event may also be the latest event in the event context except the target event. Wherein, the remaining events may be displayed in the form of search content in the search box, or in the form of links, etc., which is not limited in this embodiment of the present application.

S314. In response to the second search operation on the remaining events, present detailed information of the remaining events.

It should be noted that when the user triggers the viewing operation on the remaining events, the event integration device also receives the second search operation on the remaining events; at this time, the event integration device responds to the second search operation and presents the remaining Details of the event to complete in response to the second search operation.

In the embodiment of the present application, when the event integration device is implemented as a server, S308 to S314 can be implemented by the server; or the server can send the event context to the terminal, and be implemented by the terminal; the embodiment of the present application does not limit this .

It is understandable that the event context can provide gain information other than search words, actively explore relevant reading needs on the premise of meeting search needs, improve the integrity of information presentation in search results pages, and reduce searches that do not obtain target information in search scenarios The number of times, thereby reducing the resource consumption of the search process, can also improve the accuracy of the searched information delivery, and increase the frequency of user searches.

Next, an exemplary application of the embodiment of the present application in an actual application scenario will be described. This exemplary application describes the event context obtained by mounting the latest progress event under the topic, and presents the event context in response to the user's search operation.

It should be noted that for a news topic with a long duration (called a topic, which is often composed of multiple events that have occurred (called at least one topic event), after obtaining the latest progress event of a news topic (the event to be integrated ), mount the latest progress event under the news topic (called the target topic) to form an event context containing the latest progress event; by presenting the event context, the development process of the event can be presented intuitively. When the event integration method provided by the embodiment of the present application is used to attach the latest progress event to the news topic, it can be realized through two stages of recall and classification, including the following steps.

First, according to the event content of the latest progress event, possibly related news topics (called at least two topics to be integrated) are recalled from the news topic database (called topic library).

It should be noted that each news topic in the news topic database corresponds to a topic keyword (called a topic key character string), and any keyword in the latest progress event and topic keywords (called a subtopic key character string) by the server string) match, determine that the news topic is a news topic in possibly related news topics.

For example, see Figure 9, which is a schematic diagram of an exemplary news topic recall provided by the embodiment of this application; Title of progression event 9-1. In the news topic database 9-2, the news topic 9-21 includes 3 events, and the corresponding topic keywords 9-211 are "nurse" and "vice dean"; the news topic 9-22 includes 4 events, and the corresponding The topic keywords 9-221 are "H place" and "jumping the car"; the news topic 9-23 includes 4 events, and the corresponding topic keywords 9-231 are "Li San" and "the first object". When the topic keyword of each news topic in the news topic database 9-2 is matched in the latest development event 9-1, because the "first object" in the topic keyword 9-231 corresponding to the news topic 9-23 , matches the "first object" in the title of the latest development event 9-1, thus, the news topic 9-23 is a news topic among the recalled possibly related news topics.

It should also be noted that among the keywords of all topic events under the news topic, the two keywords with the largest number of topic events (called the first number threshold) are the topic keywords. The topic keywords of each topic event can be obtained through entity recognition and word weight analysis (called string weight analysis). Here, the server can use the entity recognition model (Char-Word Union CNN, CWCNN) to realize entity recognition, and The entity of the person name type and the place name type in the identified entity is used as the first keyword (called the entity key string); the server can use the classification model (for example, "XGboost" model) to realize the word weight analysis, and put the weight higher than The verb of weight threshold is used as the second keyword (called action key character string); If the number of words of the first keyword is greater than 3 (called the 5th setting quantity), then no longer consider the second keyword, only will The first keyword is used as the keyword of the topic event; if the number of words of the first keyword is less than 3, the first keyword and the second keyword are jointly used as the topic keyword of the topic event.

Then, the similarity between each news topic among the possibly related news topics and the latest development event is obtained, so as to determine whether each news topic is related to the latest development event based on the similarity.

Referring to FIG. 10, FIG. 10 is an exemplary schematic diagram of determining whether a news topic is related to the latest progress event provided by an embodiment of the present application; as shown in FIG. 10, the server obtains each possibly related news topic from three aspects and The similarities of the latest progress events are vector semantic similarity 10-1 (called semantic similarity), keyword graph similarity 10-2 (string graph similarity) and question-answer semantic similarity 10- 3 (referred to as question-answer similarity); finally, use the fusion model 10-4 (for example, "XGboost" model, "GBDT" model) to synthesize vector semantic similarity 10-1, keyword map similarity 10-2 and question-answer The semantic similarity 10-3 determines the decision score 10-5, so as to determine whether each news topic is related to the latest progress event according to the decision score, and finally obtain the relevant news topic 10-6 (called the target topic), that is, the latest progress event The news topic to which it belongs.

The calculation process of the similarity of each dimension is described below.

Vector semantic similarity 10-1 includes vector semantic statistical similarity (called semantic statistical similarity) and vector semantic self-attention similarity (called semantic self-attention similarity); the acquisition of vector semantic statistical similarity is first Get explained.

The server calculates the similarity between the title of each topic event in the news topic and the title of the latest development event, obtains the semantic similarity of the title vector (called the first sub-semantic similarity), and obtains the average title semantic similarity (called the first sub-semantic similarity) The average first sub-semantic similarity) and the maximum title semantic similarity (called the largest first sub-semantic similarity); calculate the keywords of each topic event in the news topic (called the topic event key string) and the latest progress event The degree of similarity between keywords (called event key strings to be integrated) is used to obtain the semantic similarity of event keyword vectors (called the second sub-semantic similarity), and the average event keyword semantic similarity (called The average second sub-semantic similarity) and the maximum event keyword semantic similarity (called the largest sub-second semantic similarity); calculate the similarity between the topic keyword of the news topic and the keyword of the latest progress event, and get the topic Keyword vector semantic similarity (referred to as the third sub-semantic similarity); here, title vector semantic similarity, average title semantic similarity, maximum title semantic similarity, event keyword vector semantic similarity, average event keyword semantic Similarity, maximum event keyword semantic similarity and topic keyword vector semantic similarity are collectively referred to as vector semantic statistical similarity.

It should be noted that the similarity between the title of each topic event in the news topic and the title of the latest progress event is calculated, and the similarity between the keywords of each topic event in the news topic and the keyword of the latest progress event is calculated , and calculating the similarity between the topic keywords of the news topic and the keywords of the latest development events can be realized through the network model (semantic statistical similarity model).

Referring to Figure 11a, Figure 11a is a schematic diagram of an exemplary model for obtaining vector semantic similarity provided by the embodiment of the present application; as shown in Figure 11a, the network model 11-1 is used to obtain the vector The degree of similarity in semantics, and the network model 11-1 is a two-tower structure. Here, the processing process of the network model 11-1 is described with the process of obtaining the semantic similarity of topic keyword vectors: input the topic keyword 11-2 corresponding to the news topic into the first network branch 11 in the network model 11-1 -11 (referred to as the first semantic branch), the semantic vector 11-3 corresponding to the topic keyword 11-2 is obtained, and the keyword 11-4 corresponding to the latest progress event is input to the second network in the network model 11-1 In the branch 11-12 (called the second semantic branch), the semantic vector 11-5 corresponding to the keyword 11-4 is obtained; then the degree of similarity between the semantic vector 11-3 and the semantic vector 11-5 is obtained by cosine similarity , and the topic keyword vector semantic similarity 11-6 is obtained. In addition, the first network branch and the second network branch can be the same network branch, for example, both are "Bert" models; and the semantic vector 11-3 and the semantic vector 11-5 are the first output of each network branch The starting character (called "CLS") vector of the dimension, the corresponding dimension can be 768 dimensions, for example; train.

When obtaining vector semantic self-attention similarity, refer to Figure 11b, Figure 11b is a schematic diagram of another exemplary model for obtaining vector semantic similarity provided by the embodiment of the present application; as shown in Figure 11b, the network model 11- The encoding module 11-71 (for example, "Bert" model) in 7 is used to obtain the semantic vector of each event in the event sequence composed of the latest progress event and at least one topic event 11-72 in a news topic, and the obtained The vector feature sequence 11-73 corresponding to the event sequence in turn; here, the server determines the segmentation mark of the latest progress event as 0 (called a distinguishing mark), and determines the splitting mark of a topic event as 1 (called a distinguishing mark). The server obtains the semantic vector corresponding to 0 and the semantic vector corresponding to 1, and then merges the semantic vector corresponding to 0 with the semantic vector of the latest development event (semantic feature to be integrated), and combines the semantic vector corresponding to 1 with the semantic vector of the topic event (topic Event semantic features) are merged, and finally all the merged results are input into the transformation (TRANSFORMER) model 11-74, and the vector semantic self-attention similarity can be obtained.

It should be noted that the conversion model 11-74 is a network model in natural language processing, and the conversion model 11-74 may be formed by stacking at least one (for example, three) conversion models. And, the conversion model 11-74 calculates the self-attention between two events (called two sequence units) in the event sequence to adaptively determine the relationship between the latest progress event and the topic event, and then automatically judge the latest progress event Topic events to watch for when matching.

When obtaining the similarity 10-2 of the keyword graph, the server constructs the keyword graph corresponding to each news topic and the keyword graph of the latest progress event, and then obtains the representation of the two keyword graphs through the graph convolutional network, and passes Calculate the cosine distance between the representations of two keyword graphs to obtain the similarity of the keyword graphs.

Among them, the construction method of the keyword graph corresponding to each news topic is as follows: each keyword corresponding to each topic event under each news topic is used as a graph node, if two keywords corresponding to two graph nodes belong to For the same topic event, build an edge between two graph nodes, and finally obtain a keyword graph (called the first key string graph) corresponding to each news topic.

For example, refer to Table 1, which includes the title column of the topic event and the keyword column of the topic event, and Table 1 is as follows.

Table 1

标题title	关键词Key words
李三签署管理令：8月23日封锁第一对象Li San signed the management order: the first object will be blocked on August 23	第一对象\|李三\|管理\|封锁First Object\|Li San\|Manage\|Block
第一组织回应李三The first organization responds to Li San	李三\|第一组织Lee San \| First Organization
第一部门回应李三签署涉第一对象管理令The first department responded that Li San signed the management order involving the first object	第一对象\|李三\|第一部门First Object \| Lee San \| First Sector
第一组织宣布8月24日起诉李三The First Organization announced that it will sue Li San on August 24	李三\|第一组织\|起诉Lee San \| First Organization \| Prosecution
第二部门计划8月25日起禁止调用第一对象The second department plans to ban calling the first object from August 25	第一对象\|计划\|第二部门First Object \| Program \| Second Sector
第三部门阻止第一对象禁令Third Sector Blocks First Object Ban	第一对象\|第三部门\|禁令First Object \| Third Sector \| Prohibition
第二对象下的第二组织要求第三组织暂停第一对象禁令Second organization under second object asks third organization to suspend first object ban	第一对象\|第三组织\|第二对象First Object \| Third Organization \| Second Object
张二撤销对第一对象和第三对象的禁令Zhang Er revoked the ban on the first object and the third object	张二\|第一对象\|第三对象Zhang Er\|First object\|Third object

Refer to Figure 12 for the keyword graph built based on Table 1, which is a schematic diagram of an exemplary keyword graph provided by the embodiment of the present application; as shown in Figure 12, the graph nodes in the keyword graph 12-1 are based on The keywords corresponding to each topic event in Table 1 are determined, including: first object, Li San, management, blockade, first organization, first department, prosecution, plan, second department, third department, ban, third department The third organization, the second object, Zhang Er and the third object; wherein, the edges between the nodes of each graph are shown in Figure 12.

Similarly, see FIG. 13, which is a schematic diagram of another exemplary keyword graph provided by the embodiment of the present application; as shown in FIG. 13, the keyword graph 13-1 (called the second key character string graph) The graph nodes in are the keywords of the latest progress event: the first department, Zhang Er and the first object; and any two of the first department, Zhang Er and the first object have edges.

When obtaining the semantic similarity 10-3 of the question and answer, the server builds a sentence sequence (called the sentence sequence to be answered) based on the title and keywords of the information topic and the latest progress event; and through the network model (for example, "MRC-Bert "model) to obtain the output of the sentence sequence, and determine the similarity of the question-and-answer semantics based on the first-dimensional feature of the output (called answer information).

Referring to FIG. 14, FIG. 14 is a schematic diagram of an exemplary acquisition of semantic similarity of questions and answers provided by an embodiment of the present application; as shown in FIG. 14, a sentence sequence 14-1 is input into a network model 14-2 (for example, " MRC-Bert" model) to obtain the answer information 14-3, and then determine the question-answer semantic similarity 10-3 based on the answer information 14-3. Among them, "CLS" in the sentence sequence 14-1 indicates the start of the sentence sequence, "SEP" represents the segmentation between sentences, and the sentence sequence 14-1 also includes questions and latest progress events constructed based on news topics. For the title of the latest development event 9-1 "The first department responds to Zhang Er's revocation of the ban on the first object", the keywords are "the first object" and "Li San", and the title is "Li San blocks the first object" News topic 9-23, the sentence sequence constructed is "[CLS] Is the next sentence the key word for the first object and Li San's news topic Li San blocks the progress of the first object? [SEP] The first department responds to Zhang Er Revocation of the ban on the first object [SEP]", can also be "[CLS] The first department responds to whether the event of Zhang Er's revocation of the ban on the first object belongs to the news topic of the first object and Li San whose keywords are blocked by Li San First Object [SEP]"

The exemplary application of the event integration method provided by the embodiment of the present application will be described below.

It should be noted that the network model 11-1, the network model 11-7, the network model for obtaining the similarity of string graphs, and the network model 14-2 can be trained on more than 2,000 topics during model training. Select and construct 50,000 topic-event sample pairs. For example, events that have been audited online (corresponding to similarity) and not online (corresponding to dissimilarity) can be selected from all news topics that have been launched and operated after a point in time. The online event is used as a positive sample, and the non-online event is used as a negative sample. Construct topic-event sample pairs in event order. For example: topic A contains five online events of abcde and two offline events of fg (wherein, event de and event fg both occur after event abc), then six topic-event sample pairs can be constructed, as shown below.

Positive samples: abc->d, abcd->e;

Negative samples: abc->f, abc->g, abcd->f, abcd->g.

Refer to Figure 15, which is a schematic diagram of an exemplary feature importance provided by the embodiment of the present application; as shown in Figure 15, the ordinate indicates the importance index, and the descending order of the importance index is as follows: question and answer Semantic similarity 10-3, vector semantic self-attention similarity 15-1, maximum title semantic similarity 15-2, keyword map similarity 10-2, maximum event keyword semantic similarity 15-3, average The title semantic similarity is 15-4 and the topic keyword vector semantic similarity is 15-5; among them, the maximum title semantic similarity is 15-2, the maximum event keyword semantic similarity is 15-3, the average event keyword semantic similarity, the average The title semantic similarity 15-4 and the topic keyword vector semantic similarity 15-5 together constitute the vector semantic statistical similarity in the vector semantic similarity 10-1 in FIG. 10 .

In addition, when the similarity of the 8 dimensions (Q&A semantic similarity 10-3, vector semantic self-attention similarity 15-1, maximum title semantic similarity 15-2, keyword graph similarity 10-2, The maximum event keyword semantic similarity is 15-3, the average title semantic similarity is 15-4, the topic keyword vector semantic similarity is 15-5, and the average event keyword semantic similarity) when performing corrosion tests, the experimental results are shown in Table 2 Show.

Table 2

It is easy to know from Table 2 that when the similarity of 8 dimensions is used for event integration, the area under the corresponding receiver operating characteristic (ROC, Receiver Operating Characteristic) curve (AUC, Area Under Curve) is 0.9420; when the average title semantics is removed When the similarity is 15-4, the AUC is reduced by 0.0025. When the maximum title semantic similarity is 15-2, the AUC is reduced by 0.0101. When the average event keyword semantic similarity is removed, the AUC is reduced by 0.0043. When the maximum event keyword semantic similarity is removed, the 15- At 3, the AUC is reduced by 0.0000. When the topic keyword vector semantic similarity of 15-5 is removed, the AUC is reduced by -0.0013. When the vector semantic self-attention similarity of 15-1 is removed, the AUC is reduced by 0.0098. When the similarity of the keyword map is removed by 10 When -2, the AUC is reduced by 0.0081, and when the similarity of question and answer semantics is 10-3, the AUC is reduced by 0.0152; therefore, it shows that the similarity of the eight dimensions contributes to the determination of the result when the event is integrated, corresponding to the importance index The results are consistent.

It should be noted that AUC refers to the area enclosed by the ROC curve and the coordinate axis, which is a performance indicator.

In the embodiment of the present application, selection may also be made based on the accuracy and time-consuming of each similarity to determine whether the latest progress event matches the topic event. Referring to Table 3, Table 3 describes the time consumption corresponding to the network model 11-1, the network model 11-7, the network model for obtaining the similarity of the string graph, and the network model 14-2.

table 3

It can be seen from Table 3 that the descending sequence based on time consumption is: network model 11-1, network model for obtaining string graph similarity, network model 11-7, and network model 14-2. And because the semantic similarity of question and answer 10-3 is obtained through the network model 14-2, the vector semantic self-attention similarity 15-1 is obtained through the network model 11-7, the maximum semantic similarity of the title is 15-2, the maximum Event keyword semantic similarity 15-3, average title semantic similarity 15-4, topic keyword vector semantic similarity 15-5 and average event keyword semantic similarity are obtained through network model 11-1, keyword map The similarity 10-2 of is obtained through the network model used to obtain the similarity of the string graph; therefore, the server uses the network model 11-7 and the network model 14-2 with the fastest speed and highest accuracy as the initial calculation scheme, When the similarity obtained by the network model 11-7 and the network model 14-2 is high enough (for example, greater than 0.7), it is directly determined to match, and when it is low enough (for example, less than 0, 1), it is determined not to match, only when the similarity is in In the middle range (for example, 0,1 to 0.7), continue to use the network model 11-1 and the network model used to obtain the similarity of the string graph to calculate the similarity, and input all the obtained similarity combination features into the fusion model for further Final judgment.

Exemplarily, the determination process is shown in formula (1).

Among them, S ₁ is the similarity degree output by the network model 11-7, and S ₂ is the similarity degree output by the network model 14-2.

It can be understood that the embodiment of the present application integrates the information of multiple events of the topic by adopting multi-dimensional heterogeneous features; and adopts three kinds of heterogeneous features based on vector semantics, features based on keyword graphs and features based on question-and-answer semantics The feature model can improve the accuracy and rationality of the similarity calculation. In addition, by adopting the event integration method provided in the embodiment of the present application, automatic batch event integration can be realized without manual participation, and the efficiency of event integration can be improved.

The following continues to describe the exemplary structure of the event integration device 455 provided by the embodiment of the present application implemented as a software module. In some embodiments, as shown in FIG. 2 , the software modules stored in the event integration device 455 of the memory 450 may include :

The information acquisition module 4551 is configured to acquire events to be integrated, and acquire at least two topics to be integrated, wherein each of the topics to be integrated includes at least one topic event;

The similarity acquisition module 4552 is configured to determine the target similarity between the event to be integrated and each topic to be integrated based on one or more of semantic similarity, string graph similarity and question-answer similarity , wherein the semantic similarity refers to the similarity in semantic features, the string graph similarity refers to the similarity in graph features corresponding to key strings, and the question-answer similarity refers to the similarity in question-answer features Spend;

Topic determination module 4553, configured to determine the target topic to which the event to be integrated belongs from at least two topics to be integrated based on the target similarity;

The event integration module 4554 is configured to integrate the event to be integrated into the target topic to obtain an event context, wherein the event context includes the event to be integrated and at least one topic event.

In the embodiment of the present application, one or more of the semantic similarity, the string graph similarity, and the question-answer similarity are selected through selection logic; the selection logic includes selection order, One or more of acquisition speed, accuracy rate, topic scale, selection quantity, topic type, model training scale, model application range and model application scale, wherein the selection order is determined based on the priority of similarity, The acquisition speed is the speed of obtaining the similarity, the accuracy is the accuracy of the similarity, the topic type is the content form of the topic to be integrated, and the topic scale is at least two of the topics to be integrated scale, the model training scale is the scale of training data corresponding to the network model used to obtain each type of similarity.

In this embodiment of the present application, when the selection logic includes the selection order, the similarity acquisition module 4552 is further configured to, based on the selection order, select from the semantic similarity, the string graph similarity In the descending order of the priority of the similarity of the question and answer, the similarity of the first set number is selected in turn; the comparison result of the similarity of the first set number and the similarity threshold is obtained; when the comparison result is When the event to be integrated is similar to the topic to be integrated, the first set amount of similarity is determined as the target similarity between the event to be integrated and the topic to be integrated; When the comparison result is a pending similarity result between the event to be integrated and the topic to be integrated, select the remaining similarities based on the selection order, until the selection end condition is met, the selected multiple similarities Determined as the target similarity, wherein the remaining similarity is the semantic similarity, the string graph similarity and the question-answer similarity, except for the first set number of similarities The similarity of the selection end condition is to determine the similarity between the event to be integrated and the topic to be integrated, or the selection end condition is to select the semantic similarity, the string graph similarity Similarity to the question and answer.

In this embodiment of the present application, when the selection logic includes the acquisition speed and the topic scale, the similarity acquisition module 4552 is further configured to, when the topic scale is larger than the set scale, from the semantic In the descending order of the acquisition speed of the similarity, the similarity of the string graph and the similarity of the question and answer, the second set number of similarities are selected in turn to obtain the relationship between the event to be integrated and the topic to be integrated. The target similarity between; when the topic scale is less than or equal to the set scale, sort in descending order from the acquisition speed of the semantic similarity, the string graph similarity and the question-answer similarity , sequentially select the third set number of similarities to obtain the target similarity between the to-be-integrated event and the to-be-integrated topic, wherein the second set number is smaller than the third set quantity.

In this embodiment of the present application, when the target similarity includes multiple types of the semantic similarity, the character string graph similarity, and the question-answer similarity, the topic determination module 4553 is further configured to The accuracy rate determines the weight ratio of various similarities in the target similarity; based on the weight ratio, fuses the various similarities in the target similarity to obtain a discrimination similarity; from at least two Among the topics to be integrated, the topic to be integrated corresponding to the highest discriminant similarity is selected to obtain the target topic to which the event to be integrated belongs.

In the embodiment of the present application, the semantic similarity includes at least one of semantic self-attention similarity and semantic statistical similarity, wherein the semantic self-attention similarity is based on the relationship between the event to be integrated and the topic event Self-attention is determined, the semantic statistical similarity is determined based on target semantics, and the target semantics refers to semantics corresponding to at least one of titles, key character strings, and event contents.

In this embodiment of the application, the semantic similarity includes semantic self-attention similarity, and the similarity acquisition module 4552 is also configured to acquire the semantic feature to be integrated corresponding to the event to be integrated, and the topic to be integrated The topic event semantic feature corresponding to each of the topic events; based on the distinction between the event to be integrated and the topic event, the semantic feature to be integrated is enhanced to obtain the first enhanced semantic feature, and based on the Enhancing the semantic feature of the topic event by distinguishing the mark to obtain a second enhanced semantic feature; combining the first enhanced semantic feature and at least one second enhanced semantic feature corresponding to the topic to be integrated into a semantic feature sequence, and The semantic self-attention similarity is determined based on the self-attention information between two sequence units in the semantic feature sequence.

In this embodiment of the application, the similarity acquisition module 4552 is further configured to acquire the first number between the title of each topic event and the title of the event to be integrated in each of the topics to be integrated. sub-semantic similarity, and based on the first sub-semantic similarity, determine the average first sub-semantic similarity and the maximum first sub-semantic similarity; in each of the topics to be integrated, obtain each of the topic events The second sub-semantic similarity between the corresponding topic event key string and the event to be integrated corresponding to the event to be integrated, and based on the second sub-semantic similarity, determine the average second sub-semantic similarity and the second maximum sub-semantic similarity; obtain the third sub-semantic similarity between the topic key string corresponding to each topic to be integrated and the event key string to be integrated; the average first sub-semantic similarity, the maximum first sub-semantic similarity, the average second sub-semantic similarity, the maximum second sub-semantic similarity and the third sub-semantic similarity, determined as the semantic statistical similarity .

In the embodiment of the present application, the first sub-semantic similarity, the second sub-semantic similarity and the third sub-semantic similarity are obtained through a semantic statistical similarity model; the event integration device 455 Also includes a model training module 4555 configured to obtain training samples, wherein the training samples include the first character string sample, the second character string sample and label similarity; adopt the first semantic branch in the semantic statistical similarity model to be trained , obtaining the first predicted semantics corresponding to the first character string sample, using the second semantic branch in the semantic statistical similarity model to be trained to obtain the second predicted semantics corresponding to the second character string sample, and Based on a comparison result between the first predicted semantics and the second predicted semantics, determine an estimated similarity between the first character string sample and the second character string sample; based on the predicted The difference between the estimated similarity and the label similarity is backpropagated in the semantic statistical similarity model to be trained to obtain the semantic statistical similarity model.

In the embodiment of the present application, the similarity acquisition module 4552 is further configured to, in each of the topics to be integrated, determine each subtopic event key string corresponding to at least one topic event as a graph node, and Establishing an edge between two graph nodes corresponding to the same topic event, determining the graph node and the edge as the first key character string graph; based on the event to be integrated corresponding to the event to be integrated Key strings, constructing a second key string graph; comparing the vector representation of the first key string graph with the vector representation of the second key string graph to obtain a graph comparison result, and based on the graph comparison As a result, the string graph similarity is determined.

In this embodiment of the application, the similarity acquisition module 4552 is further configured to combine the sequence of sentences to be answered based on the title of each topic to be integrated, the topic key character string and the event to be integrated; obtain the to-be-integrated Answering the answer information of the sentence sequence; based on the answer information, determining the question-answer similarity.

In the embodiment of the present application, the information acquisition module 4551 is further configured to obtain the matching result of the topic key character string corresponding to each topic and the event to be integrated in the topic database, wherein the topic database includes multiple the topic; based on the matching result, when it is determined that at least one subtopic key string in the topic key string matches the event to be integrated, the topic corresponding to the matching result is determined to be the same as The topic to be integrated matched with the event to be integrated; at least two topics to be integrated matched with the event to be integrated are acquired from the topic database.

In the embodiment of the present application, the information obtaining module 4551 is further configured to obtain a topic event corresponding to each topic event from at least one topic event corresponding to each topic in the topic library Key character string; count the number of topic events corresponding to each subtopic event key string in the topic event key string; combine the fourth set number of the subtopic event key strings with the largest number of topic events is the topic key string corresponding to each topic.

In this embodiment of the application, the information acquisition module 4551 is further configured to perform entity recognition on each of the topic events to obtain an entity key character string corresponding to a preset entity type; character strings for each of the topic events String weight analysis to obtain an action key string; based on one or both of the entity key string and the action key string, determine the topic event key string.

In this embodiment of the application, the information acquisition module 4551 is further configured to acquire the number of entity key strings corresponding to the entity key string; when the number of entity key strings is less than the fifth set number, the The entity key character string and the action key character string are combined into the topic event key character string; when the entity key character string is greater than or equal to the fifth set quantity, the entity key character string is determined Key string for the topic event.

In this embodiment of the application, the event integration device 455 further includes an event presentation module 4556 configured to present a search control; in response to the first search operation acting on the search control, present a simplified event corresponding to the event context context, and a presentation control corresponding to the simplified event context, wherein the simplified event context belongs to the event context, and the presentation control is used to present the event context; in response to a presentation operation acting on the presentation control , presenting the event context, wherein each event in the presented event context includes an event title and an event time, and the event is any one of the event to be integrated and at least one of the topic events; in response A view operation acting on the event title or the event time presents event detailed information.

In this embodiment of the application, the event presentation module 4556 is further configured to present the last information to be presented of the target event, wherein the target event is the event to be integrated included in the event context and at least one of the Any event in the topic event; in the recommendation area corresponding to the last information to be presented, the remaining events in the event context associated with the target event are presented, wherein the remaining events are the remaining events in the event context except Any event other than the target event; in response to a second search operation on the remaining events, presenting detailed information of the remaining events.

An embodiment of the present application provides a computer program product or computer program, where the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium. The processor of the electronic device (event integration device) reads the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the electronic device executes the above-mentioned event integration method in the embodiment of the present application.

The embodiment of the present application provides a computer-readable storage medium storing executable instructions, wherein the executable instructions are stored, and when the executable instructions are executed by the processor, the processor will be caused to execute the event integration method provided in the embodiment of the present application , for example, the event integration method shown in FIG. 3 .

In some embodiments of the present application, the computer-readable storage medium may be a memory such as FRAM, ROM, PROM, EPROM, EEPROM, flash memory, magnetic surface memory, optical disk, or CD-ROM; Various devices in any combination.

In some embodiments of the present application, executable instructions may take the form of programs, software, software modules, scripts, or codes written in any form of programming language (including compiled or interpreted languages, or declarative or procedural languages) , and it may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.

As an example, executable instructions may, but do not necessarily correspond to files in a file system, may be stored as part of a file that holds other programs or data, for example, in a Hyper Text Markup Language (HTML) document in one or more scripts, in a single file dedicated to the program in question, or in multiple cooperating files (for example, files that store one or more modules, subroutines, or sections of code).

As an example, executable instructions may be deployed to execute on one electronic device (in which case, the one electronic device is the event integration device), or to execute on multiple electronic devices at one location (in which case, the Multiple electronic devices are event integration devices), or executed on multiple electronic devices distributed at multiple locations and interconnected through a communication network (at this time, multiple Electronic devices are event integration devices).

It can be understood that in the embodiments of this application, data related to events and the like are involved. When the embodiments of this application are applied to specific products or technologies, user permission or consent must be obtained, and the collection, use and processing of relevant data Relevant laws, regulations and standards of relevant countries and regions need to be complied with.

To sum up, through the embodiment of this application, when determining the target topic to which the event to be integrated belongs among at least two topics to be integrated, it is determined by judging the target similarity between the event to be integrated and each topic to be integrated , that is, the target topic is determined by directly comparing the target similarity between the event to be integrated and each topic to be integrated, and since the target similarity includes one or more of semantic similarity, string graph similarity and question-answer similarity , thus, through the obtained target similarity, it can be accurately determined whether each topic to be integrated is the target topic to which the event to be integrated belongs, and then, when the event to be integrated is integrated into the target topic, the accuracy of event integration can be improved. In addition, in the process of event integration, the efficiency of event integration can be improved by first recalling and then obtaining the similarity.

The above descriptions are merely examples of the present application, and are not intended to limit the protection scope of the present application. Any modifications, equivalent replacements and improvements made within the spirit and scope of the present application are included in the protection scope of the present application.

Claims

An event integration method, the method is executed by an electronic device, the method comprising:

Obtaining events to be integrated, and acquiring at least two topics to be integrated, wherein each topic to be integrated includes at least one topic event;

Based on one or more of semantic similarity, string graph similarity and question-answer similarity, determine the target similarity between the event to be integrated and each topic to be integrated, wherein the semantic similarity Refers to the similarity in terms of semantic features, the similarity in character string graphs refers to the similarity in graph features corresponding to key character strings, and the question-answer similarity refers to the similarity in question-and-answer features;

Based on the target similarity, determining a target topic to which the event to be integrated belongs from at least two topics to be integrated;

Integrating the event to be integrated into the target topic to obtain an event context, wherein the event context includes the event to be integrated and at least one topic event.
The method according to claim 1, wherein one or more of the semantic similarity, the string graph similarity and the question-answer similarity are selected by selection logic;

The selection logic includes one or more of selection order, acquisition speed, accuracy rate, topic scale, selection quantity, topic type, model training scale, model applicable scope and model applicable scale, wherein the selection order is based on Determined by the priority of the similarity, the acquisition speed is the speed of obtaining the similarity, the accuracy is the accuracy of the similarity, the topic type is the content form of the topic to be integrated, and the topic scale is at least The size of the two topics to be integrated, the model training size is the size of the training data corresponding to the network model used to obtain each similarity.
The method according to claim 2, wherein, when the selection logic includes the selection order, the determination of the The target similarity between the event to be integrated and each said topic to be integrated, including:

Based on the selection order, from the descending order of priority of the semantic similarity, the string graph similarity and the question-answer similarity, sequentially select a first set number of similarities;

Acquiring a comparison result between the first set number of similarities and a similarity threshold;

When the comparison result is a similar result between the event to be integrated and the topic to be integrated, the first set amount of similarity is determined as the difference between the event to be integrated and the topic to be integrated said target similarity;

When the comparison result is a pending similarity result between the event to be integrated and the topic to be integrated, select the remaining similarities based on the selection order, until the selection end condition is met, the selected multiple similarities Determined as the target similarity, wherein the remaining similarity is the semantic similarity, the string graph similarity and the question-answer similarity, except for the first set number of similarities The similarity of the selection end condition is to determine the similarity between the event to be integrated and the topic to be integrated, or the selection end condition is to select the semantic similarity, the string graph similarity Similarity to the question and answer.
The method according to claim 2, wherein, when the selection logic includes the acquisition speed and the topic scale, the selection based on one or more of semantic similarity, string graph similarity and question-answer similarity kind, determine the target similarity between the event to be integrated and each topic to be integrated, including:

When the topic scale is larger than the set scale, from the descending order of the acquisition speed of the semantic similarity, the string graph similarity, and the question-answer similarity, sequentially select a second set number of similarities degree to obtain the target similarity between the event to be integrated and the topic to be integrated;

When the topic scale is less than or equal to the set scale, select the third setting in order from the descending order of the acquisition speed of the semantic similarity, the string graph similarity and the question-answer similarity Quantitative similarity obtains the target similarity between the event to be integrated and the topic to be integrated, wherein the second set number is smaller than the third set number.
The method according to any one of claims 1 to 4, wherein when the target similarity includes multiple of the semantic similarity, the string graph similarity and the question-answer similarity, the Based on the target similarity, determining the target topic to which the event to be integrated belongs from at least two topics to be integrated, including:

Determine the weight ratio of various similarities in the target similarity based on the accuracy rate;

Based on the weight ratio, the various similarities in the target similarity are fused to obtain the discriminant similarity;

From at least two topics to be integrated, the topic to be integrated corresponding to the highest discriminant similarity is selected to obtain the target topic to which the event to be integrated belongs.
The method according to claim 1, wherein the semantic similarity includes at least one of semantic self-attention similarity and semantic statistical similarity, wherein the semantic self-attention similarity is based on events and topics to be integrated The self-attention between events is determined, and the semantic statistical similarity is determined based on target semantics, where the target semantics refers to the semantics corresponding to at least one of titles, key character strings, and event contents.
The method according to any one of claims 1 to 4,6, wherein the semantic similarity comprises a semantic self-attention similarity, and the semantic self-attention similarity is obtained by the following steps:

Acquiring semantic features to be integrated corresponding to the event to be integrated, and topic event semantic features corresponding to each topic event in the topic to be integrated;

Based on the distinguishing marks of the event to be integrated and the topic event, the semantic feature to be integrated is enhanced to obtain a first enhanced semantic feature, and the semantic feature of the topic event is enhanced based on the distinguishing mark to obtain a second semantic feature. Two enhanced semantic features;

Composing the first enhanced semantic feature and at least one second enhanced semantic feature corresponding to the topic to be integrated into a semantic feature sequence, and based on the self-attention information between two sequence units in the semantic feature sequence, Determine the semantic self-attention similarity.
The method according to any one of claims 1 to 4, 6, wherein the semantic similarity includes semantic statistical similarity, and the semantic statistical similarity is obtained by the following steps:

In each topic to be integrated, obtain the first sub-semantic similarity between the title of each topic event and the title of the event to be integrated, and determine the average sub-semantic similarity based on the first sub-semantic similarity The first sub-semantic similarity and the maximum first sub-semantic similarity;

In each of the topics to be integrated, the second sub-semantic similarity between the topic event key string corresponding to each of the topic events and the event key string to be integrated corresponding to the event to be integrated is obtained, and based on The second sub-semantic similarity determines the average second sub-semantic similarity and the maximum sub-second semantic similarity;

Acquiring the third sub-semantic similarity between the topic key string corresponding to each topic to be integrated and the event key string to be integrated;

The average first sub-semantic similarity, the maximum first sub-semantic similarity, the average second sub-semantic similarity, the maximum second sub-semantic similarity and the third sub-semantic similarity, Determined as the semantic statistical similarity.
The method according to any one of claims 1 to 4, 6, wherein the target similarity includes the string graph similarity, and the string graph similarity is obtained by the following steps:

In each topic to be integrated, each subtopic event key character string corresponding to at least one topic event is determined as a graph node, and a graph node is established between two graph nodes corresponding to the same topic event An edge, determining the graph node and the edge as a first key character string graph;

Constructing a second key string graph based on the key string of the event to be integrated corresponding to the event to be integrated;

Comparing the vector representation of the first key string graph with the vector representation of the second key string graph to obtain a graph comparison result, and determining the string graph similarity based on the graph comparison result.
The method according to any one of claims 1 to 4, 6, wherein the target similarity includes the question-answer similarity, and the question-answer similarity is obtained by the following steps:

Based on the title of each topic to be integrated, the topic key character string and the event to be integrated, combine the sequence of sentences to be answered;

Acquiring answer information of the sentence sequence to be answered;

Based on the answer information, the question-answer similarity is determined.
The method according to any one of claims 1 to 4, 6, wherein said obtaining at least two topics to be integrated comprises:

In the topic library, the matching result of the topic key character string corresponding to each topic and the event to be integrated is obtained, wherein the topic library includes a plurality of topics;

Based on the matching result, when it is determined that at least one subtopic key string in the topic key string matches the event to be integrated, the topic corresponding to the matching result is determined to be related to the event to be integrated The matching topic to be integrated;

Obtain at least two topics to be integrated that match the event to be integrated from the topic database.
The method according to claim 11, wherein, in the topic library, before obtaining the matching result of the topic key character string corresponding to each topic and the event to be integrated, the method further includes:

Perform entity recognition on each of the topic events to obtain an entity key string corresponding to a preset entity type;

Perform string weight analysis on each topic event to obtain an action key string;

Determine a topic event key string based on one or both of the entity key string and the action key string;

Count the number of topic events corresponding to each subtopic event key string in the topic event key string;

From the descending order of the number of topic events corresponding to the topic event key strings, a fourth set number of sub-topic event key strings are sequentially selected to obtain the topic key strings.
The method according to any one of claims 1 to 4, 6, wherein, after integrating the event to be integrated into the target topic and obtaining the context of the event, the method further includes:

render the search control;

In response to a first search operation acting on the search control, presenting a simplified event context corresponding to the event context and a presentation control corresponding to the simplified event context, wherein the simplified event context belongs to the event context, The presentation control is used to present the event context;

In response to a presentation operation acting on the presentation control, the event context is presented, wherein each event in the presented event context includes an event title and an event time, and the event is the event to be integrated and at least any one of said topical events;

Event detail information is presented in response to a view operation acting on the event title or the event time.
An event integration device, the event integration device comprising:

An information acquisition module configured to acquire events to be integrated, and acquire at least two topics to be integrated, wherein each of the topics to be integrated includes at least one topic event;

The similarity acquisition module is configured to determine the target similarity between the event to be integrated and each topic to be integrated based on one or more of semantic similarity, string graph similarity and question-answer similarity, Wherein, the semantic similarity refers to the similarity in semantic features, the string graph similarity refers to the similarity in graph features corresponding to key character strings, and the question-answer similarity refers to the similarity in question-answer features ;

A topic determination module configured to determine the target topic to which the event to be integrated belongs from at least two topics to be integrated based on the target similarity;

The event integration module is configured to integrate the event to be integrated into the target topic to obtain an event context, wherein the event context includes the event to be integrated and at least one topic event.
An electronic device for event integration, the electronic device comprising:

memory for storing executable instructions;

The processor is configured to implement the event integration method according to any one of claims 1 to 13 when executing the executable instructions stored in the memory.
A computer-readable storage medium storing executable instructions for implementing the event integration method according to any one of claims 1 to 13 when executed by a processor.
A computer program product, including computer programs or instructions, when the computer programs or instructions are executed by a processor, the event integration method described in any one of claims 1 to 13 is realized.