CN115840796A - Event integration method, device, equipment and computer readable storage medium - Google Patents

Event integration method, device, equipment and computer readable storage medium Download PDF

Info

Publication number
CN115840796A
CN115840796A CN202111111428.XA CN202111111428A CN115840796A CN 115840796 A CN115840796 A CN 115840796A CN 202111111428 A CN202111111428 A CN 202111111428A CN 115840796 A CN115840796 A CN 115840796A
Authority
CN
China
Prior art keywords
topic
event
similarity
integrated
semantic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111111428.XA
Other languages
Chinese (zh)
Inventor
房育勋
朱斌
刘晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202111111428.XA priority Critical patent/CN115840796A/en
Priority to PCT/CN2022/111164 priority patent/WO2023040516A1/en
Publication of CN115840796A publication Critical patent/CN115840796A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides an event integration method, an event integration device, an event integration equipment and a computer readable storage medium, which are applied to various scenes such as cloud technology, artificial intelligence, intelligent traffic, vehicle-mounted and the like; the method comprises the following steps: the method comprises the steps of obtaining an event to be integrated, and obtaining at least one topic to be integrated, wherein each topic to be integrated comprises at least one topic event; based on selection logic, selecting one or more from semantic similarity, character string graph similarity and question-answer similarity to obtain target similarity between the event to be integrated and each topic to be integrated, wherein the character string graph similarity refers to similarity of graph features corresponding to key character strings, and the question-answer similarity refers to similarity of question-answer features; determining a target topic to which an event to be integrated belongs from at least one topic to be integrated based on the target similarity; and integrating the events to be integrated into the target topic to obtain an event venation comprising the events to be integrated and at least one topic event. Through the method and the device, the accuracy of event integration can be improved.

Description

Event integration method, device, equipment and computer readable storage medium
Technical Field
The present application relates to information processing technologies in the field of computer applications, and in particular, to an event integration method, apparatus, device, and computer-readable storage medium.
Background
For topics with long duration (often composed of a plurality of occurred events), when a latest progress event is obtained, the latest progress event needs to be integrated under a corresponding topic to form an event context containing the latest progress event, so that a user can intuitively know the progress of the event through the event context.
Generally, in order to integrate the latest progress events under topics, a clustering method is generally adopted, that is, the latest progress events are subjected to incremental clustering with the topics, so as to determine the topics to which the latest progress events belong according to a clustering center and a threshold value. However, when the event integration is performed by the incremental clustering method, the accuracy of the clustering is low, so that the accuracy of the determined topic is also low, and further, when the latest event is integrated into the topic, the accuracy of the event integration is low.
Disclosure of Invention
The embodiment of the application provides an event integration method, an event integration device, equipment and a computer-readable storage medium, and the event integration efficiency can be improved.
The technical scheme of the embodiment of the application is realized as follows:
an embodiment of the present application provides an event integration method, including:
acquiring events to be integrated and acquiring at least one topic to be integrated, wherein each topic to be integrated comprises at least one topic event;
based on selection logic, selecting one or more from semantic similarity, character string graph similarity and question-answer similarity to obtain target similarity between the event to be integrated and each topic to be integrated, wherein the semantic similarity refers to similarity in the aspect of semantic features, the character string graph similarity refers to similarity in the aspect of graph features corresponding to key character strings, and the question-answer similarity refers to similarity in the aspect of question-answer features;
determining a target topic to which the event to be integrated belongs from at least one topic to be integrated based on the target similarity;
integrating the events to be integrated into the target topic to obtain an event context comprising the events to be integrated and at least one topic event.
An embodiment of the present application provides an event integration apparatus, including:
the system comprises an information acquisition module, a topic integration module and a topic integration module, wherein the information acquisition module is used for acquiring events to be integrated and acquiring at least one topic to be integrated, and each topic to be integrated comprises at least one topic event;
the similarity obtaining module is used for selecting one or more of semantic similarity, character string graph similarity and question and answer similarity based on selection logic to obtain target similarity between the event to be integrated and each topic to be integrated, wherein the semantic similarity refers to similarity in the aspect of semantic features, the character string graph similarity refers to similarity in the aspect of graph features corresponding to key character strings, and the question and answer similarity refers to similarity in the aspect of question and answer features;
the topic determination module is used for determining a target topic to which the event to be integrated belongs from at least one topic to be integrated based on the target similarity;
and the event integration module is used for integrating the events to be integrated into the target topic to obtain an event context comprising the events to be integrated and at least one topic event.
In an embodiment of the present application, the selection logic includes one or more of a selection order, an acquisition speed, an accuracy rate, a topic size, a selection number, a topic type, a model training scale, a model application range, and a model application scale, where the selection order is determined based on a priority of similarity, the acquisition speed is a speed of acquiring similarity, the accuracy rate is an accuracy degree of similarity, the topic type is a content form of the topic to be integrated, the topic scale is a scale of at least one topic to be integrated, and the model training scale is a training data scale corresponding to a network model for acquiring each similarity.
In this embodiment of the application, when the selection logic includes the selection order, the similarity obtaining module is further configured to sequentially select, based on the selection order, a first set number of similarities from descending ranks of priorities of the semantic similarity, the string graph similarity, and the question-answer similarity; obtaining a comparison result of the similarity of the first set quantity and a similarity threshold; when the comparison result is a similar result of the event to be integrated and the topic to be integrated, determining the first set number of similarities as the target similarity between the event to be integrated and the topic to be integrated; when the comparison result is a pending similarity result of the event to be integrated and the topic to be integrated, continuing to select residual similarities based on the selection sequence until the similarity result is determined or three of the semantic similarity, the string graph similarity and the question-answer similarity are selected, and determining a plurality of selected similarities as the target similarity between the event to be integrated and the topic to be integrated, wherein the residual similarities are similarities of the semantic similarity, the string graph similarity and the question-answer similarity except the similarity of the first set number.
In this embodiment of the application, when the selection logic includes the acquisition speed and the topic scale, the similarity acquisition module is further configured to, when the topic scale is greater than a set scale, sequentially select a second set number of similarities from a descending order of the acquisition speeds of the semantic similarity, the string graph similarity, and the question and answer similarity, so as to obtain the target similarity between the event to be integrated and the topic to be integrated; when the topic scale is smaller than or equal to the set scale, sequentially selecting a third set number of similarities from descending order of the semantic similarity, the character string graph similarity and the acquisition speed of the question-answer similarity to obtain the target similarity between the event to be integrated and the topic to be integrated, wherein the second set number is smaller than the third set number.
In this embodiment of the application, when the target similarity includes multiple ones of the semantic similarity, the string graph similarity, and the question-answer similarity, the topic determination module is further configured to determine, based on an accuracy rate, a weight ratio of each similarity in the target similarity; fusing multiple similarity degrees in the target similarity degrees based on the weight ratio to obtain a discrimination similarity degree; and selecting the topic to be integrated corresponding to the highest discrimination similarity from at least one topic to be integrated to obtain the target topic to which the event to be integrated belongs.
In this embodiment of the application, the semantic similarity includes a semantic self-attention similarity, and the similarity obtaining module is further configured to obtain a to-be-integrated semantic feature corresponding to the to-be-integrated event and a topic event semantic feature corresponding to each topic event in the to-be-integrated topics; enhancing the semantic features to be integrated based on the distinguishing identifications of the events to be integrated and the topic events to obtain first enhanced semantic features, and enhancing the semantic features of the topic events based on the distinguishing identifications to obtain second enhanced semantic features; and forming the first enhanced semantic features and at least one second enhanced semantic feature corresponding to the topic to be integrated into a semantic feature sequence, and determining the semantic self-attention similarity based on self-attention information between two sequence units in the semantic feature sequence.
In this embodiment of the application, the similarity obtaining module is further configured to obtain, in each topic to be integrated, a first sub-semantic similarity between the title of each topic event and the title of the event to be integrated, and determine an average first sub-semantic similarity and a maximum first sub-semantic similarity based on the first sub-semantic similarity; in each topic to be integrated, obtaining a second sub-semantic similarity between a topic event key character string corresponding to each topic event and a to-be-integrated event key character string corresponding to the to-be-integrated event, and determining an average second sub-semantic similarity and a maximum sub-second semantic similarity based on the second sub-semantic similarity; acquiring a third sub-semantic similarity between the topic key character string corresponding to each topic to be integrated and the event key character string to be integrated; determining the average first sub-semantic similarity, the maximum first sub-semantic similarity, the average second sub-semantic similarity, the maximum second sub-semantic similarity and the third sub-semantic similarity as the semantic statistical similarity between the event to be integrated and each topic to be integrated.
In the embodiment of the application, the first sub-semantic similarity, the second sub-semantic similarity and the third sub-semantic similarity are obtained through a semantic statistical similarity model; the event integration device further comprises a model training module for acquiring a training sample, wherein the training sample comprises a first character string sample, a second character string sample and a labeling similarity; acquiring a first estimated semantic corresponding to the first character string sample by adopting a first semantic branch in a semantic statistical similarity model to be trained, acquiring a second estimated semantic corresponding to the second character string sample by adopting a second semantic branch in the semantic statistical similarity model to be trained, and determining the estimated similarity between the first character string sample and the second character string sample based on a comparison result between the first estimated semantic and the second estimated semantic; and performing back propagation in the semantic statistical similarity model to be trained based on the difference between the estimated similarity and the labeling similarity to obtain the semantic statistical similarity model.
In this embodiment of the application, the similarity obtaining module is further configured to determine, in each topic to be integrated, each sub-topic event key character string corresponding to at least one topic event as a graph node, and establish an edge between two graph nodes corresponding to two sub-topic event key character strings belonging to the same topic event, so as to obtain a first key character string graph; constructing a second key character string graph based on the key character string of the event to be integrated corresponding to the event to be integrated; determining the similarity of the character string graph between the event to be integrated and each topic to be integrated based on a comparison result between the vector representation of the first key character string graph and the vector representation of the second key character string graph.
In the embodiment of the application, the similarity obtaining module is further configured to combine a sentence sequence to be answered based on a title of each topic to be integrated, a topic key character string, and the event to be integrated; acquiring answer information of the sentence sequence to be answered; determining the question-answer similarity between the event to be integrated and each topic to be integrated based on the answer information.
In this embodiment of the application, the information obtaining module is further configured to obtain, in a topic library, a matching result between a topic key character string corresponding to each topic and the event to be integrated, where the topic library includes a plurality of topics; determining the topic corresponding to the matching result as the topic to be integrated matched with the event to be integrated when determining that at least one sub-topic key character string in the topic key character strings is matched with the event to be integrated based on the matching result; and acquiring at least one topic to be integrated matched with the event to be integrated from the topic library.
In this embodiment of the application, the information obtaining module is further configured to obtain, in at least one topic event corresponding to each topic in the topic library, a topic event key character string corresponding to each topic event; counting the number of topic events corresponding to each sub-topic event key character string in the topic event key character string; and combining the sub-topic event key character strings with the maximum topic event number in a fourth set number into the topic key character string corresponding to each topic.
In the embodiment of the application, the information acquisition module is further configured to perform entity identification on each topic event to obtain an entity key character string corresponding to a preset entity type; carrying out character string weight analysis on each topic event to obtain an action key character string; determining the topic event key string based on one or both of the entity key string and the action key string.
In this embodiment of the present application, the information obtaining module is further configured to obtain the number of entity key character strings corresponding to the entity key character strings; when the number of the entity key character strings is less than a fifth set number, combining the entity key character strings and the action key character strings into the topic event key character strings; when the entity key character string is greater than or equal to the fifth set number, determining the entity key character string as the topic event key character string.
In the embodiment of the application, the event integration device further comprises an event presentation module, configured to present a search control; in response to a first search operation acting on the search control, presenting a simplified event context corresponding to the event context belonging to which the simplified event context belongs and a presentation control corresponding to the simplified event context for presenting the event context; presenting the event context in response to a presentation operation acting on the presentation control, wherein each event in the presented event context comprises an event title and an event time, and the event is any one of the to-be-integrated event and at least one topic event; presenting event detail information in response to a viewing operation acting on the event title or the event time.
In an embodiment of the present application, the event presentation module is further configured to present last information to be presented of a target event, where the target event is any one of the to-be-integrated event and at least one of the topic events included in the event context; presenting the remaining events in the event context associated with the target event in a recommendation area corresponding to the last information to be presented, wherein the remaining events are any events except the target event in the event context; in response to a second search operation on the remaining events, detailed information of the remaining events is presented.
An embodiment of the present application provides an event integration apparatus, including:
a memory for storing executable instructions;
and the processor is used for realizing the event integration method provided by the embodiment of the application when executing the executable instructions stored in the memory.
The embodiment of the present application provides a computer-readable storage medium, which stores executable instructions for causing a processor to execute the method for event integration provided by the embodiment of the present application.
The embodiment of the present application provides a computer program product, which includes a computer program or instructions, and the computer program or instructions, when executed by a processor, implement the event integration method provided by the embodiment of the present application.
The embodiment of the application has at least the following beneficial effects: the target topic is determined by judging the target similarity between the event to be integrated and each topic to be integrated when the target topic to which the event to be integrated belongs is determined in at least one topic to be integrated, namely the target topic is determined by directly comparing the target similarity between the event to be integrated and each topic to be integrated, and the target similarity comprises one or more of semantic similarity, character string graph similarity and question-answer similarity, so that whether each topic to be integrated belongs to the event to be integrated can be accurately determined by the obtained target similarity, and the accuracy of event integration can be improved when the event to be integrated is integrated to the target topic.
Drawings
FIG. 1 is an alternative architecture diagram of an event integration system according to an embodiment of the present application;
fig. 2 is a schematic diagram of an exemplary component structure of the server in fig. 1 according to an embodiment of the present disclosure;
FIG. 3 is a schematic flow chart diagram illustrating an alternative event integration method according to an embodiment of the present disclosure;
FIG. 4 is a schematic flow chart diagram illustrating an alternative event integration method provided by an embodiment of the present application;
FIG. 5 is a schematic flow chart diagram illustrating an alternative event integration method according to an embodiment of the present disclosure;
FIG. 6 is a representation of an exemplary context of an event provided by an embodiment of the present application;
FIG. 7 is a representation of another exemplary event context provided by an embodiment of the present application;
FIG. 8 is a schematic flow chart diagram illustrating an alternative event integration method according to an embodiment of the present disclosure;
FIG. 9 is an exemplary news topic recall schematic diagram provided by an embodiment of the present application;
FIG. 10 is a schematic diagram of an exemplary determination of whether a news topic is related to a newly progressed event provided by an embodiment of the present application;
FIG. 11a is a diagram of an exemplary model for obtaining semantic similarity of vectors according to an embodiment of the present disclosure;
FIG. 11b is a diagram of another exemplary model for obtaining semantic similarity of vectors according to an embodiment of the present disclosure;
FIG. 12 is a diagram of an exemplary keyword graph provided by an embodiment of the present application;
FIG. 13 is a schematic diagram of another exemplary keyword graph provided by an embodiment of the present application;
FIG. 14 is a diagram illustrating exemplary similarity of query-answer semantics provided by an embodiment of the present application;
fig. 15 is a schematic diagram of exemplary feature importance provided by an embodiment of the present application.
Detailed Description
In order to make the objectives, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the attached drawings, the described embodiments should not be considered as limiting the present application, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.
In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.
In the following description, references to the terms "first \ second \ third \ fourth \ fifth" are only to distinguish similar objects and do not denote a particular ordering with respect to the objects, it being understood that "first \ second \ third \ fourth \ fifth" may, where permissible, be interchanged in a particular order or sequence so that embodiments of the application described herein can be practiced in other than the order shown or described herein.
Unless defined otherwise, all technical and scientific terms used in the examples of this application have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used in the embodiments of the present application is for the purpose of describing the embodiments of the present application only and is not intended to be limiting of the present application.
Before further detailed description of the embodiments of the present application, terms and expressions referred to in the embodiments of the present application will be described, and the terms and expressions referred to in the embodiments of the present application will be used for the following explanation.
1) Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. That is, artificial intelligence is an integrated technique in computer science that is used to capture the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. In addition, artificial intelligence is also used for researching the design principle and implementation method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making. In addition, the artificial intelligence technology is a comprehensive subject, and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operating/interactive systems, and mechatronics. The artificial intelligence software technology mainly includes several directions, such as computer vision technology, speech processing technology, natural language processing technology, and Machine Learning (ML)/deep Learning.
2) Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence; refers to the study of various theories and methods that enable efficient communication between humans and computers using natural language. Therefore, natural language processing is a science integrating linguistics, computer science and mathematics; therefore, research in the field of natural language processing will involve natural language, i.e., the language people use daily, so that natural language processing is closely linked with research in linguistics. Natural language processing techniques typically include Machine Reading Comprehension (MRC), text processing, semantic understanding, machine translation, robotic question and answer, and knowledge mapping.
3) Machine learning, which is a multi-field cross discipline and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis and algorithm complexity theory; the method is used for researching how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, and is a fundamental approach for computers to have intelligence, the application of machine learning extends to various fields of artificial intelligence, and machine learning and deep learning generally comprise technologies such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning and formal education learning.
4) Machine reading and understanding, which is a natural language processing task; for answering a question by a machine after reading an article, given an article, and a question based on the article; the articles and questions are, in this embodiment, sequences of sentences to be answered.
5) The Graph Convolution Network (GCN) is used for representing a computational Graph, corresponding processing data is data of a Graph structure, a Graph (Graph) is a data format and is used for representing a key character string Network, a social Network, a communication Network, a protein molecule Network and the like, nodes in the Graph represent individuals in the Network, and connection edges represent connection relations between the individuals. In this embodiment of the present application, the vector representation of the first keyword string graph and the vector representation of the second keyword string graph may be obtained through a graph convolution network.
6) Named Entity Recognition (NER), also known as Entity Recognition, entity chunking and Entity extraction, is used to locate and classify Named entities in text into predefined categories, such as people, organizations, locations, time expressions, quantities, monetary values, percentages, etc.; generally, the task of named entity recognition is to identify named entities of three major classes (entity class, time class, and numeric class) and seven minor classes (person name, organization name, place name, time, date, currency, and percentage) in the text to be processed. In the embodiment of the application, entities of preset entity types, such as entities of a person name and a place name, are acquired through named entity identification.
Generally, in order to integrate the latest events under topics, a clustering method is usually adopted, and the latest events are subjected to incremental clustering with the topics, so as to determine the topics to which the latest events belong according to a clustering center and a threshold value. However, in the process of integrating the latest events under topics by means of incremental clustering, the problem that the calculation overhead increases as the number of topics increases exists, and therefore, the efficiency of event integration is low.
Based on this, embodiments of the present application provide an event integration method, apparatus, device, and computer-readable storage medium, which can improve event integration efficiency and reduce consumption of computing resources for event integration. The following describes an exemplary application of the event integration device provided in the embodiment of the present application, and the event integration device provided in the embodiment of the present application may be implemented as various types of terminals, such as a smart phone, a smart watch, a notebook computer, a tablet computer, a desktop computer, a smart home appliance, a set-top box, a smart car device, a portable music player, a personal digital assistant, a dedicated messaging device, a smart voice interaction device, a portable game device, and a smart speaker, or may be implemented as a server. In the following, an exemplary application will be explained when the device is implemented as a server.
Referring to fig. 1, fig. 1 is an alternative architecture diagram of an event integration system provided by the embodiment of the present application; as shown in fig. 1, in order to support an event integration application, in the event integration system 100, the terminals 200 (the terminal 200-1 and the terminal 200-2 are exemplarily shown) are connected to the server 400 (event integration apparatus) through the network 300, and the network 300 may be a wide area network or a local area network, or a combination of both. In addition, the event integration system 100 further comprises a database 500 for providing data support (at least one topic to be integrated) to the server 400; fig. 1 shows a case where the database 500 is independent of the server 400, and the database 500 may also be integrated in the server 400, which is not limited in this embodiment of the present application.
The terminal 200 is configured to obtain an event context from the server 400 through the network 300 and display the event context on a graphical interface.
The server 400 is configured to acquire an event to be integrated and acquire at least one topic to be integrated, where each topic to be integrated includes at least one topic event; based on selection logic, selecting one or more from semantic similarity, character string graph similarity and question and answer similarity to obtain target similarity between the event to be integrated and each topic to be integrated, wherein the semantic similarity refers to the similarity in the aspect of semantic features, the character string graph similarity refers to the similarity in the aspect of graph features corresponding to key character strings, and the question and answer similarity refers to the similarity in the aspect of question and answer features; determining a target topic to which an event to be integrated belongs from at least one topic to be integrated based on the target similarity; and integrating the events to be integrated into the target topic to obtain an event venation comprising the events to be integrated and at least one topic event. And also for transmitting event contexts to the terminal 200 via the network 300.
In some embodiments, the server 400 may be an independent physical server, may also be a server cluster or a distributed system formed by a plurality of physical servers, and may also be a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), a big data and artificial intelligence platform, and the like. The terminal 200 may be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, a smart voice interaction device, a smart home appliance, a vehicle-mounted terminal, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the embodiment of the present application is not limited.
Referring to fig. 2, fig. 2 is a schematic diagram of an exemplary component structure of the server in fig. 1 according to an embodiment of the present disclosure; the server 400 shown in fig. 2 includes: at least one processor 410, memory 450, and at least one network interface 420; in some embodiments of the present application, the server 400 further comprises a user interface 430. The various components in server 400 are coupled together by a bus system 440. It is understood that the bus system 440 is used to enable communications among the components. The bus system 440 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 440 in fig. 2.
The Processor 410 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor, or the like.
The user interface 430 includes one or more output devices 431, including one or more speakers and/or one or more visual displays, that enable the presentation of media content. The user interface 430 also includes one or more input devices 432, including user interface components that facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, other input buttons and controls.
The memory 450 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard disk drives, optical disk drives, and the like. Memory 450 optionally includes one or more storage devices physically located remote from processor 410.
The memory 450 includes either volatile memory or nonvolatile memory, and may include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read Only Memory (ROM), and the volatile Memory may be a Random Access Memory (RAM). The memory 450 described in embodiments herein is intended to comprise any suitable type of memory.
In some embodiments of the present application, memory 450 is capable of storing data to support various operations, examples of which include programs, modules, and data structures, or subsets or supersets thereof, as exemplified below.
An operating system 451, including system programs for handling various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and handling hardware-based tasks;
a network communication module 452 for communicating to other computer devices via one or more (wired or wireless) network interfaces 420, exemplary network interfaces 420 including: bluetooth, wireless-compatibility authentication (Wi-Fi), and Universal Serial Bus (USB), etc.;
a presentation module 453 for enabling presentation of information (e.g., user interfaces for operating peripherals and displaying content and information) via one or more output devices 431 (e.g., display screens, speakers, etc.) associated with user interface 430;
an input processing module 454 for detecting one or more user inputs or interactions from one of the one or more input devices 432 and translating the detected inputs or interactions.
In some embodiments of the present application, the event integration apparatus provided in this embodiment of the present application may be implemented in a software manner, and fig. 2 illustrates the event integration apparatus 455 stored in the memory 450, which may be software in the form of programs and plug-ins, and includes the following software modules: the information acquisition module 4551, the similarity acquisition module 4552, the topic determination module 4553, the event integration module 4554, the model training module 4555, and the event presentation module 4556, which are logical and thus may be arbitrarily combined or further separated according to the functions implemented. The functions of the respective modules will be explained below.
In other embodiments of the present Application, the event integration apparatus provided in this embodiment of the present Application may be implemented in hardware, and by way of example, the event integration apparatus provided in this embodiment of the present Application may be a processor in the form of a hardware decoding processor, which is programmed to execute the event integration method provided in this embodiment of the present Application, for example, the processor in the form of the hardware decoding processor may be one or more Application Specific Integrated Circuits (ASICs), DSPs, programmable Logic Devices (PLDs), complex Programmable Logic Devices (CPLDs), field Programmable Gate Arrays (FPGAs), or other electronic components.
In the following, the event integration method provided by the embodiment of the present application will be described in conjunction with exemplary applications and implementations of the event integration apparatus provided by the embodiment of the present application. The event integration method provided by the embodiment of the application is applied to various scenes such as cloud technology, artificial intelligence, intelligent traffic and vehicle-mounted.
Referring to fig. 3, fig. 3 is an alternative flow chart of the event integration method provided in the embodiment of the present application, which will be described with reference to the steps shown in fig. 3.
S301, acquiring the event to be integrated and acquiring at least one topic to be integrated.
In the embodiment of the application, the event integration equipment acquires the event to be integrated, so that the event to be integrated is obtained; the event to be integrated may be an event to be integrated obtained by detecting an event by the event integration device, or an event to be integrated obtained by receiving an event sent by another device by the event integration device, and the like, which is not limited in this embodiment of the present application. In addition, the event integration device acquires topics to be integrated, and at least one topic to be integrated is obtained.
It should be noted that the event to be integrated refers to the event to be integrated, and the event is used to describe information of what happens, such as a news event and a point-of-view event; moreover, the event to be integrated may be a newly-developed event or a historical event, where the historical event refers to an event that occurs again after the corresponding event time, and the event is not limited in this embodiment of the present application; in addition, the event to be integrated at least comprises text information and can also comprise at least one of audio and video, images and forms. In addition, the at least one topic to be integrated may be all topics in the database, or topics that may be screened from the database and are possibly associated with the event to be integrated, and the like, which is not limited in the embodiment of the present application; and the topic to be integrated is an event subject, is a set of related events, and comprises at least one topic event, and the topic event is also an event.
S302, based on selection logic, selecting one or more from semantic similarity, character string graph similarity and question-answer similarity to obtain target similarity between the event to be integrated and each topic to be integrated.
In the embodiment of the present application, the event integration apparatus determines whether each topic to be integrated is a topic to which the event to be integrated belongs by comparing the target similarity between the event to be integrated and each topic to be integrated.
It should be noted that the target similarity refers to the possibility that the event to be integrated belongs to each topic to be integrated. And, the target similarity may be determined from one or more aspects, such that the target similarity includes one or more of semantic similarity, string graph similarity, and question-answer similarity, and the target similarity includes one or more of semantic similarity, string graph similarity, and question-answer similarity, determined based on the selection logic; the semantic similarity refers to the similarity of semantic features, the similarity of a character string graph refers to the similarity of graph features corresponding to key character strings, the question-answer similarity refers to the similarity of question-answer features, and the selection logic is a basis for selecting event integration equipment from the semantic similarity, the similarity of the character string graph and the question-answer similarity. And the event integration equipment selects one or more from the semantic similarity, the character string graph similarity and the question-answer similarity based on the selection logic to obtain the target similarity, and comprises the following steps: the event integration equipment selects one from semantic similarity, character string graph similarity and question-answer similarity based on selection logic to obtain target similarity; or the event integration equipment selects at least two of the semantic similarity, the character string graph similarity and the question-answer similarity based on the selection logic to obtain the target similarity.
It should be further noted that the selection logic includes one or more of a selection order, an acquisition speed, an accuracy, a topic size, a selection number, a topic type, a model training size, and a model application size. Wherein the selection order is determined based on a priority of the similarity, and the priority may be determined based on one or both of accuracy and time consumption; the acquisition speed is a speed for acquiring the similarity, and the acquisition speed can be determined based on one or two of time consumption for feature extraction and a feature extraction mode (parallel or serial) in the similarity; the accuracy rate is the accuracy degree of the similarity and can be determined based on one or two of the characteristics adopted in the similarity obtaining process or the accuracy of the corresponding network model; the topic type is a content form of the topic to be integrated, for example, when the content form is an image form, the similarity of a character string graph and the similarity of a question and answer can be selected as target similarity, and when the content form is a text form, one or more of the semantic similarity, the similarity of the character string graph and the similarity of the question and answer, including the semantic similarity, can be selected as the target similarity; the topic scale is the scale of at least one topic to be integrated and can be determined based on one or more of the number of topics to be integrated and the content amount of the topics to be integrated; the model training scale is the training data scale corresponding to the network model for obtaining each similarity; the applicable scale of the model is the maximum data volume which can be borne by the network model for obtaining each similarity; the model application range is the data form corresponding to the network model for obtaining each similarity.
In the embodiment of the application, the event integration equipment compares the characteristics of the graph structure corresponding to each topic to be integrated based on the characteristics of the graph structure corresponding to the event to be integrated to obtain the similarity of the character string graph. The event integration device can construct questions and articles in machine reading understanding based on the events to be integrated and each topic to be integrated, and determine question-answer similarity corresponding to answer information through interaction of the questions and the articles. And the semantic similarity, the string graph similarity and the question-answer similarity are similarities obtained from different dimensions.
S303, determining a target topic to which the event to be integrated belongs from at least one topic to be integrated based on the target similarity.
In the embodiment of the application, when at least one topic to be integrated is a topic to be integrated, the event integration device may determine whether the topic to be integrated is a target topic by judging the target similarity and the similarity threshold; and when at least one topic to be integrated is a topic which is screened out from the database and is possibly associated with the event to be integrated, the event integration device can directly determine the topic to be integrated as the target topic. When at least one topic to be integrated is a plurality of topics to be integrated, the event integration device may determine, from the at least one topic to be integrated, a topic that is most matched with the event to be integrated, determine the topic that is most matched with the event to be integrated as a topic to which the event to be integrated belongs, that is, a target topic, and may also determine, as the target topic, the topic to be integrated that corresponds to a maximum target similarity that is greater than a similarity threshold.
It should be noted that the event integration device obtains at least one corresponding target similarity between the event to be integrated and at least one topic to be integrated by obtaining the target similarity between the event to be integrated and each topic to be integrated; furthermore, the target topic is determined from the at least one topic to be integrated based on the at least one target similarity, or the target topic is determined from the at least one topic to be integrated based on a comparison result between the at least one target similarity and a similarity threshold, which is not limited in the embodiment of the present application.
S304, integrating the event to be integrated into the target topic to obtain an event venation comprising the event to be integrated and at least one topic event.
In the embodiment of the application, the event integration device integrates the event to be integrated as one topic event in the target topic into at least one topic event included in the target topic to obtain an event context including the event to be integrated and the at least one topic event. Wherein, event context refers to the occurrence process of the described things for the target topic.
It can be understood that, when the target topic to which the event to be integrated belongs is determined in at least one topic to be integrated, the target similarity between the event to be integrated and each topic to be integrated is determined by judging, that is, the target topic is determined by directly comparing the event to be integrated with each topic to be integrated with the target similarity, and the target similarity includes one or more of semantic similarity, character string graph similarity and question-answer similarity, so that whether each topic to be integrated belongs to the event to be integrated can be accurately determined by the obtained target similarity, and furthermore, when the event to be integrated is integrated to the target topic, the accuracy of event integration can be improved. In addition, when the event integration equipment determines the target topic to which the event to be integrated belongs from at least one topic to be integrated by utilizing multiple types of semantic similarity, character string graph similarity and question and answer similarity, the process of determining the target topic based on the multi-dimensional heterogeneous characteristics is adopted, so that the accuracy and the effectiveness of the obtained target topic are high, and the accuracy of event integration can be improved.
Referring to fig. 4, fig. 4 is a schematic flow chart of another alternative method for integrating events provided in the embodiment of the present application; as shown in fig. 4, in the embodiment of the present application, when the selection logic includes a selection order, S302 may be implemented by S3021 to S3024; that is, the event integration apparatus selects one or more of the semantic similarity, the string graph similarity, and the question-answer similarity based on the selection logic to obtain the target similarity between the event to be integrated and each topic to be integrated, including S3021 to S3024, which are described below.
S3021, sequentially selecting the similarity of the first set number from descending order sorting of the priorities of the semantic similarity, the similarity of the character string graphs and the similarity of the question answers based on the selection sequence.
It should be noted that the similarity of the first set number includes one or more of semantic similarity, character string graph similarity, and question-answer similarity.
Exemplarily, the event integration device may select the question-answer similarity and the semantic similarity with the highest accuracy, continue the selection if the question-answer similarity and the semantic similarity are determined, end the selection if the result is determined, and select the character graph similarity if the result is not determined; the event integration device may further select the question-answer similarity that takes the least time first, end the selection if the result can be determined, and select the similarity from the semantic similarity and the string graph similarity if the result cannot be determined.
And S3022, obtaining a comparison result of the similarity of the first set number and the similarity threshold.
It should be noted that the similarity threshold may include a first set number of word similarity thresholds, and the first set number of word similarity thresholds correspond to the first set number of similarities in a one-to-one manner.
And S3023, when the comparison result is a similar result of the event to be integrated and the topic to be integrated, determining the similarity of the first set number as the target similarity between the event to be integrated and the topic to be integrated.
It should be noted that the similar result of the event to be integrated and the topic to be integrated means that the event to be integrated and the topic to be integrated are similar or dissimilar.
And S3024, when the comparison result is the pending similarity result of the event to be integrated and the topic to be integrated, continuing to select the remaining similarities based on the selection sequence until the similarity result is determined or three of the semantic similarity, the character string graph similarity and the question-answer similarity are selected, and determining the selected similarities as the target similarity between the event to be integrated and the topic to be integrated.
It should be noted that the undetermined similar result means that it cannot be determined that the event to be integrated is similar to or dissimilar to the topic to be integrated; the residual similarity is the similarity except the similarity of the first set number in the semantic similarity, the character string graph similarity and the question and answer similarity. The selected multiple similarities refer to all the similarities selected by all the selection times.
In the embodiment of the present application, when the selection logic includes an acquisition speed and a topic size, S302 may also be implemented by S3025 and S3026; that is, the event integration apparatus selects one or more of the semantic similarity, the string graph similarity, and the question-answer similarity based on the selection logic to obtain the target similarity between the event to be integrated and each topic to be integrated, including S3025 and S3026, which are described below separately.
And S3025, when the topic scale is larger than the set scale, sequentially selecting a second set number of similarities from the descending order of the acquisition speeds of the semantic similarity, the character string graph similarity and the question-answer similarity to obtain the target similarity between the event to be integrated and the topic to be integrated.
It should be noted that, if the event integration device determines that the topic scale is larger than the set scale, indicating that the scale of at least one topic to be integrated is large, it needs to determine the result by using a small number of similarities with fast acquisition speed (the similarities with the second set number are selected in the descending order of the acquisition speed).
And S3026, when the topic scale is smaller than or equal to the set scale, sequentially selecting a third set number of similarities from the descending order of the acquisition speeds of the semantic similarity, the character string graph similarity and the question-answer similarity to obtain the target similarity between the event to be integrated and the topic to be integrated.
It should be noted that, if the topic scale determined by the event integration device is smaller than or equal to the set scale, it indicates that the scale of at least one topic to be integrated is smaller, and more similarities with a faster acquisition speed (the similarities with a third set number are selected in the descending order of the acquisition speed) need to be adopted to determine the result; in addition, the second set number is less than the third set number.
In the embodiment of the present application, when the target similarity includes multiple types of semantic similarity, string graph similarity, and question-answer similarity, S303 may be implemented by S3031 to S3033; that is, the event integration device determines a target topic to which the event to be integrated belongs from at least one topic to be integrated based on the target similarity, including S3031 to S3033, and the following steps are respectively explained.
S3031, determining the weight proportion of various similarity degrees in the target similarity degree based on the accuracy.
It should be noted that the event integration device determines the weight positively correlated to the accuracy for each of the multiple selected similarities, so as to obtain the weight ratio among the similarities in the target similarity; the weight ratio represents the association between the weights corresponding to the similarity degrees.
S3032, fusing multiple similarity degrees in the target similarity degrees based on the weight ratio to obtain the discrimination similarity degree.
It should be noted that, the event integration device fuses each similarity in the target similarities and the weight corresponding to the similarity based on the weight proportion, and when the fusion of all the similarities in the target similarities is completed, the final similarity for judging whether the event to be integrated is similar to the topic to be integrated is obtained; and judging the similarity of the event to be integrated and the topic to be integrated according to the final similarity.
S3033, selecting the topic to be integrated corresponding to the highest discrimination similarity from at least one topic to be integrated to obtain the target topic to which the event to be integrated belongs.
It should be noted that the event integration device may directly determine the topic to be integrated corresponding to the highest discrimination similarity as the target topic to which the event to be integrated belongs; after the highest discrimination similarity is compared with a threshold value, whether the topic to be integrated corresponding to the highest discrimination similarity is directly determined as the target topic to which the event to be integrated belongs is determined; and the like, which are not specifically limited in the embodiments of the present application.
In the embodiment of the application, the semantic similarity comprises one or two of semantic self-attention similarity and semantic statistical similarity; wherein the semantic self-attention similarity is determined based on self-attention between the event to be integrated and the topic event; the semantic statistical similarity is obtained by correspondingly comparing the vector semantic features of the information such as the title, the key character string and the text of the event to be integrated with the vector semantic features of the information such as the title, the key character string and the related information of the topic event to be integrated. Here, the semantic self-attention similarity is obtained by: the event integration equipment acquires semantic features to be integrated corresponding to the events to be integrated and topic event semantic features corresponding to each topic event in the topics to be integrated; enhancing the semantic features to be integrated based on the distinguishing identifications of the events to be integrated and the topic events to obtain first enhanced semantic features, and enhancing the semantic features of the topic events based on the distinguishing identifications to obtain second enhanced semantic features; and forming a semantic feature sequence by the first enhanced semantic features and at least one second enhanced semantic feature corresponding to the topic to be integrated, and determining semantic self-attention similarity based on self-attention information between two sequence units in the semantic feature sequence.
It should be noted that the self-attention information between two sequence units refers to the self-attention between the event to be integrated and any topic event.
It should be further noted that, when the semantic similarity includes two of a semantic self-attention similarity and a semantic statistical similarity, one possible implementation manner corresponding to S3021 to S3024 includes: the event integration equipment selects semantic self-attention similarity and question-answer similarity (similarity of a first set number), and if the semantic self-attention similarity and the question-answer similarity are respectively compared with corresponding sub-similarity thresholds, the event to be integrated is determined to be dissimilar or similar to the topic event, and then the process is finished; and if the semantic self-attention similarity and the question-answer similarity are respectively compared with the corresponding sub-similarity threshold values, the fact that the event to be integrated is not similar or similar to the topic event cannot be determined, and the semantic statistical similarity and the character string graph similarity are continuously selected for distinguishing. Here, from the accuracy and the acquisition speed, the descending order of the priority can be determined as question and answer similarity, semantic self-attention similarity, string graph similarity and semantic statistical similarity; and the question-answer similarity, the semantic self-attention similarity, the character string graph similarity and the semantic statistical similarity are excessive from precision to breadth in sequence.
In the embodiment of the present application, the semantic statistical similarity may be obtained through S30211 to S30214, and the following steps are described separately.
S30211, in each topic to be integrated, obtaining a first sub-semantic similarity between the title of each topic event and the title of the event to be integrated, and determining an average first sub-semantic similarity and a maximum first sub-semantic similarity based on the first sub-semantic similarity.
In the embodiment of the application, the event integration device determines the semantic statistical similarity from one or more of the similarity between the title of the event to be integrated and the title of each topic event in each topic to be integrated, the similarity between the key character string of the event to be integrated and the key character string of each topic to be integrated, and the similarity between the event to be integrated and each topic to be integrated.
Here, the event integration device acquires, for each topic event in each topic to be integrated, a degree of similarity between a title of the topic event and a title of the event to be integrated, that is, a first sub-semantic similarity, so that, for each topic to be integrated, at least one first sub-semantic similarity can be acquired; the event integration equipment calculates the average value of at least one first sub-semantic similarity to obtain an average first sub-semantic similarity; the event integration device selects the largest first sub-semantic similarity from the at least one first sub-semantic similarity, and the largest first sub-semantic similarity is obtained.
S30212, in each topic to be integrated, obtaining a second sub-semantic similarity between the topic event key character string corresponding to each topic event and the event key character string corresponding to the event to be integrated, and determining an average second sub-semantic similarity and a maximum sub-second semantic similarity based on the second sub-semantic similarity.
In the embodiment of the application, the event integration equipment acquires the similarity between the key character string of the topic event and the key character string of the event to be integrated aiming at each topic event in each topic to be integrated, so as to obtain a second sub-semantic similarity, and thus at least one second sub-semantic similarity can be obtained aiming at each topic to be integrated; the event integration equipment calculates the average value of at least one second sub-semantic similarity to obtain an average second sub-semantic similarity; the event integration equipment selects the largest second sub-semantic similarity from the at least two first sub-semantic similarities, and then the largest second sub-semantic similarity is obtained.
It should be noted that the topic event key character string is a key character string of a topic event; the event key character string to be integrated is a key character string of an event to be integrated.
S30213, obtaining a third sub-semantic similarity between the topic key character string corresponding to each topic to be integrated and the event key character string to be integrated.
In the embodiment of the application, the event integration equipment acquires the similarity between the topic key character string and the event key character string to be integrated, and then obtains the third sub-semantic similarity.
S30214, determining the average first sub-semantic similarity, the maximum first sub-semantic similarity, the average second sub-semantic similarity, the maximum second sub-semantic similarity and the third sub-semantic similarity as semantic statistical similarities between the event to be integrated and each topic to be integrated.
It should be noted that the event integration device may determine at least one of the average first sub-semantic similarity, the maximum first sub-semantic similarity, the average second sub-semantic similarity, the maximum second sub-semantic similarity, and the third sub-semantic similarity as a semantic statistical similarity between the event to be integrated and each topic to be integrated.
In the embodiment of the application, the first sub-semantic similarity in S30211, the second sub-semantic similarity in S30212, and the third sub-semantic similarity in S30213 may be obtained through a semantic statistical similarity model, where the semantic statistical similarity model is used to obtain a similarity of text pairs in terms of semantic features; the semantic statistical similarity model is obtained through training in S305 to S307, and the following steps are described separately.
S305, obtaining a training sample, wherein the training sample comprises a first character string sample, a second character string sample and labeling similarity.
It should be noted that the training sample refers to a data sample used for training a semantic statistical similarity model, the first character string sample and the second character string sample are text pairs with a similarity degree in the aspect of the semantic features to be determined, and the labeled similarity is an actual similarity degree of the first character string sample and the second character string sample in the aspect of the semantic features.
S306, adopting a first semantic branch in the semantic statistical similarity model to be trained to obtain a first estimated semantic corresponding to the first character string sample, adopting a second semantic branch in the semantic statistical similarity model to be trained to obtain a second estimated semantic corresponding to the second character string sample, and determining the estimated similarity between the first character string sample and the second character string sample based on a comparison result between the first estimated semantic and the second estimated semantic.
In the embodiment of the application, the event integration equipment initializes the parameters of the model structure, so that a semantic statistical similarity model to be trained is obtained, wherein the semantic statistical similarity model to be trained comprises a first semantic branch and a second semantic branch; then, the event integration equipment acquires the semantics corresponding to the first character string sample by using the first semantic branch, so that first estimated semantics are acquired; and the event integration equipment acquires the semantics corresponding to the second character string sample by using the second semantic branch, so that second estimated semantics are acquired. Finally, determining the similarity degree between the first character string sample and the second character string sample by adopting a similarity model in the semantic statistic similarity model to be trained, and obtaining the estimated similarity; here, the similarity model determines the estimated similarity between the first character string sample and the second character string sample by comparing the first estimated semantic and the second estimated semantic and based on a comparison result between the first estimated semantic and the second estimated semantic.
It should be noted that the semantic statistical similarity model to be trained is a model to be trained for obtaining the similarity of the semantic features of the text pair; and the semantic statistical similarity model to be trained adopts a double-tower structure (a first semantic branch and a second semantic branch), and each semantic branch in the double-tower structure is used for acquiring semantic features.
It can be understood that the semantic statistical similarity model to be trained can improve the efficiency of obtaining the estimated similarity by obtaining the semantic features by adopting a double-tower structure.
And S307, performing back propagation in the semantic statistical similarity model to be trained based on the difference between the estimated similarity and the labeled similarity to obtain the semantic statistical similarity model.
In the embodiment of the application, the event integration equipment adjusts parameters in the semantic statistical similarity model to be trained based on the difference between the estimated similarity and the labeled similarity so as to train the semantic statistical similarity model to be trained; here, the event integration device implements the adjustment of the parameters by back-propagating in the semantic statistical similarity model to be trained. The training process of the semantic statistic similarity model to be trained is an iterative training process, and the trained semantic statistic similarity model to be trained is a semantic statistic similarity model.
In the embodiment of the present application, the similarity of the character string diagrams can be obtained through S30221 to S30223, and the following steps are separately described.
S30221, in each topic to be integrated, determining each sub-topic event key character string corresponding to at least one topic event as a graph node, and establishing an edge between two graph nodes corresponding to two sub-topic event key character strings belonging to the same topic event to obtain a first key character string graph.
It should be noted that, in at least one topic event in each topic to be integrated, the topic event key character string corresponding to each topic event includes one or more sub-topic event key character strings; here, the event integration device takes one sub-topic event key character string as a graph node, traverses any two graph nodes in all the obtained graph nodes, determines that two sub-topic event key character strings corresponding to the two graph nodes belong to the same topic event, establishes an edge between the two graph nodes, determines that two sub-topic event key character strings corresponding to the two graph nodes do not belong to the same topic event, and has no edge between the two graph nodes, and finally, when the traversal is finished, the obtained graph structure is the first key character string graph.
S30222, constructing a second key character string diagram based on the key character string of the event to be integrated corresponding to the event to be integrated.
It should be noted that the event integration device constructs a graph structure corresponding to the event to be integrated based on the first key string graph construction method: the event integration equipment takes each sub event key character string to be integrated in the event key character string to be integrated as a graph node, and establishes an edge between any two graph nodes, so that a second key character string graph is obtained.
And S30223, determining the similarity of the character string graphs between the event to be integrated and each topic to be integrated based on the comparison result between the vector representation of the first key character string graph and the vector representation of the second key character string graph.
In the embodiment of the application, the event integration equipment acquires the vector representation of the first key character string diagram and acquires the vector representation of the second key character string diagram; then, comparing the vector representation of the first key character string diagram with the vector representation of the second key character string diagram, and determining the similarity of the character string diagrams between the event to be integrated and each topic to be integrated based on the comparison result between the vector representation of the first key character string diagram and the vector representation of the second key character string diagram.
In the embodiment of the present application, the question-answer similarity may be obtained through S30231 to S30233, and each step is described below.
S30231, combining sentence sequences to be answered based on the title of each topic to be integrated, the topic key character string and the event to be integrated.
It should be noted that, in order to determine whether an event to be integrated belongs to a topic to be integrated through question-answer interaction, the event integration device constructs question-answer sentences corresponding to each topic to be integrated and the event to be integrated, so as to obtain a sentence sequence to be answered. The event integration equipment combines the title of each topic to be integrated, the topic key character string and the event to be integrated according to a preset sentence pattern of the question-answer sentence, and the obtained combination result is the constructed question-answer sentence; for example, the following steps are carried out: whether the "event to be integrated" is a progress of "the title of the topic to be integrated" of which the key character string is the "topic key character string"; for another example, the following steps are included: the next sentence is a progress of "title of topic to be integrated" in which the keyword string is "topic keyword string", and "event to be integrated".
And S30232, acquiring answer information of the sentence sequence to be answered.
In the embodiment of the application, the event integration equipment determines corresponding questions and articles in a sentence sequence to be answered based on the questions and articles in machine reading understanding, and respectively performs bottom layer processing on the determined articles and questions to convert texts into digital codes; secondly, determining semantic relation between the articles and the problems based on the digital codes, acquiring the characteristics of the determined problems by combining the semantic analysis results of the articles, and acquiring the characteristics of the determined articles by combining the semantic analysis results of the problems; finally, the event integration result obtains output answer information based on the determined characterization information of the questions and the determined characteristics of the articles and the types of the answers.
It should be noted that the answer information is information about whether the event to be integrated belongs to each topic to be integrated, and may be "yes" (the event to be integrated belongs to the topic to be integrated), or "no" (the event to be integrated does not belong to the topic to be integrated), or may be a possibility that the event to be integrated belongs to the topic to be integrated, and the like, which is not limited in this embodiment of the present application.
And S30233, determining question-answer similarity between the event to be integrated and each topic to be integrated based on the answer information.
It is noted that the event integration apparatus determines the possibility that the event to be integrated belongs to each topic to be integrated based on the answer information, and determines the determined possibility as the question-answer similarity between the event to be integrated and each topic to be integrated.
In the embodiment of the present application, in S301, the event integration device acquires at least one topic to be integrated, including S3011 to S3013, and the following describes each step separately.
S3011, obtaining a matching result of the topic key character string corresponding to each topic and the event to be integrated in a topic library, wherein the topic library comprises a plurality of topics.
In the embodiment of the application, the event integration equipment can acquire a preset topic library, so that after the event to be integrated is acquired, the event integration equipment determines the topic to which the event to be integrated belongs from the topic library, and then integrates the event to be integrated into the topic. Here, the event integration device matches each topic in the topic library with the event to be integrated, and matches the topic key character string corresponding to each topic with the event to be integrated when matching.
The topic key string is a key string of a topic. The topic library comprises a plurality of topics, and each topic is the topic of one thing; each topic in the topic library comprises at least one topic event, and the topic events of different topics can be the same or different; and, the at least one topic event refers to events associated with the topic that occur at different time periods, such that there is a temporal order between the at least one topic event.
S3012, when determining that at least one sub-topic key character string in the topic key character strings is matched with the event to be integrated based on the matching result, determining the topic corresponding to the matching result as the topic to be integrated matched with the event to be integrated.
S3013, obtaining at least one topic to be integrated matched with the event to be integrated from the topic library.
It should be noted that, if at least one sub-topic key character string in the topic key character strings is matched with the event to be integrated, the topic corresponding to the matching result is determined as the topic to be integrated, which is matched with the event to be integrated; and if the topic key character string corresponding to each topic is not matched with the event to be integrated, determining that the topic corresponding to the matching result is not the topic to be integrated matched with the event to be integrated. Here, when the event integration apparatus obtains the judgment of the plurality of matching results of the plurality of topics respectively corresponding to the event to be integrated, at least one topic to be integrated that matches the event to be integrated can be obtained from the topic library; and if the topic library is determined not to have the topic to be integrated which is matched with the event to be integrated, constructing a new topic comprising the event to be integrated, and updating the new topic into the topic library.
It can be understood that in the process of integrating the event to be integrated to the affiliated target topic, the key character strings based on the topic are matched with the event to be integrated, at least one topic to be integrated which may be related to the event to be integrated is recalled, and then the target topic affiliated to the event to be integrated is accurately determined from the at least one topic to be integrated based on the similarity between the event to be integrated and each topic to be integrated; therefore, according to the embodiment of the application, the integration of the event to be integrated to the target topic can be quickly realized by adopting a recall-similarity classification mode, and the correlation between the calculation time consumption of the integration process and the number of topics is small, so that the efficiency of event integration can be improved.
In the embodiment of the present application, S3011 further includes S3014 to S3016; that is to say, before the event integration device obtains the matching result between the topic key character string corresponding to each topic and the event to be integrated, the event integration method further includes steps S3014 to S3016, and the following steps are described separately.
S3014, obtaining a topic event key string corresponding to each topic event from at least one topic event corresponding to each topic in the topic library.
It should be noted that the topic key character string corresponding to the topic is obtained by the key character string of at least one topic event; here, the event integration apparatus first acquires a topic event key string corresponding to each topic event, where at least one topic event corresponds to at least one topic event key string because each topic includes at least one topic event.
S3015, counting the number of topic events corresponding to each sub-topic event key character string in the topic event key character string.
It should be noted that, the event integration device counts the number of topic events corresponding to each sub-topic event key character string in each topic event key character string in at least one topic event key character string corresponding to at least one topic event, obtains the number of topic events corresponding to each sub-topic event key character string, and thus obtains a plurality of topic event numbers corresponding to a plurality of sub-topic event key character strings under the topic.
S3016, combining the fourth set maximum number of sub-topic event key character strings into a topic key character string corresponding to each topic.
The event integration device selects a fourth set number of sub-topic event key character strings with the maximum topic event number for a plurality of topic event numbers corresponding to a plurality of sub-topic event key character strings under the topic, and determines the fourth set number of sub-topic event key character strings with the maximum topic event number as the topic key character strings.
In this embodiment of the application, the event integration device in S3014 obtains the key character string of the topic event corresponding to each topic event, which can be implemented through S30141 to S30143, and the following steps are respectively described below.
S30141, performing entity identification on each topic event to obtain an entity key character string corresponding to a preset entity type.
It should be noted that the event integration device acquires key character strings of topic events from multiple dimensions; one dimension is an entity of a topic event, and the event integration equipment can acquire preset entity types in advance, such as a name type and a place name type; here, the event integration apparatus performs entity identification on each topic event, and selects an entity of a preset entity type from the identified entities, and takes the selected entity of the preset entity type as an entity key string.
S30142, performing character string weight analysis on each topic event to obtain an action key character string.
It should be noted that the event integration device may further obtain a key character string of the topic event from a dimension of the weight of the character string; here, the event integration apparatus performs weight analysis on the character strings in the topic event to obtain character strings greater than a weight threshold, and determines character strings representing actions among the obtained character strings greater than the weight threshold as action key character strings.
And S30143, determining the topic event key character string based on one or both of the entity key character string and the action key character string.
It should be noted that, when determining the topic event key string based on the entity key string, the event integration device may determine all of the entity key strings as the topic event key string, or extract a string from the entity key string to obtain the topic event key string; when determining the topic event key character string based on the action key character string, the event integration device may determine all of the action key character strings as the topic event key character string, or may extract a character string from the action key character string to obtain the topic event key character string; the event integration device may determine a character string obtained in any combination of the entity key character string and the action key character string as the topic event key character string.
It is understood that since generally one event includes at least one of a person, a place, and an action, the event integration apparatus determines key character strings of the topic event based on character strings respectively associated with the person, the place, and the action in the topic event, and can improve the accuracy of the key character strings of the topic event.
In the embodiment of the present application, S30143 may be implemented by S301431 to S301433; that is, the event aggregation device determines the topic event key string including S301431 to S301433 based on one or both of the entity key string and the action key string, and the following description will be made for each step.
S301431, acquiring the number of entity key strings corresponding to the entity key string.
It should be noted that the event integration device may determine the topic event key character string based on the entity key character string; here, when the number of character strings in the key character string of each topic event is limited, the event aggregation apparatus may determine how many character strings are selected from the action key character string to be determined as the topic event key character string based on the number of character strings included in the entity key character string, and may also determine whether to use the action key character string as the topic event key character string based on the number of character strings included in the entity key character string.
S301432, when the number of the entity key character strings is smaller than the fifth set number, combining the entity key character string and the action key character string into a topic event key character string.
It should be noted that, when the number of character strings in the key character string of each topic event is limited and is a fifth set number, and when the number of entity key character strings is smaller than the fifth set number, the event integration device determines that the entity key character strings are not enough as topic event key character strings, and needs to determine the action key character strings as topic event key character strings; that is, at this time, the topic event key string includes an entity key string and an action key string.
And S301433, when the entity key character string is greater than or equal to the fifth set number, determining the entity key character string as the topic event key character string.
It should be noted that, when the number of character strings in the key character string of each topic event is limited and is a fifth set number, and when the number of entity key character strings is greater than or equal to the fifth set number, the event integration device determines that the character strings in the entity key character strings are sufficient as the topic event key character strings, at this time, the topic event key character strings include the entity key character strings.
Referring to fig. 5, fig. 5 is a schematic flow chart of yet another alternative event integration method provided in the embodiment of the present application; as shown in fig. 5, in the embodiment of the present application, S304 is followed by S308 to S311; that is, after the event integration apparatus integrates the event to be integrated into the target topic and obtains the event context including the event to be integrated and at least one topic event, the event integration method further includes S308 to S311, which are described below.
And S308, presenting a search control.
It should be noted that the search control is used for performing information search, and thus, the search control can be used for searching for topic events.
S309, in response to a first search operation acting on the search control, presenting a simplified event context corresponding to the event context and a presentation control corresponding to the simplified event context.
In the embodiment of the application, when a user triggers a search control to search information, if the searched information is information associated with an integrated target topic, the event integration equipment receives a first search operation acting on the search control; thus, at this time, the event integration apparatus performs presentation of search results in response to the first search operation. Here, the presented search result may include a simplified event context corresponding to the event context and a presentation control corresponding to the simplified event context.
It should be noted that, the simplified event context belongs to the event context and is presented in the form of a part of events in the event context; the presentation control is used to present the entire context of the event, such as a "view more" button, expand an icon, and the like.
And S310, presenting an event context in response to a presenting operation acted on the presenting control, wherein each event in the presented event context comprises an event title and an event time, and the event is any one of an event to be integrated and at least one topic event.
It should be noted that when the user triggers the presentation control to view the entire event context, the event integration device also receives the presentation operation acting on the presentation control; at this time, the event integration apparatus presents the entire event context in response to the presenting operation; and, the event integration apparatus implements presentation of the event context by presenting an event title and an event time of each event in the event context, wherein the event is any one of the event to be integrated and the at least one topic event.
In the embodiment of the present application, the search recommendation result may be included in the presented search results, and the search recommendation result refers to recommendation information for the integrated target topic, for example, "do you search for" the title of the integrated target topic "? "; here, when the user performs a trigger operation for the search recommendation result, the event integration apparatus may present a simplified event context corresponding to the event context and a presentation control corresponding to the simplified event context, and present the event context in response to a presentation operation acting on the presentation control; event contexts may also be presented directly; the embodiments of the present application do not limit this.
Exemplarily, referring to fig. 6, fig. 6 is a schematic presentation diagram of an exemplary event context provided by an embodiment of the present application; as shown in FIG. 6, page 6-1 is a presentation page of search results, presented with a simplified event context 6-11 corresponding to the event context, and also presented with a presentation control 6-12; when the presentation control 6-12 is clicked (presentation operation), the entire event context 6-21 as shown in the area 6-2 is presented; here, each event in the presented event context is realized by presenting an event title (e.g., event title 6-211) and an event time (e.g., event time 6-212), and detailed information of the corresponding event is presented by clicking on the event title 6-211.
Exemplarily, referring to fig. 7, fig. 7 is a schematic view showing another exemplary event context provided by an embodiment of the present application; as shown in FIG. 7, page 7-1 is a presentation page of search results, presenting other results along with the search recommendation 7-11, and when the search recommendation 7-11 is clicked, presenting an event context 6-21 shown in area 6-2 in FIG. 6.
And S311, responding to the viewing operation acted on the event title or the event time, and presenting the event detail information.
It should be noted that, an event title or event time is a triggerable control, or a control for viewing details exists corresponding to each event, and when a user triggers the event title, the event time, or the control for viewing details, the event integration device receives a viewing operation acting on the event title or the event time, or a viewing operation acting on the control for viewing details; at this time, the event integration apparatus presents event detail information in response to a viewing operation, wherein the event detail information refers to detailed description information of an event in an event context.
Referring to fig. 8, fig. 8 is a schematic flow chart of yet another alternative event integration method provided in the embodiment of the present application; as shown in fig. 8, in the embodiment of the present application, S304 is followed by S312 to S314; that is, after the event integration apparatus integrates the event to be integrated into the target topic and obtains the event context including the event to be integrated and at least one topic event, the event integration method further includes S312 to S314, which are described below.
And S312, presenting the last information to be presented of the target event.
It should be noted that the target event is any event of the event to be integrated and the at least one topic event included in the event context; the last information to be presented refers to information of the last presentation progress of the target event, for example, the last page of the target event, and the end of the target event.
And S313, presenting the rest events in the event context associated with the target event in the recommendation area corresponding to the information to be presented at last.
It should be noted that, a recommendation area is also presented on the page presenting the last information to be presented, and the recommendation area is used for presenting recommendation information; here, the recommendation information presented by the event integration apparatus in the recommendation area is a remaining event, where the remaining event is any event in the event context except for the target event, and may also be a newly progressed event in the event context except for the target event. The remaining events may be displayed in the form of search content in the search box, may also be displayed in the form of links, and the like, which is not limited in this embodiment of the application.
And S314, responding to the second search operation aiming at the remaining events, and presenting the detailed information of the remaining events.
It should be noted that, when the user triggers the viewing operation for the remaining events, the event integration device also receives a second search operation for the remaining events; at this time, the event integration apparatus presents detailed information of the remaining events in response to the second search operation to complete a response to the second search operation.
In the embodiment of the present application, when the event integration apparatus is implemented as a server, S308 to S314 may be implemented by the server; or the server sends the event context to the terminal, and the event context is realized by the terminal; the embodiments of the present application do not limit this.
It can be understood that the event context can provide gain information outside the search terms, so that the related reading requirements are actively mined on the premise of meeting the search requirements, the integrity of information presentation in a search result page is improved, and the number of times of searching for target information which is not acquired in a search scene is reduced, thereby reducing the resource consumption in the search process, improving the conversion rate of searching, and improving the search frequency of a user.
Next, an exemplary application of the embodiment of the present application in a practical application scenario will be described.
It should be noted that, for a news topic with a long duration (corresponding to the above topic, often consisting of a plurality of events (at least one topic event) that have occurred), when a latest progress event (event to be integrated) of the news topic is obtained, the latest progress event is mounted under the belonging news topic (target topic) to form an event context containing the latest progress event; through the event context information, the development process of the event can be intuitively presented. When the event integration method provided by the embodiment of the application is adopted, the latest progress event can be mounted to the information topic, and the method can be realized through two stages of recall and classification, and comprises the following steps:
first, a news topic (at least one topic to be integrated) that may be relevant is recalled from a news topic database (topic library) according to the event content of the most recent progress event.
It should be noted that there is a topic keyword (topic key character string) corresponding to each news topic in the news topic database, and when the server matches any keyword (sub-topic key character string) in the topic keywords in the latest progress event, the server determines that the news topic is one of the possibly related news topics.
For example, referring to fig. 9, fig. 9 is a schematic diagram of an exemplary news topic recall provided by an embodiment of the present application; as shown in fig. 9, "the first department cancels the prohibition on the first object in response to zhang-di" is the title of the latest progressing event 9-1. In the news topic database 9-2, the news topic 9-21 includes 3 events, and the corresponding topic keywords 9-211 are "nurse" and "subsidiary chang"; the news topics 9-22 comprise 4 events, and the corresponding topic keywords 9-221 are 'H ground' and 'jumping'; the news topic 9-23 includes 4 events, and the corresponding topic keywords 9-231 are "Lisan" and "first object". When the topic keyword of each news topic in the news topic database 9-2 is matched in the latest progressing event 9-1, the news topic 9-23 is one of the news topics that may be related to the recall because the "first object" in the topic keyword 9-231 corresponding to the news topic 9-23 is matched with the "first object" in the title of the latest progressing event 9-1.
It should be noted that the topic keywords are two (first number threshold) keywords corresponding to the largest number of topic events among the keywords of all topic events under the news topic. The topic keywords of each topic event can be obtained through entity identification and Word right analysis (character string weight analysis), where the server may implement entity identification by using an entity identification model (Char-Word Union CNN, CWCNN), and takes the entities of the name type and the place name type in the identified entities as first keywords (entity key character strings); the server can adopt an XGboost model to realize word weight analysis, and uses verbs in words with weights higher than a weight threshold value as second keywords (action key character strings); if the number of words of the first keyword is greater than 3 (a fifth set number), the second keyword is not considered any more, and only the first keyword is taken as the keyword of the topic event; and if the number of words of the first keyword is less than 3, taking the first keyword and the second keyword as the topic keywords of the topic event together.
Then, similarity acquisition is carried out on each news topic in the possibly related news topics and the latest progress events, so as to judge whether each news topic is related to the latest progress events or not based on the similarity.
Referring to fig. 10, fig. 10 is a schematic diagram for determining whether a news topic is related to a latest progress event according to an exemplary embodiment of the present application; as shown in fig. 10, the server obtains the similarity between each possibly related news topic and the latest progress event from three aspects, namely vector semantic similarity 10-1 (semantic similarity), keyword graph similarity 10-2 (string graph similarity) and question-answer semantic similarity 10-3 (question-answer similarity); finally, a fusion model 10-4 (such as an XGBDT model) is used for integrating the vector semantic similarity 10-1, the similarity 10-2 of the keyword graph and the similarity 10-3 of the question-answer semantics to determine a decision score 10-5, so as to determine whether each news topic is related to the latest progress event according to the decision score, and finally, a related news topic 10-6, namely the news topic to which the latest progress event belongs, is obtained.
The calculation process of the similarity of each dimension is explained below. When the vector semantic similarity 10-1 is obtained, the vector semantic similarity 10-1 comprises a vector semantic statistical similarity (semantic statistical similarity) and a vector semantic self-attention similarity (semantic self-attention similarity); the acquisition of the vector semantic statistical similarity is explained first. The server calculates the similarity between the title of each topic event in the news topic and the title of the latest progress event to obtain the title vector semantic similarity (first sub-semantic similarity), and obtains the average title semantic similarity (average first sub-semantic similarity) and the maximum title semantic similarity (maximum first sub-semantic similarity); calculating the similarity between the key words (key character strings of the topic events) of each topic event in the news topic and the key words (key character strings of the event to be integrated) of the latest progress event to obtain the event key word vector semantic similarity (second sub-semantic similarity), and acquiring the average event key word semantic similarity (average second sub-semantic similarity) and the maximum event key word semantic similarity (maximum sub-second semantic similarity); calculating the similarity between the topic keywords of the news topic and the keywords of the latest progress event to obtain the semantic similarity (third sub-semantic similarity) of the topic keyword vectors; here, the heading vector semantic similarity, the average heading semantic similarity, the maximum heading semantic similarity, the event keyword vector semantic similarity, the average event keyword semantic similarity, the maximum event keyword semantic similarity, and the topic keyword vector semantic similarity are collectively referred to as a vector semantic statistical similarity.
It should be noted that, calculating the similarity between the title of each topic event in the news topic and the title of the latest progress event, calculating the similarity between the keyword of each topic event in the news topic and the keyword of the latest progress event, and calculating the similarity between the topic keyword of the news topic and the keyword of the latest progress event, may be respectively implemented by a network model (semantic statistical similarity model).
Referring to fig. 11a, fig. 11a is a schematic diagram of an exemplary model for obtaining semantic similarity of vectors according to an embodiment of the present disclosure; as shown in FIG. 11a, the network model 11-1 is used to obtain the degree of similarity between two text pairs in terms of vector semantics, and the network model 11-1 is a two-tower structure. Here, the processing procedure of the network model 11-1 is explained with a procedure of acquiring semantic similarity of topic keyword vectors: inputting topic keywords 11-2 corresponding to a news topic into a first network branch 11-11 (a first semantic branch) in a network model 11-1 to obtain semantic vectors 11-3 corresponding to the topic keywords 11-2, and inputting keywords 11-4 corresponding to a latest progress event into a second network branch 11-12 (a second semantic branch) in the network model 11-1 to obtain semantic vectors 11-5 corresponding to the keywords 11-4; and then, the similarity between the semantic vector 11-3 and the semantic vector 11-5 is obtained through cosine similarity, and the topic keyword vector semantic similarity 11-6 is obtained. In addition, the first network branch and the second network branch may be the same network branch, such as both "Bert" models; moreover, the semantic vector 11-3 and the semantic vector 11-5 are both "cls" vectors of the first dimension output by each network branch, and the corresponding dimension may be 768 dimensions, for example; and in the corresponding training process of the network model 11-1, 1 ten thousand pairs of marked sample pairs and cross loss functions can be used for training.
When obtaining the vector semantic self-attention similarity, refer to fig. 11b, where fig. 11b is a schematic diagram of another exemplary model for obtaining the vector semantic similarity provided in the embodiment of the present application; as shown in fig. 11b, the encoding module 11-71 (e.g., "Bert" model) in the network model 11-7 is configured to obtain a semantic vector of each event in an event sequence formed by the latest progressing event and at least one topic event 11-72 in one topic, and obtain vector feature sequences 11-73 sequentially corresponding to the event sequence; here, the server determines the division flag of the latest progress event as 0 (difference flag) and the division flag of the topic event as 1 (difference flag). The server obtains a semantic vector corresponding to 0 and a semantic vector corresponding to 1, then combines the vector semantic corresponding to 0 with the semantic vector of the latest progress event (semantic features to be integrated), combines the vector semantic corresponding to 1 with the semantic vector of the topic event (topic event semantic features), and finally inputs all the combined results into a conversion (TRANSFORMER) model 11-74, so that the self-attention similarity of the vector semantic can be obtained.
It should be noted that the conversion models 11-74 are a network model in natural language processing, and the conversion models 11-74 may be stacked by at least one (for example, 3) conversion models. And the conversion models 11-74 automatically judge which topic event the newly-progressed event should pay attention to for matching by calculating the relationship between two events (two sequence units) in the event sequence, the newly-progressed event and the topic event in a self-adaptive manner.
When the similarity of the keyword graphs is 10-2, the server constructs the keyword graphs corresponding to each news topic and the keyword graphs of the latest progress events, further obtains representations of the two keyword graphs through a Graph Convolution Network (GCN), and obtains the similarity of the keyword graphs by calculating the cosine distance between the representations of the two keyword graphs.
The construction mode of the keyword graph corresponding to each news topic is as follows: and taking each keyword corresponding to each topic event under each news topic as a graph node, if two keywords corresponding to two graph nodes belong to the same topic event, establishing an edge between the two graph nodes, and finally obtaining a keyword graph corresponding to each news topic.
Illustratively, referring to table 1, including a title column of a topic event and a keyword column of the topic event, table 1 is as follows:
TABLE 1
Figure BDA0003270648200000241
With reference to fig. 12, a keyword graph constructed based on table 1, fig. 12 is a schematic diagram of an exemplary keyword graph provided in an embodiment of the present application; as shown in fig. 12, the graph node in the keyword graph 12-1 is determined based on the keyword corresponding to each topic event in table 1, and includes: a first subject, lie three, manage, block, first organization, first department, prosecution, plan, second department, third department, ban, third organization, second subject, zhang di, and third subject; the edges between the nodes of the graphs are shown in fig. 12.
Similarly, referring to fig. 13, fig. 13 is a schematic diagram of another exemplary keyword graph provided in an embodiment of the present application; as shown in FIG. 13, the graph nodes in the keyword graph 13-1 are keywords of the latest progress event: a first department, zhang II, and a first object; and, any two of the first department, zhang II, and the first object have edges.
When the similarity of question-answer semantics is 10-3, the server constructs a sentence sequence (a sentence sequence to be answered) based on the title and the keywords of the information topic and the latest progress event; and obtaining the output of the sentence sequence through an MRC-BERT model, and determining the similarity of question-answer semantics based on the output first-dimension characteristics.
Referring to fig. 14, fig. 14 is a schematic diagram illustrating exemplary obtaining similarity of question and answer semantics according to an embodiment of the present application; as shown in fig. 14, the sentence sequence 14-1 is input to a network model 14-2 (for example, an "MRC-BERT" model), answer information 14-3 is obtained, and the similarity 10-3 of question-answer semantics of the question-answer semantics is determined based on the answer information 14-3. Wherein "CLS" in the sentence sequence 14-1 represents the beginning of the sentence sequence, "SEP" represents the division between sentences, and the sentence sequence 14-1 further includes a question and a latest progress event constructed based on news topics. For the latest progress event 9-1, "the first department responds to zhang di and cancels the ban on the first object," the keyword is "the first object" and "lie three," and the news topic 9-23 with the title "lie three blocks the first object," the sentence sequence constructed is "[ CLS ] whether the sentence is the progress of the keyword blocking the first object topic for the first object and lie three? The [ SEP ] first department responds to the SEP which cancels the prohibition of the first object by Zhang two, and can also respond to the CLS that whether the prohibition event of the Zhang two which cancels the first object belongs to the question [ SEP ] that the keyword locks the first object by the first object and the Li three which cancels the Li three "
The following continues to illustrate an exemplary application of the event integration method provided by the embodiments of the present application.
It should be noted that, when the network model 11-1, the network model 11-7, the network model for obtaining the similarity of the character string graph, and the network model 14-2 are model-trained, they may be trained on more than 2000 topics, and 5w topic-event sample pairs are constructed by selecting them. For example, examined events that are online (corresponding to similarity) and not online (corresponding to dissimilarity) may be selected from all topics operated online since a time point, the online events are used as positive samples, the not-online events are used as negative samples, and the topic-event sample pairs are constructed according to the event sequence. For example: topic a contains five online events abcde and two offline events fg (where both events de and fg occur after event abc), then 6 topic-event sample pairs can be constructed:
positive sample: abc- > d, abcd- > e;
negative sample: abc- > f, abc- > g, abcd- > f, abcd- > g.
Referring to FIG. 15, FIG. 15 is a schematic diagram of exemplary feature importance provided by embodiments of the present application; as shown in fig. 15, the ordinate represents the importance indicators, and the importance indicators in descending order are: the method comprises the following steps of (1) similarity of question and answer semantics 10-3, vector semantics self-attention similarity 15-1, maximum title semantic similarity 15-2, similarity of keyword graphs 10-2, maximum event keyword semantic similarity 15-3, average title semantic similarity 15-4 and topic keyword vector semantic similarity 15-5; wherein the maximum title semantic similarity 15-2, the maximum event keyword semantic similarity 15-3, the average event keyword semantic similarity 15-4, and the topic keyword vector semantic similarity 15-5, together constitute the vector semantic statistical similarity in the vector semantic similarity 10-1 of fig. 10.
In addition, when the corrosion test was performed on the similarity of 8 dimensions (the similarity of question-answer semantics is 10-3, the vector semantics is 15-1 from attention, the maximum topic semantic similarity is 15-2, the similarity of keyword graph is 10-2, the maximum event keyword semantic similarity is 15-3, the average topic semantic similarity is 15-4, the topic keyword vector semantic similarity is 15-5, and the average event keyword semantic similarity), the experimental results are shown in table 2:
TABLE 2
Figure BDA0003270648200000251
Figure BDA0003270648200000261
As is apparent from table 2, when event integration is performed using the similarity of 8 dimensions, the corresponding "AUC" is 0.9420; "AUC" decreases by-0.0025 when the average title semantic similarity 15-4 is removed, "AUC" decreases by-0.0101 when the maximum title semantic similarity 15-2 is removed, "AUC" decreases by-0.0043 when the average event keyword semantic similarity is removed, "AUC" decreases by-0.0000 when the maximum event keyword semantic similarity 15-3 is removed, "AUC" decreases by-0.0013 when the topic keyword vector semantic similarity 15-5 is removed, "AUC" decreases by-0.0098 when the vector semantic self-attention similarity 15-1 is removed, "AUC" decreases by-0.0081 when the similarity 10-2 of the keyword graph is removed, and "AUC" decreases by-0.0152 when the similarity 10-3 of the question and answer semantics is removed; therefore, it is shown that the similarity of 8 dimensions contributes to the determination of the result in the event integration, and is consistent with the result corresponding to the importance index.
The "AUC" is an area enclosed by coordinate axes under an ROC Curve (Receiver Operating Characteristic Curve), and is a performance index.
In the embodiment of the application, the selection can be further performed based on the accuracy and the time consumption of each similarity, and whether the latest progress event is matched with the topic event is determined. Referring to table 3, table 3 describes the time consumption corresponding to the network model 11-1, the network model 11-7, the network model for obtaining similarity of the character string graph, and the network model 14-2.
TABLE 3
Figure BDA0003270648200000262
As can be seen from Table 3, the descending sequence based on time consumption is: the network model 11-1, the network model for obtaining the similarity of the character string graph, the network model 11-7 and the network model 14-2. The similarity 10-3 of question-answer semantics is obtained through a network model 14-2, the self-attention similarity 15-1 of vector semantics is obtained through a network model 11-7, the maximum title semantic similarity 15-2, the maximum event keyword semantic similarity 15-3, the average title semantic similarity 15-4, the topic keyword vector semantic similarity 15-5 and the average event keyword semantic similarity are obtained through the network model 11-1, and the similarity 10-2 of a keyword graph is obtained through the network model for obtaining the similarity of a character string graph; therefore, the server takes the network model 11-7 and the network model 14-2 which are fastest and highest in accuracy as an initial calculation scheme, when the similarity obtained by the network model 11-7 and the network model 14-2 is high enough, matching is directly judged, when the similarity is low enough, mismatching is judged, the similarity is calculated by continuously adopting the network model 11-1 and the network model for obtaining the similarity of the character string diagram only when the similarity is in a middle range, and all obtained similarity combination characteristics are input into the fusion model for final judgment.
Illustratively, the determination process is shown as equation (1):
Figure BDA0003270648200000271
wherein S is 1 Similarity, S, output for network model 11-7 2 Similarity for the network model 14-2 output.
It can be understood that the embodiment of the application integrates information of a plurality of events of topics by adopting multi-dimensional heterogeneous features, and greatly enhances the accuracy and the rationality of similarity calculation by using three heterogeneous feature models of a feature based on vector semantics, a feature based on a keyword graph and a feature based on question-answer semantics. In addition, by adopting the event integration method provided by the embodiment of the application, automatic batch event integration can be realized, manual participation is not needed, and the event integration efficiency can be improved.
Continuing with the exemplary structure of the event integration apparatus 455 provided by the embodiments of the present application as software modules, in some embodiments, as shown in fig. 2, the software modules stored in the event integration apparatus 455 of the memory 450 may include:
the information acquiring module 4551 is configured to acquire an event to be integrated and acquire at least one topic to be integrated, where each topic to be integrated includes at least one topic event;
a similarity obtaining module 4552, configured to select one or more of a semantic similarity, a string graph similarity, and a question-answer similarity based on a selection logic, to obtain a target similarity between the event to be integrated and each topic to be integrated, where the semantic similarity refers to a similarity in a semantic feature aspect, the string graph similarity refers to a similarity in a graph feature aspect corresponding to a key string, and the question-answer similarity refers to a similarity in a question-answer feature aspect;
a topic determination module 4553, configured to determine, based on the target similarity, a target topic to which the event to be integrated belongs from at least one topic to be integrated;
an event integration module 4554, configured to integrate the event to be integrated into the target topic, and obtain an event context including the event to be integrated and at least one of the topic events.
In an embodiment of the present application, the selection logic includes one or more of a selection order, an acquisition speed, an accuracy rate, a topic size, a selection number, a topic type, a model training scale, a model application range, and a model application scale, where the selection order is determined based on a priority of similarity, the acquisition speed is a speed of acquiring similarity, the accuracy rate is an accuracy degree of similarity, the topic type is a content form of the topic to be integrated, the topic scale is a scale of at least one topic to be integrated, and the model training scale is a training data scale corresponding to a network model for acquiring each similarity.
In this embodiment of the application, when the selection logic includes the selection order, the similarity obtaining module 4552 is further configured to sequentially select, based on the selection order, a first set number of similarities from descending orders of priorities of the semantic similarity, the string graph similarity, and the question-answer similarity; obtaining a comparison result of the similarity of the first set quantity and a similarity threshold; when the comparison result is a similar result of the event to be integrated and the topic to be integrated, determining the first set number of similarities as the target similarity between the event to be integrated and the topic to be integrated; when the comparison result is a pending similarity result of the event to be integrated and the topic to be integrated, continuing to select residual similarities based on the selection sequence until the similarity result is determined or three of the semantic similarity, the string graph similarity and the question-answer similarity are selected, and determining a plurality of selected similarities as the target similarity between the event to be integrated and the topic to be integrated, wherein the residual similarities are similarities of the semantic similarity, the string graph similarity and the question-answer similarity except the similarity of the first set number.
In this embodiment of the application, when the selection logic includes the obtaining speed and the topic scale, the similarity obtaining module 4552 is further configured to, when the topic scale is larger than a set scale, sequentially select a second set number of similarities from a descending order of the obtaining speeds of the semantic similarity, the string graph similarity, and the question-answer similarity, so as to obtain the target similarity between the event to be integrated and the topic to be integrated; when the topic scale is smaller than or equal to the set scale, sequentially selecting a third set number of similarities from descending order of the semantic similarity, the character string graph similarity and the acquisition speed of the question-answer similarity to obtain the target similarity between the event to be integrated and the topic to be integrated, wherein the second set number is smaller than the third set number.
In this embodiment of the application, when the target similarity includes multiple ones of the semantic similarity, the string graph similarity, and the question-answer similarity, the topic determination module 4553 is further configured to determine, based on an accuracy, a weight ratio of each similarity in the target similarity; fusing multiple similarity degrees in the target similarity degrees based on the weight ratio to obtain a discrimination similarity degree; and selecting the topic to be integrated corresponding to the highest discrimination similarity from at least one topic to be integrated to obtain the target topic to which the event to be integrated belongs.
In this embodiment of the application, the semantic similarity includes a semantic self-attention similarity, and the similarity obtaining module 4552 is further configured to obtain a to-be-integrated semantic feature corresponding to the to-be-integrated event and a topic event semantic feature corresponding to each topic event in the to-be-integrated topic; enhancing the semantic features to be integrated based on the distinguishing identifications of the events to be integrated and the topic events to obtain first enhanced semantic features, and enhancing the semantic features of the topic events based on the distinguishing identifications to obtain second enhanced semantic features; and forming the first enhanced semantic features and at least one second enhanced semantic feature corresponding to the topic to be integrated into a semantic feature sequence, and determining the semantic self-attention similarity based on self-attention information between two sequence units in the semantic feature sequence.
In this embodiment of the application, the similarity obtaining module 4552 is further configured to obtain, in each topic to be integrated, a first sub-semantic similarity between the title of each topic event and the title of the event to be integrated, and determine an average first sub-semantic similarity and a maximum first sub-semantic similarity based on the first sub-semantic similarity; in each topic to be integrated, obtaining a second sub-semantic similarity between a topic event key character string corresponding to each topic event and a to-be-integrated event key character string corresponding to the to-be-integrated event, and determining an average second sub-semantic similarity and a maximum sub-second semantic similarity based on the second sub-semantic similarity; acquiring a third sub-semantic similarity between the topic key character string corresponding to each topic to be integrated and the event key character string to be integrated; determining the average first sub-semantic similarity, the maximum first sub-semantic similarity, the average second sub-semantic similarity, the maximum second sub-semantic similarity and the third sub-semantic similarity as the semantic statistical similarity between the event to be integrated and each topic to be integrated.
In the embodiment of the application, the first sub-semantic similarity, the second sub-semantic similarity and the third sub-semantic similarity are obtained through a semantic statistical similarity model; the event integrating device 455 further includes a model training module 4555, configured to obtain training samples, where the training samples include a first character string sample, a second character string sample, and a labeled similarity; acquiring a first estimated semantic corresponding to the first character string sample by adopting a first semantic branch in a to-be-trained semantic statistic similarity model, acquiring a second estimated semantic corresponding to the second character string sample by adopting a second semantic branch in the to-be-trained semantic statistic similarity model, and determining the estimated similarity between the first character string sample and the second character string sample based on a comparison result between the first estimated semantic and the second estimated semantic; and performing back propagation in the semantic statistical similarity model to be trained based on the difference between the estimated similarity and the labeling similarity to obtain the semantic statistical similarity model.
In this embodiment of the application, the similarity obtaining module 4552 is further configured to determine, in each topic to be integrated, each sub-topic event key character string corresponding to at least one topic event as a graph node, and establish an edge between two graph nodes corresponding to two sub-topic event key character strings belonging to the same topic event, so as to obtain a first key character string graph; constructing a second key character string graph based on the key character string of the event to be integrated corresponding to the event to be integrated; determining the similarity of the character string graph between the event to be integrated and each topic to be integrated based on a comparison result between the vector representation of the first key character string graph and the vector representation of the second key character string graph.
In this embodiment of the application, the similarity obtaining module 4552 is further configured to combine a sentence sequence to be answered based on a title of each topic to be integrated, a topic key character string, and the event to be integrated; acquiring answer information of the sentence sequence to be answered; determining the question-answer similarity between the event to be integrated and each topic to be integrated based on the answer information.
In this embodiment of the application, the information obtaining module 4551 is further configured to obtain, in a topic library, a matching result between a topic key character string corresponding to each topic and the event to be integrated, where the topic library includes a plurality of topics; when at least one sub-topic key character string in the topic key character strings is matched with the event to be integrated based on the matching result, determining the topic corresponding to the matching result as the topic to be integrated matched with the event to be integrated; and acquiring at least one topic to be integrated matched with the event to be integrated from the topic library.
In this embodiment of the application, the information obtaining module 4551 is further configured to obtain, from at least one topic event corresponding to each topic in the topic library, a topic event key character string corresponding to each topic event; counting the number of topic events corresponding to each sub-topic event key character string in the topic event key character string; and combining the sub-topic event key character strings with the maximum topic event number in a fourth set number into the topic key character string corresponding to each topic.
In this embodiment of the application, the information obtaining module 4551 is further configured to perform entity identification on each topic event to obtain an entity key character string corresponding to a preset entity type; carrying out character string weight analysis on each topic event to obtain an action key character string; determining the topic event key string based on one or both of the entity key string and the action key string.
In this embodiment of the present application, the information obtaining module 4551 is further configured to obtain the number of entity key character strings corresponding to the entity key character strings; when the number of the entity key character strings is less than a fifth set number, combining the entity key character strings and the action key character strings into the topic event key character strings; when the entity key character string is greater than or equal to the fifth set number, determining the entity key character string as the topic event key character string.
In this embodiment of the present application, the event integration apparatus 455 further includes an event presentation module 4556, configured to present a search control; in response to a first search operation acting on the search control, presenting a simplified event context corresponding to the event context belonging to which the simplified event context belongs and a presentation control corresponding to the simplified event context for presenting the event context; presenting the event context in response to a presentation operation acting on the presentation control, wherein each event in the presented event context comprises an event title and an event time, and the event is any one of the to-be-integrated event and at least one of the topic events; presenting event detail information in response to a viewing operation acting on the event title or the event time.
In this embodiment of the application, the event presentation module 4556 is further configured to present information to be presented last of a target event, where the target event is any one of the to-be-integrated event and at least one topic event included in the event context; presenting the remaining events in the event context associated with the target event in a recommendation area corresponding to the last information to be presented, wherein the remaining events are any events except the target event in the event context; and presenting detailed information of the remaining events in response to a second search operation on the remaining events.
Embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device (event integration device) reads the computer instructions from the computer-readable storage medium, and executes the computer instructions, so that the computer device executes the event integration method described in the embodiment of the present application.
Embodiments of the present application provide a computer-readable storage medium storing executable instructions, which when executed by a processor, cause the processor to perform an event integration method provided by embodiments of the present application, for example, the event integration method shown in fig. 3.
In some embodiments of the present application, the computer-readable storage medium may be memory such as FRAM, ROM, PROM, EPROM, EEPROM, flash memory, magnetic surface memory, optical disks, or CD-ROM; or may be various devices including one or any combination of the above memories.
In some embodiments of the application, the executable instructions may be in the form of a program, software module, script, or code, written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages, and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
By way of example, executable instructions may correspond, but do not necessarily have to correspond, to files in a file system, and may be stored in a portion of a file that holds other programs or data, such as in one or more scripts in a hypertext Markup Language (HTML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).
As an example, the executable instructions may be deployed to be executed on one computer device (in this case, this one computer device is the event integration device), or on multiple computer devices located at one site (in this case, multiple computer devices located at one site are the event integration device), or on multiple computer devices distributed at multiple sites and interconnected by a communication network (in this case, multiple computer devices distributed at multiple sites and interconnected by a communication network are the event integration device).
In summary, according to the embodiments of the present application, when the target topic to which the event to be integrated belongs is determined in the at least one topic to be integrated, the target similarity between the event to be integrated and each topic to be integrated is determined by determining the target similarity between the event to be integrated and each topic to be integrated, that is, the target topic is determined by directly comparing the event to be integrated and each topic to be integrated, and the target similarity includes one or more of semantic similarity, character string graph similarity, and question and answer similarity, so that whether each topic to be integrated belongs to the event to be integrated can be accurately determined by the obtained target similarity, and further, when the event to be integrated is integrated into the target topic, the accuracy of event integration can be improved. In addition, the efficiency of event integration can be improved by recalling and then acquiring the similarity in the event integration process.
The above description is only an example of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, and improvement made within the spirit and scope of the present application are included in the protection scope of the present application.

Claims (15)

1. An event integration method, comprising:
acquiring events to be integrated and acquiring at least one topic to be integrated, wherein each topic to be integrated comprises at least one topic event;
based on selection logic, selecting one or more from semantic similarity, character string graph similarity and question-answer similarity to obtain target similarity between the event to be integrated and each topic to be integrated, wherein the semantic similarity refers to similarity in the aspect of semantic features, the character string graph similarity refers to similarity in the aspect of graph features corresponding to key character strings, and the question-answer similarity refers to similarity in the aspect of question-answer features;
determining a target topic to which the event to be integrated belongs from at least one topic to be integrated based on the target similarity;
integrating the events to be integrated into the target topic to obtain an event context comprising the events to be integrated and at least one topic event.
2. The method of claim 1, wherein the selection logic comprises one or more of a selection order, an acquisition speed, an accuracy rate, a topic size, a selection number, a topic type, a model training scale, a model application range and a model application scale, wherein the selection order is determined based on a priority of similarity, the acquisition speed is a speed of acquiring similarity, the accuracy rate is an accuracy degree of similarity, the topic type is a content form of the topic to be integrated, the topic size is a size of at least one topic to be integrated, and the model training scale is a training data scale corresponding to a network model for acquiring each similarity.
3. The method according to claim 2, wherein when the selection logic includes the selection order, the obtaining of the target similarity between the event to be integrated and each of the topics to be integrated based on the selection logic by selecting one or more of semantic similarity, string graph similarity, and question-answer similarity comprises:
sequentially selecting a first set number of similarities from descending order of priorities of the semantic similarity, the string graph similarity and the question-answer similarity based on the selection sequence;
obtaining a comparison result of the similarity of the first set quantity and a similarity threshold;
when the comparison result is a similar result of the event to be integrated and the topic to be integrated, determining the first set number of similarities as the target similarity between the event to be integrated and the topic to be integrated;
when the comparison result is a pending similarity result of the event to be integrated and the topic to be integrated, continuing to select residual similarities based on the selection sequence until the similarity result is determined or three of the semantic similarity, the string graph similarity and the question-answer similarity are selected, and determining a plurality of selected similarities as the target similarity between the event to be integrated and the topic to be integrated, wherein the residual similarities are similarities of the semantic similarity, the string graph similarity and the question-answer similarity except the similarity of the first set number.
4. The method according to claim 2, wherein when the selection logic includes the acquisition speed and the topic size, the selecting one or more of semantic similarity, string graph similarity, and question-answer similarity based on the selection logic to obtain a target similarity between the event to be integrated and each topic to be integrated includes:
when the topic scale is larger than a set scale, sequentially selecting a second set number of similarities from the descending order of the acquisition speed of the semantic similarity, the character string graph similarity and the question-answer similarity to obtain the target similarity between the event to be integrated and the topic to be integrated;
when the topic scale is smaller than or equal to the set scale, sequentially selecting a third set number of similarities from descending order of the semantic similarity, the character string graph similarity and the acquisition speed of the question-answer similarity to obtain the target similarity between the event to be integrated and the topic to be integrated, wherein the second set number is smaller than the third set number.
5. The method according to any one of claims 1 to 4, wherein when the target similarity includes a plurality of the semantic similarity, the string graph similarity, and the question-answer similarity, the determining, based on the target similarity, a target topic to which the event to be integrated belongs from at least one of the topics to be integrated includes:
determining the weight proportion of various similarity degrees in the target similarity degrees based on the accuracy rate;
fusing multiple similarity degrees in the target similarity degrees based on the weight ratio to obtain a discrimination similarity degree;
and selecting the topic to be integrated corresponding to the highest discrimination similarity from at least one topic to be integrated to obtain the target topic to which the event to be integrated belongs.
6. The method according to any one of claims 1 to 4, wherein the semantic similarity comprises a semantic self-attention similarity obtained by:
acquiring semantic features to be integrated corresponding to the events to be integrated and semantic features of topic events corresponding to each topic event in the topics to be integrated;
enhancing the semantic features to be integrated based on the distinguishing identifications of the events to be integrated and the topic events to obtain first enhanced semantic features, and enhancing the semantic features of the topic events based on the distinguishing identifications to obtain second enhanced semantic features;
and forming the first enhanced semantic features and at least one second enhanced semantic feature corresponding to the topic to be integrated into a semantic feature sequence, and determining the semantic self-attention similarity based on self-attention information between two sequence units in the semantic feature sequence.
7. The method according to any one of claims 1 to 4, wherein the semantic similarity comprises a semantic statistical similarity obtained by:
in each topic to be integrated, acquiring a first sub-semantic similarity between the title of each topic event and the title of the event to be integrated, and determining an average first sub-semantic similarity and a maximum first sub-semantic similarity based on the first sub-semantic similarity;
in each topic to be integrated, obtaining a second sub-semantic similarity between a topic event key character string corresponding to each topic event and a to-be-integrated event key character string corresponding to the to-be-integrated event, and determining an average second sub-semantic similarity and a maximum sub-second semantic similarity based on the second sub-semantic similarity;
acquiring a third sub-semantic similarity between the topic key character string corresponding to each topic to be integrated and the event key character string to be integrated;
determining the average first sub-semantic similarity, the maximum first sub-semantic similarity, the average second sub-semantic similarity, the maximum second sub-semantic similarity and the third sub-semantic similarity as the semantic statistical similarity between the event to be integrated and each topic to be integrated.
8. The method according to any one of claims 1 to 4, wherein the target similarity includes the string graph similarity obtained by:
in each topic to be integrated, determining each sub-topic event key character string corresponding to at least one topic event as a graph node, and establishing an edge between two graph nodes corresponding to two sub-topic event key character strings belonging to the same topic event to obtain a first key character string graph;
constructing a second key character string graph based on the key character string of the event to be integrated corresponding to the event to be integrated;
determining the similarity of the character string graph between the event to be integrated and each topic to be integrated based on a comparison result between the vector representation of the first key character string graph and the vector representation of the second key character string graph.
9. The method according to any one of claims 1 to 4, wherein the target similarity includes the question-answer similarity, and the question-answer similarity is obtained by:
combining sentence sequences to be answered based on the title of each topic to be integrated, the topic key character strings and the events to be integrated;
acquiring answer information of the sentence sequence to be answered;
determining the question-answer similarity between the event to be integrated and each topic to be integrated based on the answer information.
10. The method as claimed in any one of claims 1 to 4, wherein the obtaining at least one topic to be integrated comprises:
in a topic library, obtaining a matching result of a topic key character string corresponding to each topic and the event to be integrated, wherein the topic library comprises a plurality of topics;
when at least one sub-topic key character string in the topic key character strings is matched with the event to be integrated based on the matching result, determining the topic corresponding to the matching result as the topic to be integrated matched with the event to be integrated;
and acquiring at least one topic to be integrated matched with the event to be integrated from the topic library.
11. The method as claimed in claim 10, wherein before obtaining, in the topic library, a matching result between the topic keyword string corresponding to each topic and the event to be integrated, the method further comprises:
in at least one topic event corresponding to each topic in the topic library, performing entity identification on each topic event to obtain an entity key character string corresponding to a preset entity type;
carrying out character string weight analysis on each topic event to obtain an action key character string;
determining a topic event key string based on one or both of the entity key string and the action key string;
counting the number of topic events corresponding to each sub-topic event key character string in the topic event key character string;
and sequentially selecting a fourth set number of the sub-topic event key character strings from the descending order of the number of the topic events corresponding to the topic event key character strings to obtain the topic key character strings corresponding to each topic.
12. The method as claimed in any one of claims 1 to 4, wherein after integrating the event to be integrated into the target topic, resulting in an event context comprising the event to be integrated and at least one of the topic events, the method further comprises:
presenting a search control;
in response to a first search operation acting on the search control, presenting a simplified event context corresponding to the event context to which the simplified event context belongs and a presentation control corresponding to the simplified event context for presenting the event context;
presenting the event context in response to a presentation operation acting on the presentation control, wherein each event in the presented event context comprises an event title and an event time, and the event is any one of the to-be-integrated event and at least one topic event;
presenting event detail information in response to a viewing operation acting on the event title or the event time.
13. An event integration apparatus, comprising:
the system comprises an information acquisition module, a topic integration module and a topic integration module, wherein the information acquisition module is used for acquiring events to be integrated and acquiring at least one topic to be integrated, and each topic to be integrated comprises at least one topic event;
the similarity obtaining module is used for selecting one or more of semantic similarity, character string graph similarity and question and answer similarity based on selection logic to obtain target similarity between the event to be integrated and each topic to be integrated, wherein the semantic similarity refers to similarity in the aspect of semantic features, the character string graph similarity refers to similarity in the aspect of graph features corresponding to key character strings, and the question and answer similarity refers to similarity in the aspect of question and answer features;
the topic determination module is used for determining a target topic to which the event to be integrated belongs from at least one topic to be integrated based on the target similarity;
and the event integration module is used for integrating the events to be integrated into the target topic to obtain an event context comprising the events to be integrated and at least one topic event.
14. An event integration apparatus, comprising:
a memory for storing executable instructions;
a processor for implementing the event integration method of any one of claims 1 to 12 when executing executable instructions stored in the memory.
15. A computer-readable storage medium storing executable instructions for implementing the event integration method of any one of claims 1 to 12 when executed by a processor.
CN202111111428.XA 2021-09-18 2021-09-18 Event integration method, device, equipment and computer readable storage medium Pending CN115840796A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202111111428.XA CN115840796A (en) 2021-09-18 2021-09-18 Event integration method, device, equipment and computer readable storage medium
PCT/CN2022/111164 WO2023040516A1 (en) 2021-09-18 2022-08-09 Event integration method and apparatus, and electronic device, computer-readable storage medium and computer program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111111428.XA CN115840796A (en) 2021-09-18 2021-09-18 Event integration method, device, equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN115840796A true CN115840796A (en) 2023-03-24

Family

ID=85574458

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111111428.XA Pending CN115840796A (en) 2021-09-18 2021-09-18 Event integration method, device, equipment and computer readable storage medium

Country Status (2)

Country Link
CN (1) CN115840796A (en)
WO (1) WO2023040516A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116361468B (en) * 2023-04-03 2024-05-03 北京中科闻歌科技股份有限公司 Event context generation method, electronic equipment and storage medium
CN117056459B (en) * 2023-08-07 2024-05-10 北京网聘信息技术有限公司 Vector recall method and device

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108549647B (en) * 2018-01-17 2022-04-15 中移在线服务有限公司 Method for realizing active prediction of emergency in mobile customer service field without marking corpus based on SinglePass algorithm
CN111382276B (en) * 2018-12-29 2023-06-20 中国科学院信息工程研究所 Event development context graph generation method
CN110795607A (en) * 2019-10-29 2020-02-14 中国人民解放军32181部队 Equipment guarantee data matching method and system based on multi-stage similarity calculation
CN111444337B (en) * 2020-02-27 2022-07-19 桂林电子科技大学 Topic tracking method based on improved KL divergence

Also Published As

Publication number Publication date
WO2023040516A1 (en) 2023-03-23

Similar Documents

Publication Publication Date Title
CN111444428B (en) Information recommendation method and device based on artificial intelligence, electronic equipment and storage medium
US20220180882A1 (en) Training method and device for audio separation network, audio separation method and device, and medium
CN111241311B (en) Media information recommendation method and device, electronic equipment and storage medium
CN111897964B (en) Text classification model training method, device, equipment and storage medium
CN108885623A (en) The lexical analysis system and method for knowledge based map
CN111368075A (en) Article quality prediction method and device, electronic equipment and storage medium
WO2023040516A1 (en) Event integration method and apparatus, and electronic device, computer-readable storage medium and computer program product
EP4310695A1 (en) Data processing method and apparatus, computer device, and storage medium
CN112257841A (en) Data processing method, device and equipment in graph neural network and storage medium
CN113010702A (en) Interactive processing method and device for multimedia information, electronic equipment and storage medium
CN113573128A (en) Audio processing method, device, terminal and storage medium
CN116955591A (en) Recommendation language generation method, related device and medium for content recommendation
CN116977701A (en) Video classification model training method, video classification method and device
CN115510194A (en) Question and answer sentence retrieval method and device, electronic equipment and storage medium
CN113033209B (en) Text relation extraction method and device, storage medium and computer equipment
CN115129849A (en) Method and device for acquiring topic representation and computer readable storage medium
Testoni et al. Quantifiers in a multimodal world: Hallucinating vision with language and sound
CN114510942A (en) Method for acquiring entity words, and method, device and equipment for training model
CN116484085A (en) Information delivery method, device, equipment, storage medium and program product
CN112861474A (en) Information labeling method, device, equipment and computer readable storage medium
CN113590772A (en) Abnormal score detection method, device, equipment and computer readable storage medium
Hendronoto et al. Implementation of ALBERT for text mining on Jacob voice chatbot
CN113569557B (en) Information quality identification method, device, equipment, storage medium and program product
CN116431779B (en) FAQ question-answering matching method and device in legal field, storage medium and electronic device
CN117540024B (en) Classification model training method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination