WO2023040516A1 - Event integration method and apparatus, and electronic device, computer-readable storage medium and computer program product - Google Patents

Event integration method and apparatus, and electronic device, computer-readable storage medium and computer program product Download PDF

Info

Publication number
WO2023040516A1
WO2023040516A1 PCT/CN2022/111164 CN2022111164W WO2023040516A1 WO 2023040516 A1 WO2023040516 A1 WO 2023040516A1 CN 2022111164 W CN2022111164 W CN 2022111164W WO 2023040516 A1 WO2023040516 A1 WO 2023040516A1
Authority
WO
WIPO (PCT)
Prior art keywords
event
similarity
topic
integrated
semantic
Prior art date
Application number
PCT/CN2022/111164
Other languages
French (fr)
Chinese (zh)
Inventor
房育勋
朱斌
刘晨
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2023040516A1 publication Critical patent/WO2023040516A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying

Definitions

  • the present application relates to information processing technology in the field of computer application, and in particular to an event integration method, device, electronic equipment, computer-readable storage medium and computer program product.
  • clustering is usually used, that is, the latest progress event and the topic are incrementally clustered, so as to determine the topic to which the latest progress event belongs according to the cluster center and threshold.
  • accuracy rate of the clustering is low, which affects the accuracy of the determined topic, and then, when the latest progress event is integrated into the topic, it affects The accuracy of event integration is improved.
  • Embodiments of the present application provide an event integration method, device, electronic device, computer-readable storage medium, and computer program product, which can improve the accuracy of event integration.
  • the embodiment of this application provides an event integration method, including:
  • semantic similarity refers to the similarity in terms of semantic features
  • similarity in character string graphs refers to the similarity in graph features corresponding to key character strings
  • question-answer similarity refers to the similarity in question-and-answer features
  • determining a target topic to which the event to be integrated belongs from at least two topics to be integrated;
  • Integrating the event to be integrated into the target topic to obtain an event context, wherein the event context includes the event to be integrated and at least one topic event.
  • An embodiment of the present application provides an event integration device, including:
  • An information acquisition module configured to acquire events to be integrated, and acquire at least two topics to be integrated, wherein each of the topics to be integrated includes at least one topic event;
  • the similarity acquisition module is configured to determine the target similarity between the event to be integrated and each topic to be integrated based on one or more of semantic similarity, string graph similarity and question-answer similarity, wherein, the semantic similarity refers to the similarity in semantic features, the string graph similarity refers to the similarity in graph features corresponding to key character strings, and the question-answer similarity refers to the similarity in question-answer features ;
  • a topic determination module configured to determine the target topic to which the event to be integrated belongs from at least two topics to be integrated based on the target similarity
  • the event integration module is configured to integrate the event to be integrated into the target topic to obtain an event context, wherein the event context includes the event to be integrated and at least one topic event.
  • An embodiment of the present application provides an electronic device for event integration, including:
  • the processor is configured to implement the event integration method provided in the embodiment of the present application when executing the executable instructions stored in the memory.
  • the embodiment of the present application provides a computer-readable storage medium, which stores executable instructions for causing a processor to execute the event integration method provided in the embodiment of the present application.
  • An embodiment of the present application provides a computer program product, including a computer program or an instruction, and when the computer program or instruction is executed by a processor, the event integration method provided in the embodiment of the present application is implemented.
  • the embodiments of the present application have at least the following beneficial effects: when determining the target topic to which the event to be integrated belongs among at least two topics to be integrated, it is determined by judging the target similarity between the event to be integrated and each topic to be integrated, That is, the target topic is determined by directly comparing the target similarity between the event to be integrated and each topic to be integrated, and since the target similarity includes one or more of semantic similarity, string graph similarity and question-answer similarity, Therefore, whether each topic to be integrated is the target topic to which the event to be integrated belongs can be accurately determined through the obtained target similarity, and then, when the event to be integrated is integrated into the target topic, the accuracy of event integration can be improved.
  • FIG. 1 is a schematic diagram of an optional architecture of the event integration system provided by the embodiment of the present application.
  • FIG. 2 is a schematic diagram of an exemplary composition and structure of the server in FIG. 1 provided by the embodiment of the present application;
  • FIG. 3 is an optional schematic flow diagram 1 of the event integration method provided by the embodiment of the present application.
  • Fig. 4a is an optional schematic flow diagram II of the event integration method provided by the embodiment of the present application.
  • Fig. 4b is an optional schematic flow diagram III of the event integration method provided by the embodiment of the present application.
  • Fig. 4c is an optional schematic flowchart 4 of the event integration method provided by the embodiment of the present application.
  • Fig. 4d is a schematic flow diagram of obtaining semantic statistical similarity provided by the embodiment of the present application.
  • Fig. 4e is a schematic flow diagram of the semantic statistical similarity model provided by the embodiment of the present application.
  • Fig. 4f is a schematic flow diagram of obtaining the similarity of character string graphs provided by the embodiment of the present application.
  • Fig. 4g is a schematic flow diagram of obtaining the question-and-answer similarity provided by the embodiment of the present application.
  • Fig. 4h is a schematic flow diagram of obtaining at least two topics to be integrated provided by the embodiment of the present application.
  • Fig. 4i is a first schematic flow diagram of obtaining a key character string of a topic event provided by the embodiment of the present application.
  • Fig. 4j is the second schematic flow diagram for obtaining the key character string of the topic event provided by the embodiment of the present application.
  • FIG. 5 is an optional schematic flowchart five of the event integration method provided by the embodiment of the present application.
  • FIG. 6 is a schematic diagram showing an exemplary event context provided by the embodiment of the present application.
  • Fig. 7 is a schematic presentation of another exemplary event context provided by the embodiment of the present application.
  • FIG. 8 is an optional schematic flowchart six of the event integration method provided by the embodiment of the present application.
  • Fig. 9 is a schematic diagram of an exemplary news topic recall provided by an embodiment of the present application.
  • Fig. 10 is an exemplary schematic diagram of determining whether a news topic is related to the latest progress event provided by the embodiment of the present application;
  • Fig. 11a is a schematic diagram of an exemplary model for obtaining vector semantic similarity provided by an embodiment of the present application.
  • Fig. 11b is a schematic diagram of another exemplary model for obtaining vector semantic similarity provided by the embodiment of the present application.
  • Fig. 12 is a schematic diagram of an exemplary keyword map provided by an embodiment of the present application.
  • Fig. 13 is a schematic diagram of another exemplary keyword map provided by the embodiment of the present application.
  • Fig. 14 is a schematic diagram of an exemplary acquisition of semantic similarity of questions and answers provided by an embodiment of the present application.
  • Fig. 15 is a schematic diagram of an exemplary feature importance provided by an embodiment of the present application.
  • first ⁇ second ⁇ third ⁇ fourth ⁇ fifth are only used to distinguish similar objects, and do not represent a specific ordering of objects. Understandably, “first ⁇ second Two ⁇ third ⁇ fourth ⁇ fifth” can be interchanged in specific order or sequential order if allowed, so that the embodiments of the present application described here can be implemented in an order other than those illustrated or described here.
  • AI Artificial Intelligence
  • digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results system. That is, artificial intelligence is a comprehensive technique of computer science used to capture the essence of intelligence and produce a new intelligent machine that can respond in a similar way to human intelligence.
  • artificial intelligence is also used to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • artificial intelligence technology is a comprehensive subject that involves a wide range of fields, including both hardware-level technology and software-level technology.
  • Artificial intelligence basic technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics.
  • Artificial intelligence software technology mainly includes several major directions such as computer vision technology, speech processing technology, natural language processing technology, and machine learning (Machine Learning, ML)/deep learning.
  • Natural language processing is a direction in the field of computer science and artificial intelligence; it refers to the study of various theories and methods that can realize effective communication between humans and computers in natural language. Therefore, natural language processing is a science that integrates linguistics, computer science and mathematics; thus, the research in the field of natural language processing will involve natural language, that is, the language people use every day, so the research of natural language processing and linguistics have a close connection. Natural language processing technologies usually include technologies such as machine reading comprehension (Machine Reading Comprehension, MRC), text processing, semantic understanding, machine translation, robot question answering, and knowledge graphs.
  • Machine Reading Comprehension MRC
  • Machine learning is a multi-field interdisciplinary subject involving probability theory, statistics, approximation theory, convex analysis and algorithm complexity theory; it is used to study how computers simulate or realize human learning behaviors to obtain New knowledge or skills, reorganize the existing knowledge structure so that it can continuously improve its own performance.
  • Machine learning is the core of artificial intelligence and the fundamental way to make computers intelligent.
  • the application of machine learning pervades all fields of artificial intelligence.
  • Machine learning and deep learning usually include artificial neural networks, belief networks, reinforcement learning, migration learning, and inductive learning. Japanese teaching and learning techniques.
  • Machine reading comprehension is a natural language processing task; given an article and a question in the article, it is used to answer the question after reading the article by a machine; among them, the article and the question, namely The sentence sequence to be answered in the embodiment of this application.
  • GCN Graph Convolutional Network
  • the corresponding processing data is the data of the graph structure
  • the graph (graph) is a data format used to represent the key string network
  • the nodes in the graph represent individuals in the network
  • the edges represent the connections between individuals.
  • the vector representation of the first key string graph and the vector representation of the second key string graph can be obtained through a graph convolutional network.
  • Named Entity Recognition also known as entity recognition, entity segmentation and entity extraction, is used to locate and classify named entities in text into predefined categories, such as people, organizations, locations, Time expression, quantity, value resource percentage, etc.; usually, the task of named entity recognition is to identify three categories (entity category, time category and number category) and seven subcategories (person name, organization name, place name, time, date, value resource and percentage) named entities.
  • entities of a preset entity type such as entities of a person name and a place name type, are acquired through named entity recognition.
  • a clustering method is usually used to incrementally cluster the latest progress event and the topic, so as to determine the topic to which the latest progress event belongs according to the cluster center and threshold.
  • the accuracy of event integration is affected, and there is still computational overhead as the number of topics increases. The increased problem, thus, affects the efficiency of event integration.
  • the embodiments of the present application provide an event integration method, device, electronic device, computer-readable storage medium, and computer program product, which can improve the accuracy and efficiency of event integration and reduce the computing resource consumption of event integration.
  • the following describes the exemplary application of the electronic device for event integration (hereinafter referred to as event integration device) provided by the embodiment of the present application.
  • the event integration device provided by the embodiment of the present application can be implemented as a smart phone, a smart watch, a notebook computer, a tablet
  • Various types of terminals such as computers, desktop computers, smart home appliances, set-top boxes, smart car devices, portable music players, personal digital assistants, dedicated messaging devices, smart voice interaction devices, portable game devices, and smart speakers can also be implemented as servers .
  • an exemplary application when the device is implemented as a server will be described.
  • FIG. 1 is a schematic diagram of an optional architecture of the event integration system provided by the embodiment of the present application; as shown in FIG.
  • the terminal 200-1 and the terminal 200-2) are connected to the server 400 (called an event integration device) through the network 300, and the network 300 may be a wide area network or a local area network, or a combination of both.
  • the event integration system 100 also includes a database 500 for providing data support to the server 400 (for example, providing at least two topics to be integrated to the server 400); and, shown in FIG. 1 is that the database 500 is independent of the server 400, in addition, the database 500 may also be integrated in the server 400, which is not limited in this embodiment of the present application.
  • the terminal 200 is configured to obtain the event context from the server 400 through the network 300 and display the event context on a graphical interface.
  • the server 400 is configured to acquire events to be integrated, and acquire at least two topics to be integrated, wherein each topic to be integrated includes at least one topic event; based on one of semantic similarity, string graph similarity, and question-answer similarity or more, determine the target similarity between the event to be integrated and each topic to be integrated, where the semantic similarity refers to the similarity in semantic features, and the string graph similarity refers to the graph features corresponding to key strings Q&A similarity refers to the similarity of Q&A features; based on the target similarity, determine the target topic to which the event to be integrated belongs from at least two topics to be integrated; integrate the event to be integrated into the target topic, and get the event A context, wherein the event context includes an event to be integrated and at least one topic event. It is also used to send the event context to the terminal 200 through the network 300 .
  • the server 400 can be an independent physical server, or a server cluster or a distributed system composed of multiple physical servers, and can also provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, Cloud servers for basic cloud computing services such as network services, cloud communications, middleware services, domain name services, security services, content delivery network (CDN, Content Delivery Network), and big data and artificial intelligence platforms.
  • the terminal 200 may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, a smart voice interaction device, a smart home appliance, a vehicle terminal, etc., but is not limited thereto.
  • the terminal and the server may be connected directly or indirectly through wired or wireless communication, which is not limited in this embodiment of the present application.
  • FIG. 2 is a schematic diagram of an exemplary composition structure of the server in FIG. 1 provided by the embodiment of the present application; the server 400 shown in FIG. 2 includes: at least one processor 410, a memory 450 and at least one network interface 420 ; in some embodiments of the present application, the server 400 further includes a user interface 430 .
  • Various components in the server 400 are coupled together through a bus system 440 .
  • the bus system 440 is used to realize connection and communication among these components.
  • the bus system 440 also includes a power bus, a control bus and a status signal bus. However, for clarity of illustration, the various buses are labeled as bus system 440 in FIG. 2 .
  • Processor 410 can be a kind of integrated circuit chip, has signal processing capability, such as general processor, digital signal processor (DSP, Digital Signal Processor), or other programmable logic device, discrete gate or transistor logic device, discrete hardware Components, etc., wherein the general-purpose processor can be a microprocessor or any conventional processor, etc.
  • DSP digital signal processor
  • DSP Digital Signal Processor
  • User interface 430 includes one or more output devices 431 that enable presentation of media content, including one or more speakers and/or one or more visual displays.
  • the user interface 430 also includes one or more input devices 432, including user interface components that facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, other input buttons and controls.
  • Memory 450 may be removable, non-removable or a combination thereof.
  • Exemplary hardware devices include solid state memory, hard drives, optical drives, and the like.
  • Memory 450 optionally includes one or more storage devices located physically remote from processor 410 .
  • Memory 450 includes volatile memory or nonvolatile memory, and may include both volatile and nonvolatile memory.
  • the non-volatile memory can be a read-only memory (ROM, Read Only Memory), and the volatile memory can be a random access memory (RAM, Random Access Memory).
  • ROM read-only memory
  • RAM random access memory
  • the memory 450 described in the embodiment of the present application is intended to include any suitable type of memory.
  • the memory 450 can store data to support various operations, and examples of these data include programs, modules and data structures or subsets or supersets thereof, which are exemplarily described below.
  • Operating system 451 including system programs for processing various basic system services and performing hardware-related tasks, such as framework layer, core library layer, driver layer, etc., for implementing various basic services and processing hardware-based tasks;
  • Exemplary network interfaces 420 include: Bluetooth, Wireless Compatibility Authentication (Wi-Fi), and Universal Serial Bus (USB, Universal Serial Bus), etc.;
  • Presentation module 453 for enabling presentation of information via one or more output devices 431 (e.g., display screen, speakers, etc.) associated with user interface 430 (e.g., a user interface for operating peripherals and displaying content and information );
  • output devices 431 e.g., display screen, speakers, etc.
  • user interface 430 e.g., a user interface for operating peripherals and displaying content and information
  • the input processing module 454 is configured to detect one or more user inputs or interactions from one or more of the input devices 432 and translate the detected inputs or interactions.
  • the event integration device provided by the embodiment of the present application can be realized by software.
  • FIG. 2 shows the event integration device 455 stored in the memory 450, which can be software in the form of programs and plug-ins. , including the following software modules: information acquisition module 4551, similarity acquisition module 4552, topic determination module 4553, event integration module 4554, model training module 4555, and event display module 4556, these modules are logical, so according to the implemented functions Arbitrary combinations or further splits are possible. The function of each module will be explained below.
  • the event integration device provided in the embodiment of the present application may be implemented in a hardware manner.
  • the event integration device provided in the embodiment of the present application may be a processor in the form of a hardware decoding processor. It is programmed to execute the event integration method provided by the embodiment of the present application.
  • the processor in the form of a hardware decoding processor can adopt one or more application-specific integrated circuits (ASIC, Application Specific Integrated Circuit), DSP, programmable logic Device (PLD, Programmable Logic Device), Complex Programmable Logic Device (CPLD, Complex Programmable Logic Device), Field Programmable Gate Array (FPGA, Field-Programmable Gate Array) or other electronic components.
  • ASIC Application Specific Integrated Circuit
  • DSP digital signal processor
  • PLD programmable logic Device
  • CPLD Complex Programmable Logic Device
  • FPGA Field-Programmable Gate Array
  • the event integration method provided in the embodiment of the application will be described in conjunction with the exemplary application and implementation of the event integration device provided in the embodiment of the application. And, the event integration method provided by the embodiment of the present application is applied to various event integration scenarios such as cloud technology, artificial intelligence, smart transportation, and vehicle.
  • FIG. 3 is an optional schematic flowchart 1 of the event integration method provided by the embodiment of the present application, which will be described in conjunction with the steps shown in FIG. 3 .
  • the event integration device obtains the event to be integrated, and thus obtains the event to be integrated; here, the event to be integrated can be obtained by the event integration device detecting the event, or the event integration device Events to be integrated obtained by receiving events sent by other devices, etc., are not limited in this embodiment of the present application.
  • the event integration device obtains the topics to be integrated, and obtains at least two topics to be integrated.
  • the event to be integrated refers to the event to be integrated, and the event is used to describe the information of what happened, such as news event, highlight event; and the event to be integrated can be the latest progress event, or it can be Historical events, where historical events refer to events that have occurred after the corresponding event time, which is not limited in the embodiment of the present application; in addition, the events to be integrated include at least text information, and may also include audio, video, images, and tables. at least one of .
  • At least one topic to be integrated may be all topics in the database, or may be selected from the database and may be associated with the event to be integrated, etc., and the embodiment of the present application does not limit this; and the topic to be integrated is
  • An event topic is a collection of related events, including at least one topic event, and a topic event is also an event.
  • S302. Determine the target similarity between the event to be integrated and each topic to be integrated based on one or more of semantic similarity, character string graph similarity, and question-answer similarity.
  • the event integration device determines whether each topic to be integrated is a topic to which the event to be integrated belongs by comparing the target similarity between the event to be integrated and each topic to be integrated.
  • the target similarity refers to the possibility that the event to be integrated belongs to each topic to be integrated.
  • the target similarity can be determined from one or more aspects, thus, the target similarity includes one or more of semantic similarity, string graph similarity and question-answer similarity, and the target similarity includes semantic similarity
  • One or more of string graph similarity and question-answer similarity are determined based on selection logic; among them, semantic similarity refers to the similarity in terms of semantic features, and string graph similarity refers to the key string
  • the question-and-answer similarity refers to the similarity in question-and-answer features
  • the selection logic is the basis for the event integration device to select from semantic similarity, string graph similarity, and question-and-answer similarity.
  • the event integration device selects one or more from semantic similarity, string graph similarity and question-answer similarity to obtain the target similarity, including: based on the selection logic, the event integration device selects from semantic similarity, Select one of string graph similarity and question-answer similarity to obtain the target similarity; or, based on the selection logic, the event integration device selects at least two from semantic similarity, string graph similarity and question-answer similarity to obtain the target similarity.
  • the selection logic includes one or more of selection order, acquisition speed, accuracy rate, topic scale, selection quantity, topic type, model training scale, and model applicable scale.
  • the selection order is determined based on the priority of the similarity, and the priority can be determined based on one or both of accuracy and time-consuming;
  • the acquisition speed is the speed of obtaining the similarity, and the acquisition speed can be It is determined based on one or both of the time-consuming feature extraction in the similarity and the feature extraction method (parallel or serial);
  • the accuracy rate is the accuracy of the similarity, which can be based on the characteristics of the features used in the similarity acquisition process Or one or both of the accuracy of the corresponding network model is determined;
  • the topic type is the content form of the topic to be integrated, for example, when the content form is an image form, you can choose string graph similarity and question-answer similarity as target similarity
  • the content is in the form of text, one or more of semantic similarity, string graph similarity, and question-and-answer similarity including semantic similarity
  • the event integration device compares the features of the graph structure corresponding to the events to be integrated with the features of the graph structure corresponding to each topic to be integrated to obtain the similarity of the string graph.
  • the event integration device can construct questions and articles in machine reading comprehension based on the event to be integrated and each topic to be integrated, and determine the answer information through the information interaction between the question and the article, and determine the similarity of the corresponding question and answer based on the answer information.
  • semantic similarity, string graph similarity and question-answer similarity are similarities obtained from different dimensions.
  • the event integration device may determine the topic that best matches the event to be integrated from at least two topics to be integrated based on the target similarity, and determine the topic that best matches the event to be integrated as the event to be integrated.
  • the topic is the target topic; wherein, the target topic refers to the topic to be integrated corresponding to the maximum target similarity.
  • the event integration device obtains at least two corresponding target similarities between the event to be integrated and at least two topics to be integrated by obtaining the target similarity between the event to be integrated and each topic to be integrated; and then , determining a target topic from at least two topics to be integrated based on at least two target similarities, which is not limited in this embodiment of the present application.
  • the target similarity between the target topic and the event to be integrated may be greater than the similarity threshold; thus, when the maximum target similarity is lower than the similarity threshold, the event integration device determines the maximum target similarity The topic of the event is not the topic of the event to be integrated; and when the maximum target similarity is greater than or equal to the similarity threshold, the event integration device will determine the topic corresponding to the maximum target similarity as the topic of the event to be integrated.
  • the event integration device integrates the event to be integrated as a topic event in the target topic into at least one topic event included in the target topic to obtain an event context including the event to be integrated and at least one topic event.
  • the event context refers to the occurrence process of the events described for the target topic.
  • the target topic to which the event to be integrated belongs in at least two topics to be integrated, it is determined by judging the target similarity between the event to be integrated and each topic to be integrated, that is, the target topic is directly
  • the event to be integrated is determined by comparing the target similarity with each topic to be integrated, and since the target similarity includes one or more of semantic similarity, string graph similarity and question-answer similarity, thus, through the obtained
  • the target similarity of can accurately determine whether each topic to be integrated is the target topic to which the event to be integrated belongs, and then, when the event to be integrated is integrated into the target topic, the accuracy of event integration can be improved.
  • the event integration device determines the target topic of the event to be integrated from at least two topics to be integrated by using multiple types of semantic similarity, string graph similarity and question-answer similarity, it is a multi-dimensional heterogeneous Features determine the process of target topics, thus, can improve the accuracy and effectiveness of the obtained target topics, and in turn, can improve the accuracy of event integration.
  • Fig. 4a is an optional flow diagram II of the event integration method provided by the embodiment of the present application; as shown in Fig. 4a, in the embodiment of the present application, when the selection logic includes the selection order, S302 can pass through S3021 to S3024 implementation; that is to say, the event integration device selects one or more of semantic similarity, string graph similarity and question-answer similarity to determine the target similarity between the event to be integrated and each topic to be integrated, including From S3021 to S3024, each step will be described respectively below.
  • the first set number of similarities includes one or more of semantic similarity, character string graph similarity, and question-answer similarity.
  • the event integration device can first select the question-answer similarity and semantic similarity with the highest accuracy, and end the selection if the result can be determined, and then select the character graph similarity if the result cannot be determined; the event integration device can also first select the time-consuming The least question-answer similarity, if the result can be determined, the selection will end, if the result cannot be determined, then choose the similarity from the semantic similarity and string graph similarity.
  • the determined result means that the similarity of the selected target is greater than the first similarity threshold or smaller than the second similarity threshold
  • the undetermined result means that the selected target similarity is less than or equal to the first similarity threshold and greater than the second similarity threshold. is equal to the second similarity threshold; here, the first similarity threshold is greater than the second similarity threshold.
  • the similarity threshold may include a first set number of sub-similarity thresholds, and the first set number of sub-similarity thresholds corresponds to the first set number of similarities one-to-one.
  • the similar result of the event to be integrated and the topic to be integrated means that the event to be integrated is similar or not similar to the topic to be integrated, which is the above determinable result.
  • the undetermined similarity result means that it is impossible to determine whether the event to be integrated is similar or not similar to the topic to be integrated, which is the above-mentioned undetermined result; the remaining similarity is semantic similarity, string graph similarity and question-answer similarity
  • the selection end condition is to determine the similarity between the event to be integrated and the topic to be integrated, or the selection end condition is to select semantic similarity, string graph similarity and question-and-answer similarity.
  • the selected multiple similarities refer to all similarities selected by all selection times.
  • Fig. 4b is an optional flow diagram III of the event integration method provided by the embodiment of the present application; as shown in Fig. 4b, in the embodiment of the present application, when the selection logic includes acquisition speed and topic scale, S302 also It can be realized through S3025 and S3026; that is, the event integration device determines the target between the event to be integrated and each topic to be integrated based on one or more of semantic similarity, string graph similarity and question-answer similarity Similarity, including S3025 and S3026, each step will be described separately below.
  • the event integration device determines that the topic scale is larger than the set scale, it indicates that at least two topics to be integrated have a large scale, and a small number (called the second set number) needs to be used to obtain a faster similarity ( It is called the second set number of similarities selected from the descending order of acquisition speed) to determine the result.
  • the event integration device determines that the topic scale is smaller than or equal to the set scale, it indicates that the scale of at least two topics to be integrated is relatively small, and more (called the third set number) can be used to obtain similar degree (referred to as the third set number of similarities selected from the descending order of acquisition speed) to determine the result; in addition, the second set number is smaller than the third set number.
  • Fig. 4c is an optional flow diagram IV of the event integration method provided by the embodiment of the present application; as shown in Fig. 4c, in the embodiment of the present application, when the target similarity includes semantic similarity, string graph similarity degree and question-answer similarity, S303 can be implemented through S3031 to S3033; that is, the event integration device determines the target topic to which the event to be integrated belongs from at least two topics to be integrated based on the target similarity, including S3031 Up to S3033, each step will be described below.
  • the event integration device determines the weight positively related to the accuracy rate for each of the selected similarities, and thus obtains the weight between the various similarities in the target similarity Matching ratio; wherein, the weight matching ratio represents a ratio between weights corresponding to various similarities, for example, 0.3:0.4:0.3.
  • the event integration device fuses each similarity in the target similarity with the corresponding weight of the similarity. After the fusion of all similarities in the target similarity is completed, The final similarity for judging whether the event to be integrated is similar to the topic to be integrated is obtained; wherein the final similarity for judging whether the event to be integrated is similar to the topic to be integrated is the discrimination similarity.
  • the event integration device can directly determine the topic to be integrated corresponding to the highest discriminant similarity as the target topic to which the event to be integrated belongs; it can also compare the highest discriminative similarity with a threshold, and then determine whether to use the highest The topic to be integrated corresponding to the discriminant similarity degree of , is directly determined as the target topic to which the event to be integrated belongs; etc., which are not limited in this embodiment of the present application.
  • the semantic similarity includes one or both of semantic self-attention similarity and semantic statistical similarity; wherein, the semantic self-attention similarity is based on the self-attention between the event to be integrated and the topic event
  • the semantic statistical similarity is determined based on the target semantics, and the target semantics refers to the semantics corresponding to at least one of the title, key string and event content; that is, the semantic statistical similarity is determined by treating the title
  • the vector semantic features of information such as key strings and texts are obtained by corresponding comparison with the vector semantic features of information such as titles, key strings, and texts of topic events for each topic to be integrated.
  • the semantic self-attention similarity is obtained through the following steps: the event integration device obtains the semantic features to be integrated corresponding to the events to be integrated, and the topic event semantic features corresponding to each topic event in the topic to be integrated; based on the event to be integrated and the topic
  • the distinguishing mark of the event, the semantic feature to be integrated is enhanced to obtain the first enhanced semantic feature, and the semantic feature of the topic event is enhanced based on the distinguishing mark to obtain the second enhanced semantic feature;
  • the first enhanced semantic feature corresponds to the topic to be integrated
  • At least one second enhanced semantic feature forms a semantic feature sequence
  • the semantic self-attention similarity is determined based on self-attention information between two sequence units in the semantic feature sequence.
  • self-attention information between two sequence units refers to the self-attention between the event to be integrated and any topic event.
  • S3021 to S3024 include: the event integration device selects semantic self-attention similarity and question-answer similarity (the first setting A certain amount of similarity), if the semantic self-attention similarity and question-answer similarity are compared with the corresponding sub-similarity thresholds, and it is determined that the event to be integrated is not similar or similar to the topic event, then end; and if the semantic self-attention After the similarity and question-answer similarity are compared with the corresponding sub-similarity thresholds, if the event to be integrated cannot be determined to be dissimilar or similar to the topic event, continue to select semantic statistical similarity and string graph similarity for discrimination.
  • semantic self-attention similarity and question-answer similarity the first setting A certain amount of similarity
  • the descending order of priority can be determined as question-answer similarity, semantic self-attention similarity, string graph similarity and semantic statistical similarity; and, question-answer similarity, semantic self-attention similarity Degree, string graph similarity and semantic statistical similarity are the transition from precision to breadth in turn.
  • Fig. 4d is a schematic flow diagram of obtaining semantic statistical similarity provided by the embodiment of the present application; as shown in Fig. 4d, in the embodiment of the present application, the semantic statistical similarity can be obtained through S30211 to S30214, the following for each The steps are explained separately.
  • each topic to be integrated obtain the first sub-semantic similarity between the title of each topic event and the title of the event to be integrated, and determine the average first sub-semantic similarity based on the first sub-semantic similarity and the maximum first subsemantic similarity.
  • the event integration device determines the semantic statistical similarity from one or more of the first similarity degree, the second similarity degree, the third similarity degree, and the fourth similarity degree; wherein, the first similarity
  • the degree refers to the degree of similarity between the title of the event to be integrated and the title of each topic event in each topic to be integrated
  • the second degree of similarity refers to the key string of the event to be integrated and the title of each topic to be integrated
  • the degree of similarity between the key strings of each topic event, the third degree of similarity refers to the degree of similarity between the key character strings of the event to be integrated and the key character strings of each topic to be integrated
  • the fourth degree of similarity refers to The degree of similarity between the events to be aggregated and each topic to be aggregated.
  • the event integration device obtains the degree of similarity between the title of the topic event and the title of the event to be integrated, and obtains the first sub-semantic similarity (also called first degree of similarity), so that for each topic to be integrated, at least one first sub-semantic similarity corresponding to at least one topic event can be obtained; the event integration device calculates the average value of at least one first sub-semantic similarity, The average first sub-semantic similarity is obtained; the event integration device selects the largest first sub-semantic similarity from at least one first sub-semantic similarity, and thus obtains the largest first sub-semantic similarity.
  • first sub-semantic similarity also called first degree of similarity
  • each topic to be integrated obtain the second sub-semantic similarity between the key string of the topic event corresponding to each topic event and the key string of the event to be integrated corresponding to the event to be integrated, and based on the second sub-semantic Similarity, to determine the average second sub-semantic similarity and the maximum sub-second semantic similarity.
  • the event integration device obtains the second similarity between the topic event key string of the topic event and the event key string of the event to be integrated degree, that is, the second sub-semantic similarity is obtained, so that for each topic to be integrated, at least one second sub-semantic similarity corresponding to at least one topic event can be obtained; Calculate the average value to obtain the average second sub-semantic similarity; the event integration device selects the largest second sub-semantic similarity from at least two first sub-semantic similarities, and obtains the largest second sub-semantic similarity .
  • the key character string of the topic event is the key character string of the topic event;
  • the key character string of the event to be integrated is the key character string of the event to be integrated.
  • the event integration device obtains the third degree of similarity between the key character string of the topic and the key character string of the event to be integrated, and thus obtains the third sub-semantic similarity.
  • S30214 Determine the average first sub-semantic similarity, maximum first sub-semantic similarity, average second sub-semantic similarity, maximum second sub-semantic similarity, and third sub-semantic similarity as semantic statistical similarity.
  • the event integration device can combine the average first sub-semantic similarity, the maximum first sub-semantic similarity, the average second sub-semantic similarity, the maximum second sub-semantic similarity and the third sub-semantic similarity At least one is determined as the semantic statistical similarity between the event to be integrated and each topic to be integrated.
  • the first sub-semantic similarity in S30211, the second sub-semantic similarity in S30212, and the third sub-semantic similarity in S30213 can all be obtained through the semantic statistical similarity model.
  • the statistical similarity model is used to obtain the similarity of text pairs in terms of semantic features; wherein, referring to Fig. 4e, Fig. 4e is a schematic flow chart of the semantic statistical similarity model provided by the embodiment of the present application; as shown in Fig. 4e, in this application
  • the semantic statistical similarity model is obtained through training from S305 to S307, and each step will be described below.
  • the training samples refer to the data samples used to train the semantic statistical similarity model
  • the first character string sample and the second character string sample are text pairs whose similarity in semantic features is to be determined
  • the label similarity is The actual degree of similarity between the first string sample and the second string sample in terms of semantic features.
  • the event integration device initializes the parameters of the model structure, and thus obtains the semantic statistical similarity model to be trained, wherein the semantic statistical similarity model to be trained includes the first semantic branch and the second semantic branch; then , the event integration device uses the first semantic branch to obtain the semantics corresponding to the first string sample, and thus obtains the first estimated semantics; the event integration device uses the second semantic branch to obtain the semantics corresponding to the second string sample, that is A second estimated semantics is obtained.
  • the similarity model in the semantic statistical similarity model to be trained is used to determine the similarity between the first string sample and the second string sample, and the estimated similarity is obtained; here, the similarity model compares the first The predicted semantics and the second predicted semantics, and based on the comparison result between the first predicted semantics and the second predicted semantics, determine the predicted similarity between the first character string sample and the second character string sample.
  • the semantic statistical similarity model to be trained is a model to be trained for obtaining the similarity in the semantic features of text pairs; and the semantic statistical similarity model to be trained adopts a double-tower structure (the first semantic branch and second semantic branch), each semantic branch in the twin-tower structure is used to obtain semantic features, and the parameters in the first semantic branch and the second semantic branch in the twin-tower structure are shared.
  • the semantic statistical similarity model to be trained may also be a pre-trained model.
  • the semantic statistical similarity model to be trained can improve the efficiency of obtaining estimated similarity by using a double-tower structure to obtain semantic features.
  • the event integration device adjusts the parameters in the semantic statistical similarity model to be trained based on the difference between the estimated similarity and the labeled similarity, so as to train the semantic statistical similarity model to be trained; here, the event The integration device adjusts the parameters by performing backpropagation in the semantic statistical similarity model to be trained.
  • the training process of the semantic statistical similarity model to be trained is an iterative training process, and the semantic statistical similarity model to be trained after the training is the semantic statistical similarity model.
  • Fig. 4f is a schematic flow chart for obtaining the similarity of character string graph provided by the embodiment of the present application; as shown in Fig. 4f, in the embodiment of the present application, the similarity of character string graph can be obtained through S30221 to S30223, as follows Each step is explained separately.
  • each topic to be integrated determine each subtopic event key string corresponding to at least one topic event as a graph node, and build an edge between two graph nodes corresponding to the same topic event, and connect the graph node and Edge, determined to obtain the first key string graph.
  • the topic event key string corresponding to each topic event includes one or more subtopic event key strings; here, the event integration device integrates a subtopic
  • the event key string is used as a graph node, and any two graph nodes among all the obtained graph nodes are traversed. If it is determined that the two subtopic event key strings corresponding to the two graph nodes belong to the same topic event, it is two graph nodes.
  • the event integration device builds a graph structure corresponding to the event to be integrated based on the construction method of the first key string graph: the event integration device uses each sub-key string of the event to be integrated in the key string of the event to be integrated as A graph node, build an edge between any two graph nodes, and get the second key string graph.
  • the event integration device obtains the vector representation of the first key string graph, and obtains the vector representation of the second key character string graph; then, the vector representation of the first key string graph and the second key character string Compare the vector representations of string graphs, and determine the character between the event to be integrated and each topic to be integrated based on the graph comparison result between the vector representation of the first key string graph and the vector representation of the second key string graph String graph similarity.
  • the acquisition process of the vector representation of the first key character string graph includes: obtaining (for example, obtaining through the Bert model) the vector representation of the graph node and the vector representation of the edge in the first key character string graph, based on the vector representation and The vector representation of the edge obtains (for example, through a graph convolution model) the vector representation of the first key string graph.
  • the process of obtaining the vector representation of the first key character string graph is similar to the process of obtaining the vector representation of the first key character string graph, and will not be described repeatedly in this embodiment of the present application.
  • Fig. 4g is a schematic flow diagram of obtaining the similarity of question and answer provided by the embodiment of the present application; as shown in Fig. 4g, in the embodiment of the present application, the similarity of question and answer can be obtained through S30231 to S30233, and the following steps are respectively Be explained.
  • the event integration device constructs a question-and-answer statement corresponding to each topic to be integrated and the event to be integrated, and thus obtains a sequence of sentences to be answered.
  • the event integration device combines the title of each topic to be integrated, the topic key character string and the event to be integrated according to the preset sentence pattern of the question and answer statement, and the obtained combination result is the constructed question and answer statement; for example: “to be Whether the "integration event” is the progress of the "title of the topic to be integrated” whose key string is “topic key string”; another example is: whether the next sentence is the progress of the "topic to be integrated” whose key string is "topic key string” Title" progress, "events to integrate”.
  • the event integration device determines the corresponding questions and articles in the sentence sequence to be answered, and performs underlying processing on the determined articles and questions respectively, and converts the text into numbers Coding; then, determine the semantic connection between the article and the question based on the digital code, and combine the results of the semantic analysis of the article to obtain the characteristics of the determined question, and also combine the results of the semantic analysis of the question to obtain the characteristics of the determined article; finally, The event integration result is based on the determined representation information of the question, the determined characteristics of the article, and the type of the answer, and the output answer information is obtained.
  • the answer information refers to whether the event to be integrated belongs to each topic to be integrated, which can be "yes” (the event to be integrated belongs to the topic to be integrated), or "no” (the event to be integrated does not belong to the topic to be integrated).
  • the topic to be integrated may also be the possibility that the event to be integrated belongs to the topic to be integrated, etc., which is not limited in this embodiment of the present application.
  • the event integration device determines the possibility that the event to be integrated belongs to each topic to be integrated, and determines the determined possibility as the question-answer similarity between the event to be integrated and each topic to be integrated .
  • FIG. 4h is a schematic flow diagram of obtaining at least two topics to be integrated provided by the embodiment of the present application; as shown in FIG. 4h, in the embodiment of the present application, the event integration device obtains at least two topics to be integrated in S301, Including S3011 to S3013, each step will be described respectively below.
  • the event integration device can obtain the preset topic library, so that after obtaining the event to be integrated, the event integration device determines the topic to which the event to be integrated belongs from the topic library, and then integrates the event to be integrated into in the topic it belongs to.
  • the event integration device first matches each topic in the topic library with the event to be integrated, and the matching is to match the topic key string corresponding to each topic with the event to be integrated.
  • topic key character string is the key character string of the topic.
  • the topic library includes a plurality of topics, each topic is a theme of an event; and, each topic in the topic library includes at least one topic event, and the topic events included in different topics can be the same or different; and, at least one Topic events refer to events associated with topics that occur in different time periods, so at least one topic event has a time sequence.
  • the matching result of the topic key string corresponding to each topic obtained by the event integration device and the event to be integrated indicates that at least one subtopic key string in the topic key string matches the event to be integrated, it is determined The topic corresponding to the matching result is the topic to be integrated that matches the event to be integrated; and the matching result of the topic key string corresponding to each topic and the event to be integrated, if it indicates that the topic key string does not match the event to be integrated, then determine The topic corresponding to the matching result is not the topic to be integrated that matches the event to be integrated.
  • the event integration device when it obtains multiple matching results of multiple topics corresponding to the event to be integrated, it can obtain at least one topic to be integrated that matches the event to be integrated from the topic database; If there is no topic to be integrated that matches the event to be integrated, a new topic including the event to be integrated will be constructed, and the new topic will be updated to the topic library; in addition, after obtaining at least one topic to be integrated, when at least one topic to be integrated is a When the topic to be integrated, the event integration device can determine whether the topic to be integrated is the target topic by comparing the target similarity with the similarity threshold, or directly determine the topic to be integrated as the target topic.
  • the embodiment of the present application adopts recall-similarity
  • the classification method can quickly realize the integration of events to be integrated into target topics, reduce the time-consuming calculation of the integration process and the number of topics to be integrated, and thus improve the efficiency of event integration.
  • S3011 also includes S3014 to S3016; that is, before the event integration device obtains the matching result of the topic key string corresponding to each topic and the event to be integrated, the event integration method S3014 to S3016 are also included, and each step will be described separately below.
  • the topic key string corresponding to the topic is obtained through the key string of at least one topic event; here, the event integration device first obtains the topic event key string corresponding to each topic event, because each topic includes at least A topic event, thus, at least one topic event corresponds to at least one topic event key string.
  • the event integration device counts the number of topic events corresponding to each subtopic event key string in each topic event key string in each topic event key string of at least one topic event key string, The number of topic events corresponding to each subtopic event key string is obtained, thereby obtaining the number of multiple topic events corresponding to multiple subtopic event key strings under the topic.
  • the event integration device selects a fourth set number (for example, 2) of subtopic event key characters corresponding to the number of subtopic event key strings under the topic with the maximum number of topic events string, and determine the fourth set number of sub-topic event key strings with the largest number of topic events as topic key strings.
  • a fourth set number for example, 2 of subtopic event key characters corresponding to the number of subtopic event key strings under the topic with the maximum number of topic events string
  • Fig. 4i is a schematic flow diagram of obtaining a topic event key character string provided by the embodiment of the present application; as shown in Fig. 4i, in the embodiment of the present application, in S3014, the event integration device obtains the topic corresponding to each topic event
  • the event key character string can be realized through S30141 to S30143, and each step will be described below.
  • the event integration device obtains key strings of topic events from multiple dimensions; one of the dimensions is the entity of the topic event, and the event integration device can obtain preset entity types in advance, such as person name type and place name type;
  • the event integration device performs entity identification on each topic event, selects an entity of a preset entity type from the identified entities, and uses the selected entity of a preset entity type as an entity key string.
  • the event integration device can also obtain key strings of topic events from the dimension of string weight; here, the event integration device analyzes the weight of strings in topic events to obtain strings greater than the weight threshold, And the character string representing the action among the obtained character strings greater than the weight threshold is determined as the action key character string.
  • the event integration device determines the topic event key string based on the entity key string, it may determine all the entity key strings as the topic event key string, or extract a string from the entity key string to Obtain the topic event key string; when the event integration device determines the topic event key string based on the action key string, it can determine all the action key strings as the topic event key string, or extract characters from the action key string string to obtain the key string of the topic event; the event integration device can determine the string obtained in any combination of the entity key string and the action key string as the topic event key string.
  • the event integration device determines the key strings of the topic event based on the character strings associated with the characters, locations and actions in the topic event, and can Improve the accuracy of topic event key strings.
  • Fig. 4j is the second schematic flow diagram for obtaining the key character string of the topic event provided by the embodiment of the present application; as shown in Fig. 4j, in the embodiment of the present application, S30143 can be realized through S301431 to S301433; that is, the event
  • the integration device determines the topic event key string based on one or both of the entity key string and the action key string, including S301431 to S301433, and each step is described below.
  • the event integration device can first determine the topic event key string based on the entity key string; here, when the number of character strings in the key string of each topic event is limited, the event integration device can be based on the entity key character
  • the number of character strings included in the string, the character string selected from the action key string is determined as the topic event key string, and it is also possible to determine whether to use the action key string as a topic event based on the number of character strings included in the entity key string key string.
  • the event integration device determines that the entity The key string is not enough to be the key string of the topic event, and the key string of the action needs to be determined as the key string of the topic event; that is, at this time, the key string of the topic event includes the key string of the entity and the key string of the action.
  • the event integration device when the number of key strings in each topic event is limited and is the fifth set number, the event integration device, when the number of entity key strings is greater than or equal to the fifth set number, It is determined that the character strings in the entity key strings are sufficient as the topic event key strings, and at this time, the topic event key strings include the entity key strings.
  • FIG. 5 is an optional schematic flowchart five of the event integration method provided by the embodiment of the present application; as shown in FIG. 5, in the embodiment of the present application, S304 also includes S308 to S311; that is, the event After the integration device integrates the event to be integrated into the target topic, and obtains the event context including the event to be integrated and at least one topic event, the event integration method further includes S308 to S311, and each step will be described below.
  • search control is used for searching information, thus, the search control can be used for searching topic events.
  • the event integration device when the user triggers the search control to search for information, if the searched information is information associated with the integrated target topic, the event integration device receives the first search operation on the search control; thus At this point, the event integration device presents search results in response to the first search operation.
  • the presented search results may include a simplified event context corresponding to the event context and a presentation control corresponding to the simplified event context.
  • the simplified event context belongs to the event context, and the presentation content is part of the events in the event context; the presentation control is used to present the entire event context, for example, the "View More" button, the expansion icon, etc.
  • each event in the presented event context includes an event title and an event time, and the event is any one of an event to be integrated and at least one topic event.
  • the event integration device when the user triggers the presentation control to view the entire event context, the event integration device also receives the presentation operation on the presentation control; at this time, the event integration device presents the entire event context in response to the presentation operation and, the event integration device realizes the presentation of the event context by presenting the event title and event time of each event in the event context, wherein the event is any one of the event to be integrated and at least one topic event.
  • the presented search results may include search recommendation results, where the search recommendation results refer to recommended information for the integrated target topic, for example, "Are you searching for the title of the integrated target topic?" "; here, when the user performs a trigger operation on the search recommendation result, the event integration device can present the simplified event context corresponding to the event context and the presentation control corresponding to the simplified event context, and respond to the presentation operation acting on the presentation control, presenting The event context; the event context may also be presented directly; this is not limited in this embodiment of the present application.
  • FIG. 6 is a schematic diagram of an exemplary event context presentation provided by the embodiment of the present application; as shown in FIG. Simplified event context 6-11, also presents a presentation control 6-12; when clicking (presentation operation) presentation control 6-12, presents the entire event context 6-21 as shown in area 6-2; here, the presented Each event in the event context is realized by presenting the event title (for example, event title 6-211) and event time (for example, event time 6-212), and the detailed information of the corresponding event is displayed by clicking on the event title 6-211.
  • event title for example, event title 6-211
  • event time for example, event time 6-212
  • FIG. 7 is a schematic diagram showing another exemplary event context provided by the embodiment of the present application.
  • page 7-1 is a page for presenting search results, and other results are presented
  • the search recommendation result 7-11 is presented.
  • the event context 6-21 shown in the area 6-2 in FIG. 6 is presented.
  • the event title or event time is a control that can be triggered, or each event corresponds to a control for viewing details.
  • the event integration device When the user triggers the event title, the event integration device also receives the view function on the event title. Operation; when the user triggers the event time, the event integration device also receives the view operation on the event time; when the user triggers the control for viewing details, the event integration device also receives the action on the control for viewing the details View operation; at this time, the event integration device presents event detailed information in response to the view operation, wherein the event detailed information refers to the detailed description information of the event in the event context, that is, the event content.
  • Fig. 8 is an optional schematic flow diagram six of the event integration method provided by the embodiment of the present application; as shown in Fig. 8, in the embodiment of the present application, S312 to S314 are also included after S304; that is, the event After the integration device integrates the event to be integrated into the target topic, and obtains the event context including the event to be integrated and at least one topic event, the event integration method further includes S312 to S314, and each step will be described below.
  • the target event is any one of the events to be integrated and at least one topic event included in the event context;
  • the last information to be presented refers to the information of the final presentation progress of the target event, for example, the last page of the target event , the bottom area of the target event.
  • the recommendation information presented by the event integration device in the recommendation area is the remaining events, wherein the remaining events are Any event in the event context except the target event may also be the latest event in the event context except the target event.
  • the remaining events may be displayed in the form of search content in the search box, or in the form of links, etc., which is not limited in this embodiment of the present application.
  • the event integration device when the user triggers the viewing operation on the remaining events, the event integration device also receives the second search operation on the remaining events; at this time, the event integration device responds to the second search operation and presents the remaining Details of the event to complete in response to the second search operation.
  • S308 to S314 can be implemented by the server; or the server can send the event context to the terminal, and be implemented by the terminal; the embodiment of the present application does not limit this .
  • the event context can provide gain information other than search words, actively explore relevant reading needs on the premise of meeting search needs, improve the integrity of information presentation in search results pages, and reduce searches that do not obtain target information in search scenarios
  • the number of times, thereby reducing the resource consumption of the search process can also improve the accuracy of the searched information delivery, and increase the frequency of user searches.
  • This exemplary application describes the event context obtained by mounting the latest progress event under the topic, and presents the event context in response to the user's search operation.
  • a topic which is often composed of multiple events that have occurred (called at least one topic event)
  • the event to be integrated after obtaining the latest progress event of a news topic (the event to be integrated ), mount the latest progress event under the news topic (called the target topic) to form an event context containing the latest progress event; by presenting the event context, the development process of the event can be presented intuitively.
  • the event integration method provided by the embodiment of the present application is used to attach the latest progress event to the news topic, it can be realized through two stages of recall and classification, including the following steps.
  • topic library a news topic database
  • each news topic in the news topic database corresponds to a topic keyword (called a topic key character string), and any keyword in the latest progress event and topic keywords (called a subtopic key character string) by the server string) match, determine that the news topic is a news topic in possibly related news topics.
  • a topic key character string any keyword in the latest progress event and topic keywords (called a subtopic key character string) by the server string) match, determine that the news topic is a news topic in possibly related news topics.
  • Figure 9 is a schematic diagram of an exemplary news topic recall provided by the embodiment of this application; Title of progression event 9-1.
  • the news topic 9-21 includes 3 events, and the corresponding topic keywords 9-211 are “nurse” and “vice dean”; the news topic 9-22 includes 4 events, and the corresponding The topic keywords 9-221 are "H place” and “jumping the car”; the news topic 9-23 includes 4 events, and the corresponding topic keywords 9-231 are "Li San” and "the first object”.
  • the news topic 9-23 is a news topic among the recalled possibly related news topics.
  • the two keywords with the largest number of topic events are the topic keywords.
  • the topic keywords of each topic event can be obtained through entity recognition and word weight analysis (called string weight analysis).
  • the server can use the entity recognition model (Char-Word Union CNN, CWCNN) to realize entity recognition, and The entity of the person name type and the place name type in the identified entity is used as the first keyword (called the entity key string); the server can use the classification model (for example, "XGboost" model) to realize the word weight analysis, and put the weight higher than The verb of weight threshold is used as the second keyword (called action key character string); If the number of words of the first keyword is greater than 3 (called the 5th setting quantity), then no longer consider the second keyword, only will The first keyword is used as the keyword of the topic event; if the number of words of the first keyword is less than 3, the first keyword and the second keyword are jointly used as the topic keyword of the topic event.
  • FIG. 10 is an exemplary schematic diagram of determining whether a news topic is related to the latest progress event provided by an embodiment of the present application; as shown in FIG. 10, the server obtains each possibly related news topic from three aspects and The similarities of the latest progress events are vector semantic similarity 10-1 (called semantic similarity), keyword graph similarity 10-2 (string graph similarity) and question-answer semantic similarity 10- 3 (referred to as question-answer similarity); finally, use the fusion model 10-4 (for example, "XGboost" model, "GBDT” model) to synthesize vector semantic similarity 10-1, keyword map similarity 10-2 and question-answer
  • the semantic similarity 10-3 determines the decision score 10-5, so as to determine whether each news topic is related to the latest progress event according to the decision score, and finally obtain the relevant news topic 10-6 (called the target topic), that is, the latest progress event The news topic to which it belongs.
  • Vector semantic similarity 10-1 includes vector semantic statistical similarity (called semantic statistical similarity) and vector semantic self-attention similarity (called semantic self-attention similarity); the acquisition of vector semantic statistical similarity is first Get explained.
  • the server calculates the similarity between the title of each topic event in the news topic and the title of the latest development event, obtains the semantic similarity of the title vector (called the first sub-semantic similarity), and obtains the average title semantic similarity (called the first sub-semantic similarity) The average first sub-semantic similarity) and the maximum title semantic similarity (called the largest first sub-semantic similarity); calculate the keywords of each topic event in the news topic (called the topic event key string) and the latest progress event
  • the degree of similarity between keywords is used to obtain the semantic similarity of event keyword vectors (called the second sub-semantic similarity), and the average event keyword semantic similarity (called The average second sub-semantic similarity) and the maximum event keyword semantic similarity (called the largest sub-second semantic similarity); calculate the similarity between the topic keyword of the news topic and the keyword of the latest progress event, and get the topic Keyword vector semantic similarity (referred to as the third sub-semantic similarity), and obtain the topic
  • the similarity between the title of each topic event in the news topic and the title of the latest progress event is calculated, and the similarity between the keywords of each topic event in the news topic and the keyword of the latest progress event is calculated , and calculating the similarity between the topic keywords of the news topic and the keywords of the latest development events can be realized through the network model (semantic statistical similarity model).
  • Figure 11a is a schematic diagram of an exemplary model for obtaining vector semantic similarity provided by the embodiment of the present application; as shown in Figure 11a, the network model 11-1 is used to obtain the vector The degree of similarity in semantics, and the network model 11-1 is a two-tower structure.
  • the processing process of the network model 11-1 is described with the process of obtaining the semantic similarity of topic keyword vectors: input the topic keyword 11-2 corresponding to the news topic into the first network branch 11 in the network model 11-1 -11 (referred to as the first semantic branch), the semantic vector 11-3 corresponding to the topic keyword 11-2 is obtained, and the keyword 11-4 corresponding to the latest progress event is input to the second network in the network model 11-1 In the branch 11-12 (called the second semantic branch), the semantic vector 11-5 corresponding to the keyword 11-4 is obtained; then the degree of similarity between the semantic vector 11-3 and the semantic vector 11-5 is obtained by cosine similarity , and the topic keyword vector semantic similarity 11-6 is obtained.
  • first network branch and the second network branch can be the same network branch, for example, both are "Bert” models; and the semantic vector 11-3 and the semantic vector 11-5 are the first output of each network branch
  • Figure 11b is a schematic diagram of another exemplary model for obtaining vector semantic similarity provided by the embodiment of the present application; as shown in Figure 11b, the network model 11-
  • the encoding module 11-71 (for example, "Bert" model) in 7 is used to obtain the semantic vector of each event in the event sequence composed of the latest progress event and at least one topic event 11-72 in a news topic, and the obtained The vector feature sequence 11-73 corresponding to the event sequence in turn; here, the server determines the segmentation mark of the latest progress event as 0 (called a distinguishing mark), and determines the splitting mark of a topic event as 1 (called a distinguishing mark).
  • the server obtains the semantic vector corresponding to 0 and the semantic vector corresponding to 1, and then merges the semantic vector corresponding to 0 with the semantic vector of the latest development event (semantic feature to be integrated), and combines the semantic vector corresponding to 1 with the semantic vector of the topic event (topic Event semantic features) are merged, and finally all the merged results are input into the transformation (TRANSFORMER) model 11-74, and the vector semantic self-attention similarity can be obtained.
  • TRANSFORMER transformation
  • the conversion model 11-74 is a network model in natural language processing, and the conversion model 11-74 may be formed by stacking at least one (for example, three) conversion models. And, the conversion model 11-74 calculates the self-attention between two events (called two sequence units) in the event sequence to adaptively determine the relationship between the latest progress event and the topic event, and then automatically judge the latest progress event Topic events to watch for when matching.
  • the server When obtaining the similarity 10-2 of the keyword graph, the server constructs the keyword graph corresponding to each news topic and the keyword graph of the latest progress event, and then obtains the representation of the two keyword graphs through the graph convolutional network, and passes Calculate the cosine distance between the representations of two keyword graphs to obtain the similarity of the keyword graphs.
  • each keyword corresponding to each topic event under each news topic is used as a graph node, if two keywords corresponding to two graph nodes belong to For the same topic event, build an edge between two graph nodes, and finally obtain a keyword graph (called the first key string graph) corresponding to each news topic.
  • Table 1 which includes the title column of the topic event and the keyword column of the topic event
  • Table 1 is as follows.
  • FIG 12 is a schematic diagram of an exemplary keyword graph provided by the embodiment of the present application; as shown in Figure 12, the graph nodes in the keyword graph 12-1 are based on The keywords corresponding to each topic event in Table 1 are determined, including: first object, Li San, management, blockade, first organization, first department, prosecution, plan, second department, third department, ban, third department The third organization, the second object, Zhang Er and the third object; wherein, the edges between the nodes of each graph are shown in Figure 12.
  • FIG. 13 is a schematic diagram of another exemplary keyword graph provided by the embodiment of the present application; as shown in FIG. 13, the keyword graph 13-1 (called the second key character string graph)
  • the graph nodes in are the keywords of the latest progress event: the first department, Zhang Er and the first object; and any two of the first department, Zhang Er and the first object have edges.
  • the server When obtaining the semantic similarity 10-3 of the question and answer, the server builds a sentence sequence (called the sentence sequence to be answered) based on the title and keywords of the information topic and the latest progress event; and through the network model (for example, "MRC-Bert "model) to obtain the output of the sentence sequence, and determine the similarity of the question-and-answer semantics based on the first-dimensional feature of the output (called answer information).
  • the network model for example, "MRC-Bert "model
  • FIG. 14 is a schematic diagram of an exemplary acquisition of semantic similarity of questions and answers provided by an embodiment of the present application; as shown in FIG. 14, a sentence sequence 14-1 is input into a network model 14-2 (for example, " MRC-Bert” model) to obtain the answer information 14-3, and then determine the question-answer semantic similarity 10-3 based on the answer information 14-3.
  • a network model 14-2 for example, " MRC-Bert” model
  • CLS in the sentence sequence 14-1 indicates the start of the sentence sequence
  • SEP represents the segmentation between sentences
  • the sentence sequence 14-1 also includes questions and latest progress events constructed based on news topics.
  • the network model 11-1, the network model 11-7, the network model for obtaining the similarity of string graphs, and the network model 14-2 can be trained on more than 2,000 topics during model training.
  • Select and construct 50,000 topic-event sample pairs For example, events that have been audited online (corresponding to similarity) and not online (corresponding to dissimilarity) can be selected from all news topics that have been launched and operated after a point in time. The online event is used as a positive sample, and the non-online event is used as a negative sample.
  • Figure 15 is a schematic diagram of an exemplary feature importance provided by the embodiment of the present application; as shown in Figure 15, the ordinate indicates the importance index, and the descending order of the importance index is as follows: question and answer Semantic similarity 10-3, vector semantic self-attention similarity 15-1, maximum title semantic similarity 15-2, keyword map similarity 10-2, maximum event keyword semantic similarity 15-3, average The title semantic similarity is 15-4 and the topic keyword vector semantic similarity is 15-5; among them, the maximum title semantic similarity is 15-2, the maximum event keyword semantic similarity is 15-3, the average event keyword semantic similarity, the average The title semantic similarity 15-4 and the topic keyword vector semantic similarity 15-5 together constitute the vector semantic statistical similarity in the vector semantic similarity 10-1 in FIG. 10 .
  • the AUC is reduced by 0.0098.
  • the similarity of the keyword map is removed by 10
  • the AUC is reduced by 0.0081, and when the similarity of question and answer semantics is 10-3, the AUC is reduced by 0.0152; therefore, it shows that the similarity of the eight dimensions contributes to the determination of the result when the event is integrated, corresponding to the importance index The results are consistent.
  • AUC refers to the area enclosed by the ROC curve and the coordinate axis, which is a performance indicator.
  • selection may also be made based on the accuracy and time-consuming of each similarity to determine whether the latest progress event matches the topic event.
  • Table 3 describes the time consumption corresponding to the network model 11-1, the network model 11-7, the network model for obtaining the similarity of the string graph, and the network model 14-2.
  • the descending sequence based on time consumption is: network model 11-1, network model for obtaining string graph similarity, network model 11-7, and network model 14-2.
  • the vector semantic self-attention similarity 15-1 is obtained through the network model 11-7
  • the maximum semantic similarity of the title is 15-2
  • the maximum Event keyword semantic similarity 15-3 is obtained through the network model 11-7
  • the server uses the network model 11-7 and the network model 14-2 with the fastest speed and highest accuracy as the initial calculation scheme
  • S 1 is the similarity degree output by the network model 11-7
  • S 2 is the similarity degree output by the network model 14-2.
  • the embodiment of the present application integrates the information of multiple events of the topic by adopting multi-dimensional heterogeneous features; and adopts three kinds of heterogeneous features based on vector semantics, features based on keyword graphs and features based on question-and-answer semantics
  • the feature model can improve the accuracy and rationality of the similarity calculation.
  • automatic batch event integration can be realized without manual participation, and the efficiency of event integration can be improved.
  • the software modules stored in the event integration device 455 of the memory 450 may include :
  • the information acquisition module 4551 is configured to acquire events to be integrated, and acquire at least two topics to be integrated, wherein each of the topics to be integrated includes at least one topic event;
  • the similarity acquisition module 4552 is configured to determine the target similarity between the event to be integrated and each topic to be integrated based on one or more of semantic similarity, string graph similarity and question-answer similarity , wherein the semantic similarity refers to the similarity in semantic features, the string graph similarity refers to the similarity in graph features corresponding to key strings, and the question-answer similarity refers to the similarity in question-answer features Spend;
  • Topic determination module 4553 configured to determine the target topic to which the event to be integrated belongs from at least two topics to be integrated based on the target similarity
  • the event integration module 4554 is configured to integrate the event to be integrated into the target topic to obtain an event context, wherein the event context includes the event to be integrated and at least one topic event.
  • one or more of the semantic similarity, the string graph similarity, and the question-answer similarity are selected through selection logic; the selection logic includes selection order, One or more of acquisition speed, accuracy rate, topic scale, selection quantity, topic type, model training scale, model application range and model application scale, wherein the selection order is determined based on the priority of similarity,
  • the acquisition speed is the speed of obtaining the similarity
  • the accuracy is the accuracy of the similarity
  • the topic type is the content form of the topic to be integrated
  • the topic scale is at least two of the topics to be integrated scale
  • the model training scale is the scale of training data corresponding to the network model used to obtain each type of similarity.
  • the similarity acquisition module 4552 is further configured to, based on the selection order, select from the semantic similarity, the string graph similarity In the descending order of the priority of the similarity of the question and answer, the similarity of the first set number is selected in turn; the comparison result of the similarity of the first set number and the similarity threshold is obtained; when the comparison result is When the event to be integrated is similar to the topic to be integrated, the first set amount of similarity is determined as the target similarity between the event to be integrated and the topic to be integrated; When the comparison result is a pending similarity result between the event to be integrated and the topic to be integrated, select the remaining similarities based on the selection order, until the selection end condition is met, the selected multiple similarities Determined as the target similarity, wherein the remaining similarity is the semantic similarity, the string graph similarity and the question-answer similarity, except for the first set number of similarities
  • the similarity of the selection end condition is to determine the similarity between the event to be
  • the similarity acquisition module 4552 is further configured to, when the topic scale is larger than the set scale, from the semantic In the descending order of the acquisition speed of the similarity, the similarity of the string graph and the similarity of the question and answer, the second set number of similarities are selected in turn to obtain the relationship between the event to be integrated and the topic to be integrated.
  • the topic determination module 4553 is further configured to The accuracy rate determines the weight ratio of various similarities in the target similarity; based on the weight ratio, fuses the various similarities in the target similarity to obtain a discrimination similarity; from at least two Among the topics to be integrated, the topic to be integrated corresponding to the highest discriminant similarity is selected to obtain the target topic to which the event to be integrated belongs.
  • the semantic similarity includes at least one of semantic self-attention similarity and semantic statistical similarity, wherein the semantic self-attention similarity is based on the relationship between the event to be integrated and the topic event Self-attention is determined, the semantic statistical similarity is determined based on target semantics, and the target semantics refers to semantics corresponding to at least one of titles, key character strings, and event contents.
  • the semantic similarity includes semantic self-attention similarity
  • the similarity acquisition module 4552 is also configured to acquire the semantic feature to be integrated corresponding to the event to be integrated, and the topic to be integrated
  • the topic event semantic feature corresponding to each of the topic events based on the distinction between the event to be integrated and the topic event, the semantic feature to be integrated is enhanced to obtain the first enhanced semantic feature, and based on the Enhancing the semantic feature of the topic event by distinguishing the mark to obtain a second enhanced semantic feature; combining the first enhanced semantic feature and at least one second enhanced semantic feature corresponding to the topic to be integrated into a semantic feature sequence, and
  • the semantic self-attention similarity is determined based on the self-attention information between two sequence units in the semantic feature sequence.
  • the similarity acquisition module 4552 is further configured to acquire the first number between the title of each topic event and the title of the event to be integrated in each of the topics to be integrated. sub-semantic similarity, and based on the first sub-semantic similarity, determine the average first sub-semantic similarity and the maximum first sub-semantic similarity; in each of the topics to be integrated, obtain each of the topic events The second sub-semantic similarity between the corresponding topic event key string and the event to be integrated corresponding to the event to be integrated, and based on the second sub-semantic similarity, determine the average second sub-semantic similarity and the second maximum sub-semantic similarity; obtain the third sub-semantic similarity between the topic key string corresponding to each topic to be integrated and the event key string to be integrated; the average first sub-semantic similarity, the maximum first sub-semantic similarity, the average second sub-semantic similarity, the maximum
  • the first sub-semantic similarity, the second sub-semantic similarity and the third sub-semantic similarity are obtained through a semantic statistical similarity model;
  • the event integration device 455 Also includes a model training module 4555 configured to obtain training samples, wherein the training samples include the first character string sample, the second character string sample and label similarity; adopt the first semantic branch in the semantic statistical similarity model to be trained , obtaining the first predicted semantics corresponding to the first character string sample, using the second semantic branch in the semantic statistical similarity model to be trained to obtain the second predicted semantics corresponding to the second character string sample, and Based on a comparison result between the first predicted semantics and the second predicted semantics, determine an estimated similarity between the first character string sample and the second character string sample; based on the predicted The difference between the estimated similarity and the label similarity is backpropagated in the semantic statistical similarity model to be trained to obtain the semantic statistical similarity model.
  • the similarity acquisition module 4552 is further configured to, in each of the topics to be integrated, determine each subtopic event key string corresponding to at least one topic event as a graph node, and Establishing an edge between two graph nodes corresponding to the same topic event, determining the graph node and the edge as the first key character string graph; based on the event to be integrated corresponding to the event to be integrated Key strings, constructing a second key string graph; comparing the vector representation of the first key string graph with the vector representation of the second key string graph to obtain a graph comparison result, and based on the graph comparison As a result, the string graph similarity is determined.
  • the similarity acquisition module 4552 is further configured to combine the sequence of sentences to be answered based on the title of each topic to be integrated, the topic key character string and the event to be integrated; obtain the to-be-integrated Answering the answer information of the sentence sequence; based on the answer information, determining the question-answer similarity.
  • the information acquisition module 4551 is further configured to obtain the matching result of the topic key character string corresponding to each topic and the event to be integrated in the topic database, wherein the topic database includes multiple the topic; based on the matching result, when it is determined that at least one subtopic key string in the topic key string matches the event to be integrated, the topic corresponding to the matching result is determined to be the same as The topic to be integrated matched with the event to be integrated; at least two topics to be integrated matched with the event to be integrated are acquired from the topic database.
  • the information obtaining module 4551 is further configured to obtain a topic event corresponding to each topic event from at least one topic event corresponding to each topic in the topic library Key character string; count the number of topic events corresponding to each subtopic event key string in the topic event key string; combine the fourth set number of the subtopic event key strings with the largest number of topic events is the topic key string corresponding to each topic.
  • the information acquisition module 4551 is further configured to perform entity recognition on each of the topic events to obtain an entity key character string corresponding to a preset entity type; character strings for each of the topic events String weight analysis to obtain an action key string; based on one or both of the entity key string and the action key string, determine the topic event key string.
  • the information acquisition module 4551 is further configured to acquire the number of entity key strings corresponding to the entity key string; when the number of entity key strings is less than the fifth set number, the The entity key character string and the action key character string are combined into the topic event key character string; when the entity key character string is greater than or equal to the fifth set quantity, the entity key character string is determined Key string for the topic event.
  • the event integration device 455 further includes an event presentation module 4556 configured to present a search control; in response to the first search operation acting on the search control, present a simplified event corresponding to the event context context, and a presentation control corresponding to the simplified event context, wherein the simplified event context belongs to the event context, and the presentation control is used to present the event context; in response to a presentation operation acting on the presentation control , presenting the event context, wherein each event in the presented event context includes an event title and an event time, and the event is any one of the event to be integrated and at least one of the topic events; in response A view operation acting on the event title or the event time presents event detailed information.
  • an event presentation module 4556 configured to present a search control; in response to the first search operation acting on the search control, present a simplified event corresponding to the event context context, and a presentation control corresponding to the simplified event context, wherein the simplified event context belongs to the event context, and the presentation control is used to present the event context; in response to a presentation operation acting on the presentation control
  • the event presentation module 4556 is further configured to present the last information to be presented of the target event, wherein the target event is the event to be integrated included in the event context and at least one of the Any event in the topic event; in the recommendation area corresponding to the last information to be presented, the remaining events in the event context associated with the target event are presented, wherein the remaining events are the remaining events in the event context except Any event other than the target event; in response to a second search operation on the remaining events, presenting detailed information of the remaining events.
  • An embodiment of the present application provides a computer program product or computer program, where the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the electronic device reads the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the electronic device executes the above-mentioned event integration method in the embodiment of the present application.
  • the embodiment of the present application provides a computer-readable storage medium storing executable instructions, wherein the executable instructions are stored, and when the executable instructions are executed by the processor, the processor will be caused to execute the event integration method provided in the embodiment of the present application , for example, the event integration method shown in FIG. 3 .
  • the computer-readable storage medium may be a memory such as FRAM, ROM, PROM, EPROM, EEPROM, flash memory, magnetic surface memory, optical disk, or CD-ROM; Various devices in any combination.
  • executable instructions may take the form of programs, software, software modules, scripts, or codes written in any form of programming language (including compiled or interpreted languages, or declarative or procedural languages) , and it may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
  • executable instructions may, but do not necessarily correspond to files in a file system, may be stored as part of a file that holds other programs or data, for example, in a Hyper Text Markup Language (HTML) document in one or more scripts, in a single file dedicated to the program in question, or in multiple cooperating files (for example, files that store one or more modules, subroutines, or sections of code).
  • HTML Hyper Text Markup Language
  • executable instructions may be deployed to execute on one electronic device (in which case, the one electronic device is the event integration device), or to execute on multiple electronic devices at one location (in which case, the Multiple electronic devices are event integration devices), or executed on multiple electronic devices distributed at multiple locations and interconnected through a communication network (at this time, multiple Electronic devices are event integration devices).
  • the target topic to which the event to be integrated belongs among at least two topics to be integrated, it is determined by judging the target similarity between the event to be integrated and each topic to be integrated , that is, the target topic is determined by directly comparing the target similarity between the event to be integrated and each topic to be integrated, and since the target similarity includes one or more of semantic similarity, string graph similarity and question-answer similarity , thus, through the obtained target similarity, it can be accurately determined whether each topic to be integrated is the target topic to which the event to be integrated belongs, and then, when the event to be integrated is integrated into the target topic, the accuracy of event integration can be improved.
  • the efficiency of event integration can be improved by first recalling and then obtaining the similarity.

Abstract

An event integration method and apparatus, and an electronic device, a computer-readable storage medium and a computer program product, which are applied to various event integration scenarios such as cloud technology, artificial intelligence, smart transportation and vehicle mounting. The event integration method comprises: acquiring an event to be integrated and at least two topics to be integrated, wherein each topic to be integrated comprises at least one topic event; on the basis of at least one of a semantic similarity, a character string graph similarity and a question-answer similarity, determining a target similarity between the event to be integrated and each topic to be integrated, wherein the semantic similarity refers to the similarity of semantic features, the character string graph similarity refers to the similarity of graph features which correspond to a key character string, and the question-answer similarity refers to the similarity of question-answer features; determining, from the at least two topics to be integrated and on the basis of the target similarity, a target topic to which the event to be integrated belongs; and integrating, into the target topic, the event to be integrated, so as to obtain event context, wherein the event context comprises the event to be integrated and at least one topic event.

Description

一种事件整合方法、装置、电子设备、计算机可读存储介质及计算机程序产品An event integration method, device, electronic device, computer-readable storage medium, and computer program product
相关申请的交叉引用Cross References to Related Applications
本申请基于申请号为202111111428.X、申请日为2021年09月18日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。This application is based on a Chinese patent application with application number 202111111428.X and a filing date of September 18, 2021, and claims the priority of this Chinese patent application. The entire content of this Chinese patent application is hereby incorporated by reference into this application.
技术领域technical field
本申请涉及计算机应用领域中的信息处理技术,尤其涉及一种事件整合方法、装置、电子设备、计算机可读存储介质及计算机程序产品。The present application relates to information processing technology in the field of computer application, and in particular to an event integration method, device, electronic equipment, computer-readable storage medium and computer program product.
背景技术Background technique
对于延续时间较长(大于时长阈值)的话题(往往由多个已发生的事件组成),当获取到最新进展事件时,需要把最新进展事件整合到对应话题下,形成包含最新进展事件的事件脉络,使得用户能够通过事件脉络,直观的了解事件发展的过程。For topics with a long duration (greater than the duration threshold) (often composed of multiple events that have occurred), when the latest progress event is obtained, the latest progress event needs to be integrated into the corresponding topic to form an event containing the latest progress event The context enables users to intuitively understand the process of event development through the event context.
一般来说,为了将最新进展事件整合到话题下,通常采用聚类的方式,即将最新进展事件与话题进行增量聚类,以根据聚类中心和阈值确定最新进展事件所属的话题。然而,上述通过增量聚类的方式进行事件整合时,由于聚类的准确率较低,从而,影响了确定的所属话题的准确率,进而,当将最新进展事件整合至所属话题时,影响了事件整合的准确率。Generally speaking, in order to integrate the latest progress event into the topic, clustering is usually used, that is, the latest progress event and the topic are incrementally clustered, so as to determine the topic to which the latest progress event belongs according to the cluster center and threshold. However, when events are integrated through incremental clustering, the accuracy rate of the clustering is low, which affects the accuracy of the determined topic, and then, when the latest progress event is integrated into the topic, it affects The accuracy of event integration is improved.
发明内容Contents of the invention
本申请实施例提供一种事件整合方法、装置、电子设备、计算机可读存储介质及计算机程序产品,能够提升事件整合的准确率。Embodiments of the present application provide an event integration method, device, electronic device, computer-readable storage medium, and computer program product, which can improve the accuracy of event integration.
本申请实施例的技术方案是这样实现的:The technical scheme of the embodiment of the application is realized in this way:
本申请实施例提供一种事件整合方法,包括:The embodiment of this application provides an event integration method, including:
获取待整合事件,并获取至少两个待整合话题,其中,每个所述待整合话题包括至少一个话题事件;Obtaining events to be integrated, and acquiring at least two topics to be integrated, wherein each topic to be integrated includes at least one topic event;
基于语义相似度、字符串图相似度和问答相似度中的一种或多种,确定所述待整合事件与每个所述待整合话题之间的目标相似度,其中,所述语义相似度是指语义特征方面的相似度,所述字符串图相似度是指关键字符串对应的图特征方面的相似度,所述问答相似度是指问答特征方面的相似度;Based on one or more of semantic similarity, string graph similarity and question-answer similarity, determine the target similarity between the event to be integrated and each topic to be integrated, wherein the semantic similarity Refers to the similarity in terms of semantic features, the similarity in character string graphs refers to the similarity in graph features corresponding to key character strings, and the question-answer similarity refers to the similarity in question-and-answer features;
基于所述目标相似度,从至少两个所述待整合话题中确定所述待整合事件所属的目标话题;Based on the target similarity, determining a target topic to which the event to be integrated belongs from at least two topics to be integrated;
将所述待整合事件整合至所述目标话题中,得到事件脉络,其中,所述事件脉络包括所述待整合事件和至少一个所述话题事件。Integrating the event to be integrated into the target topic to obtain an event context, wherein the event context includes the event to be integrated and at least one topic event.
本申请实施例提供一种事件整合装置,包括:An embodiment of the present application provides an event integration device, including:
信息获取模块,配置为获取待整合事件,并获取至少两个待整合话题,其中,每个所述待整合话题包括至少一个话题事件;An information acquisition module configured to acquire events to be integrated, and acquire at least two topics to be integrated, wherein each of the topics to be integrated includes at least one topic event;
相似度获取模块,配置为基于语义相似度、字符串图相似度和问答相似度中的一种或多种,确定所述待整合事件与每个所述待整合话题之间的目标相似度,其中,所述语义相似度是指语义特征方面的相似度,所述字符串图相似度是指关键字符串对应的图特征方面的相似度,所述问答相似度是指问答特征方面的相似度;The similarity acquisition module is configured to determine the target similarity between the event to be integrated and each topic to be integrated based on one or more of semantic similarity, string graph similarity and question-answer similarity, Wherein, the semantic similarity refers to the similarity in semantic features, the string graph similarity refers to the similarity in graph features corresponding to key character strings, and the question-answer similarity refers to the similarity in question-answer features ;
话题确定模块,配置为基于所述目标相似度,从至少两个所述待整合话题中确定所述待整合事件所属的目标话题;A topic determination module configured to determine the target topic to which the event to be integrated belongs from at least two topics to be integrated based on the target similarity;
事件整合模块,配置为将所述待整合事件整合至所述目标话题中,得到事件脉络,其中,所述事件脉络包括所述待整合事件和至少一个所述话题事件。The event integration module is configured to integrate the event to be integrated into the target topic to obtain an event context, wherein the event context includes the event to be integrated and at least one topic event.
本申请实施例提供一种用于事件整合的电子设备,包括:An embodiment of the present application provides an electronic device for event integration, including:
存储器,用于存储可执行指令;memory for storing executable instructions;
处理器,用于执行所述存储器中存储的可执行指令时,实现本申请实施例提供的事件整合方法。The processor is configured to implement the event integration method provided in the embodiment of the present application when executing the executable instructions stored in the memory.
本申请实施例提供一种计算机可读存储介质,存储有可执行指令,用于引起处理器执行时,实现本申请实施例提供的事件整合方法。The embodiment of the present application provides a computer-readable storage medium, which stores executable instructions for causing a processor to execute the event integration method provided in the embodiment of the present application.
本申请实施例提供一种计算机程序产品,包括计算机程序或指令,所述计算机程序或指令被处理器执行时实现本申请实施例提供的事件整合方法。An embodiment of the present application provides a computer program product, including a computer program or an instruction, and when the computer program or instruction is executed by a processor, the event integration method provided in the embodiment of the present application is implemented.
本申请实施例至少具有以下有益效果:由于在至少两个待整合话题中确定待整合事件所属的目标话题时,是通过判断待整合事件与每个待整合话题之间的目标相似度确定的,即目标话题是直接将待整合事件与每个待整合话题进行目标相似度对比确定的,又由于目标相似度包括语义相似度、字符串图相似度和问答相似度中的一种或多种,从而,通过所获得的目标相似度能够准确地确定每个待整合话题是否是待整合事件所属的目标话题,进而,当将待整合事件整合至目标话题时,能够提升事件整合的准确率。The embodiments of the present application have at least the following beneficial effects: when determining the target topic to which the event to be integrated belongs among at least two topics to be integrated, it is determined by judging the target similarity between the event to be integrated and each topic to be integrated, That is, the target topic is determined by directly comparing the target similarity between the event to be integrated and each topic to be integrated, and since the target similarity includes one or more of semantic similarity, string graph similarity and question-answer similarity, Therefore, whether each topic to be integrated is the target topic to which the event to be integrated belongs can be accurately determined through the obtained target similarity, and then, when the event to be integrated is integrated into the target topic, the accuracy of event integration can be improved.
附图说明Description of drawings
图1是本申请实施例提供的事件整合系统的一个可选的架构示意图;FIG. 1 is a schematic diagram of an optional architecture of the event integration system provided by the embodiment of the present application;
图2是本申请实施例提供的图1中的服务器的一种示例性的组成结构示意图;FIG. 2 is a schematic diagram of an exemplary composition and structure of the server in FIG. 1 provided by the embodiment of the present application;
图3是本申请实施例提供的事件整合方法的可选的流程示意图一;FIG. 3 is an optional schematic flow diagram 1 of the event integration method provided by the embodiment of the present application;
图4a是本申请实施例提供的事件整合方法的可选的流程示意图二;Fig. 4a is an optional schematic flow diagram II of the event integration method provided by the embodiment of the present application;
图4b是本申请实施例提供的事件整合方法的可选的流程示意图三;Fig. 4b is an optional schematic flow diagram III of the event integration method provided by the embodiment of the present application;
图4c是本申请实施例提供的事件整合方法的可选的流程示意图四;Fig. 4c is an optional schematic flowchart 4 of the event integration method provided by the embodiment of the present application;
图4d是本申请实施例提供的获取语义统计相似度的流程示意图;Fig. 4d is a schematic flow diagram of obtaining semantic statistical similarity provided by the embodiment of the present application;
图4e是本申请实施例提供的语义统计相似度模型的流程示意图;Fig. 4e is a schematic flow diagram of the semantic statistical similarity model provided by the embodiment of the present application;
图4f是本申请实施例提供的获取字符串图相似度的流程示意图;Fig. 4f is a schematic flow diagram of obtaining the similarity of character string graphs provided by the embodiment of the present application;
图4g是本申请实施例提供的获取问答相似度的流程示意图;Fig. 4g is a schematic flow diagram of obtaining the question-and-answer similarity provided by the embodiment of the present application;
图4h是本申请实施例提供的获取至少两个待整合话题的流程示意图;Fig. 4h is a schematic flow diagram of obtaining at least two topics to be integrated provided by the embodiment of the present application;
图4i是本申请实施例提供的获取话题事件关键字符串的流程示意图一;Fig. 4i is a first schematic flow diagram of obtaining a key character string of a topic event provided by the embodiment of the present application;
图4j是本申请实施例提供的获取话题事件关键字符串的流程示意图二;Fig. 4j is the second schematic flow diagram for obtaining the key character string of the topic event provided by the embodiment of the present application;
图5是本申请实施例提供的事件整合方法的可选的流程示意图五;FIG. 5 is an optional schematic flowchart five of the event integration method provided by the embodiment of the present application;
图6是本申请实施例提供的一种示例性的事件脉络的呈现示意图;FIG. 6 is a schematic diagram showing an exemplary event context provided by the embodiment of the present application;
图7是本申请实施例提供的另一种示例性的事件脉络的呈现示意图;Fig. 7 is a schematic presentation of another exemplary event context provided by the embodiment of the present application;
图8是本申请实施例提供的事件整合方法的可选的流程示意图六;FIG. 8 is an optional schematic flowchart six of the event integration method provided by the embodiment of the present application;
图9是本申请实施例提供的一种示例性的新闻话题召回示意图;Fig. 9 is a schematic diagram of an exemplary news topic recall provided by an embodiment of the present application;
图10是本申请实施例提供的一种示例性的确定新闻话题与最新进展事件是否相关的示意图;Fig. 10 is an exemplary schematic diagram of determining whether a news topic is related to the latest progress event provided by the embodiment of the present application;
图11a是本申请实施例提供的一种示例性的获取向量语义相似度的模型的示意图;Fig. 11a is a schematic diagram of an exemplary model for obtaining vector semantic similarity provided by an embodiment of the present application;
图11b是本申请实施例提供的另一种示例性的获取向量语义相似度的模型的示意图;Fig. 11b is a schematic diagram of another exemplary model for obtaining vector semantic similarity provided by the embodiment of the present application;
图12是本申请实施例提供的一种示例性的关键词图的示意图;Fig. 12 is a schematic diagram of an exemplary keyword map provided by an embodiment of the present application;
图13是本申请实施例提供的另一种示例性的关键词图的示意图;Fig. 13 is a schematic diagram of another exemplary keyword map provided by the embodiment of the present application;
图14是本申请实施例提供的一种示例性的获取问答语义的相似度的示意图;Fig. 14 is a schematic diagram of an exemplary acquisition of semantic similarity of questions and answers provided by an embodiment of the present application;
图15是本申请实施例提供的一种示例性的特征重要性的示意图。Fig. 15 is a schematic diagram of an exemplary feature importance provided by an embodiment of the present application.
具体实施方式Detailed ways
为了使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请作进一步地详细描述,所描述的实施例不应视为对本申请的限制,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本申请保护的范围。In order to make the purpose, technical solutions and advantages of the application clearer, the application will be further described in detail below in conjunction with the accompanying drawings. All other embodiments obtained under the premise of creative labor belong to the scope of protection of this application.
在以下的描述中,涉及到“一些实施例”,其描述了所有可能实施例的子集,但是可以理解,“一些实施例”可以是所有可能实施例的相同子集或不同子集,并且可以在不冲突的情况下相互结合。In the following description, references to "some embodiments" describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or a different subset of all possible embodiments, and Can be combined with each other without conflict.
在以下的描述中,所涉及的术语“第一\第二\第三\第四\第五”仅仅是区别类似的对象,不代表针对对象的特定排序,可以理解地,“第一\第二\第三\第四\第五”在允许的情况下可以互换特定的顺序或先后次序,以使这里描述的本申请实施例能够以除了在这里图示或描述的以外的顺序实施。In the following description, the terms "first\second\third\fourth\fifth" are only used to distinguish similar objects, and do not represent a specific ordering of objects. Understandably, "first\second Two\third\fourth\fifth" can be interchanged in specific order or sequential order if allowed, so that the embodiments of the present application described here can be implemented in an order other than those illustrated or described here.
除非另有定义,本申请实施例所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本申请实施例中所使用的术语只是为了描述本申请实施例的目的,不是旨在限制本申请。Unless otherwise defined, all technical and scientific terms used in the embodiments of the present application have the same meaning as commonly understood by those skilled in the technical field of the present application. The terms used in the embodiments of the present application are only for the purpose of describing the embodiments of the present application, and are not intended to limit the present application.
对本申请实施例进行进一步详细说明之前,对本申请实施例中涉及的名词和术语进行说明,本申请实施例中涉及的名词和术语适用于如下的解释。Before further describing the embodiments of the present application in detail, the nouns and terms involved in the embodiments of the present application are described, and the nouns and terms involved in the embodiments of the present application are applicable to the following explanations.
1)人工智能(Artificial Intelligence,AI),是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境,获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。也就是说,人工智能是计算机科学的一个综合技术,用于获取智能的实质,并生产出一种新的能以人类智能相似的方式做出反应的智能机器。另外,人工智能还用于研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。此外,人工智能技术是一门综合学科,涉及领域广泛,既有硬件层面的技术也有软件层面的技术。人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统和机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、语音处理技术、自然语言处理技术以及机器学习(Machine Learning,ML)/深度学习等几大方向。1) Artificial Intelligence (AI) is the theory, method, technology and application of using digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results system. That is, artificial intelligence is a comprehensive technique of computer science used to capture the essence of intelligence and produce a new intelligent machine that can respond in a similar way to human intelligence. In addition, artificial intelligence is also used to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making. In addition, artificial intelligence technology is a comprehensive subject that involves a wide range of fields, including both hardware-level technology and software-level technology. Artificial intelligence basic technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics. Artificial intelligence software technology mainly includes several major directions such as computer vision technology, speech processing technology, natural language processing technology, and machine learning (Machine Learning, ML)/deep learning.
2)自然语言处理(Nature Language processing,NLP),是计算机科学领域与人工智能领域中的一个方向;是指研究能够实现人与计算机之间用自然语言进行有效通信的各种理论和方法。因此,自然语言处理是一门融语言学、计算机科学和数学于一体的科学;从而,自然语言处理领域的研究将涉及自然语言,即人们日常使用的语言,所以自然语言处理与语言学的研究有着密切的联系。自然语言处理技术通常包括机器阅读理解(Machine Reading Comprehension,MRC)、文本处理、语义理解、机器翻译、机器人问答和知识图谱等技术。2) Natural language processing (Nature Language processing, NLP), is a direction in the field of computer science and artificial intelligence; it refers to the study of various theories and methods that can realize effective communication between humans and computers in natural language. Therefore, natural language processing is a science that integrates linguistics, computer science and mathematics; thus, the research in the field of natural language processing will involve natural language, that is, the language people use every day, so the research of natural language processing and linguistics have a close connection. Natural language processing technologies usually include technologies such as machine reading comprehension (Machine Reading Comprehension, MRC), text processing, semantic understanding, machine translation, robot question answering, and knowledge graphs.
3)机器学习,是一门多领域交叉学科,涉及概率论、统计学、逼近论、凸分析和算法复杂度理论等多门学科;用于研究计算机怎样模拟或实现人类的学习行为,以获取新的知识或技能,重新组织已有的知识结构使之不断改善自身的性能。机器学习是人工智能的核心,是使计算机具有智能的根本途径,机器学习的应用遍及人工智能的各个领域,机器学习和深度学习通常包括人工神经网络、置信网络、强化学习、迁移学习、归纳学习和式教学习等技术。3) Machine learning is a multi-field interdisciplinary subject involving probability theory, statistics, approximation theory, convex analysis and algorithm complexity theory; it is used to study how computers simulate or realize human learning behaviors to obtain New knowledge or skills, reorganize the existing knowledge structure so that it can continuously improve its own performance. Machine learning is the core of artificial intelligence and the fundamental way to make computers intelligent. The application of machine learning pervades all fields of artificial intelligence. Machine learning and deep learning usually include artificial neural networks, belief networks, reinforcement learning, migration learning, and inductive learning. Japanese teaching and learning techniques.
4)机器阅读理解,是一种自然语言处理任务;用于在给定一篇文章,以及文章的一个问题的情况下,通过机器在阅读文章后对问题进行作答;其中,文章和问题,即本申请实施例中的待回答语句序列。4) Machine reading comprehension is a natural language processing task; given an article and a question in the article, it is used to answer the question after reading the article by a machine; among them, the article and the question, namely The sentence sequence to be answered in the embodiment of this application.
5)图卷积网络(Graph Convolutional Network,GCN),用于计算图的表示,对应的处理数据是图结构的数据,而图(graph)是一种数据格式,用于表示关键字符串网络、社交网络、通信网络和蛋白分子网络等,图中的节点表示网络中的个体,边表示个体之间的连接关系。在本申请实施例中,第一关键字符串图的向量表示和第二关键字符串图的向量表示可以通过图卷积网络获取。5) Graph Convolutional Network (GCN), which is used to calculate the representation of the graph, the corresponding processing data is the data of the graph structure, and the graph (graph) is a data format used to represent the key string network, Social networks, communication networks, and protein molecular networks, etc., the nodes in the graph represent individuals in the network, and the edges represent the connections between individuals. In the embodiment of the present application, the vector representation of the first key string graph and the vector representation of the second key string graph can be obtained through a graph convolutional network.
6)命名实体识别(Named Entity Recognition,NER),也称为实体识别、实体分块和实体提取,用于将文本中的命名实体定位并分类为预先定义的类别,如人员、组织、位置、时间表达式、数量、价值资源百分比等;通常,命名实体识别的任务是识别出待处理文本中三大类(实体类、时间类和数字类)和七小类(人名、机构名、地名、时间、日期、价值资源和百分比)命名实体。在本申请实施例中,通过命名实体识别,获取预设实体类型的实体,比如,人名和地名类型的实体。6) Named Entity Recognition (NER), also known as entity recognition, entity segmentation and entity extraction, is used to locate and classify named entities in text into predefined categories, such as people, organizations, locations, Time expression, quantity, value resource percentage, etc.; usually, the task of named entity recognition is to identify three categories (entity category, time category and number category) and seven subcategories (person name, organization name, place name, time, date, value resource and percentage) named entities. In this embodiment of the present application, entities of a preset entity type, such as entities of a person name and a place name type, are acquired through named entity recognition.
一般来说,为了将最新进展事件整合到话题下,通常采用聚类的方式,将最新进展事件与话题进行增量聚类,以根据聚类中心和阈值确定最新进行事件所属的话题。然而,上述通过增量聚类的方式,将最新进展事件整合到话题下的过程中,由于聚类的准确率较低,影响了事件整合的准确率,还存在计算开销随着话题的数量增多而增加的问题,从而,影响了事件整合的效率。Generally speaking, in order to integrate the latest progress event into the topic, a clustering method is usually used to incrementally cluster the latest progress event and the topic, so as to determine the topic to which the latest progress event belongs according to the cluster center and threshold. However, in the above-mentioned process of integrating the latest progress events into topics through incremental clustering, due to the low accuracy of clustering, the accuracy of event integration is affected, and there is still computational overhead as the number of topics increases. The increased problem, thus, affects the efficiency of event integration.
基于此,本申请实施例提供一种事件整合方法、装置、电子设备、计算机可读存储介质及计算机程序产品,能够提升事件整合准确率和效率,降低事件整合的计算资源消耗。下面说明本申请实施例提供的用于事件整合的电子设备(以下简称为事件整合设备)的示例性应用,本申请实施例提供的事件整合设备可以实施为智能手机、智能手表、笔记本电脑、平板电脑、台式计算机、智能家电、机顶盒、智能车载设备、便携式音乐播放器、个人数字助理、专用消息设备、智能语音交互设备、便携式游戏设备和智能音箱等各种类型的终端,也可以实施为服务器。下面,将说明设备实施为服务器时的示例性应用。Based on this, the embodiments of the present application provide an event integration method, device, electronic device, computer-readable storage medium, and computer program product, which can improve the accuracy and efficiency of event integration and reduce the computing resource consumption of event integration. The following describes the exemplary application of the electronic device for event integration (hereinafter referred to as event integration device) provided by the embodiment of the present application. The event integration device provided by the embodiment of the present application can be implemented as a smart phone, a smart watch, a notebook computer, a tablet Various types of terminals such as computers, desktop computers, smart home appliances, set-top boxes, smart car devices, portable music players, personal digital assistants, dedicated messaging devices, smart voice interaction devices, portable game devices, and smart speakers can also be implemented as servers . Next, an exemplary application when the device is implemented as a server will be described.
参见图1,图1是本申请实施例提供的事件整合系统的一个可选的架构示意图;如图1所示,为支撑一个事件整合应用,在事件整合系统100中,终端200(示例性示出了终端200-1和终端200-2)通过网络300连接服务器400(称为事件整合设备),网络300可以是广域网或者局域网,又或者是二者的组合。另外,该事件整合系统100中还包括数据库500,用于向服务器400提供数据支持(比如,向服务器400提供至少两个待整合话题);并且,图1中示出的为数据库500独立于服务器400的一种情况,此外,数据库500还可以集成在服务器400中,本申请实施例对此不作限定。Referring to FIG. 1, FIG. 1 is a schematic diagram of an optional architecture of the event integration system provided by the embodiment of the present application; as shown in FIG. The terminal 200-1 and the terminal 200-2) are connected to the server 400 (called an event integration device) through the network 300, and the network 300 may be a wide area network or a local area network, or a combination of both. In addition, the event integration system 100 also includes a database 500 for providing data support to the server 400 (for example, providing at least two topics to be integrated to the server 400); and, shown in FIG. 1 is that the database 500 is independent of the server 400, in addition, the database 500 may also be integrated in the server 400, which is not limited in this embodiment of the present application.
终端200,用于通过网络300从服务器400获取事件脉络,并在图形界面上显示事件脉络。The terminal 200 is configured to obtain the event context from the server 400 through the network 300 and display the event context on a graphical interface.
服务器400,用于获取待整合事件,并获取至少两个待整合话题,其中,每个待整合话题包括至少一个话题事件;基于语义相似度、字符串图相似度和问答相似度中的一 种或多种,确定待整合事件与每个待整合话题之间的目标相似度,其中,语义相似度是指语义特征方面的相似度,字符串图相似度是指关键字符串对应的图特征方面的相似度,问答相似度是指问答特征方面的相似度;基于目标相似度,从至少两个待整合话题中确定待整合事件所属的目标话题;将待整合事件整合至目标话题中,得到事件脉络,其中,事件脉络包括待整合事件和至少一个话题事件。还用于通过网络300向终端200发送事件脉络。The server 400 is configured to acquire events to be integrated, and acquire at least two topics to be integrated, wherein each topic to be integrated includes at least one topic event; based on one of semantic similarity, string graph similarity, and question-answer similarity or more, determine the target similarity between the event to be integrated and each topic to be integrated, where the semantic similarity refers to the similarity in semantic features, and the string graph similarity refers to the graph features corresponding to key strings Q&A similarity refers to the similarity of Q&A features; based on the target similarity, determine the target topic to which the event to be integrated belongs from at least two topics to be integrated; integrate the event to be integrated into the target topic, and get the event A context, wherein the event context includes an event to be integrated and at least one topic event. It is also used to send the event context to the terminal 200 through the network 300 .
在一些实施例中,服务器400可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、内容分发网络(CDN,Content Delivery Network)、以及大数据和人工智能平台等基础云计算服务的云服务器。终端200可以是智能手机、平板电脑、笔记本电脑、台式计算机、智能音箱、智能手表、智能语音交互设备、智能家电和车载终端等,但并不局限于此。终端以及服务器可以通过有线或无线通信方式进行直接或间接地连接,本申请实施例中不作限制。In some embodiments, the server 400 can be an independent physical server, or a server cluster or a distributed system composed of multiple physical servers, and can also provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, Cloud servers for basic cloud computing services such as network services, cloud communications, middleware services, domain name services, security services, content delivery network (CDN, Content Delivery Network), and big data and artificial intelligence platforms. The terminal 200 may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, a smart voice interaction device, a smart home appliance, a vehicle terminal, etc., but is not limited thereto. The terminal and the server may be connected directly or indirectly through wired or wireless communication, which is not limited in this embodiment of the present application.
参见图2,图2是本申请实施例提供的图1中的服务器的一种示例性的组成结构示意图;图2所示的服务器400包括:至少一个处理器410、存储器450和至少一个网络接口420;在本申请的一些实施例中,服务器400还包括用户接口430。服务器400中的各个组件通过总线系统440耦合在一起。可理解,总线系统440用于实现这些组件之间的连接通信。总线系统440除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见,在图2中将各种总线都标为总线系统440。Referring to FIG. 2, FIG. 2 is a schematic diagram of an exemplary composition structure of the server in FIG. 1 provided by the embodiment of the present application; the server 400 shown in FIG. 2 includes: at least one processor 410, a memory 450 and at least one network interface 420 ; in some embodiments of the present application, the server 400 further includes a user interface 430 . Various components in the server 400 are coupled together through a bus system 440 . It can be understood that the bus system 440 is used to realize connection and communication among these components. In addition to the data bus, the bus system 440 also includes a power bus, a control bus and a status signal bus. However, for clarity of illustration, the various buses are labeled as bus system 440 in FIG. 2 .
处理器410可以是一种集成电路芯片,具有信号的处理能力,例如通用处理器、数字信号处理器(DSP,Digital Signal Processor),或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等,其中,通用处理器可以是微处理器或者任何常规的处理器等。Processor 410 can be a kind of integrated circuit chip, has signal processing capability, such as general processor, digital signal processor (DSP, Digital Signal Processor), or other programmable logic device, discrete gate or transistor logic device, discrete hardware Components, etc., wherein the general-purpose processor can be a microprocessor or any conventional processor, etc.
用户接口430包括使得能够呈现媒体内容的一个或多个输出装置431,包括一个或多个扬声器和/或一个或多个视觉显示屏。用户接口430还包括一个或多个输入装置432,包括有助于用户输入的用户接口部件,比如键盘、鼠标、麦克风、触屏显示屏、摄像头、其他输入按钮和控件。User interface 430 includes one or more output devices 431 that enable presentation of media content, including one or more speakers and/or one or more visual displays. The user interface 430 also includes one or more input devices 432, including user interface components that facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, other input buttons and controls.
存储器450可以是可移除的,不可移除的或其组合。示例性的硬件设备包括固态存储器,硬盘驱动器,光盘驱动器等。存储器450可选地包括在物理位置上远离处理器410的一个或多个存储设备。Memory 450 may be removable, non-removable or a combination thereof. Exemplary hardware devices include solid state memory, hard drives, optical drives, and the like. Memory 450 optionally includes one or more storage devices located physically remote from processor 410 .
存储器450包括易失性存储器或非易失性存储器,也可包括易失性和非易失性存储器两者。非易失性存储器可以是只读存储器(ROM,Read Only Memory),易失性存储器可以是随机存取存储器(RAM,Random Access Memory)。本申请实施例描述的存储器450旨在包括任意适合类型的存储器。Memory 450 includes volatile memory or nonvolatile memory, and may include both volatile and nonvolatile memory. The non-volatile memory can be a read-only memory (ROM, Read Only Memory), and the volatile memory can be a random access memory (RAM, Random Access Memory). The memory 450 described in the embodiment of the present application is intended to include any suitable type of memory.
在本申请的一些实施例中,存储器450能够存储数据以支持各种操作,这些数据的示例包括程序、模块和数据结构或者其子集或超集,下面示例性说明。In some embodiments of the present application, the memory 450 can store data to support various operations, and examples of these data include programs, modules and data structures or subsets or supersets thereof, which are exemplarily described below.
操作系统451,包括用于处理各种基本系统服务和执行硬件相关任务的系统程序,例如框架层、核心库层、驱动层等,用于实现各种基础业务以及处理基于硬件的任务;Operating system 451, including system programs for processing various basic system services and performing hardware-related tasks, such as framework layer, core library layer, driver layer, etc., for implementing various basic services and processing hardware-based tasks;
网络通信模块452,用于经由一个或多个(有线或无线)网络接口420到达其他电子设备,示例性的网络接口420包括:蓝牙、无线相容性认证(Wi-Fi)、和通用串行总线(USB,Universal Serial Bus)等;A network communication module 452 for reaching other electronic devices via one or more (wired or wireless) network interfaces 420. Exemplary network interfaces 420 include: Bluetooth, Wireless Compatibility Authentication (Wi-Fi), and Universal Serial Bus (USB, Universal Serial Bus), etc.;
呈现模块453,用于经由一个或多个与用户接口430相关联的输出装置431(例如,显示屏、扬声器等)使得能够呈现信息(例如,用于操作外围设备和显示内容和信息的用户接口);Presentation module 453 for enabling presentation of information via one or more output devices 431 (e.g., display screen, speakers, etc.) associated with user interface 430 (e.g., a user interface for operating peripherals and displaying content and information );
输入处理模块454,用于对一个或多个来自一个或多个输入装置432之一的一个或多个用户输入或互动进行检测以及翻译所检测的输入或互动。The input processing module 454 is configured to detect one or more user inputs or interactions from one or more of the input devices 432 and translate the detected inputs or interactions.
在本申请的一些实施例中,本申请实施例提供的事件整合装置可以采用软件方式实现,图2示出了存储在存储器450中的事件整合装置455,其可以是程序和插件等形式的软件,包括以下软件模块:信息获取模块4551、相似度获取模块4552、话题确定模块4553、事件整合模块4554、模型训练模块4555和事件展示模块4556,这些模块是逻辑上的,因此根据所实现的功能可以进行任意的组合或进一步拆分。将在下文中说明各个模块的功能。In some embodiments of the present application, the event integration device provided by the embodiment of the present application can be realized by software. FIG. 2 shows the event integration device 455 stored in the memory 450, which can be software in the form of programs and plug-ins. , including the following software modules: information acquisition module 4551, similarity acquisition module 4552, topic determination module 4553, event integration module 4554, model training module 4555, and event display module 4556, these modules are logical, so according to the implemented functions Arbitrary combinations or further splits are possible. The function of each module will be explained below.
在本申请的另一些实施例中,本申请实施例提供的事件整合装置可以采用硬件方式实现,作为示例,本申请实施例提供的事件整合装置可以是采用硬件译码处理器形式的处理器,其被编程以执行本申请实施例提供的事件整合方法,例如,硬件译码处理器形式的处理器可以采用一个或多个应用专用集成电路(ASIC,Application Specific Integrated Circuit)、DSP、可编程逻辑器件(PLD,Programmable Logic Device)、复杂可编程逻辑器件(CPLD,Complex Programmable Logic Device)、现场可编程门阵列(FPGA,Field-Programmable Gate Array)或其他电子元件。In other embodiments of the present application, the event integration device provided in the embodiment of the present application may be implemented in a hardware manner. As an example, the event integration device provided in the embodiment of the present application may be a processor in the form of a hardware decoding processor. It is programmed to execute the event integration method provided by the embodiment of the present application. For example, the processor in the form of a hardware decoding processor can adopt one or more application-specific integrated circuits (ASIC, Application Specific Integrated Circuit), DSP, programmable logic Device (PLD, Programmable Logic Device), Complex Programmable Logic Device (CPLD, Complex Programmable Logic Device), Field Programmable Gate Array (FPGA, Field-Programmable Gate Array) or other electronic components.
下面,将结合本申请实施例提供的事件整合设备的示例性应用和实施,说明本申请实施例提供的事件整合方法。以及,本申请实施例提供的事件整合方法应用于云技术、人工智能、智慧交通和车载等各种事件整合场景。In the following, the event integration method provided in the embodiment of the application will be described in conjunction with the exemplary application and implementation of the event integration device provided in the embodiment of the application. And, the event integration method provided by the embodiment of the present application is applied to various event integration scenarios such as cloud technology, artificial intelligence, smart transportation, and vehicle.
参见图3,图3是本申请实施例提供的事件整合方法的可选的流程示意图一,将结合图3示出的步骤进行说明。Referring to FIG. 3 , FIG. 3 is an optional schematic flowchart 1 of the event integration method provided by the embodiment of the present application, which will be described in conjunction with the steps shown in FIG. 3 .
S301、获取待整合事件,并获取至少两个待整合话题。S301. Acquire events to be integrated, and acquire at least two topics to be integrated.
在本申请实施例中,事件整合设备对待进行整合的事件进行获取,也就得到了待整合事件;这里,可以是事件整合设备对事件进行检测来获得的待整合事件,还可以是事件整合设备接收其他设备发送的事件来获得的待整合事件,等等,本申请实施例对此不作限定。另外,事件整合设备获取待进行整合的话题,也就得到了至少两个待整合话题。In this embodiment of the application, the event integration device obtains the event to be integrated, and thus obtains the event to be integrated; here, the event to be integrated can be obtained by the event integration device detecting the event, or the event integration device Events to be integrated obtained by receiving events sent by other devices, etc., are not limited in this embodiment of the present application. In addition, the event integration device obtains the topics to be integrated, and obtains at least two topics to be integrated.
需要说明的是,待整合事件是指待整合的事件,而事件用于描述所发生的事情的信息,比如,新闻事件,看点事件;并且,待整合事件可以是最新进展事件,也可以是历史事件,其中,历史事件是指对应的事件时间之后存在已发生事件的事件,本申请实施例对此不作限定;另外,待整合事件至少包括文本信息,还可以包括音视频、图像和表格中的至少一种。另外,至少一个待整合话题可以是数据库中所有的话题,也可以从数据库中筛选出的可能与待整合事件关联的话题,等等,本申请实施例对此不作限定;并且,待整合话题是一个事件主题,是相关事件的集合,包括至少一个话题事件,而话题事件也是一个事件。It should be noted that the event to be integrated refers to the event to be integrated, and the event is used to describe the information of what happened, such as news event, highlight event; and the event to be integrated can be the latest progress event, or it can be Historical events, where historical events refer to events that have occurred after the corresponding event time, which is not limited in the embodiment of the present application; in addition, the events to be integrated include at least text information, and may also include audio, video, images, and tables. at least one of . In addition, at least one topic to be integrated may be all topics in the database, or may be selected from the database and may be associated with the event to be integrated, etc., and the embodiment of the present application does not limit this; and the topic to be integrated is An event topic is a collection of related events, including at least one topic event, and a topic event is also an event.
S302、基于语义相似度、字符串图相似度和问答相似度中的一种或多种,确定待整合事件与每个待整合话题之间的目标相似度。S302. Determine the target similarity between the event to be integrated and each topic to be integrated based on one or more of semantic similarity, character string graph similarity, and question-answer similarity.
在本申请实施例中,事件整合设备通过比较待整合事件与每个待整合话题之间的目标相似度,来确定每个待整合话题是否是待整合事件所属的话题。In the embodiment of the present application, the event integration device determines whether each topic to be integrated is a topic to which the event to be integrated belongs by comparing the target similarity between the event to be integrated and each topic to be integrated.
需要说明的是,目标相似度是指待整合事件属于每个待整合话题的可能性。并且,目标相似度可从一个或多个方面进行确定,从而,目标相似度包括语义相似度、字符串图相似度和问答相似度中的一种或多种,并且,目标相似度包括语义相似度、字符串图相似度和问答相似度中的一种或多种,是基于选择逻辑确定的;其中,语义相似度是指语义特征方面的相似度,字符串图相似度是指关键字符串对应的图特征方面的相似度,问答相似度是指问答特征方面的相似度,选择逻辑是事件整合设备从语义相似度、字符串图相似度和问答相似度中进行选择的依据。以及,事件整合设备基于选择逻辑,从语 义相似度、字符串图相似度和问答相似度中选择一种或多种,得到目标相似度,包括:事件整合设备基于选择逻辑,从语义相似度、字符串图相似度和问答相似度中选择一种,得到目标相似度;或者,事件整合设备基于选择逻辑,从语义相似度、字符串图相似度和问答相似度中选择至少两种,得到目标相似度。It should be noted that the target similarity refers to the possibility that the event to be integrated belongs to each topic to be integrated. Moreover, the target similarity can be determined from one or more aspects, thus, the target similarity includes one or more of semantic similarity, string graph similarity and question-answer similarity, and the target similarity includes semantic similarity One or more of string graph similarity and question-answer similarity are determined based on selection logic; among them, semantic similarity refers to the similarity in terms of semantic features, and string graph similarity refers to the key string Corresponding to the similarity in terms of graph features, the question-and-answer similarity refers to the similarity in question-and-answer features, and the selection logic is the basis for the event integration device to select from semantic similarity, string graph similarity, and question-and-answer similarity. And, based on the selection logic, the event integration device selects one or more from semantic similarity, string graph similarity and question-answer similarity to obtain the target similarity, including: based on the selection logic, the event integration device selects from semantic similarity, Select one of string graph similarity and question-answer similarity to obtain the target similarity; or, based on the selection logic, the event integration device selects at least two from semantic similarity, string graph similarity and question-answer similarity to obtain the target similarity.
还需要说明的是,选择逻辑包括选择顺序、获取速度、准确率、话题规模、选择数量、话题类型、模型训练规模和模型适用规模中的一种或多种。其中,选择顺序是基于相似度的优先级确定的,而优先级可以是基于准确度和耗时中的一种或两种确定的;获取速度为获取相似度的速度,并且,获取速度可以是基于相似度中特征提取耗时和特征提取方式(并行或串行)中的一种或两种确定的;准确率为相似度的准确程度,可以是基于相似度获取过程所采用的特征的特点或对应网络模型的准确度中的一种或两种确定的;话题类型为待整合话题的内容形式,比如,内容形式为图像形式时,可以选择字符串图相似度和问答相似度作为目标相似度,内容形式为文本形式时,可以选择语义相似度、字符串图相似度和问答相似度中包括语义相似度的一种或多种作为目标相似度;话题规模为至少一个待整合话题的规模,可以是基于待整合话题的数量和待整合话题的内容量中的一种或多种确定的;模型训练规模为用于获取每种相似度的网络模型所对应的训练数据规模;模型适用规模为用于获取每种相似度的网络模型所能承载的最大数据量;模型适用范围为用于获取每种相似度的网络模型所对应的数据形式。It should also be noted that the selection logic includes one or more of selection order, acquisition speed, accuracy rate, topic scale, selection quantity, topic type, model training scale, and model applicable scale. Among them, the selection order is determined based on the priority of the similarity, and the priority can be determined based on one or both of accuracy and time-consuming; the acquisition speed is the speed of obtaining the similarity, and the acquisition speed can be It is determined based on one or both of the time-consuming feature extraction in the similarity and the feature extraction method (parallel or serial); the accuracy rate is the accuracy of the similarity, which can be based on the characteristics of the features used in the similarity acquisition process Or one or both of the accuracy of the corresponding network model is determined; the topic type is the content form of the topic to be integrated, for example, when the content form is an image form, you can choose string graph similarity and question-answer similarity as target similarity When the content is in the form of text, one or more of semantic similarity, string graph similarity, and question-and-answer similarity including semantic similarity can be selected as the target similarity; the topic scale is the scale of at least one topic to be integrated , can be determined based on one or more of the number of topics to be integrated and the content of the topics to be integrated; the model training scale is the training data scale corresponding to the network model used to obtain each similarity; the applicable scale of the model It is the maximum amount of data that can be carried by the network model used to obtain each similarity; the scope of application of the model is the corresponding data form of the network model used to obtain each similarity.
在本申请实施例中,事件整合设备基于待整合事件对应的图结构的特征,与每个待整合话题对应的图结构的特征进行比较,获得字符串图相似度。事件整合设备可以基于待整合事件与每个待整合话题构建机器阅读理解中的问题与文章,并通过问题与文章之间的信息交互,确定答案信息,并基于答案信息确定对应的问答相似度。以及,语义相似度、字符串图相似度和问答相似度是从不同维度获得的相似度。In the embodiment of the present application, the event integration device compares the features of the graph structure corresponding to the events to be integrated with the features of the graph structure corresponding to each topic to be integrated to obtain the similarity of the string graph. The event integration device can construct questions and articles in machine reading comprehension based on the event to be integrated and each topic to be integrated, and determine the answer information through the information interaction between the question and the article, and determine the similarity of the corresponding question and answer based on the answer information. And, semantic similarity, string graph similarity and question-answer similarity are similarities obtained from different dimensions.
S303、基于目标相似度,从至少两个待整合话题中确定待整合事件所属的目标话题。S303. Based on the target similarity, determine the target topic to which the event to be integrated belongs from at least two topics to be integrated.
在本申请实施例中,事件整合设备可以基于目标相似度,从至少两个待整合话题中确定与待整合事件最匹配的话题,并将与待整合事件最匹配的话题确定为待整合事件所属的话题,即为目标话题;其中,目标话题是指最大目标相似度所对应的待整合话题。In this embodiment of the application, the event integration device may determine the topic that best matches the event to be integrated from at least two topics to be integrated based on the target similarity, and determine the topic that best matches the event to be integrated as the event to be integrated. The topic is the target topic; wherein, the target topic refers to the topic to be integrated corresponding to the maximum target similarity.
需要说明的是,事件整合设备通过获取待整合事件与每个待整合话题之间的目标相似度,来获得待整合事件与至少两个待整合话题之间对应的至少两个目标相似度;进而,基于至少两个目标相似度从至少两个待整合话题中确定目标话题,本申请实施例对此不作限定。It should be noted that the event integration device obtains at least two corresponding target similarities between the event to be integrated and at least two topics to be integrated by obtaining the target similarity between the event to be integrated and each topic to be integrated; and then , determining a target topic from at least two topics to be integrated based on at least two target similarities, which is not limited in this embodiment of the present application.
还需要说明的是,目标话题与待整合事件之间的目标相似度可以是大于相似度阈值的;从而,当最大目标相似度低于相似度阈值时,事件整合设备确定最大目标相似度所对应的话题不是待整合事件的所属话题;而在最大目标相似度大于或等于相似度阈值时,事件整合设备才会将最大目标相似度所对应的话题确定为待整合事件的所属话题。It should also be noted that the target similarity between the target topic and the event to be integrated may be greater than the similarity threshold; thus, when the maximum target similarity is lower than the similarity threshold, the event integration device determines the maximum target similarity The topic of the event is not the topic of the event to be integrated; and when the maximum target similarity is greater than or equal to the similarity threshold, the event integration device will determine the topic corresponding to the maximum target similarity as the topic of the event to be integrated.
S304、将待整合事件整合至目标话题中,得到事件脉络。S304. Integrate the event to be integrated into the target topic to obtain the context of the event.
在本申请实施例中,事件整合设备将待整合事件作为目标话题中的一个话题事件,整合至目标话题所包括的至少一个话题事件中,得到包括待整合事件和至少一个话题事件的事件脉络。其中,事件脉络是指针对目标话题所描述的事情的发生过程。In this embodiment of the present application, the event integration device integrates the event to be integrated as a topic event in the target topic into at least one topic event included in the target topic to obtain an event context including the event to be integrated and at least one topic event. Among them, the event context refers to the occurrence process of the events described for the target topic.
可以理解的是,由于在至少两个待整合话题中确定待整合事件所属的目标话题时,是通过判断待整合事件与每个待整合话题之间的目标相似度确定的,即目标话题是直接将待整合事件与每个待整合话题进行目标相似度对比确定的,又由于目标相似度包括语义相似度、字符串图相似度和问答相似度中的一种或多种,从而,通过所获得的目标相似度能够准确地确定每个待整合话题是否是待整合事件所属的目标话题,进而,当将待整合事件整合至目标话题时,能够提升事件整合的准确率。另外,当事件整合设备通过 利用语义相似度、字符串图相似度和问答相似度中的多种,从至少两个待整合话题中确定待整合事件所属的目标话题,是一种基于多维异构特征确定目标话题的过程,因此,能够提升所获得的目标话题的准确度和有效性,进而,能够提升事件整合的准确性。It can be understood that when determining the target topic to which the event to be integrated belongs in at least two topics to be integrated, it is determined by judging the target similarity between the event to be integrated and each topic to be integrated, that is, the target topic is directly The event to be integrated is determined by comparing the target similarity with each topic to be integrated, and since the target similarity includes one or more of semantic similarity, string graph similarity and question-answer similarity, thus, through the obtained The target similarity of can accurately determine whether each topic to be integrated is the target topic to which the event to be integrated belongs, and then, when the event to be integrated is integrated into the target topic, the accuracy of event integration can be improved. In addition, when the event integration device determines the target topic of the event to be integrated from at least two topics to be integrated by using multiple types of semantic similarity, string graph similarity and question-answer similarity, it is a multi-dimensional heterogeneous Features determine the process of target topics, thus, can improve the accuracy and effectiveness of the obtained target topics, and in turn, can improve the accuracy of event integration.
参见图4a,图4a是本申请实施例提供的事件整合方法的可选的流程示意图二;如图4a所示,在本申请实施例中,当选择逻辑包括选择顺序时,S302可通过S3021至S3024实现;也就是说,事件整合设备基于语义相似度、字符串图相似度和问答相似度中选择一种或多种,确定待整合事件与每个待整合话题之间的目标相似度,包括S3021至S3024,下面对各步骤分别进行说明。Referring to Fig. 4a, Fig. 4a is an optional flow diagram II of the event integration method provided by the embodiment of the present application; as shown in Fig. 4a, in the embodiment of the present application, when the selection logic includes the selection order, S302 can pass through S3021 to S3024 implementation; that is to say, the event integration device selects one or more of semantic similarity, string graph similarity and question-answer similarity to determine the target similarity between the event to be integrated and each topic to be integrated, including From S3021 to S3024, each step will be described respectively below.
S3021、基于选择顺序,从语义相似度、字符串图相似度和问答相似度的优先级的降序排序中,依次选择第一设定数量的相似度。S3021. Based on the selection order, sequentially select a first set number of similarities from the descending order of priorities of semantic similarity, character string graph similarity, and question-answer similarity.
需要说明的是,第一设定数量的相似度包括语义相似度、字符串图相似度和问答相似度中的一种或多种。It should be noted that the first set number of similarities includes one or more of semantic similarity, character string graph similarity, and question-answer similarity.
示例性地,事件整合设备可以先选择准确率最高的问答相似度和语义相似度,如果能够确定结果则结束选择,如果不能确定结果再选择字符图相似度;事件整合设备还可以先选择耗时最少的问答相似度,如果能够确定结果则结束选择,如果不能确定结果再从语义相似度和字符串图相似度中进行相似度的选择。其中,能够确定结果是指所选择出的目标相似度大于第一相似度阈值或小于第二相似度阈值,不能确定结果是指所选择出的目标相似度小于等于第一相似度阈值、且大于等于第二相似度阈值;这里,第一相似度阈值大于第二相似度阈值。Exemplarily, the event integration device can first select the question-answer similarity and semantic similarity with the highest accuracy, and end the selection if the result can be determined, and then select the character graph similarity if the result cannot be determined; the event integration device can also first select the time-consuming The least question-answer similarity, if the result can be determined, the selection will end, if the result cannot be determined, then choose the similarity from the semantic similarity and string graph similarity. Among them, the determined result means that the similarity of the selected target is greater than the first similarity threshold or smaller than the second similarity threshold, and the undetermined result means that the selected target similarity is less than or equal to the first similarity threshold and greater than the second similarity threshold. is equal to the second similarity threshold; here, the first similarity threshold is greater than the second similarity threshold.
S3022、获取第一设定数量的相似度与相似度阈值的比较结果。S3022. Acquire comparison results between the first set amount of similarity and the similarity threshold.
需要说明的是,相似度阈值中可以包括第一设定数量的子相似度阈值,并且第一设定数量的子相似度阈值与第一设定数量的相似度一一对应。It should be noted that the similarity threshold may include a first set number of sub-similarity thresholds, and the first set number of sub-similarity thresholds corresponds to the first set number of similarities one-to-one.
S3023、当比较结果为待整合事件与待整合话题的相似结果时,将第一设定数量的相似度,确定为待整合事件与待整合话题之间的目标相似度。S3023. When the comparison result is a similar result of the event to be integrated and the topic to be integrated, determine a first set amount of similarity as a target similarity between the event to be integrated and the topic to be integrated.
需要说明的是,待整合事件与待整合话题的相似结果是指待整合事件与待整合话题相似或者不相似,即为上述的能够确定结果。It should be noted that the similar result of the event to be integrated and the topic to be integrated means that the event to be integrated is similar or not similar to the topic to be integrated, which is the above determinable result.
S3024、当比较结果为待整合事件与待整合话题的待定相似结果时,基于选择顺序对剩余相似度进行选择,直至满足选择结束条件时,将选择出的多个相似度确定为目标相似度。S3024. When the comparison result is the undetermined similarity result of the event to be integrated and the topic to be integrated, select the remaining similarities based on the selection sequence, and determine the selected similarities as the target similarity until the selection end condition is satisfied.
需要说明的是,待定相似结果是指无法确定待整合事件与待整合话题相似或者不相似,即为上述的不能够确定结果;剩余相似度为语义相似度、字符串图相似度和问答相似度中除第一设定数量的相似度之外的相似度;选择结束条件为确定出待整合事件与待整合话题之间相似,或者,选择结束条件为选择了语义相似度、字符串图相似度和问答相似度。其中,选择出的多个相似度是指所有选择次数所选择出的所有的相似度。It should be noted that the undetermined similarity result means that it is impossible to determine whether the event to be integrated is similar or not similar to the topic to be integrated, which is the above-mentioned undetermined result; the remaining similarity is semantic similarity, string graph similarity and question-answer similarity In addition to the similarity of the first set number of similarities; the selection end condition is to determine the similarity between the event to be integrated and the topic to be integrated, or the selection end condition is to select semantic similarity, string graph similarity and question-and-answer similarity. Wherein, the selected multiple similarities refer to all similarities selected by all selection times.
参见图4b,图4b是本申请实施例提供的事件整合方法的可选的流程示意图三;如图4b所示,在本申请实施例中,当选择逻辑包括获取速度和话题规模时,S302还可通过S3025和S3026实现;也就是说,事件整合设备基于语义相似度、字符串图相似度和问答相似度中的一种或多种,确定待整合事件与每个待整合话题之间的目标相似度,包括S3025和S3026,下面对各步骤分别进行说明。Referring to Fig. 4b, Fig. 4b is an optional flow diagram III of the event integration method provided by the embodiment of the present application; as shown in Fig. 4b, in the embodiment of the present application, when the selection logic includes acquisition speed and topic scale, S302 also It can be realized through S3025 and S3026; that is, the event integration device determines the target between the event to be integrated and each topic to be integrated based on one or more of semantic similarity, string graph similarity and question-answer similarity Similarity, including S3025 and S3026, each step will be described separately below.
S3025、当话题规模大于设定规模时,从语义相似度、字符串图相似度和问答相似度的获取速度的降序排序中,依次选择第二设定数量的相似度,得到待整合事件与待整合话题之间的目标相似度。S3025. When the topic scale is larger than the set scale, select the second set number of similarities in descending order from the acquisition speed of semantic similarity, string graph similarity and question-answer similarity, and obtain the event to be integrated and the number of similarities to be integrated. Integrate target similarity between topics.
需要说明的是,如果事件整合设备确定话题规模大于设定规模,表明至少两个待整合话题的规模较大,需要采用少量的(称为第二设定数量)获取速度较快的相似度(称 为从获取速度的降序排序中选择出的第二设定数量的相似度)进行结果的判定。It should be noted that if the event integration device determines that the topic scale is larger than the set scale, it indicates that at least two topics to be integrated have a large scale, and a small number (called the second set number) needs to be used to obtain a faster similarity ( It is called the second set number of similarities selected from the descending order of acquisition speed) to determine the result.
S3026、当话题规模小于等于设定规模时,从语义相似度、字符串图相似度和问答相似度的获取速度的降序排序中,依次选择第三设定数量的相似度,得到待整合事件与待整合话题之间的目标相似度。S3026. When the topic scale is less than or equal to the set scale, select the third set number of similarities from the descending order of acquisition speed of semantic similarity, string graph similarity and question-answer similarity to obtain the event to be integrated and The target similarity between topics to be integrated.
需要说明的是,如果事件整合设备确定话题规模小于等于设定规模,表明至少两个待整合话题的规模较小,可以采用较多的(称为第三设定数量)获取速度较快的相似度(称为从获取速度的降序排序中选择出的第三设定数量的相似度)进行结果的判定;另外,第二设定数量小于第三设定数量。It should be noted that if the event integration device determines that the topic scale is smaller than or equal to the set scale, it indicates that the scale of at least two topics to be integrated is relatively small, and more (called the third set number) can be used to obtain similar degree (referred to as the third set number of similarities selected from the descending order of acquisition speed) to determine the result; in addition, the second set number is smaller than the third set number.
参见图4c,图4c是本申请实施例提供的事件整合方法的可选的流程示意图四;如图4c所示,在本申请实施例中,当目标相似度包括语义相似度、字符串图相似度和问答相似度中的多种时,S303可通过S3031至S3033实现;也就是说,事件整合设备基于目标相似度,从至少两个待整合话题中确定待整合事件所属的目标话题,包括S3031至S3033,下面对各步骤分别进行说明。Referring to Fig. 4c, Fig. 4c is an optional flow diagram IV of the event integration method provided by the embodiment of the present application; as shown in Fig. 4c, in the embodiment of the present application, when the target similarity includes semantic similarity, string graph similarity degree and question-answer similarity, S303 can be implemented through S3031 to S3033; that is, the event integration device determines the target topic to which the event to be integrated belongs from at least two topics to be integrated based on the target similarity, including S3031 Up to S3033, each step will be described below.
S3031、基于准确率确定目标相似度中各种相似度的权重配比。S3031. Determine weight ratios of various similarities in the target similarity based on the accuracy rate.
需要说明的是,事件整合设备对选择出的各种相似度中的每种相似度,确定与准确率正相关的权重,从而也就得到了目标相似度中的各种相似度之间的权重配比;其中,权重配比表征各种相似度对应的各权重之间的比例,比如,0.3:0.4:0.3。It should be noted that the event integration device determines the weight positively related to the accuracy rate for each of the selected similarities, and thus obtains the weight between the various similarities in the target similarity Matching ratio; wherein, the weight matching ratio represents a ratio between weights corresponding to various similarities, for example, 0.3:0.4:0.3.
S3032、基于权重配比,对目标相似度中的各种相似度进行融合,得到判别相似度。S3032. Based on the weight ratio, the various similarities in the target similarity are fused to obtain the discrimination similarity.
需要说明的是,事件整合设备基于权重配比,将目标相似度中的每种相似度与该相似度所对应的权重进行融合,当完成了目标相似度中所有相似度的融合之后,也就获得了用于判别待整合事件与待整合话题是否相似的最终的相似度;其中,用于判别待整合事件与待整合话题是否相似的最终的相似度即为判别相似度。It should be noted that, based on the weight ratio, the event integration device fuses each similarity in the target similarity with the corresponding weight of the similarity. After the fusion of all similarities in the target similarity is completed, The final similarity for judging whether the event to be integrated is similar to the topic to be integrated is obtained; wherein the final similarity for judging whether the event to be integrated is similar to the topic to be integrated is the discrimination similarity.
S3033、从至少两个待整合话题中,选择最高的判别相似度所对应的待整合话题,得到待整合事件所属的目标话题。S3033. From the at least two topics to be integrated, select the topic to be integrated corresponding to the highest discriminant similarity, and obtain the target topic to which the event to be integrated belongs.
需要说明的是,事件整合设备可以将最高的判别相似度所对应的待整合话题直接确定为待整合事件所属的目标话题;还可以将最高的判别相似度与阈值比较后,再确定是否将最高的判别相似度所对应的待整合话题直接确定为待整合事件所属的目标话题;等等,本申请实施例对此不作限定。It should be noted that the event integration device can directly determine the topic to be integrated corresponding to the highest discriminant similarity as the target topic to which the event to be integrated belongs; it can also compare the highest discriminative similarity with a threshold, and then determine whether to use the highest The topic to be integrated corresponding to the discriminant similarity degree of , is directly determined as the target topic to which the event to be integrated belongs; etc., which are not limited in this embodiment of the present application.
在本申请实施例中,语义相似度包括语义自注意力相似度和语义统计相似度中的一种或两种;其中,语义自注意力相似度基于待整合事件与话题事件之间的自注意力确定;语义统计相似度基于目标语义确定,目标语义是指标题、关键字符串和事件内容中的至少一种所对应的语义;也就是说,语义统计相似度是通过对待整合事件的标题、关键字符串、正文等信息的向量语义特征,与每个待整合话题的标题、关键字符串、话题事件的正文等信息的向量语义特征进行对应比较获得的。In the embodiment of the present application, the semantic similarity includes one or both of semantic self-attention similarity and semantic statistical similarity; wherein, the semantic self-attention similarity is based on the self-attention between the event to be integrated and the topic event The semantic statistical similarity is determined based on the target semantics, and the target semantics refers to the semantics corresponding to at least one of the title, key string and event content; that is, the semantic statistical similarity is determined by treating the title, The vector semantic features of information such as key strings and texts are obtained by corresponding comparison with the vector semantic features of information such as titles, key strings, and texts of topic events for each topic to be integrated.
这里,语义自注意力相似度通过以下步骤获得:事件整合设备获取待整合事件对应的待整合语义特征、以及待整合话题中的每个话题事件对应的话题事件语义特征;基于待整合事件和话题事件的区别标识,对待整合语义特征进行增强,得到第一增强语义特征,并基于区别标识对话题事件语义特征进行增强,得到第二增强语义特征;将第一增强语义特征与待整合话题对应的至少一个第二增强语义特征组成语义特征序列,并基于语义特征序列中两个序列单元之间的自注意力信息,确定语义自注意力相似度。Here, the semantic self-attention similarity is obtained through the following steps: the event integration device obtains the semantic features to be integrated corresponding to the events to be integrated, and the topic event semantic features corresponding to each topic event in the topic to be integrated; based on the event to be integrated and the topic The distinguishing mark of the event, the semantic feature to be integrated is enhanced to obtain the first enhanced semantic feature, and the semantic feature of the topic event is enhanced based on the distinguishing mark to obtain the second enhanced semantic feature; the first enhanced semantic feature corresponds to the topic to be integrated At least one second enhanced semantic feature forms a semantic feature sequence, and the semantic self-attention similarity is determined based on self-attention information between two sequence units in the semantic feature sequence.
需要说明的是,两个序列单元之间的自注意力信息是指待整合事件与任一个话题事件之间的自注意力。It should be noted that the self-attention information between two sequence units refers to the self-attention between the event to be integrated and any topic event.
还需要说明的是,语义相似度包括语义自注意力相似度和语义统计相似度中的两种时,S3021至S3024包括:事件整合设备选择语义自注意力相似度和问答相似度(第一 设定数量的相似度),如果语义自注意力相似度和问答相似度分别与对应的子相似度阈值比较后,确定待整合事件与话题事件不相似或相似,则结束;而如果语义自注意力相似度和问答相似度分别与对应的子相似度阈值比较后,无法确定待整合事件与话题事件不相似或相似,则继续选择语义统计相似度和字符串图相似度进行判别。这里,从准确率和获取速度,可以确定优先级的降序排序为问答相似度、语义自注意力相似度、字符串图相似度和语义统计相似度;以及,问答相似度、语义自注意力相似度、字符串图相似度和语义统计相似度四者依次是精准度到广度的过度。It should also be noted that when the semantic similarity includes two types of semantic self-attention similarity and semantic statistical similarity, S3021 to S3024 include: the event integration device selects semantic self-attention similarity and question-answer similarity (the first setting A certain amount of similarity), if the semantic self-attention similarity and question-answer similarity are compared with the corresponding sub-similarity thresholds, and it is determined that the event to be integrated is not similar or similar to the topic event, then end; and if the semantic self-attention After the similarity and question-answer similarity are compared with the corresponding sub-similarity thresholds, if the event to be integrated cannot be determined to be dissimilar or similar to the topic event, continue to select semantic statistical similarity and string graph similarity for discrimination. Here, from the accuracy rate and acquisition speed, the descending order of priority can be determined as question-answer similarity, semantic self-attention similarity, string graph similarity and semantic statistical similarity; and, question-answer similarity, semantic self-attention similarity Degree, string graph similarity and semantic statistical similarity are the transition from precision to breadth in turn.
参见图4d,图4d是本申请实施例提供的获取语义统计相似度的流程示意图;如图4d所示,在本申请实施例中,语义统计相似度可通过S30211至S30214获得,下面对各步骤分别进行说明。Referring to Fig. 4d, Fig. 4d is a schematic flow diagram of obtaining semantic statistical similarity provided by the embodiment of the present application; as shown in Fig. 4d, in the embodiment of the present application, the semantic statistical similarity can be obtained through S30211 to S30214, the following for each The steps are explained separately.
S30211、在每个待整合话题中,获取每个话题事件的标题与待整合事件的标题之间的第一子语义相似度,并基于第一子语义相似度,确定平均第一子语义相似度和最大第一子语义相似度。S30211. In each topic to be integrated, obtain the first sub-semantic similarity between the title of each topic event and the title of the event to be integrated, and determine the average first sub-semantic similarity based on the first sub-semantic similarity and the maximum first subsemantic similarity.
在本申请实施例中,事件整合设备从第一相似程度、第二相似程度、第三相似程度、以及第四相似程度中的一种或多种,确定语义统计相似度;其中,第一相似程度是指从待整合事件的标题与每个待整合话题中的每个话题事件的标题之间的相似程度,第二相似程度是指待整合事件的关键字符串与每个待整合话题中的每个话题事件的关键字符串之间的相似程度,第三相似程度是指待整合事件的关键字符串与每个待整合话题的关键字符串之间的相似程度,以及第四相似程度是指待整合事件与每个待整合话题之间的相似程度。In the embodiment of the present application, the event integration device determines the semantic statistical similarity from one or more of the first similarity degree, the second similarity degree, the third similarity degree, and the fourth similarity degree; wherein, the first similarity The degree refers to the degree of similarity between the title of the event to be integrated and the title of each topic event in each topic to be integrated, and the second degree of similarity refers to the key string of the event to be integrated and the title of each topic to be integrated The degree of similarity between the key strings of each topic event, the third degree of similarity refers to the degree of similarity between the key character strings of the event to be integrated and the key character strings of each topic to be integrated, and the fourth degree of similarity refers to The degree of similarity between the events to be aggregated and each topic to be aggregated.
这里,事件整合设备针对每个待整合话题中的每个话题事件,获取该话题事件的标题与待整合事件的标题之间的相似程度,也就得到了第一子语义相似度(又称为第一相似程度),从而针对每个待整合话题,能够获得与至少一个话题事件对应的至少一个第一子语义相似度;事件整合设备对至少一个第一子语义相似度进行平均值的计算,得到平均第一子语义相似度;事件整合设备从至少一个第一子语义相似度中选择最大的第一子语义相似度,也就得到了最大第一子语义相似度。Here, for each topic event in each topic to be integrated, the event integration device obtains the degree of similarity between the title of the topic event and the title of the event to be integrated, and obtains the first sub-semantic similarity (also called first degree of similarity), so that for each topic to be integrated, at least one first sub-semantic similarity corresponding to at least one topic event can be obtained; the event integration device calculates the average value of at least one first sub-semantic similarity, The average first sub-semantic similarity is obtained; the event integration device selects the largest first sub-semantic similarity from at least one first sub-semantic similarity, and thus obtains the largest first sub-semantic similarity.
S30212、在每个待整合话题中,获取每个话题事件对应的话题事件关键字符串与待整合事件对应的待整合事件关键字符串之间的第二子语义相似度,并基于第二子语义相似度,确定平均第二子语义相似度和最大子第二语义相似度。S30212. In each topic to be integrated, obtain the second sub-semantic similarity between the key string of the topic event corresponding to each topic event and the key string of the event to be integrated corresponding to the event to be integrated, and based on the second sub-semantic Similarity, to determine the average second sub-semantic similarity and the maximum sub-second semantic similarity.
在本申请实施例中,事件整合设备针对每个待整合话题中的每个话题事件,获取该话题事件的话题事件关键字符串与待整合事件的待整合事件关键字符串之间的第二相似程度,也就得到第二子语义相似度,从而针对每个待整合话题,能够获得与至少一个话题事件对应的至少一个第二子语义相似度;事件整合设备对至少一个第二子语义相似度进行平均值的计算,得到平均第二子语义相似度;事件整合设备从至少二个第一子语义相似度中选择最大的第二子语义相似度,也就得到了最大第二子语义相似度。In this embodiment of the application, for each topic event in each topic to be integrated, the event integration device obtains the second similarity between the topic event key string of the topic event and the event key string of the event to be integrated degree, that is, the second sub-semantic similarity is obtained, so that for each topic to be integrated, at least one second sub-semantic similarity corresponding to at least one topic event can be obtained; Calculate the average value to obtain the average second sub-semantic similarity; the event integration device selects the largest second sub-semantic similarity from at least two first sub-semantic similarities, and obtains the largest second sub-semantic similarity .
需要说明的是,话题事件关键字符串是话题事件的关键字符串;待整合事件关键字符串是待整合事件的关键字符串。It should be noted that the key character string of the topic event is the key character string of the topic event; the key character string of the event to be integrated is the key character string of the event to be integrated.
S30213、获取每个待整合话题对应的话题关键字符串与待整合事件关键字符串之间的第三子语义相似度。S30213. Obtain the third sub-semantic similarity between the topic key string corresponding to each topic to be integrated and the event key string to be integrated.
在本申请实施例中,事件整合设备获取话题关键字符串与待整合事件关键字符串之间的第三相似程度,也就得到了第三子语义相似度。In the embodiment of the present application, the event integration device obtains the third degree of similarity between the key character string of the topic and the key character string of the event to be integrated, and thus obtains the third sub-semantic similarity.
S30214、将平均第一子语义相似度、最大第一子语义相似度、平均第二子语义相似度、最大第二子语义相似度和第三子语义相似度,确定为语义统计相似度。S30214. Determine the average first sub-semantic similarity, maximum first sub-semantic similarity, average second sub-semantic similarity, maximum second sub-semantic similarity, and third sub-semantic similarity as semantic statistical similarity.
需要说明的是,事件整合设备可以将平均第一子语义相似度、最大第一子语义相似 度、平均第二子语义相似度、最大第二子语义相似度和第三子语义相似度中的至少一种,确定为待整合事件与每个待整合话题之间的语义统计相似度。It should be noted that the event integration device can combine the average first sub-semantic similarity, the maximum first sub-semantic similarity, the average second sub-semantic similarity, the maximum second sub-semantic similarity and the third sub-semantic similarity At least one is determined as the semantic statistical similarity between the event to be integrated and each topic to be integrated.
在本申请实施例中,S30211中的第一子语义相似度、S30212中的第二子语义相似度和S30213中的第三子语义相似度,均分别可以通过语义统计相似度模型获得的,语义统计相似度模型用于获取文本对在语义特征方面的相似程度;其中,参见图4e,图4e是本申请实施例提供的语义统计相似度模型的流程示意图;如图4e所示,在本申请实施例中,语义统计相似度模型是通过S305至S307训练得到的,下面对各步骤分别进行说明。In the embodiment of this application, the first sub-semantic similarity in S30211, the second sub-semantic similarity in S30212, and the third sub-semantic similarity in S30213 can all be obtained through the semantic statistical similarity model. The statistical similarity model is used to obtain the similarity of text pairs in terms of semantic features; wherein, referring to Fig. 4e, Fig. 4e is a schematic flow chart of the semantic statistical similarity model provided by the embodiment of the present application; as shown in Fig. 4e, in this application In the embodiment, the semantic statistical similarity model is obtained through training from S305 to S307, and each step will be described below.
S305、获取训练样本,其中,训练样本包括第一字符串样本、第二字符串样本和标注相似度。S305. Obtain a training sample, where the training sample includes a first character string sample, a second character string sample, and label similarity.
需要说明的是,训练样本是指用于训练出语义统计相似度模型的数据样本,第一字符串样本和第二字符串样本为待确定语义特征方面的相似程度的文本对,标注相似度为第一字符串样本和第二字符串样本在语义特征方面的实际相似程度。It should be noted that the training samples refer to the data samples used to train the semantic statistical similarity model, the first character string sample and the second character string sample are text pairs whose similarity in semantic features is to be determined, and the label similarity is The actual degree of similarity between the first string sample and the second string sample in terms of semantic features.
S306、采用待训练语义统计相似度模型中的第一语义分支,获取第一字符串样本对应的第一预估语义,采用待训练语义统计相似度模型中的第二语义分支获取第二字符串样本对应的第二预估语义,并基于第一预估语义和第二预估语义之间的对比结果,确定第一字符串样本和第二字符串样本之间的预估相似度。S306. Use the first semantic branch in the semantic statistical similarity model to be trained to obtain the first estimated semantics corresponding to the first character string sample, and use the second semantic branch in the semantic statistical similarity model to be trained to obtain the second character string The second predicted semantics corresponding to the sample, and based on the comparison result between the first predicted semantics and the second predicted semantics, determine the predicted similarity between the first character string sample and the second character string sample.
在本申请实施例中,事件整合设备对模型结构进行参数初始化,也就获得了待训练语义统计相似度模型,其中,待训练语义统计相似度模型包括第一语义分支和第二语义分支;接着,事件整合设备利用第一语义分支获取第一字符串样本对应的语义,也就获得了第一预估语义;事件整合设备并利用第二语义分支获取第二字符串样本对应的语义,也就获得了第二预估语义。最后,采用待训练语义统计相似度模型中的相似模型确定第一字符串样本和第二字符串样本之间的相似程度,也就得到了预估相似度;这里,相似度模型通过对比第一预估语义和第二预估语义,并将基于第一预估语义和第二预估语义之间的对比结果,确定第一字符串样本和第二字符串样本之间的预估相似度。In the embodiment of the present application, the event integration device initializes the parameters of the model structure, and thus obtains the semantic statistical similarity model to be trained, wherein the semantic statistical similarity model to be trained includes the first semantic branch and the second semantic branch; then , the event integration device uses the first semantic branch to obtain the semantics corresponding to the first string sample, and thus obtains the first estimated semantics; the event integration device uses the second semantic branch to obtain the semantics corresponding to the second string sample, that is A second estimated semantics is obtained. Finally, the similarity model in the semantic statistical similarity model to be trained is used to determine the similarity between the first string sample and the second string sample, and the estimated similarity is obtained; here, the similarity model compares the first The predicted semantics and the second predicted semantics, and based on the comparison result between the first predicted semantics and the second predicted semantics, determine the predicted similarity between the first character string sample and the second character string sample.
需要说明的是,待训练语义统计相似度模型为待进行训练的用于获取文本对的语义特征方面的相似程度的模型;并且待训练语义统计相似度模型采用双塔结构(第一语义分支和第二语义分支),双塔结构中的每种语义分支用于获取语义特征,以及双塔结构中的第一语义分支和第二语义分支中的参数是共享的。另外,待训练语义统计相似度模型还可以是预训练出的模型。It should be noted that the semantic statistical similarity model to be trained is a model to be trained for obtaining the similarity in the semantic features of text pairs; and the semantic statistical similarity model to be trained adopts a double-tower structure (the first semantic branch and second semantic branch), each semantic branch in the twin-tower structure is used to obtain semantic features, and the parameters in the first semantic branch and the second semantic branch in the twin-tower structure are shared. In addition, the semantic statistical similarity model to be trained may also be a pre-trained model.
可以理解的是,待训练语义统计相似度模型通过采用双塔结构获取语义特征,能够提升获取预估相似度的效率。It is understandable that the semantic statistical similarity model to be trained can improve the efficiency of obtaining estimated similarity by using a double-tower structure to obtain semantic features.
S307、基于预估相似度与标注相似度之间的差异,在待训练语义统计相似度模型中进行反向传播,得到语义统计相似度模型。S307. Based on the difference between the estimated similarity and the tagged similarity, perform backpropagation in the semantic statistical similarity model to be trained to obtain a semantic statistical similarity model.
在本申请实施例中,事件整合设备基于预估相似度与标注相似度之间的差异,调整待训练语义统计相似度模型中的参数,以对待训练语义统计相似度模型进行训练;这里,事件整合设备通过在待训练语义统计相似度模型中进行反向传播来实现参数的调整。其中,待训练语义统计相似度模型的训练过程为迭代训练过程,完成训练后的待训练语义统计相似度模型即为语义统计相似度模型。In the embodiment of the present application, the event integration device adjusts the parameters in the semantic statistical similarity model to be trained based on the difference between the estimated similarity and the labeled similarity, so as to train the semantic statistical similarity model to be trained; here, the event The integration device adjusts the parameters by performing backpropagation in the semantic statistical similarity model to be trained. Wherein, the training process of the semantic statistical similarity model to be trained is an iterative training process, and the semantic statistical similarity model to be trained after the training is the semantic statistical similarity model.
参见图4f,图4f是本申请实施例提供的获取字符串图相似度的流程示意图;如图4f所示,在本申请实施例中,字符串图相似度可通过S30221至S30223获得,下面对各步骤分别进行说明。Referring to Fig. 4f, Fig. 4f is a schematic flow chart for obtaining the similarity of character string graph provided by the embodiment of the present application; as shown in Fig. 4f, in the embodiment of the present application, the similarity of character string graph can be obtained through S30221 to S30223, as follows Each step is explained separately.
S30221、在每个待整合话题中,将至少一个话题事件对应的每个子话题事件关键字符串确定为图节点,并在对应于同一话题事件的两个图节点之间建边,将图节点和边, 确定为得到第一关键字符串图。S30221. In each topic to be integrated, determine each subtopic event key string corresponding to at least one topic event as a graph node, and build an edge between two graph nodes corresponding to the same topic event, and connect the graph node and Edge, determined to obtain the first key string graph.
需要说明的是,在每个待整合话题中的至少一个话题事件中,每个话题事件对应的话题事件关键字符串包括一个或多个子话题事件关键字符串;这里,事件整合设备将一个子话题事件关键字符串作为一个图节点,并遍历获得的所有图节点中的任意两个图节点,如果确定该两个图节点对应的两个子话题事件关键字符串属于同一话题事件,即为两个图节点对应于同一话题事件,则在该两个图节点之间建边,而如果确定该两个图节点对应的两个子话题事件关键字符串不属于同一话题事件,则该两个图节点之间无边,最终,在遍历结束时,获得的图结构即为第一关键字符串图。It should be noted that, in at least one topic event in each topic to be integrated, the topic event key string corresponding to each topic event includes one or more subtopic event key strings; here, the event integration device integrates a subtopic The event key string is used as a graph node, and any two graph nodes among all the obtained graph nodes are traversed. If it is determined that the two subtopic event key strings corresponding to the two graph nodes belong to the same topic event, it is two graph nodes. If a node corresponds to the same topic event, an edge is established between the two graph nodes, and if it is determined that the key strings of the two subtopic events corresponding to the two graph nodes do not belong to the same topic event, then an edge between the two graph nodes Boundless, finally, at the end of the traversal, the obtained graph structure is the first key string graph.
S30222、基于待整合事件对应的待整合事件关键字符串,构建第二关键字符串图。S30222. Based on the key string of the event to be integrated corresponding to the event to be integrated, construct a second key string map.
需要说明的是,事件整合设备基于第一关键字符串图的构建方式,构建与待整合事件对应的图结构:事件整合设备将待整合事件关键字符串中的每个子待整合事件关键字符串作为一个图节点,在任意两个图节点之间建边,也就得到了第二关键字符串图。It should be noted that the event integration device builds a graph structure corresponding to the event to be integrated based on the construction method of the first key string graph: the event integration device uses each sub-key string of the event to be integrated in the key string of the event to be integrated as A graph node, build an edge between any two graph nodes, and get the second key string graph.
S30223、将第一关键字符串图的向量表示和第二关键字符串图的向量表示进行对比,得到图对比结果,并基于图对比结果,确定字符串图相似度。S30223. Comparing the vector representation of the first key character string graph with the vector representation of the second key character string graph to obtain a graph comparison result, and determine the string graph similarity based on the graph comparison result.
在本申请实施例中,事件整合设备获取第一关键字符串图的向量表示,并获取第二关键字符串图的向量表示;接着,将第一关键字符串图的向量表示和第二关键字符串图的向量表示进行对比,并基于第一关键字符串图的向量表示和第二关键字符串图的向量表示之间的图对比结果,确定待整合事件与每个待整合话题之间的字符串图相似度。In this embodiment of the application, the event integration device obtains the vector representation of the first key string graph, and obtains the vector representation of the second key character string graph; then, the vector representation of the first key string graph and the second key character string Compare the vector representations of string graphs, and determine the character between the event to be integrated and each topic to be integrated based on the graph comparison result between the vector representation of the first key string graph and the vector representation of the second key string graph String graph similarity.
需要说明的是,第一关键字符串图的向量表示的获取过程包括:获取(比如,通过Bert模型获取)第一关键字符串图中图节点的向量表示和边的向量表示,基于向量表示和边的向量表示,获取(比如,通过图卷积模型获取)第一关键字符串图的向量表示。第一关键字符串图的向量表示的获取过程与第一关键字符串图的向量表示的获取过程类似,本申请实施例在此不再重复描述。It should be noted that the acquisition process of the vector representation of the first key character string graph includes: obtaining (for example, obtaining through the Bert model) the vector representation of the graph node and the vector representation of the edge in the first key character string graph, based on the vector representation and The vector representation of the edge obtains (for example, through a graph convolution model) the vector representation of the first key string graph. The process of obtaining the vector representation of the first key character string graph is similar to the process of obtaining the vector representation of the first key character string graph, and will not be described repeatedly in this embodiment of the present application.
参见图4g,图4g是本申请实施例提供的获取问答相似度的流程示意图;如图4g所示,在本申请实施例中,问答相似度可通过S30231至S30233获得,下面对各步骤分别进行说明。Referring to Fig. 4g, Fig. 4g is a schematic flow diagram of obtaining the similarity of question and answer provided by the embodiment of the present application; as shown in Fig. 4g, in the embodiment of the present application, the similarity of question and answer can be obtained through S30231 to S30233, and the following steps are respectively Be explained.
S30231、基于每个待整合话题的标题、话题关键字符串和待整合事件,组合待回答语句序列。S30231. Based on the title of each topic to be integrated, the topic key character string and the event to be integrated, combine the sentence sequence to be answered.
需要说明的是,事件整合设备为了通过问答交互确定待整合事件是否属于一个待整合话题,构建与每个待整合话题、以及待整合事件对应的问答语句,也就得到了待回答语句序列。其中,事件整合设备按照问答语句的预设句式,对每个待整合话题的标题、话题关键字符串和待整合事件进行组合,所获得的组合结果即构建的问答语句;比如为:“待整合事件”是否是关键字符串为“话题关键字符串”的“待整合话题的标题”的进展;再比如为:下句是否是关键字符串为“话题关键字符串”的“待整合话题的标题”的进展,“待整合事件”。It should be noted that, in order to determine whether the event to be integrated belongs to a topic to be integrated through question-and-answer interaction, the event integration device constructs a question-and-answer statement corresponding to each topic to be integrated and the event to be integrated, and thus obtains a sequence of sentences to be answered. Among them, the event integration device combines the title of each topic to be integrated, the topic key character string and the event to be integrated according to the preset sentence pattern of the question and answer statement, and the obtained combination result is the constructed question and answer statement; for example: "to be Whether the "integration event" is the progress of the "title of the topic to be integrated" whose key string is "topic key string"; another example is: whether the next sentence is the progress of the "topic to be integrated" whose key string is "topic key string" Title" progress, "events to integrate".
S30232、获取待回答语句序列的答案信息。S30232. Obtain answer information of the sentence sequence to be answered.
在本申请实施例中,事件整合设备基于机器阅读理解中的问题和文章,在待回答语句序列中确定对应的问题和文章,并对确定的文章和问题分别进行底层处理,将文本转化成为数字编码;接着,基于数字编码确定文章和问题的语义联系,并结合对文章的语义分析结果获取所确定的问题的特征,同时也结合对问题的语义分析结果获取所确定的文章的特征;最终,事件整合结果基于所确定的问题的表征信息和确定的文章的特征,以及答案的类型,得到输出的答案信息。In the embodiment of this application, based on the questions and articles in machine reading comprehension, the event integration device determines the corresponding questions and articles in the sentence sequence to be answered, and performs underlying processing on the determined articles and questions respectively, and converts the text into numbers Coding; then, determine the semantic connection between the article and the question based on the digital code, and combine the results of the semantic analysis of the article to obtain the characteristics of the determined question, and also combine the results of the semantic analysis of the question to obtain the characteristics of the determined article; finally, The event integration result is based on the determined representation information of the question, the determined characteristics of the article, and the type of the answer, and the output answer information is obtained.
需要说明的是,答案信息是指待整合事件是否属于每个待整合话题的信息,可以是“是”(待整合事件属于该待整合话题),也可以是“否”(待整合事件不属于该待整合 话题),还可以是待整合事件属于该待整合话题的可能性,等等,本申请实施例对此不作限定。It should be noted that the answer information refers to whether the event to be integrated belongs to each topic to be integrated, which can be "yes" (the event to be integrated belongs to the topic to be integrated), or "no" (the event to be integrated does not belong to the topic to be integrated). The topic to be integrated) may also be the possibility that the event to be integrated belongs to the topic to be integrated, etc., which is not limited in this embodiment of the present application.
S30233、基于答案信息,确定问答相似度。S30233. Based on the answer information, determine the question-answer similarity.
需要说明的是,事件整合设备基于答案信息,确定待整合事件属于每个待整合话题的可能性,并将所确定的可能性确定为待整合事件与每个待整合话题之间的问答相似度。It should be noted that, based on the answer information, the event integration device determines the possibility that the event to be integrated belongs to each topic to be integrated, and determines the determined possibility as the question-answer similarity between the event to be integrated and each topic to be integrated .
参见图4h,图4h是本申请实施例提供的获取至少两个待整合话题的流程示意图;如图4h所示,在本申请实施例中,S301中事件整合设备获取至少两个待整合话题,包括S3011至S3013,下面对各步骤分别进行说明。Referring to FIG. 4h, FIG. 4h is a schematic flow diagram of obtaining at least two topics to be integrated provided by the embodiment of the present application; as shown in FIG. 4h, in the embodiment of the present application, the event integration device obtains at least two topics to be integrated in S301, Including S3011 to S3013, each step will be described respectively below.
S3011、在话题库中,获取每个话题对应的话题关键字符串与待整合事件的匹配结果,其中,话题库包括多个话题。S3011. Obtain a matching result between a topic key character string corresponding to each topic and an event to be integrated in the topic database, wherein the topic database includes a plurality of topics.
在本申请实施例中,事件整合设备能够获取到预先设置的话题库,从而,在获得了待整合事件之后,事件整合设备从话题库中确定待整合事件所属的话题,进而将待整合事件整合至所属的话题中。这里,事件整合设备先是将话题库中的每个话题与待整合事件进行匹配,匹配时是将每个话题对应的话题关键字符串与待整合事件进行匹配。In the embodiment of the present application, the event integration device can obtain the preset topic library, so that after obtaining the event to be integrated, the event integration device determines the topic to which the event to be integrated belongs from the topic library, and then integrates the event to be integrated into in the topic it belongs to. Here, the event integration device first matches each topic in the topic library with the event to be integrated, and the matching is to match the topic key string corresponding to each topic with the event to be integrated.
需要说明的是,话题关键字符串是话题的关键字符串。话题库包括多个话题,每个话题为一个事情的主题;并且,话题库中的每个话题包括至少一个话题事件,不同话题所包括的话题事件可以相同,也可以不相同;以及,至少一个话题事件是指与话题关联的发生在不同时间段的事件,从而,至少一个话题事件之间具有时间顺序。It should be noted that the topic key character string is the key character string of the topic. The topic library includes a plurality of topics, each topic is a theme of an event; and, each topic in the topic library includes at least one topic event, and the topic events included in different topics can be the same or different; and, at least one Topic events refer to events associated with topics that occur in different time periods, so at least one topic event has a time sequence.
S3012、基于匹配结果,确定话题关键字符串中的至少一个子话题关键字符串与待整合事件匹配时,将匹配结果对应的话题,确定为与待整合事件匹配的待整合话题。S3012. Based on the matching result, when it is determined that at least one subtopic key string in the topic key string matches the event to be integrated, determine the topic corresponding to the matching result as the topic to be integrated that matches the event to be integrated.
S3013、从话题库中,获取与待整合事件匹配的至少两个待整合话题。S3013. Obtain at least two topics to be integrated that match the event to be integrated from the topic database.
需要说明的是,事件整合设备获得的每个话题对应的话题关键字符串与待整合事件的匹配结果,如果表示话题关键字符串中的至少一个子话题关键字符串与待整合事件匹配,则确定该匹配结果对应的话题为与待整合事件匹配的待整合话题;而每个话题对应的话题关键字符串与待整合事件的匹配结果,如果表示话题关键字符串与待整合事件不匹配,则确定该匹配结果对应的话题不是与待整合事件匹配的待整合话题。这里,当事件整合设备获得了多个话题分别与待整合事件对应的多个匹配结果的判断,则能够从话题库中获取与待整合事件匹配的至少一个待整合话题;而如果确定话题库中不存在与待整合事件匹配的待整合话题,将构建包括待整合事件的新话题,将新话题更新至话题库中;另外,获得了至少一个待整合话题之后,当至少一个待整合话题为一个待整合话题时,事件整合设备可以通过比较目标相似度和相似度阈值,以确定这一个待整合话题是否为目标话题,也可以直接将这一个待整合话题确定为目标话题。It should be noted that, if the matching result of the topic key string corresponding to each topic obtained by the event integration device and the event to be integrated indicates that at least one subtopic key string in the topic key string matches the event to be integrated, it is determined The topic corresponding to the matching result is the topic to be integrated that matches the event to be integrated; and the matching result of the topic key string corresponding to each topic and the event to be integrated, if it indicates that the topic key string does not match the event to be integrated, then determine The topic corresponding to the matching result is not the topic to be integrated that matches the event to be integrated. Here, when the event integration device obtains multiple matching results of multiple topics corresponding to the event to be integrated, it can obtain at least one topic to be integrated that matches the event to be integrated from the topic database; If there is no topic to be integrated that matches the event to be integrated, a new topic including the event to be integrated will be constructed, and the new topic will be updated to the topic library; in addition, after obtaining at least one topic to be integrated, when at least one topic to be integrated is a When the topic to be integrated, the event integration device can determine whether the topic to be integrated is the target topic by comparing the target similarity with the similarity threshold, or directly determine the topic to be integrated as the target topic.
可以理解的是,由于在将待整合事件整合至所属的目标话题的过程中,先基于话题的关键字符串与待整合事件进行匹配,召回与待整合事件可能相关的至少两个待整合话题,接着再基于待整合事件与每个待整合话题之间的相似度,准确地从至少两个待整合话题中确定出待整合事件所属的目标话题;因此,本申请实施例通过采用召回-相似度分类的方式,能够快速地实现待整合事件向目标话题的整合,减少整合过程的计算耗时与待整合话题的数量,从而,能够提升事件整合的效率。It can be understood that, in the process of integrating the event to be integrated into the target topic, at least two topics to be integrated that may be related to the event to be integrated are recalled based on the key character string of the topic and the event to be integrated, Then, based on the similarity between the event to be integrated and each topic to be integrated, accurately determine the target topic to which the event to be integrated belongs from at least two topics to be integrated; therefore, the embodiment of the present application adopts recall-similarity The classification method can quickly realize the integration of events to be integrated into target topics, reduce the time-consuming calculation of the integration process and the number of topics to be integrated, and thus improve the efficiency of event integration.
继续参见图4h,在本申请实施例中,S3011之前还包括S3014至S3016;也就是说,事件整合设备获取每个话题对应的话题关键字符串与待整合事件的匹配结果之前,该事件整合方法还包括S3014至S3016,下面对各步骤分别进行说明。Continuing to refer to Figure 4h, in the embodiment of the present application, S3011 also includes S3014 to S3016; that is, before the event integration device obtains the matching result of the topic key string corresponding to each topic and the event to be integrated, the event integration method S3014 to S3016 are also included, and each step will be described separately below.
S3014、在每个话题对应的至少一个话题事件中,获取每个话题事件对应的话题事件关键字符串。S3014. From at least one topic event corresponding to each topic, acquire a topic event key string corresponding to each topic event.
需要说明的是,话题对应的话题关键字符串是通过至少一个话题事件的关键字符串获得的;这里,事件整合设备先获取每个话题事件对应的话题事件关键字符串,由于每个话题包括至少一个话题事件,从而,至少一个话题事件对应至少一个话题事件关键字符串。It should be noted that the topic key string corresponding to the topic is obtained through the key string of at least one topic event; here, the event integration device first obtains the topic event key string corresponding to each topic event, because each topic includes at least A topic event, thus, at least one topic event corresponds to at least one topic event key string.
S3015、统计话题事件关键字符串中每个子话题事件关键字符串对应的话题事件数量。S3015. Count the number of topic events corresponding to each subtopic event key string in the topic event key string.
需要说明的是,事件整合设备在至少一个话题事件关键字符串的每个话题事件关键字符串中,统计每个话题事件关键字符串中的每个子话题事件关键字符串对应的话题事件的数量,得到与每个子话题事件关键字符串对应的话题事件数量,从而得到话题下的多个子话题事件关键字符串对应的多个话题事件数量。It should be noted that the event integration device counts the number of topic events corresponding to each subtopic event key string in each topic event key string in each topic event key string of at least one topic event key string, The number of topic events corresponding to each subtopic event key string is obtained, thereby obtaining the number of multiple topic events corresponding to multiple subtopic event key strings under the topic.
S3016、将第四设定数量个最大话题事件数量的子话题事件关键字符串,组合为话题关键字符串。S3016. Combine the fourth set number of sub-topic event key strings with the maximum number of topic events into a topic key string.
需要说明的是,事件整合设备针对话题下的多个子话题事件关键字符串对应的多个话题事件数量,选择第四设定数量个(比如,2个)最大话题事件数量的子话题事件关键字符串,并将第四设定数量个最大话题事件数量的子话题事件关键字符串确定为话题关键字符串。It should be noted that the event integration device selects a fourth set number (for example, 2) of subtopic event key characters corresponding to the number of subtopic event key strings under the topic with the maximum number of topic events string, and determine the fourth set number of sub-topic event key strings with the largest number of topic events as topic key strings.
参见图4i,图4i是本申请实施例提供的获取话题事件关键字符串的流程示意图一;如图4i所示,在本申请实施例中,S3014中事件整合设备获取每个话题事件对应的话题事件关键字符串,可通过S30141至S30143实现,下面对各步骤分别进行说明。Referring to Fig. 4i, Fig. 4i is a schematic flow diagram of obtaining a topic event key character string provided by the embodiment of the present application; as shown in Fig. 4i, in the embodiment of the present application, in S3014, the event integration device obtains the topic corresponding to each topic event The event key character string can be realized through S30141 to S30143, and each step will be described below.
S30141、对每个话题事件进行实体识别,得到与预设实体类型对应的实体关键字符串。S30141. Perform entity recognition on each topic event to obtain an entity key string corresponding to a preset entity type.
需要说明的是,事件整合设备从多种维度获取话题事件的关键字符串;其中,一个维度是话题事件的实体,事件整合设备预先能够获取到预设实体类型,比如,人名类型,地名类型;这里,事件整合设备对每个话题事件进行实体识别,并从识别出的实体中选择预设实体类型的实体,以及将选择出的预设实体类型的实体作为实体关键字符串。It should be noted that the event integration device obtains key strings of topic events from multiple dimensions; one of the dimensions is the entity of the topic event, and the event integration device can obtain preset entity types in advance, such as person name type and place name type; Here, the event integration device performs entity identification on each topic event, selects an entity of a preset entity type from the identified entities, and uses the selected entity of a preset entity type as an entity key string.
S30142、对每个话题事件进行字符串权重分析,得到动作关键字符串。S30142. Perform string weight analysis on each topic event to obtain an action key string.
需要说明的是,事件整合设备还可以从字符串权重的维度获取话题事件的关键字符串;这里,事件整合设备对话题事件中的字符串进行权重的分析,来获得大于权重阈值的字符串,并将获得的大于权重阈值的字符串中的表征动作的字符串确定为动作关键字符串。It should be noted that the event integration device can also obtain key strings of topic events from the dimension of string weight; here, the event integration device analyzes the weight of strings in topic events to obtain strings greater than the weight threshold, And the character string representing the action among the obtained character strings greater than the weight threshold is determined as the action key character string.
S30143、基于实体关键字符串和动作关键字符串中的一种或两种,确定话题事件关键字符串。S30143. Based on one or both of the entity key string and the action key string, determine the topic event key string.
需要说明的是,事件整合设备在基于实体关键字符串确定话题事件关键字符串时,可以将实体关键字符串的全部确定为话题事件关键字符串,也可以从实体关键字符串中抽取字符串以获得话题事件关键字符串;事件整合设备在基于动作关键字符串确定话题事件关键字符串时,可以将动作关键字符串的全部确定为话题事件关键字符串,也可以从动作关键字符串中抽取字符串以获得话题事件关键字符串;事件整合设备可以将实体关键字符串和动作关键字符串的任意组合方式下获得的字符串,确定为话题事件关键字符串。It should be noted that when the event integration device determines the topic event key string based on the entity key string, it may determine all the entity key strings as the topic event key string, or extract a string from the entity key string to Obtain the topic event key string; when the event integration device determines the topic event key string based on the action key string, it can determine all the action key strings as the topic event key string, or extract characters from the action key string string to obtain the key string of the topic event; the event integration device can determine the string obtained in any combination of the entity key string and the action key string as the topic event key string.
可以理解的是,由于通常一个话题事件包括人物、地点和动作中的至少一种,事件整合设备基于话题事件中与人物、地点和动作分别关联的字符串,确定话题事件的关键字符串,能够提升话题事件关键字符串的准确性。It can be understood that, since a topic event usually includes at least one of characters, locations and actions, the event integration device determines the key strings of the topic event based on the character strings associated with the characters, locations and actions in the topic event, and can Improve the accuracy of topic event key strings.
参见图4j,图4j是本申请实施例提供的获取话题事件关键字符串的流程示意图二;如图4j所示,在本申请实施例中,S30143可通过S301431至S301433实现;也就是说, 事件整合设备基于实体关键字符串和动作关键字符串中的一种或两种,确定话题事件关键字符串,包括S301431至S301433,下面对各步骤分别进行说明。Referring to Fig. 4j, Fig. 4j is the second schematic flow diagram for obtaining the key character string of the topic event provided by the embodiment of the present application; as shown in Fig. 4j, in the embodiment of the present application, S30143 can be realized through S301431 to S301433; that is, the event The integration device determines the topic event key string based on one or both of the entity key string and the action key string, including S301431 to S301433, and each step is described below.
S301431、获取实体关键字符串对应的实体关键字符串数量。S301431. Obtain the number of entity key character strings corresponding to the entity key character string.
需要说明的是,事件整合设备可以先基于实体关键字符串确定话题事件关键字符串;这里,当每个话题事件的关键字符串中的字符串数量有限制时,事件整合设备可以基于实体关键字符串包括的字符串的数量,从动作关键字符串中选择字符串确定为话题事件关键字符串,还可以基于实体关键字符串包括的字符串的数量,确定是否将动作关键字符串也作为话题事件关键字符串。It should be noted that the event integration device can first determine the topic event key string based on the entity key string; here, when the number of character strings in the key string of each topic event is limited, the event integration device can be based on the entity key character The number of character strings included in the string, the character string selected from the action key string is determined as the topic event key string, and it is also possible to determine whether to use the action key string as a topic event based on the number of character strings included in the entity key string key string.
S301432、当实体关键字符串数量小于第五设定数量时,将实体关键字符串和动作关键字符串,组合为话题事件关键字符串。S301432. When the number of entity key strings is less than the fifth set number, combine entity key strings and action key strings into topic event key strings.
需要说明的是,当每个话题事件的关键字符串中的字符串数量有限制,且为第五设定数量时,事件整合设备在实体关键字符串数量小于第五设定数量时,确定实体关键字符串不足够作为话题事件关键字符串,需要将动作关键字符串也确定为话题事件关键字符串;也就是说,此时,话题事件关键字符串包括实体关键字符串和动作关键字符串。It should be noted that when the number of character strings in the key character strings of each topic event is limited and is the fifth set number, the event integration device determines that the entity The key string is not enough to be the key string of the topic event, and the key string of the action needs to be determined as the key string of the topic event; that is, at this time, the key string of the topic event includes the key string of the entity and the key string of the action.
S301433、当实体关键字符串大于或等于第五设定数量时,将实体关键字符串确定为话题事件关键字符串。S301433. When the entity key string is greater than or equal to the fifth set number, determine the entity key string as the topic event key string.
需要说明的是,当每个话题事件的关键字符串中的字符串数量有限制,且为第五设定数量时,事件整合设备在实体关键字符串数量大于或等于第五设定数量时,确定实体关键字符串中的字符串足够作为话题事件关键字符串,此时,话题事件关键字符串包括实体关键字符串。It should be noted that, when the number of key strings in each topic event is limited and is the fifth set number, the event integration device, when the number of entity key strings is greater than or equal to the fifth set number, It is determined that the character strings in the entity key strings are sufficient as the topic event key strings, and at this time, the topic event key strings include the entity key strings.
参见图5,图5是本申请实施例提供的事件整合方法的可选的流程示意图五;如图5所示,在本申请实施例中,S304之后还包括S308至S311;也就是说,事件整合设备将待整合事件整合至目标话题中,得到包括待整合事件和至少一个话题事件的事件脉络之后,该事件整合方法还包括S308至S311,下面对各步骤分别进行说明。Referring to FIG. 5, FIG. 5 is an optional schematic flowchart five of the event integration method provided by the embodiment of the present application; as shown in FIG. 5, in the embodiment of the present application, S304 also includes S308 to S311; that is, the event After the integration device integrates the event to be integrated into the target topic, and obtains the event context including the event to be integrated and at least one topic event, the event integration method further includes S308 to S311, and each step will be described below.
S308、呈现搜索控件。S308. Presenting a search control.
需要说明的是,搜索控件用于进行信息搜索,从而,搜索控件可以用于话题事件的搜索。It should be noted that the search control is used for searching information, thus, the search control can be used for searching topic events.
S309、响应于作用在搜索控件上的第一搜索操作,呈现事件脉络对应的简化事件脉络、以及简化事件脉络对应的呈现控件。S309. In response to the first search operation acting on the search control, present a simplified event context corresponding to the event context and a presentation control corresponding to the simplified event context.
在本申请实施例中,当用户触发搜索控件进行信息搜索时,如果搜索的信息是与整合后的目标话题关联的信息,则事件整合设备接受到了作用在搜索控件上的第一搜索操作;从而此时,事件整合设备响应于该第一搜索操作,进行搜索结果的呈现。这里,呈现的搜索结果中可以包括事件脉络对应的简化事件脉络、以及简化事件脉络对应的呈现控件。In this embodiment of the application, when the user triggers the search control to search for information, if the searched information is information associated with the integrated target topic, the event integration device receives the first search operation on the search control; thus At this point, the event integration device presents search results in response to the first search operation. Here, the presented search results may include a simplified event context corresponding to the event context and a presentation control corresponding to the simplified event context.
需要说明的是,简化事件脉络属于事件脉络,呈现内容为事件脉络中的部分事件;呈现控件用于呈现整个事件脉络,比如,“查看更多”按钮,展开图标等。It should be noted that the simplified event context belongs to the event context, and the presentation content is part of the events in the event context; the presentation control is used to present the entire event context, for example, the "View More" button, the expansion icon, etc.
S310、响应于作用在呈现控件上的呈现操作,呈现事件脉络,其中,呈现的事件脉络中的每个事件包括事件标题和事件时间,事件为待整合事件和至少一个话题事件中的任意一个。S310. Present an event context in response to a presentation operation acting on the presentation control, wherein each event in the presented event context includes an event title and an event time, and the event is any one of an event to be integrated and at least one topic event.
需要说明的是,当用户触发呈现控件进行整个事件脉络的查看时,事件整合设备也就接受到了作用在呈现控件上的呈现操作;此时,事件整合设备响应于该呈现操作,呈现整个事件脉络;并且,事件整合设备通过呈现事件脉络中的每个事件的事件标题和事件时间来实现事件脉络的呈现,其中,事件为待整合事件和至少一个话题事件中的任意一个。It should be noted that when the user triggers the presentation control to view the entire event context, the event integration device also receives the presentation operation on the presentation control; at this time, the event integration device presents the entire event context in response to the presentation operation and, the event integration device realizes the presentation of the event context by presenting the event title and event time of each event in the event context, wherein the event is any one of the event to be integrated and at least one topic event.
在本申请实施例中,呈现的搜索结果中可以包括搜索推荐结果,搜索推荐结果是指针对整合后的目标话题的推荐信息,比如,“您是否在搜索“整合后的目标话题的标题”?”;这里,当用户针对搜索推荐结果进行触发操作时,事件整合设备可以呈现事件脉络对应的简化事件脉络、以及简化事件脉络对应的呈现控件,并响应于作用在呈现控件上的呈现操作,呈现事件脉络;还可以直接呈现事件脉络;本申请实施例对此不作限定。In this embodiment of the application, the presented search results may include search recommendation results, where the search recommendation results refer to recommended information for the integrated target topic, for example, "Are you searching for the title of the integrated target topic?" "; here, when the user performs a trigger operation on the search recommendation result, the event integration device can present the simplified event context corresponding to the event context and the presentation control corresponding to the simplified event context, and respond to the presentation operation acting on the presentation control, presenting The event context; the event context may also be presented directly; this is not limited in this embodiment of the present application.
示例性地,参见图6,图6是本申请实施例提供的一种示例性的事件脉络的呈现示意图;如图6所示,页面6-1是搜索结果的呈现页面,呈现有事件脉络对应的简化事件脉络6-11,还呈现有呈现控件6-12;当点击(呈现操作)呈现控件6-12时,呈现如区域6-2所示的整个事件脉络6-21;这里,呈现的事件脉络中的每个事件,通过呈现事件标题(比如,事件标题6-211)和事件时间(比如,事件时间6-212)实现,通过点击事件标题6-211呈现对应事件的详细信息。For example, refer to FIG. 6, which is a schematic diagram of an exemplary event context presentation provided by the embodiment of the present application; as shown in FIG. Simplified event context 6-11, also presents a presentation control 6-12; when clicking (presentation operation) presentation control 6-12, presents the entire event context 6-21 as shown in area 6-2; here, the presented Each event in the event context is realized by presenting the event title (for example, event title 6-211) and event time (for example, event time 6-212), and the detailed information of the corresponding event is displayed by clicking on the event title 6-211.
示例性地,参见图7,图7是本申请实施例提供的另一种示例性的事件脉络的呈现示意图;如图7所示,页面7-1是搜索结果的呈现页面,呈现有其他结果的同时还呈现有搜索推荐结果7-11,当点击搜索推荐结果7-11时,则呈现图6中区域6-2所示的事件脉络6-21。For example, refer to FIG. 7 , which is a schematic diagram showing another exemplary event context provided by the embodiment of the present application; as shown in FIG. 7 , page 7-1 is a page for presenting search results, and other results are presented At the same time, the search recommendation result 7-11 is presented. When the search recommendation result 7-11 is clicked, the event context 6-21 shown in the area 6-2 in FIG. 6 is presented.
S311、响应于作用在事件标题或事件时间上的查看操作,呈现事件详情信息。S311. Present event detailed information in response to a view operation acting on the event title or event time.
需要说明的是,事件标题或事件时间是可触发的控件,或者每个事件对应存在用于查看详情的控件,当用户触发事件标题时,事件整合设备也就接受到了作用在事件标题上的查看操作;当用户触发事件时间时,事件整合设备也就接受到了作用在事件时间上的查看操作;当用户触发查看详情的控件时,事件整合设备也就接受到了作用在用于查看详情的控件上的查看操作;此时,事件整合设备响应于查看操作,呈现事件详情信息,其中,事件详情信息是指事件脉络中的事件的详细描述信息,即为事件内容。It should be noted that the event title or event time is a control that can be triggered, or each event corresponds to a control for viewing details. When the user triggers the event title, the event integration device also receives the view function on the event title. Operation; when the user triggers the event time, the event integration device also receives the view operation on the event time; when the user triggers the control for viewing details, the event integration device also receives the action on the control for viewing the details View operation; at this time, the event integration device presents event detailed information in response to the view operation, wherein the event detailed information refers to the detailed description information of the event in the event context, that is, the event content.
参见图8,图8是本申请实施例提供的事件整合方法的可选的流程示意图六;如图8所示,在本申请实施例中,S304之后还包括S312至S314;也就是说,事件整合设备将待整合事件整合至目标话题中,得到包括待整合事件和至少一个话题事件的事件脉络之后,该事件整合方法还包括S312至S314,下面对各步骤分别进行说明。Referring to Fig. 8, Fig. 8 is an optional schematic flow diagram six of the event integration method provided by the embodiment of the present application; as shown in Fig. 8, in the embodiment of the present application, S312 to S314 are also included after S304; that is, the event After the integration device integrates the event to be integrated into the target topic, and obtains the event context including the event to be integrated and at least one topic event, the event integration method further includes S312 to S314, and each step will be described below.
S312、呈现目标事件的最后待呈现信息。S312. Present the last information to be presented of the target event.
需要说明的是,目标事件是事件脉络所包括的待整合事件和至少一个话题事件中的任一事件;最后待呈现信息是指目标事件的最后呈现进度的信息,比如,目标事件的最后一页,目标事件的底部区域。It should be noted that the target event is any one of the events to be integrated and at least one topic event included in the event context; the last information to be presented refers to the information of the final presentation progress of the target event, for example, the last page of the target event , the bottom area of the target event.
S313、在最后待呈现信息对应的推荐区域,呈现与目标事件关联的事件脉络中的剩余事件。S313. In the recommendation area corresponding to the last information to be presented, present remaining events in the event context associated with the target event.
需要说明的是,呈现最后待呈现信息的页面上还呈现有推荐区域,该推荐区域用于呈现推荐信息;这里,事件整合设备在推荐区域所呈现的推荐信息为剩余事件,其中,剩余事件是事件脉络中除目标事件之外的任一事件,也可以是事件脉络中除目标事件之外的最新进展的事件。其中,剩余事件可通过搜索框中的搜索内容的形式显示,还可以通过链接的形式显示,等等,本申请实施例对此不作限定。It should be noted that there is also a recommendation area on the page presenting the final information to be presented, and the recommendation area is used to present the recommendation information; here, the recommendation information presented by the event integration device in the recommendation area is the remaining events, wherein the remaining events are Any event in the event context except the target event may also be the latest event in the event context except the target event. Wherein, the remaining events may be displayed in the form of search content in the search box, or in the form of links, etc., which is not limited in this embodiment of the present application.
S314、响应于针对剩余事件上的第二搜索操作,呈现剩余事件的详细信息。S314. In response to the second search operation on the remaining events, present detailed information of the remaining events.
需要说明的是,当用户触发了针对剩余事件的查看操作时,事件整合设备也就接受到了针对剩余事件上的第二搜索操作;此时,事件整合设备响应于该第二搜索操作,呈现剩余事件的详细信息,以完成对第二搜索操作的响应。It should be noted that when the user triggers the viewing operation on the remaining events, the event integration device also receives the second search operation on the remaining events; at this time, the event integration device responds to the second search operation and presents the remaining Details of the event to complete in response to the second search operation.
在本申请实施例中,当事件整合设备实施为服务器时,S308至S314可以是通过服务器实现的;也可以是由服务器向终端发送事件脉络,由终端实现的;本申请实施例对 此不作限定。In the embodiment of the present application, when the event integration device is implemented as a server, S308 to S314 can be implemented by the server; or the server can send the event context to the terminal, and be implemented by the terminal; the embodiment of the present application does not limit this .
可以理解的是,通过事件脉络能够提供搜索词外的增益信息,满足搜索需求的前提下主动挖掘相关阅读需求,提升搜索结果页中信息呈现的完整性,减少搜索场景中未获取目标信息的搜索次数,从而降低搜索过程的资源消耗,还能够提升搜索的信息的投放准确度,提高用户搜索频次。It is understandable that the event context can provide gain information other than search words, actively explore relevant reading needs on the premise of meeting search needs, improve the integrity of information presentation in search results pages, and reduce searches that do not obtain target information in search scenarios The number of times, thereby reducing the resource consumption of the search process, can also improve the accuracy of the searched information delivery, and increase the frequency of user searches.
下面,将说明本申请实施例在一个实际的应用场景中的示例性应用。该示例性应用描述了将最新进展事件挂载在所属话题下,以获得的事件脉络,并响应于用户的搜索操作,呈现事件脉络。Next, an exemplary application of the embodiment of the present application in an actual application scenario will be described. This exemplary application describes the event context obtained by mounting the latest progress event under the topic, and presents the event context in response to the user's search operation.
需要说明的是,对于延续时间较长的新闻话题(称为话题,往往由多个已发生的事件(称为至少一个话题事件)组成),在获得了新闻话题的最新进展事件(待整合事件)时,把最新进展事件挂载到所属的新闻话题(称为目标话题)下,形成包含最新进展事件的事件脉络;通过呈现事件脉络,能够直观地呈现事件的发展过程。当采用本申请实施例提供的事件整合方法将最新进展事件挂载到所属的新闻话题时,可通过召回和分类两个阶段实现,包括如下步骤。It should be noted that for a news topic with a long duration (called a topic, which is often composed of multiple events that have occurred (called at least one topic event), after obtaining the latest progress event of a news topic (the event to be integrated ), mount the latest progress event under the news topic (called the target topic) to form an event context containing the latest progress event; by presenting the event context, the development process of the event can be presented intuitively. When the event integration method provided by the embodiment of the present application is used to attach the latest progress event to the news topic, it can be realized through two stages of recall and classification, including the following steps.
首先,根据最新进展事件的事件内容,从新闻话题数据库(称为话题库)中召回可能相关的新闻话题(称为至少两个待整合话题)。First, according to the event content of the latest progress event, possibly related news topics (called at least two topics to be integrated) are recalled from the news topic database (called topic library).
需要说明的是,新闻话题数据库中的每个新闻话题对应存在话题关键词(称为话题关键字符串),服务器在最新进展事件与话题关键词中的任一关键词(称为子话题关键字符串)匹配时,确定该新闻话题为可能相关的新闻话题中的一个新闻话题。It should be noted that each news topic in the news topic database corresponds to a topic keyword (called a topic key character string), and any keyword in the latest progress event and topic keywords (called a subtopic key character string) by the server string) match, determine that the news topic is a news topic in possibly related news topics.
示例性地,参见图9,图9是本申请实施例提供的一种示例性的新闻话题召回示意图;如图9所示,“第一部门回应张二撤销对第一对象的禁令”为最新进展事件9-1的标题。在新闻话题数据库9-2中,新闻话题9-21包括3个事件,对应的话题关键词9-211为“护士”和“副院长”;新闻话题9-22包括4个事件,对应的话题关键词9-221为“H地”和“跳车”;新闻话题9-23包括4个事件,对应的话题关键词9-231为“李三”和“第一对象”。当将新闻话题数据库9-2中的每个新闻话题的话题关键词在最新进展事件9-1中进行匹配时,由于新闻话题9-23对应的话题关键词9-231中“第一对象”,与最新进展事件9-1的标题中的“第一对象”匹配,从而,新闻话题9-23为召回的可能相关的新闻话题中的一个新闻话题。For example, see Figure 9, which is a schematic diagram of an exemplary news topic recall provided by the embodiment of this application; Title of progression event 9-1. In the news topic database 9-2, the news topic 9-21 includes 3 events, and the corresponding topic keywords 9-211 are "nurse" and "vice dean"; the news topic 9-22 includes 4 events, and the corresponding The topic keywords 9-221 are "H place" and "jumping the car"; the news topic 9-23 includes 4 events, and the corresponding topic keywords 9-231 are "Li San" and "the first object". When the topic keyword of each news topic in the news topic database 9-2 is matched in the latest development event 9-1, because the "first object" in the topic keyword 9-231 corresponding to the news topic 9-23 , matches the "first object" in the title of the latest development event 9-1, thus, the news topic 9-23 is a news topic among the recalled possibly related news topics.
还需要说明的是,在新闻话题下的所有话题事件的关键词中,话题事件数量最多的两个(称为第一数量阈值)关键词即为话题关键词。而每个话题事件的话题关键词可通过实体识别和词权分析(称为字符串权重分析)获得,这里,服务器可以采用实体识别模型(Char-Word Union CNN,CWCNN)实现实体识别,并将识别出的实体中的人名类型和地名类型的实体作为第一关键词(称为实体关键字符串);服务器可以采用分类模型(比如,“XGboost”模型)实现词权分析,并将权重高于权重阈值的动词作为第二关键词(称为动作关键字符串);如果第一关键词的词数量大于3(称为第五设定数量)个,则不再考虑第二关键词,仅将第一关键词作为话题事件的关键词;如果第一关键词的词数量不足3个,则将第一关键词和第二关键词共同作为话题事件的话题关键词。It should also be noted that among the keywords of all topic events under the news topic, the two keywords with the largest number of topic events (called the first number threshold) are the topic keywords. The topic keywords of each topic event can be obtained through entity recognition and word weight analysis (called string weight analysis). Here, the server can use the entity recognition model (Char-Word Union CNN, CWCNN) to realize entity recognition, and The entity of the person name type and the place name type in the identified entity is used as the first keyword (called the entity key string); the server can use the classification model (for example, "XGboost" model) to realize the word weight analysis, and put the weight higher than The verb of weight threshold is used as the second keyword (called action key character string); If the number of words of the first keyword is greater than 3 (called the 5th setting quantity), then no longer consider the second keyword, only will The first keyword is used as the keyword of the topic event; if the number of words of the first keyword is less than 3, the first keyword and the second keyword are jointly used as the topic keyword of the topic event.
然后,将可能相关的新闻话题中的每个新闻话题与最新进展事件进行相似度获取,以基于相似度判断每个新闻话题与最新进展事件是否相关。Then, the similarity between each news topic among the possibly related news topics and the latest development event is obtained, so as to determine whether each news topic is related to the latest development event based on the similarity.
参见图10,图10是本申请实施例提供的一种示例性的确定新闻话题与最新进展事件是否相关的示意图;如图10所示,服务器从三个方面获取每个可能相关的新闻话题与最新进展事件的相似度,分别是向量语义相似度10-1(称为语义相似度)、关键词图的相似度10-2(称为字符串图相似度)和问答语义的相似度10-3(称为问答相似度);最后,利用融合模型10-4(比如,“XGboost”模型,“GBDT”模型)综合向量语义相 似度10-1、关键词图的相似度10-2和问答语义的相似度10-3确定决断分数10-5,以根据决断分数确定每个新闻话题与最新进展事件是否相关,最后获得相关的新闻话题10-6(称为目标话题),即最新进展事件所属的新闻话题。Referring to FIG. 10, FIG. 10 is an exemplary schematic diagram of determining whether a news topic is related to the latest progress event provided by an embodiment of the present application; as shown in FIG. 10, the server obtains each possibly related news topic from three aspects and The similarities of the latest progress events are vector semantic similarity 10-1 (called semantic similarity), keyword graph similarity 10-2 (string graph similarity) and question-answer semantic similarity 10- 3 (referred to as question-answer similarity); finally, use the fusion model 10-4 (for example, "XGboost" model, "GBDT" model) to synthesize vector semantic similarity 10-1, keyword map similarity 10-2 and question-answer The semantic similarity 10-3 determines the decision score 10-5, so as to determine whether each news topic is related to the latest progress event according to the decision score, and finally obtain the relevant news topic 10-6 (called the target topic), that is, the latest progress event The news topic to which it belongs.
下面说明每个维度的相似度的计算过程。The calculation process of the similarity of each dimension is described below.
向量语义相似度10-1包括向量语义统计相似度(称为语义统计相似度)和向量语义自注意力相似度(称为语义自注意力相似度);下面先对向量语义统计相似度的获取获取进行说明。Vector semantic similarity 10-1 includes vector semantic statistical similarity (called semantic statistical similarity) and vector semantic self-attention similarity (called semantic self-attention similarity); the acquisition of vector semantic statistical similarity is first Get explained.
服务器计算新闻话题中每个话题事件的标题和最新进展事件的标题之间的相似程度,得到标题向量语义相似度(称为第一子语义相似度),并获取平均标题语义相似度(称为平均第一子语义相似度)和最大标题语义相似度(称为最大第一子语义相似度);计算新闻话题中每个话题事件的关键词(称为话题事件关键字符串)和最新进展事件的关键词(称为待整合事件关键字符串)之间的相似程度,得到事件关键词向量语义相似度(称为第二子语义相似度),并获取平均事件关键词语义相似度(称为平均第二子语义相似度)和最大事件关键词语义相似度(称为最大子第二语义相似度);计算新闻话题的话题关键词和最新进展事件的关键词之间的相似程度,得到话题关键词向量语义相似度(称为第三子语义相似度);这里,标题向量语义相似度、平均标题语义相似度、最大标题语义相似度、事件关键词向量语义相似度、平均事件关键词语义相似度、最大事件关键词语义相似度和话题关键词向量语义相似度,统称为向量语义统计相似度。The server calculates the similarity between the title of each topic event in the news topic and the title of the latest development event, obtains the semantic similarity of the title vector (called the first sub-semantic similarity), and obtains the average title semantic similarity (called the first sub-semantic similarity) The average first sub-semantic similarity) and the maximum title semantic similarity (called the largest first sub-semantic similarity); calculate the keywords of each topic event in the news topic (called the topic event key string) and the latest progress event The degree of similarity between keywords (called event key strings to be integrated) is used to obtain the semantic similarity of event keyword vectors (called the second sub-semantic similarity), and the average event keyword semantic similarity (called The average second sub-semantic similarity) and the maximum event keyword semantic similarity (called the largest sub-second semantic similarity); calculate the similarity between the topic keyword of the news topic and the keyword of the latest progress event, and get the topic Keyword vector semantic similarity (referred to as the third sub-semantic similarity); here, title vector semantic similarity, average title semantic similarity, maximum title semantic similarity, event keyword vector semantic similarity, average event keyword semantic Similarity, maximum event keyword semantic similarity and topic keyword vector semantic similarity are collectively referred to as vector semantic statistical similarity.
需要说明的是,计算新闻话题中每个话题事件的标题和最新进展事件的标题之间的相似程度,计算新闻话题中每个话题事件的关键词和最新进展事件的关键词之间的相似程度,以及计算新闻话题的话题关键词和最新进展事件的关键词之间的相似程度,均分别可以通过网络模型(语义统计相似度模型)实现。It should be noted that the similarity between the title of each topic event in the news topic and the title of the latest progress event is calculated, and the similarity between the keywords of each topic event in the news topic and the keyword of the latest progress event is calculated , and calculating the similarity between the topic keywords of the news topic and the keywords of the latest development events can be realized through the network model (semantic statistical similarity model).
参见图11a,图11a是本申请实施例提供的一种示例性的获取向量语义相似度的模型的示意图;如图11a所示,网络模型11-1用于获取两个文本对之间在向量语义方面的相似程度,并且网络模型11-1是双塔结构。这里,以获取话题关键词向量语义相似度的过程对网络模型11-1的处理过程进行说明:将新闻话题对应的话题关键词11-2输入至网络模型11-1中的第一网络分支11-11(称为第一语义分支)中,得到话题关键词11-2对应的语义向量11-3,将最新进展事件对应的关键词11-4输入至网络模型11-1中的第二网络分支11-12(称为第二语义分支)中,得到关键词11-4对应的语义向量11-5;接着通过余弦相似度获取语义向量11-3和语义向量11-5之间的相似程度,也就得到了话题关键词向量语义相似度11-6。另外,第一网络分支和第二网络分支可以是相同的网络分支,比如均为“Bert”模型;并且,语义向量11-3和语义向量11-5均是每个网络分支输出的第一个维度的开始字符(称为“CLS”)向量,对应的维度比如可以为768维;以及,在网络模型11-1对应的训练过程中,可以通过标注的1万对样本对和交叉损失函数进行训练。Referring to Figure 11a, Figure 11a is a schematic diagram of an exemplary model for obtaining vector semantic similarity provided by the embodiment of the present application; as shown in Figure 11a, the network model 11-1 is used to obtain the vector The degree of similarity in semantics, and the network model 11-1 is a two-tower structure. Here, the processing process of the network model 11-1 is described with the process of obtaining the semantic similarity of topic keyword vectors: input the topic keyword 11-2 corresponding to the news topic into the first network branch 11 in the network model 11-1 -11 (referred to as the first semantic branch), the semantic vector 11-3 corresponding to the topic keyword 11-2 is obtained, and the keyword 11-4 corresponding to the latest progress event is input to the second network in the network model 11-1 In the branch 11-12 (called the second semantic branch), the semantic vector 11-5 corresponding to the keyword 11-4 is obtained; then the degree of similarity between the semantic vector 11-3 and the semantic vector 11-5 is obtained by cosine similarity , and the topic keyword vector semantic similarity 11-6 is obtained. In addition, the first network branch and the second network branch can be the same network branch, for example, both are "Bert" models; and the semantic vector 11-3 and the semantic vector 11-5 are the first output of each network branch The starting character (called "CLS") vector of the dimension, the corresponding dimension can be 768 dimensions, for example; train.
在获取向量语义自注意力相似度时,参见图11b,图11b是本申请实施例提供的另一种示例性的获取向量语义相似度的模型的示意图;如图11b所示,网络模型11-7中的编码模块11-71(比如,“Bert”模型)用于获取最新进展事件和一个新闻话题中的至少一个话题事件11-72构成的事件序列中的每个事件的语义向量,得到的与事件序列依次对应的向量特征序列11-73;这里,服务器将最新进展事件的分割标识确定为0(称为区别标识),将话题事件的分割标识确定为1(称为区别标识)。服务器获取0对应的语义向量和1对应的语义向量,再将0对应的向量语义与最新进展事件的语义向量(待整合语义特征)合并,将1对应的向量语义与话题事件的语义向量(话题事件语义特征)合并,最后将合并后的所有结果输入至转换(TRANSFORMER)模型11-74,也就能够 获得向量语义自注意力相似度。When obtaining vector semantic self-attention similarity, refer to Figure 11b, Figure 11b is a schematic diagram of another exemplary model for obtaining vector semantic similarity provided by the embodiment of the present application; as shown in Figure 11b, the network model 11- The encoding module 11-71 (for example, "Bert" model) in 7 is used to obtain the semantic vector of each event in the event sequence composed of the latest progress event and at least one topic event 11-72 in a news topic, and the obtained The vector feature sequence 11-73 corresponding to the event sequence in turn; here, the server determines the segmentation mark of the latest progress event as 0 (called a distinguishing mark), and determines the splitting mark of a topic event as 1 (called a distinguishing mark). The server obtains the semantic vector corresponding to 0 and the semantic vector corresponding to 1, and then merges the semantic vector corresponding to 0 with the semantic vector of the latest development event (semantic feature to be integrated), and combines the semantic vector corresponding to 1 with the semantic vector of the topic event (topic Event semantic features) are merged, and finally all the merged results are input into the transformation (TRANSFORMER) model 11-74, and the vector semantic self-attention similarity can be obtained.
需要说明的是,转换模型11-74是自然语言处理中的一种网络模型,并且,转换模型11-74可以是至少一个(比如,3个)转换模型堆叠而成的。以及,转换模型11-74通过计算事件序列中两两事件(称为两个序列单元)之间的自注意力,以自适应的确定最新进展事件和话题事件的关系,进而自动判断最新进展事件进行匹配时待关注的话题事件。It should be noted that the conversion model 11-74 is a network model in natural language processing, and the conversion model 11-74 may be formed by stacking at least one (for example, three) conversion models. And, the conversion model 11-74 calculates the self-attention between two events (called two sequence units) in the event sequence to adaptively determine the relationship between the latest progress event and the topic event, and then automatically judge the latest progress event Topic events to watch for when matching.
在获取关键词图的相似度10-2时,服务器构建每个新闻话题对应的关键词图和最新进展事件的关键词图,进而通过图卷积网络获取两个关键词图的表示,并通过计算两个关键词图的表示之间的余弦距离,得到关键词图的相似度。When obtaining the similarity 10-2 of the keyword graph, the server constructs the keyword graph corresponding to each news topic and the keyword graph of the latest progress event, and then obtains the representation of the two keyword graphs through the graph convolutional network, and passes Calculate the cosine distance between the representations of two keyword graphs to obtain the similarity of the keyword graphs.
其中,每个新闻话题对应的关键词图的构建方式为:将每个新闻话题下的每个话题事件对应的每个关键词作为一个图节点,如果两个图节点对应的两个关键词属于同一个话题事件,则在两个图节点之间建边,最终得到与每个新闻话题对应的关键词图(称为第一关键字符串图)。Among them, the construction method of the keyword graph corresponding to each news topic is as follows: each keyword corresponding to each topic event under each news topic is used as a graph node, if two keywords corresponding to two graph nodes belong to For the same topic event, build an edge between two graph nodes, and finally obtain a keyword graph (called the first key string graph) corresponding to each news topic.
示例性地,参见表1,包括话题事件的标题列和话题事件的关键词列,表1如下所示。For example, refer to Table 1, which includes the title column of the topic event and the keyword column of the topic event, and Table 1 is as follows.
表1Table 1
标题title 关键词Key words
李三签署管理令:8月23日封锁第一对象Li San signed the management order: the first object will be blocked on August 23 第一对象|李三|管理|封锁First Object|Li San|Manage|Block
第一组织回应李三The first organization responds to Li San 李三|第一组织Lee San | First Organization
第一部门回应李三签署涉第一对象管理令The first department responded that Li San signed the management order involving the first object 第一对象|李三|第一部门First Object | Lee San | First Sector
第一组织宣布8月24日起诉李三The First Organization announced that it will sue Li San on August 24 李三|第一组织|起诉Lee San | First Organization | Prosecution
第二部门计划8月25日起禁止调用第一对象The second department plans to ban calling the first object from August 25 第一对象|计划|第二部门First Object | Program | Second Sector
第三部门阻止第一对象禁令Third Sector Blocks First Object Ban 第一对象|第三部门|禁令First Object | Third Sector | Prohibition
第二对象下的第二组织要求第三组织暂停第一对象禁令Second organization under second object asks third organization to suspend first object ban 第一对象|第三组织|第二对象First Object | Third Organization | Second Object
张二撤销对第一对象和第三对象的禁令Zhang Er revoked the ban on the first object and the third object 张二|第一对象|第三对象Zhang Er|First object|Third object
基于表1构建的关键词图参见图12,图12是本申请实施例提供的一种示例性的关键词图的示意图;如图12所示,关键词图12-1中的图节点是基于表1中的各话题事件对应的关键词确定的,包括:第一对象、李三、管理、封锁、第一组织、第一部门、起诉、计划、第二部门、第三部门、禁令、第三组织、第二对象、张二和第三对象;其中,各图节点之间的边如图12所示。Refer to Figure 12 for the keyword graph built based on Table 1, which is a schematic diagram of an exemplary keyword graph provided by the embodiment of the present application; as shown in Figure 12, the graph nodes in the keyword graph 12-1 are based on The keywords corresponding to each topic event in Table 1 are determined, including: first object, Li San, management, blockade, first organization, first department, prosecution, plan, second department, third department, ban, third department The third organization, the second object, Zhang Er and the third object; wherein, the edges between the nodes of each graph are shown in Figure 12.
同理,参见图13,图13是本申请实施例提供的另一种示例性的关键词图的示意图;如图13所示,关键词图13-1(称为第二关键字符串图)中的图节点为最新进展事件的关键词:第一部门、张二和第一对象;并且,第一部门、张二和第一对象中任意两者有边。Similarly, see FIG. 13, which is a schematic diagram of another exemplary keyword graph provided by the embodiment of the present application; as shown in FIG. 13, the keyword graph 13-1 (called the second key character string graph) The graph nodes in are the keywords of the latest progress event: the first department, Zhang Er and the first object; and any two of the first department, Zhang Er and the first object have edges.
在获取问答语义的相似度10-3时,服务器基于信息话题的标题和关键词、以及最新进展事件,构建语句序列(称为待回答语句序列);并通过网络模型(比如,“MRC-Bert”模型)获取语句序列的输出,基于输出的第一维特征(称为答案信息)确定问答语义的相似度。When obtaining the semantic similarity 10-3 of the question and answer, the server builds a sentence sequence (called the sentence sequence to be answered) based on the title and keywords of the information topic and the latest progress event; and through the network model (for example, "MRC-Bert "model) to obtain the output of the sentence sequence, and determine the similarity of the question-and-answer semantics based on the first-dimensional feature of the output (called answer information).
参见图14,图14是本申请实施例提供的一种示例性的获取问答语义的相似度的示意图;如图14所示,将语句序列14-1输入至网络模型14-2(比如,“MRC-Bert”模型),得到答案信息14-3,进而基于答案信息14-3确定问答语义的问答语义的相似度10-3。其中,语句序列14-1中的“CLS”表示语句序列的开始,“SEP”表示语句间的分割,语句序列14-1还包括基于新闻话题构建的问题和最新进展事件。针对最新进展事件9-1的标题“第一部门回应张二撤销对第一对象的禁令”,关键词为“第一对象”和“李三”、 以及标题为“李三封锁第一对象”的新闻话题9-23,构建的语句序列为“[CLS]下句是否是关键词为第一对象和李三的新闻话题李三封锁第一对象的进展?[SEP]第一部门回应张二撤销对第一对象的禁令[SEP]”,还可以为“[CLS]第一部门回应张二撤销对第一对象的禁令事件是否属于关键词为第一对象和李三的新闻话题李三封锁第一对象[SEP]”Referring to FIG. 14, FIG. 14 is a schematic diagram of an exemplary acquisition of semantic similarity of questions and answers provided by an embodiment of the present application; as shown in FIG. 14, a sentence sequence 14-1 is input into a network model 14-2 (for example, " MRC-Bert" model) to obtain the answer information 14-3, and then determine the question-answer semantic similarity 10-3 based on the answer information 14-3. Among them, "CLS" in the sentence sequence 14-1 indicates the start of the sentence sequence, "SEP" represents the segmentation between sentences, and the sentence sequence 14-1 also includes questions and latest progress events constructed based on news topics. For the title of the latest development event 9-1 "The first department responds to Zhang Er's revocation of the ban on the first object", the keywords are "the first object" and "Li San", and the title is "Li San blocks the first object" News topic 9-23, the sentence sequence constructed is "[CLS] Is the next sentence the key word for the first object and Li San's news topic Li San blocks the progress of the first object? [SEP] The first department responds to Zhang Er Revocation of the ban on the first object [SEP]", can also be "[CLS] The first department responds to whether the event of Zhang Er's revocation of the ban on the first object belongs to the news topic of the first object and Li San whose keywords are blocked by Li San First Object [SEP]"
下面继续说明本申请实施例提供的事件整合方法的示例性应用。The exemplary application of the event integration method provided by the embodiment of the present application will be described below.
需要说明的是,网络模型11-1、网络模型11-7、用于获取字符串图相似度的网络模型和网络模型14-2在进行模型训练时,可以在2000多个话题上训练,通过挑选构造出5万条话题-事件样本对。比如,可以从一个时间点以后所有上线运营的新闻话题中挑选审核过的上线(对应于相似)及不上线(对应于不相似)的事件,上线事件作为正样本,不上线事件作为负样本,按事件顺序构造话题-事件样本对。例如:话题A包含abcde五个上线事件和fg两个不上线事件(其中,事件de和事件fg均发生在事件abc后),则可构造6个话题-事件样本对,如下所示。It should be noted that the network model 11-1, the network model 11-7, the network model for obtaining the similarity of string graphs, and the network model 14-2 can be trained on more than 2,000 topics during model training. Select and construct 50,000 topic-event sample pairs. For example, events that have been audited online (corresponding to similarity) and not online (corresponding to dissimilarity) can be selected from all news topics that have been launched and operated after a point in time. The online event is used as a positive sample, and the non-online event is used as a negative sample. Construct topic-event sample pairs in event order. For example: topic A contains five online events of abcde and two offline events of fg (wherein, event de and event fg both occur after event abc), then six topic-event sample pairs can be constructed, as shown below.
正样本:abc->d,abcd->e;Positive samples: abc->d, abcd->e;
负样本:abc->f,abc->g,abcd->f,abcd->g。Negative samples: abc->f, abc->g, abcd->f, abcd->g.
参见图15,图15是本申请实施例提供的一种示例性的特征重要性的示意图;如图15所示,纵坐标表示的为重要性指标,按重要性指标降序排列后依次为:问答语义的相似度10-3、向量语义自注意力相似度15-1、最大标题语义相似度15-2、关键词图的相似度10-2、最大事件关键词语义相似度15-3、平均标题语义相似度15-4和话题关键词向量语义相似度15-5;其中,最大标题语义相似度15-2、最大事件关键词语义相似度15-3、平均事件关键词语义相似度、平均标题语义相似度15-4和话题关键词向量语义相似度15-5,共同构成图10的向量语义相似度10-1中的向量语义统计相似度。Refer to Figure 15, which is a schematic diagram of an exemplary feature importance provided by the embodiment of the present application; as shown in Figure 15, the ordinate indicates the importance index, and the descending order of the importance index is as follows: question and answer Semantic similarity 10-3, vector semantic self-attention similarity 15-1, maximum title semantic similarity 15-2, keyword map similarity 10-2, maximum event keyword semantic similarity 15-3, average The title semantic similarity is 15-4 and the topic keyword vector semantic similarity is 15-5; among them, the maximum title semantic similarity is 15-2, the maximum event keyword semantic similarity is 15-3, the average event keyword semantic similarity, the average The title semantic similarity 15-4 and the topic keyword vector semantic similarity 15-5 together constitute the vector semantic statistical similarity in the vector semantic similarity 10-1 in FIG. 10 .
此外,当对8个维度的相似度(问答语义的相似度10-3、向量语义自注意力相似度15-1、最大标题语义相似度15-2、关键词图的相似度10-2、最大事件关键词语义相似度15-3、平均标题语义相似度15-4、话题关键词向量语义相似度15-5和平均事件关键词语义相似度)进行腐蚀试验时,实验结果如表2所示。In addition, when the similarity of the 8 dimensions (Q&A semantic similarity 10-3, vector semantic self-attention similarity 15-1, maximum title semantic similarity 15-2, keyword graph similarity 10-2, The maximum event keyword semantic similarity is 15-3, the average title semantic similarity is 15-4, the topic keyword vector semantic similarity is 15-5, and the average event keyword semantic similarity) when performing corrosion tests, the experimental results are shown in Table 2 Show.
表2Table 2
Figure PCTCN2022111164-appb-000001
Figure PCTCN2022111164-appb-000001
由表2易知,采用8个维度的相似度进行事件整合时,对应的接受者操作特性(ROC,Receiver Operating Characteristic)曲线下方的面积(AUC,Area Under Curve)为0.9420;当去掉平均标题语义相似度15-4时AUC减少0.0025,当去掉最大标题语义相似度15-2时AUC减少0.0101,当去掉平均事件关键词语义相似度时AUC减少0.0043,当去掉最大事件关键词语义相似度15-3时AUC减少0.0000,当去掉话题关键词向量语义相似度15-5时AUC减少-0.0013,当去掉向量语义自注意力相似度15-1时AUC减少0.0098,当去掉关键词图的相似度10-2时AUC减少0.0081,当去掉问答语义的相似度10-3时 AUC减少0.0152;因此,表明,8个维度的相似度在事件整合时对结果的确定均有贡献度,与重要性指标对应的结果一致。It is easy to know from Table 2 that when the similarity of 8 dimensions is used for event integration, the area under the corresponding receiver operating characteristic (ROC, Receiver Operating Characteristic) curve (AUC, Area Under Curve) is 0.9420; when the average title semantics is removed When the similarity is 15-4, the AUC is reduced by 0.0025. When the maximum title semantic similarity is 15-2, the AUC is reduced by 0.0101. When the average event keyword semantic similarity is removed, the AUC is reduced by 0.0043. When the maximum event keyword semantic similarity is removed, the 15- At 3, the AUC is reduced by 0.0000. When the topic keyword vector semantic similarity of 15-5 is removed, the AUC is reduced by -0.0013. When the vector semantic self-attention similarity of 15-1 is removed, the AUC is reduced by 0.0098. When the similarity of the keyword map is removed by 10 When -2, the AUC is reduced by 0.0081, and when the similarity of question and answer semantics is 10-3, the AUC is reduced by 0.0152; therefore, it shows that the similarity of the eight dimensions contributes to the determination of the result when the event is integrated, corresponding to the importance index The results are consistent.
需要说明的是,AUC是指ROC曲线与坐标轴围成的面积,是一种性能指标。It should be noted that AUC refers to the area enclosed by the ROC curve and the coordinate axis, which is a performance indicator.
在本申请实施例中,还可以基于各相似度的准确率和耗时进行选择,以确定最新进展事件与话题事件是否匹配。参见表3,表3描述了网络模型11-1、网络模型11-7、用于获取字符串图相似度的网络模型和网络模型14-2对应的耗时情况。In the embodiment of the present application, selection may also be made based on the accuracy and time-consuming of each similarity to determine whether the latest progress event matches the topic event. Referring to Table 3, Table 3 describes the time consumption corresponding to the network model 11-1, the network model 11-7, the network model for obtaining the similarity of the string graph, and the network model 14-2.
表3table 3
Figure PCTCN2022111164-appb-000002
Figure PCTCN2022111164-appb-000002
由表3可知,基于耗时的降序序列为:网络模型11-1、用于获取字符串图相似度的网络模型、网络模型11-7和网络模型14-2。又由于问答语义的相似度10-3是通过网络模型14-2获得的,向量语义自注意力相似度15-1是通过网络模型11-7获得的,最大标题语义相似度15-2、最大事件关键词语义相似度15-3、平均标题语义相似度15-4、话题关键词向量语义相似度15-5和平均事件关键词语义相似度是通过网络模型11-1获得的,关键词图的相似度10-2是通过用于获取字符串图相似度的网络模型获得的;因此,服务器以速度最快且准确率最高的网络模型11-7和网络模型14-2作为初始计算方案,当网络模型11-7和网络模型14-2获得的相似度足够高(比如,大于0.7)时直接判定匹配,当足够低(比如,小于0,1)时判定不匹配,只在相似度处于中间范围(比如,0,1至0.7)时继续采用网络模型11-1和用于获取字符串图相似度的网络模型计算相似度,并将获得的所有相似度组合特征输入到融合模型中进行最后判断。It can be seen from Table 3 that the descending sequence based on time consumption is: network model 11-1, network model for obtaining string graph similarity, network model 11-7, and network model 14-2. And because the semantic similarity of question and answer 10-3 is obtained through the network model 14-2, the vector semantic self-attention similarity 15-1 is obtained through the network model 11-7, the maximum semantic similarity of the title is 15-2, the maximum Event keyword semantic similarity 15-3, average title semantic similarity 15-4, topic keyword vector semantic similarity 15-5 and average event keyword semantic similarity are obtained through network model 11-1, keyword map The similarity 10-2 of is obtained through the network model used to obtain the similarity of the string graph; therefore, the server uses the network model 11-7 and the network model 14-2 with the fastest speed and highest accuracy as the initial calculation scheme, When the similarity obtained by the network model 11-7 and the network model 14-2 is high enough (for example, greater than 0.7), it is directly determined to match, and when it is low enough (for example, less than 0, 1), it is determined not to match, only when the similarity is in In the middle range (for example, 0,1 to 0.7), continue to use the network model 11-1 and the network model used to obtain the similarity of the string graph to calculate the similarity, and input all the obtained similarity combination features into the fusion model for further Final judgment.
示例性地,判定过程如式(1)所示。Exemplarily, the determination process is shown in formula (1).
Figure PCTCN2022111164-appb-000003
Figure PCTCN2022111164-appb-000003
其中,S 1为网络模型11-7输出的相似度,S 2为网络模型14-2输出的相似度。 Among them, S 1 is the similarity degree output by the network model 11-7, and S 2 is the similarity degree output by the network model 14-2.
可以理解的是,本申请实施例通过采用多维异构特征,融合了话题多个事件的信息;并且,采用基于向量语义的特征、基于关键词图的特征和基于问答语义的特征三种异构特征模型,能够提升相似度计算的准确性和合理性。另外,采用本申请实施例提供的事件整合方法,能够实现自动批量的事件整合,无需人工参与,能够提升事件整合效率。It can be understood that the embodiment of the present application integrates the information of multiple events of the topic by adopting multi-dimensional heterogeneous features; and adopts three kinds of heterogeneous features based on vector semantics, features based on keyword graphs and features based on question-and-answer semantics The feature model can improve the accuracy and rationality of the similarity calculation. In addition, by adopting the event integration method provided in the embodiment of the present application, automatic batch event integration can be realized without manual participation, and the efficiency of event integration can be improved.
下面继续说明本申请实施例提供的事件整合装置455的实施为软件模块的示例性结构,在一些实施例中,如图2所示,存储在存储器450的事件整合装置455中的软件模块可以包括:The following continues to describe the exemplary structure of the event integration device 455 provided by the embodiment of the present application implemented as a software module. In some embodiments, as shown in FIG. 2 , the software modules stored in the event integration device 455 of the memory 450 may include :
信息获取模块4551,配置为获取待整合事件,并获取至少两个待整合话题,其中,每个所述待整合话题包括至少一个话题事件;The information acquisition module 4551 is configured to acquire events to be integrated, and acquire at least two topics to be integrated, wherein each of the topics to be integrated includes at least one topic event;
相似度获取模块4552,配置为基于语义相似度、字符串图相似度和问答相似度中的一种或多种,确定所述待整合事件与每个所述待整合话题之间的目标相似度,其中,所述语义相似度是指语义特征方面的相似度,所述字符串图相似度是指关键字符串对应的图特征方面的相似度,所述问答相似度是指问答特征方面的相似度;The similarity acquisition module 4552 is configured to determine the target similarity between the event to be integrated and each topic to be integrated based on one or more of semantic similarity, string graph similarity and question-answer similarity , wherein the semantic similarity refers to the similarity in semantic features, the string graph similarity refers to the similarity in graph features corresponding to key strings, and the question-answer similarity refers to the similarity in question-answer features Spend;
话题确定模块4553,配置为基于所述目标相似度,从至少两个所述待整合话题中确定所述待整合事件所属的目标话题;Topic determination module 4553, configured to determine the target topic to which the event to be integrated belongs from at least two topics to be integrated based on the target similarity;
事件整合模块4554,配置为将所述待整合事件整合至所述目标话题中,得到事件脉络,其中,所述事件脉络包括所述待整合事件和至少一个所述话题事件。The event integration module 4554 is configured to integrate the event to be integrated into the target topic to obtain an event context, wherein the event context includes the event to be integrated and at least one topic event.
在本申请实施例中,所述语义相似度、所述字符串图相似度和所述问答相似度中的一种或多种,是通过选择逻辑选择出的;所述选择逻辑包括选择顺序、获取速度、准确率、话题规模、选择数量、话题类型、模型训练规模、模型适用范围和模型适用规模中的一种或多种,其中,所述选择顺序是基于相似度的优先级确定的,所述获取速度为获取相似度的速度,所述准确率为相似度的准确程度,所述话题类型为所述待整合话题的内容形式,所述话题规模为至少两个所述待整合话题的规模,所述模型训练规模为用于获取每种相似度的网络模型所对应的训练数据规模。In the embodiment of the present application, one or more of the semantic similarity, the string graph similarity, and the question-answer similarity are selected through selection logic; the selection logic includes selection order, One or more of acquisition speed, accuracy rate, topic scale, selection quantity, topic type, model training scale, model application range and model application scale, wherein the selection order is determined based on the priority of similarity, The acquisition speed is the speed of obtaining the similarity, the accuracy is the accuracy of the similarity, the topic type is the content form of the topic to be integrated, and the topic scale is at least two of the topics to be integrated scale, the model training scale is the scale of training data corresponding to the network model used to obtain each type of similarity.
在本申请实施例中,当所述选择逻辑包括所述选择顺序时,所述相似度获取模块4552,还配置为基于所述选择顺序,从所述语义相似度、所述字符串图相似度和所述问答相似度的优先级的降序排序中,依次选择第一设定数量的相似度;获取所述第一设定数量的相似度与相似度阈值的比较结果;当所述比较结果为所述待整合事件与所述待整合话题的相似结果时,将所述第一设定数量的相似度,确定为所述待整合事件与所述待整合话题之间的所述目标相似度;当所述比较结果为所述待整合事件与所述待整合话题的待定相似结果时,基于所述选择顺序对剩余相似度进行选择,直至满足选择结束条件时,将选择出的多个相似度确定为所述目标相似度,其中,所述剩余相似度为所述语义相似度、所述字符串图相似度和所述问答相似度中,除所述第一设定数量的相似度之外的相似度,所述选择结束条件为确定出所述待整合事件与所述待整合话题之间相似,或者,所述选择结束条件为选择了所述语义相似度、所述字符串图相似度和所述问答相似度。In this embodiment of the present application, when the selection logic includes the selection order, the similarity acquisition module 4552 is further configured to, based on the selection order, select from the semantic similarity, the string graph similarity In the descending order of the priority of the similarity of the question and answer, the similarity of the first set number is selected in turn; the comparison result of the similarity of the first set number and the similarity threshold is obtained; when the comparison result is When the event to be integrated is similar to the topic to be integrated, the first set amount of similarity is determined as the target similarity between the event to be integrated and the topic to be integrated; When the comparison result is a pending similarity result between the event to be integrated and the topic to be integrated, select the remaining similarities based on the selection order, until the selection end condition is met, the selected multiple similarities Determined as the target similarity, wherein the remaining similarity is the semantic similarity, the string graph similarity and the question-answer similarity, except for the first set number of similarities The similarity of the selection end condition is to determine the similarity between the event to be integrated and the topic to be integrated, or the selection end condition is to select the semantic similarity, the string graph similarity Similarity to the question and answer.
在本申请实施例中,当所述选择逻辑包括所述获取速度和所述话题规模时,所述相似度获取模块4552,还配置为当所述话题规模大于设定规模时,从所述语义相似度、所述字符串图相似度和所述问答相似度的所述获取速度的降序排序中,依次选择第二设定数量的相似度,得到所述待整合事件与所述待整合话题之间的所述目标相似度;当所述话题规模小于等于所述设定规模时,从所述语义相似度、所述字符串图相似度和所述问答相似度的所述获取速度的降序排序中,依次选择第三设定数量的相似度,得到所述待整合事件与所述待整合话题之间的所述目标相似度,其中,所述第二设定数量小于所述第三设定数量。In this embodiment of the present application, when the selection logic includes the acquisition speed and the topic scale, the similarity acquisition module 4552 is further configured to, when the topic scale is larger than the set scale, from the semantic In the descending order of the acquisition speed of the similarity, the similarity of the string graph and the similarity of the question and answer, the second set number of similarities are selected in turn to obtain the relationship between the event to be integrated and the topic to be integrated. The target similarity between; when the topic scale is less than or equal to the set scale, sort in descending order from the acquisition speed of the semantic similarity, the string graph similarity and the question-answer similarity , sequentially select the third set number of similarities to obtain the target similarity between the to-be-integrated event and the to-be-integrated topic, wherein the second set number is smaller than the third set quantity.
在本申请实施例中,当所述目标相似度包括所述语义相似度、所述字符串图相似度和所述问答相似度中的多种时,所述话题确定模块4553,还配置为基于准确率确定所述目标相似度中各种相似度的权重配比;基于所述权重配比,对所述目标相似度中的各种相似度进行融合,得到判别相似度;从至少两个所述待整合话题中,选择最高的所述判 别相似度所对应的所述待整合话题,得到所述待整合事件所属的所述目标话题。In this embodiment of the present application, when the target similarity includes multiple types of the semantic similarity, the character string graph similarity, and the question-answer similarity, the topic determination module 4553 is further configured to The accuracy rate determines the weight ratio of various similarities in the target similarity; based on the weight ratio, fuses the various similarities in the target similarity to obtain a discrimination similarity; from at least two Among the topics to be integrated, the topic to be integrated corresponding to the highest discriminant similarity is selected to obtain the target topic to which the event to be integrated belongs.
在本申请实施例中,所述语义相似度包括语义自注意力相似度和语义统计相似度中的至少一种,其中,所述语义自注意力相似度基于待整合事件与话题事件之间的自注意力确定,所述语义统计相似度基于目标语义确定,所述目标语义是指标题、关键字符串和事件内容中的至少一种所对应的语义。In the embodiment of the present application, the semantic similarity includes at least one of semantic self-attention similarity and semantic statistical similarity, wherein the semantic self-attention similarity is based on the relationship between the event to be integrated and the topic event Self-attention is determined, the semantic statistical similarity is determined based on target semantics, and the target semantics refers to semantics corresponding to at least one of titles, key character strings, and event contents.
在本申请实施例中,所述语义相似度包括语义自注意力相似度,所述相似度获取模块4552还配置为获取所述待整合事件对应的待整合语义特征、以及所述待整合话题中的每个所述话题事件对应的话题事件语义特征;基于所述待整合事件和所述话题事件的区别标识,对所述待整合语义特征进行增强,得到第一增强语义特征,并基于所述区别标识对所述话题事件语义特征进行增强,得到第二增强语义特征;将所述第一增强语义特征与所述待整合话题对应的至少一个所述第二增强语义特征组成语义特征序列,并基于所述语义特征序列中两个序列单元之间的自注意力信息,确定所述语义自注意力相似度。In this embodiment of the application, the semantic similarity includes semantic self-attention similarity, and the similarity acquisition module 4552 is also configured to acquire the semantic feature to be integrated corresponding to the event to be integrated, and the topic to be integrated The topic event semantic feature corresponding to each of the topic events; based on the distinction between the event to be integrated and the topic event, the semantic feature to be integrated is enhanced to obtain the first enhanced semantic feature, and based on the Enhancing the semantic feature of the topic event by distinguishing the mark to obtain a second enhanced semantic feature; combining the first enhanced semantic feature and at least one second enhanced semantic feature corresponding to the topic to be integrated into a semantic feature sequence, and The semantic self-attention similarity is determined based on the self-attention information between two sequence units in the semantic feature sequence.
在本申请实施例中,所述相似度获取模块4552,还配置为在每个所述待整合话题中,获取每个所述话题事件的标题与所述待整合事件的标题之间的第一子语义相似度,并基于所述第一子语义相似度,确定平均第一子语义相似度和最大第一子语义相似度;在每个所述待整合话题中,获取每个所述话题事件对应的话题事件关键字符串与所述待整合事件对应的待整合事件关键字符串之间的第二子语义相似度,并基于所述第二子语义相似度,确定平均第二子语义相似度和最大子第二语义相似度;获取每个所述待整合话题对应的话题关键字符串与所述待整合事件关键字符串之间的第三子语义相似度;将所述平均第一子语义相似度、所述最大第一子语义相似度、所述平均第二子语义相似度、所述最大第二子语义相似度和所述第三子语义相似度,确定为所述语义统计相似度。In this embodiment of the application, the similarity acquisition module 4552 is further configured to acquire the first number between the title of each topic event and the title of the event to be integrated in each of the topics to be integrated. sub-semantic similarity, and based on the first sub-semantic similarity, determine the average first sub-semantic similarity and the maximum first sub-semantic similarity; in each of the topics to be integrated, obtain each of the topic events The second sub-semantic similarity between the corresponding topic event key string and the event to be integrated corresponding to the event to be integrated, and based on the second sub-semantic similarity, determine the average second sub-semantic similarity and the second maximum sub-semantic similarity; obtain the third sub-semantic similarity between the topic key string corresponding to each topic to be integrated and the event key string to be integrated; the average first sub-semantic similarity, the maximum first sub-semantic similarity, the average second sub-semantic similarity, the maximum second sub-semantic similarity and the third sub-semantic similarity, determined as the semantic statistical similarity .
在本申请实施例中,所述第一子语义相似度、所述第二子语义相似度和所述第三子语义相似度,是通过语义统计相似度模型获得的;所述事件整合装置455还包括模型训练模块4555,配置为获取训练样本,其中,所述训练样本包括第一字符串样本、第二字符串样本和标注相似度;采用待训练语义统计相似度模型中的第一语义分支,获取所述第一字符串样本对应的第一预估语义,采用所述待训练语义统计相似度模型中的第二语义分支获取所述第二字符串样本对应的第二预估语义,并基于所述第一预估语义和所述第二预估语义之间的对比结果,确定所述第一字符串样本和所述第二字符串样本之间的预估相似度;基于所述预估相似度与所述标注相似度之间的差异,在所述待训练语义统计相似度模型中进行反向传播,得到所述语义统计相似度模型。In the embodiment of the present application, the first sub-semantic similarity, the second sub-semantic similarity and the third sub-semantic similarity are obtained through a semantic statistical similarity model; the event integration device 455 Also includes a model training module 4555 configured to obtain training samples, wherein the training samples include the first character string sample, the second character string sample and label similarity; adopt the first semantic branch in the semantic statistical similarity model to be trained , obtaining the first predicted semantics corresponding to the first character string sample, using the second semantic branch in the semantic statistical similarity model to be trained to obtain the second predicted semantics corresponding to the second character string sample, and Based on a comparison result between the first predicted semantics and the second predicted semantics, determine an estimated similarity between the first character string sample and the second character string sample; based on the predicted The difference between the estimated similarity and the label similarity is backpropagated in the semantic statistical similarity model to be trained to obtain the semantic statistical similarity model.
在本申请实施例中,所述相似度获取模块4552,还配置为在每个所述待整合话题中,将至少一个所述话题事件对应的每个子话题事件关键字符串确定为图节点,并在对应于同一所述话题事件的两个所述图节点之间建边,将所述图节点和所述边,确定为第一关键字符串图;基于所述待整合事件对应的待整合事件关键字符串,构建第二关键字符串图;将所述第一关键字符串图的向量表示和所述第二关键字符串图的向量表示进行对比,得到图对比结果,并基于所述图对比结果,确定所述字符串图相似度。In the embodiment of the present application, the similarity acquisition module 4552 is further configured to, in each of the topics to be integrated, determine each subtopic event key string corresponding to at least one topic event as a graph node, and Establishing an edge between two graph nodes corresponding to the same topic event, determining the graph node and the edge as the first key character string graph; based on the event to be integrated corresponding to the event to be integrated Key strings, constructing a second key string graph; comparing the vector representation of the first key string graph with the vector representation of the second key string graph to obtain a graph comparison result, and based on the graph comparison As a result, the string graph similarity is determined.
在本申请实施例中,所述相似度获取模块4552,还配置为基于每个所述待整合话题的标题、话题关键字符串和所述待整合事件,组合待回答语句序列;获取所述待回答语句序列的答案信息;基于所述答案信息,确定所述问答相似度。In this embodiment of the application, the similarity acquisition module 4552 is further configured to combine the sequence of sentences to be answered based on the title of each topic to be integrated, the topic key character string and the event to be integrated; obtain the to-be-integrated Answering the answer information of the sentence sequence; based on the answer information, determining the question-answer similarity.
在本申请实施例中,所述信息获取模块4551,还配置为在话题库中,获取每个话题对应的话题关键字符串与所述待整合事件的匹配结果,其中,所述话题库包括多个所述话题;基于所述匹配结果,确定所述话题关键字符串中的至少一个子话题关键字符串与所述待整合事件匹配时,将所述匹配结果对应的所述话题,确定为与所述待整合事件匹 配的所述待整合话题;从所述话题库中,获取与所述待整合事件匹配的至少两个所述待整合话题。In the embodiment of the present application, the information acquisition module 4551 is further configured to obtain the matching result of the topic key character string corresponding to each topic and the event to be integrated in the topic database, wherein the topic database includes multiple the topic; based on the matching result, when it is determined that at least one subtopic key string in the topic key string matches the event to be integrated, the topic corresponding to the matching result is determined to be the same as The topic to be integrated matched with the event to be integrated; at least two topics to be integrated matched with the event to be integrated are acquired from the topic database.
在本申请实施例中,所述信息获取模块4551,还配置为在所述话题库中的每个所述话题对应的至少一个所述话题事件中,获取每个所述话题事件对应的话题事件关键字符串;统计所述话题事件关键字符串中每个子话题事件关键字符串对应的话题事件数量;将第四设定数量个最大所述话题事件数量的所述子话题事件关键字符串,组合为每个所述话题对应的所述话题关键字符串。In the embodiment of the present application, the information obtaining module 4551 is further configured to obtain a topic event corresponding to each topic event from at least one topic event corresponding to each topic in the topic library Key character string; count the number of topic events corresponding to each subtopic event key string in the topic event key string; combine the fourth set number of the subtopic event key strings with the largest number of topic events is the topic key string corresponding to each topic.
在本申请实施例中,所述信息获取模块4551,还配置为对每个所述话题事件进行实体识别,得到与预设实体类型对应的实体关键字符串;对每个所述话题事件进行字符串权重分析,得到动作关键字符串;基于所述实体关键字符串和所述动作关键字符串中的一种或两种,确定所述话题事件关键字符串。In this embodiment of the application, the information acquisition module 4551 is further configured to perform entity recognition on each of the topic events to obtain an entity key character string corresponding to a preset entity type; character strings for each of the topic events String weight analysis to obtain an action key string; based on one or both of the entity key string and the action key string, determine the topic event key string.
在本申请实施例中,所述信息获取模块4551,还配置为获取所述实体关键字符串对应的实体关键字符串数量;当所述实体关键字符串数量小于第五设定数量时,将所述实体关键字符串和所述动作关键字符串,组合为所述话题事件关键字符串;当所述实体关键字符串大于或等于所述第五设定数量时,将所述实体关键字符串确定为所述话题事件关键字符串。In this embodiment of the application, the information acquisition module 4551 is further configured to acquire the number of entity key strings corresponding to the entity key string; when the number of entity key strings is less than the fifth set number, the The entity key character string and the action key character string are combined into the topic event key character string; when the entity key character string is greater than or equal to the fifth set quantity, the entity key character string is determined Key string for the topic event.
在本申请实施例中,所述事件整合装置455还包括事件展示模块4556,配置为呈现搜索控件;响应于作用在所述搜索控件上的第一搜索操作,呈现所述事件脉络对应的简化事件脉络、以及所述简化事件脉络对应的呈现控件,其中,所述简化事件脉络属于所述事件脉络,所述呈现控件用于呈现所述事件脉络;响应于作用在所述呈现控件上的呈现操作,呈现所述事件脉络,其中,呈现的所述事件脉络中的每个事件包括事件标题和事件时间,所述事件为所述待整合事件和至少一个所述话题事件中的任意一个;响应于作用在所述事件标题或所述事件时间上的查看操作,呈现事件详情信息。In this embodiment of the application, the event integration device 455 further includes an event presentation module 4556 configured to present a search control; in response to the first search operation acting on the search control, present a simplified event corresponding to the event context context, and a presentation control corresponding to the simplified event context, wherein the simplified event context belongs to the event context, and the presentation control is used to present the event context; in response to a presentation operation acting on the presentation control , presenting the event context, wherein each event in the presented event context includes an event title and an event time, and the event is any one of the event to be integrated and at least one of the topic events; in response A view operation acting on the event title or the event time presents event detailed information.
在本申请实施例中,所述事件展示模块4556,还配置为呈现目标事件的最后待呈现信息,其中,所述目标事件是所述事件脉络所包括的所述待整合事件和至少一个所述话题事件中的任一事件;在所述最后待呈现信息对应的推荐区域,呈现与所述目标事件关联的所述事件脉络中的剩余事件,其中,所述剩余事件是所述事件脉络中除所述目标事件之外的任一事件;响应于针对所述剩余事件上的第二搜索操作,呈现所述剩余事件的详细信息。In this embodiment of the application, the event presentation module 4556 is further configured to present the last information to be presented of the target event, wherein the target event is the event to be integrated included in the event context and at least one of the Any event in the topic event; in the recommendation area corresponding to the last information to be presented, the remaining events in the event context associated with the target event are presented, wherein the remaining events are the remaining events in the event context except Any event other than the target event; in response to a second search operation on the remaining events, presenting detailed information of the remaining events.
本申请实施例提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。电子设备(事件整合设备)的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该电子设备执行本申请实施例上述的事件整合方法。An embodiment of the present application provides a computer program product or computer program, where the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium. The processor of the electronic device (event integration device) reads the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the electronic device executes the above-mentioned event integration method in the embodiment of the present application.
本申请实施例提供一种存储有可执行指令的计算机可读存储介质,其中存储有可执行指令,当可执行指令被处理器执行时,将引起处理器执行本申请实施例提供的事件整合方法,例如,如图3示出的事件整合方法。The embodiment of the present application provides a computer-readable storage medium storing executable instructions, wherein the executable instructions are stored, and when the executable instructions are executed by the processor, the processor will be caused to execute the event integration method provided in the embodiment of the present application , for example, the event integration method shown in FIG. 3 .
在本申请的一些实施例中,计算机可读存储介质可以是FRAM、ROM、PROM、EPROM、EEPROM、闪存、磁表面存储器、光盘、或CD-ROM等存储器;也可以是包括上述存储器之一或任意组合的各种设备。In some embodiments of the present application, the computer-readable storage medium may be a memory such as FRAM, ROM, PROM, EPROM, EEPROM, flash memory, magnetic surface memory, optical disk, or CD-ROM; Various devices in any combination.
在本申请的一些实施例中,可执行指令可以采用程序、软件、软件模块、脚本或代码的形式,按任意形式的编程语言(包括编译或解释语言,或者声明性或过程性语言)来编写,并且其可按任意形式部署,包括被部署为独立的程序或者被部署为模块、组件、子例程或者适合在计算环境中使用的其它单元。In some embodiments of the present application, executable instructions may take the form of programs, software, software modules, scripts, or codes written in any form of programming language (including compiled or interpreted languages, or declarative or procedural languages) , and it may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
作为示例,可执行指令可以但不一定对应于文件系统中的文件,可以可被存储在 保存其它程序或数据的文件的一部分,例如,存储在超文本标记语言(HTML,Hyper Text Markup Language)文档中的一个或多个脚本中,存储在专用于所讨论的程序的单个文件中,或者,存储在多个协同文件(例如,存储一个或多个模块、子程序或代码部分的文件)中。As an example, executable instructions may, but do not necessarily correspond to files in a file system, may be stored as part of a file that holds other programs or data, for example, in a Hyper Text Markup Language (HTML) document in one or more scripts, in a single file dedicated to the program in question, or in multiple cooperating files (for example, files that store one or more modules, subroutines, or sections of code).
作为示例,可执行指令可被部署为在一个电子设备上执行(此时,这一个电子设备即事件整合设备),或者在位于一个地点的多个电子设备上执行(此时,位于一个地点的多个电子设备即事件整合设备),又或者,在分布在多个地点且通过通信网络互连的多个电子设备上执行(此时,分布在多个地点且通过通信网络互连的多个电子设备即事件整合设备)。As an example, executable instructions may be deployed to execute on one electronic device (in which case, the one electronic device is the event integration device), or to execute on multiple electronic devices at one location (in which case, the Multiple electronic devices are event integration devices), or executed on multiple electronic devices distributed at multiple locations and interconnected through a communication network (at this time, multiple Electronic devices are event integration devices).
可以理解的是,在本申请实施例中,涉及到事件等相关的数据,当本申请实施例运用到具体产品或技术中时,需要获得用户许可或者同意,且相关数据的收集、使用和处理需要遵守相关国家和地区的相关法律法规和标准。It can be understood that in the embodiments of this application, data related to events and the like are involved. When the embodiments of this application are applied to specific products or technologies, user permission or consent must be obtained, and the collection, use and processing of relevant data Relevant laws, regulations and standards of relevant countries and regions need to be complied with.
综上所述,通过本申请实施例,由于在至少两个待整合话题中确定待整合事件所属的目标话题时,是通过判断待整合事件与每个待整合话题之间的目标相似度确定的,即目标话题是直接将待整合事件与每个待整合话题进行目标相似度对比确定的,又由于目标相似度包括语义相似度、字符串图相似度和问答相似度中的一种或多种,从而,通过所获得的目标相似度能够准确地确定每个待整合话题是否是待整合事件所属的目标话题,进而,当将待整合事件整合至目标话题时,能够提升事件整合的准确率。另外,事件整合过程中通过先召回再进行相似度的获取,能够提升事件整合的效率。To sum up, through the embodiment of this application, when determining the target topic to which the event to be integrated belongs among at least two topics to be integrated, it is determined by judging the target similarity between the event to be integrated and each topic to be integrated , that is, the target topic is determined by directly comparing the target similarity between the event to be integrated and each topic to be integrated, and since the target similarity includes one or more of semantic similarity, string graph similarity and question-answer similarity , thus, through the obtained target similarity, it can be accurately determined whether each topic to be integrated is the target topic to which the event to be integrated belongs, and then, when the event to be integrated is integrated into the target topic, the accuracy of event integration can be improved. In addition, in the process of event integration, the efficiency of event integration can be improved by first recalling and then obtaining the similarity.
以上所述,仅为本申请的实施例而已,并非用于限定本申请的保护范围。凡在本申请的精神和范围之内所作的任何修改、等同替换和改进等,均包含在本申请的保护范围之内。The above descriptions are merely examples of the present application, and are not intended to limit the protection scope of the present application. Any modifications, equivalent replacements and improvements made within the spirit and scope of the present application are included in the protection scope of the present application.

Claims (17)

  1. 一种事件整合方法,所述方法由电子设备执行,所述方法包括:An event integration method, the method is executed by an electronic device, the method comprising:
    获取待整合事件,并获取至少两个待整合话题,其中,每个所述待整合话题包括至少一个话题事件;Obtaining events to be integrated, and acquiring at least two topics to be integrated, wherein each topic to be integrated includes at least one topic event;
    基于语义相似度、字符串图相似度和问答相似度中的一种或多种,确定所述待整合事件与每个所述待整合话题之间的目标相似度,其中,所述语义相似度是指语义特征方面的相似度,所述字符串图相似度是指关键字符串对应的图特征方面的相似度,所述问答相似度是指问答特征方面的相似度;Based on one or more of semantic similarity, string graph similarity and question-answer similarity, determine the target similarity between the event to be integrated and each topic to be integrated, wherein the semantic similarity Refers to the similarity in terms of semantic features, the similarity in character string graphs refers to the similarity in graph features corresponding to key character strings, and the question-answer similarity refers to the similarity in question-and-answer features;
    基于所述目标相似度,从至少两个所述待整合话题中确定所述待整合事件所属的目标话题;Based on the target similarity, determining a target topic to which the event to be integrated belongs from at least two topics to be integrated;
    将所述待整合事件整合至所述目标话题中,得到事件脉络,其中,所述事件脉络包括所述待整合事件和至少一个所述话题事件。Integrating the event to be integrated into the target topic to obtain an event context, wherein the event context includes the event to be integrated and at least one topic event.
  2. 根据权利要求1所述的方法,其中,所述语义相似度、所述字符串图相似度和所述问答相似度中的一种或多种,是通过选择逻辑选择出的;The method according to claim 1, wherein one or more of the semantic similarity, the string graph similarity and the question-answer similarity are selected by selection logic;
    所述选择逻辑包括选择顺序、获取速度、准确率、话题规模、选择数量、话题类型、模型训练规模、模型适用范围和模型适用规模中的一种或多种,其中,所述选择顺序是基于相似度的优先级确定的,所述获取速度为获取相似度的速度,所述准确率为相似度的准确程度,所述话题类型为所述待整合话题的内容形式,所述话题规模为至少两个所述待整合话题的规模,所述模型训练规模为用于获取每种相似度的网络模型所对应的训练数据规模。The selection logic includes one or more of selection order, acquisition speed, accuracy rate, topic scale, selection quantity, topic type, model training scale, model applicable scope and model applicable scale, wherein the selection order is based on Determined by the priority of the similarity, the acquisition speed is the speed of obtaining the similarity, the accuracy is the accuracy of the similarity, the topic type is the content form of the topic to be integrated, and the topic scale is at least The size of the two topics to be integrated, the model training size is the size of the training data corresponding to the network model used to obtain each similarity.
  3. 根据权利要求2所述的方法,其中,当所述选择逻辑包括所述选择顺序时,所述基于语义相似度、字符串图相似度和问答相似度中的一种或多种,确定所述待整合事件与每个所述待整合话题之间的目标相似度,包括:The method according to claim 2, wherein, when the selection logic includes the selection order, the determination of the The target similarity between the event to be integrated and each said topic to be integrated, including:
    基于所述选择顺序,从所述语义相似度、所述字符串图相似度和所述问答相似度的优先级的降序排序中,依次选择第一设定数量的相似度;Based on the selection order, from the descending order of priority of the semantic similarity, the string graph similarity and the question-answer similarity, sequentially select a first set number of similarities;
    获取所述第一设定数量的相似度与相似度阈值的比较结果;Acquiring a comparison result between the first set number of similarities and a similarity threshold;
    当所述比较结果为所述待整合事件与所述待整合话题的相似结果时,将所述第一设定数量的相似度,确定为所述待整合事件与所述待整合话题之间的所述目标相似度;When the comparison result is a similar result between the event to be integrated and the topic to be integrated, the first set amount of similarity is determined as the difference between the event to be integrated and the topic to be integrated said target similarity;
    当所述比较结果为所述待整合事件与所述待整合话题的待定相似结果时,基于所述选择顺序对剩余相似度进行选择,直至满足选择结束条件时,将选择出的多个相似度确定为所述目标相似度,其中,所述剩余相似度为所述语义相似度、所述字符串图相似度和所述问答相似度中,除所述第一设定数量的相似度之外的相似度,所述选择结束条件为确定出所述待整合事件与所述待整合话题之间相似,或者,所述选择结束条件为选择了所述语义相似度、所述字符串图相似度和所述问答相似度。When the comparison result is a pending similarity result between the event to be integrated and the topic to be integrated, select the remaining similarities based on the selection order, until the selection end condition is met, the selected multiple similarities Determined as the target similarity, wherein the remaining similarity is the semantic similarity, the string graph similarity and the question-answer similarity, except for the first set number of similarities The similarity of the selection end condition is to determine the similarity between the event to be integrated and the topic to be integrated, or the selection end condition is to select the semantic similarity, the string graph similarity Similarity to the question and answer.
  4. 根据权利要求2所述的方法,其中,当所述选择逻辑包括所述获取速度和所述话题规模时,所述基于语义相似度、字符串图相似度和问答相似度中的一种或多种,确定所述待整合事件与每个所述待整合话题之间的目标相似度,包括:The method according to claim 2, wherein, when the selection logic includes the acquisition speed and the topic scale, the selection based on one or more of semantic similarity, string graph similarity and question-answer similarity kind, determine the target similarity between the event to be integrated and each topic to be integrated, including:
    当所述话题规模大于设定规模时,从所述语义相似度、所述字符串图相似度和所述问答相似度的所述获取速度的降序排序中,依次选择第二设定数量的相似度,得到所述待整合事件与所述待整合话题之间的所述目标相似度;When the topic scale is larger than the set scale, from the descending order of the acquisition speed of the semantic similarity, the string graph similarity, and the question-answer similarity, sequentially select a second set number of similarities degree to obtain the target similarity between the event to be integrated and the topic to be integrated;
    当所述话题规模小于等于所述设定规模时,从所述语义相似度、所述字符串图相似度和所述问答相似度的所述获取速度的降序排序中,依次选择第三设定数量的相似度, 得到所述待整合事件与所述待整合话题之间的所述目标相似度,其中,所述第二设定数量小于所述第三设定数量。When the topic scale is less than or equal to the set scale, select the third setting in order from the descending order of the acquisition speed of the semantic similarity, the string graph similarity and the question-answer similarity Quantitative similarity obtains the target similarity between the event to be integrated and the topic to be integrated, wherein the second set number is smaller than the third set number.
  5. 根据权利要求1至4任一项所述的方法,其中,当所述目标相似度包括所述语义相似度、所述字符串图相似度和所述问答相似度中的多种时,所述基于所述目标相似度,从至少两个所述待整合话题中确定所述待整合事件所属的目标话题,包括:The method according to any one of claims 1 to 4, wherein when the target similarity includes multiple of the semantic similarity, the string graph similarity and the question-answer similarity, the Based on the target similarity, determining the target topic to which the event to be integrated belongs from at least two topics to be integrated, including:
    基于准确率确定所述目标相似度中各种相似度的权重配比;Determine the weight ratio of various similarities in the target similarity based on the accuracy rate;
    基于所述权重配比,对所述目标相似度中的各种相似度进行融合,得到判别相似度;Based on the weight ratio, the various similarities in the target similarity are fused to obtain the discriminant similarity;
    从至少两个所述待整合话题中,选择最高的所述判别相似度所对应的所述待整合话题,得到所述待整合事件所属的所述目标话题。From at least two topics to be integrated, the topic to be integrated corresponding to the highest discriminant similarity is selected to obtain the target topic to which the event to be integrated belongs.
  6. 根据权利要求1所述的方法,其中,所述语义相似度包括语义自注意力相似度和语义统计相似度中的至少一种,其中,所述语义自注意力相似度基于待整合事件与话题事件之间的自注意力确定,所述语义统计相似度基于目标语义确定,所述目标语义是指标题、关键字符串和事件内容中的至少一种所对应的语义。The method according to claim 1, wherein the semantic similarity includes at least one of semantic self-attention similarity and semantic statistical similarity, wherein the semantic self-attention similarity is based on events and topics to be integrated The self-attention between events is determined, and the semantic statistical similarity is determined based on target semantics, where the target semantics refers to the semantics corresponding to at least one of titles, key character strings, and event contents.
  7. 根据权利要求1至4、6任一项所述的方法,其中,所述语义相似度包括语义自注意力相似度,所述语义自注意力相似度通过以下步骤获得:The method according to any one of claims 1 to 4,6, wherein the semantic similarity comprises a semantic self-attention similarity, and the semantic self-attention similarity is obtained by the following steps:
    获取所述待整合事件对应的待整合语义特征、以及所述待整合话题中的每个所述话题事件对应的话题事件语义特征;Acquiring semantic features to be integrated corresponding to the event to be integrated, and topic event semantic features corresponding to each topic event in the topic to be integrated;
    基于所述待整合事件和所述话题事件的区别标识,对所述待整合语义特征进行增强,得到第一增强语义特征,并基于所述区别标识对所述话题事件语义特征进行增强,得到第二增强语义特征;Based on the distinguishing marks of the event to be integrated and the topic event, the semantic feature to be integrated is enhanced to obtain a first enhanced semantic feature, and the semantic feature of the topic event is enhanced based on the distinguishing mark to obtain a second semantic feature. Two enhanced semantic features;
    将所述第一增强语义特征与所述待整合话题对应的至少一个所述第二增强语义特征组成语义特征序列,并基于所述语义特征序列中两个序列单元之间的自注意力信息,确定所述语义自注意力相似度。Composing the first enhanced semantic feature and at least one second enhanced semantic feature corresponding to the topic to be integrated into a semantic feature sequence, and based on the self-attention information between two sequence units in the semantic feature sequence, Determine the semantic self-attention similarity.
  8. 根据权利要求1至4、6任一项所述的方法,其中,所述语义相似度包括语义统计相似度,所述语义统计相似度通过以下步骤获得:The method according to any one of claims 1 to 4, 6, wherein the semantic similarity includes semantic statistical similarity, and the semantic statistical similarity is obtained by the following steps:
    在每个所述待整合话题中,获取每个所述话题事件的标题与所述待整合事件的标题之间的第一子语义相似度,并基于所述第一子语义相似度,确定平均第一子语义相似度和最大第一子语义相似度;In each topic to be integrated, obtain the first sub-semantic similarity between the title of each topic event and the title of the event to be integrated, and determine the average sub-semantic similarity based on the first sub-semantic similarity The first sub-semantic similarity and the maximum first sub-semantic similarity;
    在每个所述待整合话题中,获取每个所述话题事件对应的话题事件关键字符串与所述待整合事件对应的待整合事件关键字符串之间的第二子语义相似度,并基于所述第二子语义相似度,确定平均第二子语义相似度和最大子第二语义相似度;In each of the topics to be integrated, the second sub-semantic similarity between the topic event key string corresponding to each of the topic events and the event key string to be integrated corresponding to the event to be integrated is obtained, and based on The second sub-semantic similarity determines the average second sub-semantic similarity and the maximum sub-second semantic similarity;
    获取每个所述待整合话题对应的话题关键字符串与所述待整合事件关键字符串之间的第三子语义相似度;Acquiring the third sub-semantic similarity between the topic key string corresponding to each topic to be integrated and the event key string to be integrated;
    将所述平均第一子语义相似度、所述最大第一子语义相似度、所述平均第二子语义相似度、所述最大第二子语义相似度和所述第三子语义相似度,确定为所述语义统计相似度。The average first sub-semantic similarity, the maximum first sub-semantic similarity, the average second sub-semantic similarity, the maximum second sub-semantic similarity and the third sub-semantic similarity, Determined as the semantic statistical similarity.
  9. 根据权利要求1至4、6任一项所述的方法,其中,所述目标相似度包括所述字符串图相似度,所述字符串图相似度通过以下步骤获得:The method according to any one of claims 1 to 4, 6, wherein the target similarity includes the string graph similarity, and the string graph similarity is obtained by the following steps:
    在每个所述待整合话题中,将至少一个所述话题事件对应的每个子话题事件关键字符串确定为图节点,并在对应于同一所述话题事件的两个所述图节点之间建边,将所述图节点和所述边,确定为第一关键字符串图;In each topic to be integrated, each subtopic event key character string corresponding to at least one topic event is determined as a graph node, and a graph node is established between two graph nodes corresponding to the same topic event An edge, determining the graph node and the edge as a first key character string graph;
    基于所述待整合事件对应的待整合事件关键字符串,构建第二关键字符串图;Constructing a second key string graph based on the key string of the event to be integrated corresponding to the event to be integrated;
    将所述第一关键字符串图的向量表示和所述第二关键字符串图的向量表示进行对比,得到图对比结果,并基于所述图对比结果,确定所述字符串图相似度。Comparing the vector representation of the first key string graph with the vector representation of the second key string graph to obtain a graph comparison result, and determining the string graph similarity based on the graph comparison result.
  10. 根据权利要求1至4、6任一项所述的方法,其中,所述目标相似度包括所述问答相似度,所述问答相似度通过以下步骤获得:The method according to any one of claims 1 to 4, 6, wherein the target similarity includes the question-answer similarity, and the question-answer similarity is obtained by the following steps:
    基于每个所述待整合话题的标题、话题关键字符串和所述待整合事件,组合待回答语句序列;Based on the title of each topic to be integrated, the topic key character string and the event to be integrated, combine the sequence of sentences to be answered;
    获取所述待回答语句序列的答案信息;Acquiring answer information of the sentence sequence to be answered;
    基于所述答案信息,确定所述问答相似度。Based on the answer information, the question-answer similarity is determined.
  11. 根据权利要求1至4、6任一项所述的方法,其中,所述获取至少两个待整合话题,包括:The method according to any one of claims 1 to 4, 6, wherein said obtaining at least two topics to be integrated comprises:
    在话题库中,获取每个话题对应的话题关键字符串与所述待整合事件的匹配结果,其中,所述话题库包括多个所述话题;In the topic library, the matching result of the topic key character string corresponding to each topic and the event to be integrated is obtained, wherein the topic library includes a plurality of topics;
    基于所述匹配结果,确定所述话题关键字符串中的至少一个子话题关键字符串与所述待整合事件匹配时,将所述匹配结果对应的所述话题,确定为与所述待整合事件匹配的所述待整合话题;Based on the matching result, when it is determined that at least one subtopic key string in the topic key string matches the event to be integrated, the topic corresponding to the matching result is determined to be related to the event to be integrated The matching topic to be integrated;
    从所述话题库中,获取与所述待整合事件匹配的至少两个所述待整合话题。Obtain at least two topics to be integrated that match the event to be integrated from the topic database.
  12. 根据权利要求11所述的方法,其中,所述在话题库中,获取每个话题对应的话题关键字符串与所述待整合事件的匹配结果之前,所述方法还包括:The method according to claim 11, wherein, in the topic library, before obtaining the matching result of the topic key character string corresponding to each topic and the event to be integrated, the method further includes:
    对每个所述话题事件进行实体识别,得到与预设实体类型对应的实体关键字符串;Perform entity recognition on each of the topic events to obtain an entity key string corresponding to a preset entity type;
    对每个所述话题事件进行字符串权重分析,得到动作关键字符串;Perform string weight analysis on each topic event to obtain an action key string;
    基于所述实体关键字符串和所述动作关键字符串中的一种或两种,确定话题事件关键字符串;Determine a topic event key string based on one or both of the entity key string and the action key string;
    统计所述话题事件关键字符串中每个子话题事件关键字符串对应的话题事件数量;Count the number of topic events corresponding to each subtopic event key string in the topic event key string;
    从所述话题事件关键字符串对应的所述话题事件数量的降序排序中,依次选择第四设定数量的所述子话题事件关键字符串,得到所述话题关键字符串。From the descending order of the number of topic events corresponding to the topic event key strings, a fourth set number of sub-topic event key strings are sequentially selected to obtain the topic key strings.
  13. 根据权利要求1至4、6任一项所述的方法,其中,所述将所述待整合事件整合至所述目标话题中,得到事件脉络之后,所述方法还包括:The method according to any one of claims 1 to 4, 6, wherein, after integrating the event to be integrated into the target topic and obtaining the context of the event, the method further includes:
    呈现搜索控件;render the search control;
    响应于作用在所述搜索控件上的第一搜索操作,呈现所述事件脉络对应的简化事件脉络、以及所述简化事件脉络对应的呈现控件,其中,所述简化事件脉络属于所述事件脉络,所述呈现控件用于呈现所述事件脉络;In response to a first search operation acting on the search control, presenting a simplified event context corresponding to the event context and a presentation control corresponding to the simplified event context, wherein the simplified event context belongs to the event context, The presentation control is used to present the event context;
    响应于作用在所述呈现控件上的呈现操作,呈现所述事件脉络,其中,呈现的所述事件脉络中的每个事件包括事件标题和事件时间,所述事件为所述待整合事件和至少一个所述话题事件中的任意一个;In response to a presentation operation acting on the presentation control, the event context is presented, wherein each event in the presented event context includes an event title and an event time, and the event is the event to be integrated and at least any one of said topical events;
    响应于作用在所述事件标题或所述事件时间上的查看操作,呈现事件详情信息。Event detail information is presented in response to a view operation acting on the event title or the event time.
  14. 一种事件整合装置,所述事件整合装置包括:An event integration device, the event integration device comprising:
    信息获取模块,配置为获取待整合事件,并获取至少两个待整合话题,其中,每个所述待整合话题包括至少一个话题事件;An information acquisition module configured to acquire events to be integrated, and acquire at least two topics to be integrated, wherein each of the topics to be integrated includes at least one topic event;
    相似度获取模块,配置为基于语义相似度、字符串图相似度和问答相似度中的一种或多种,确定所述待整合事件与每个所述待整合话题之间的目标相似度,其中,所述语义相似度是指语义特征方面的相似度,所述字符串图相似度是指关键字符串对应的图特征方面的相似度,所述问答相似度是指问答特征方面的相似度;The similarity acquisition module is configured to determine the target similarity between the event to be integrated and each topic to be integrated based on one or more of semantic similarity, string graph similarity and question-answer similarity, Wherein, the semantic similarity refers to the similarity in semantic features, the string graph similarity refers to the similarity in graph features corresponding to key character strings, and the question-answer similarity refers to the similarity in question-answer features ;
    话题确定模块,配置为基于所述目标相似度,从至少两个所述待整合话题中确定所述待整合事件所属的目标话题;A topic determination module configured to determine the target topic to which the event to be integrated belongs from at least two topics to be integrated based on the target similarity;
    事件整合模块,配置为将所述待整合事件整合至所述目标话题中,得到事件脉络,其中,所述事件脉络包括所述待整合事件和至少一个所述话题事件。The event integration module is configured to integrate the event to be integrated into the target topic to obtain an event context, wherein the event context includes the event to be integrated and at least one topic event.
  15. 一种用于事件整合的电子设备,所述电子设备包括:An electronic device for event integration, the electronic device comprising:
    存储器,用于存储可执行指令;memory for storing executable instructions;
    处理器,用于执行所述存储器中存储的可执行指令时,实现权利要求1至13任一项所述的事件整合方法。The processor is configured to implement the event integration method according to any one of claims 1 to 13 when executing the executable instructions stored in the memory.
  16. 一种计算机可读存储介质,存储有可执行指令,用于被处理器执行时,实现权利要求1至13任一项所述的事件整合方法。A computer-readable storage medium storing executable instructions for implementing the event integration method according to any one of claims 1 to 13 when executed by a processor.
  17. 一种计算机程序产品,包括计算机程序或指令,所述计算机程序或指令被处理器执行时,实现权利要求1至13任一项所述的事件整合方法。A computer program product, including computer programs or instructions, when the computer programs or instructions are executed by a processor, the event integration method described in any one of claims 1 to 13 is realized.
PCT/CN2022/111164 2021-09-18 2022-08-09 Event integration method and apparatus, and electronic device, computer-readable storage medium and computer program product WO2023040516A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111111428.X 2021-09-18
CN202111111428.XA CN115840796A (en) 2021-09-18 2021-09-18 Event integration method, device, equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
WO2023040516A1 true WO2023040516A1 (en) 2023-03-23

Family

ID=85574458

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/111164 WO2023040516A1 (en) 2021-09-18 2022-08-09 Event integration method and apparatus, and electronic device, computer-readable storage medium and computer program product

Country Status (2)

Country Link
CN (1) CN115840796A (en)
WO (1) WO2023040516A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116361468A (en) * 2023-04-03 2023-06-30 北京中科闻歌科技股份有限公司 Event context generation method, electronic equipment and storage medium
CN117056459A (en) * 2023-08-07 2023-11-14 北京网聘信息技术有限公司 Vector recall method and device
CN116361468B (en) * 2023-04-03 2024-05-03 北京中科闻歌科技股份有限公司 Event context generation method, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108549647A (en) * 2018-01-17 2018-09-18 中移在线服务有限公司 The method without accident in mark language material active predicting movement customer service field is realized based on SinglePass algorithms
CN110795607A (en) * 2019-10-29 2020-02-14 中国人民解放军32181部队 Equipment guarantee data matching method and system based on multi-stage similarity calculation
CN111382276A (en) * 2018-12-29 2020-07-07 中国科学院信息工程研究所 Event development venation map generation method
CN111444337A (en) * 2020-02-27 2020-07-24 桂林电子科技大学 Topic tracking method based on improved K L divergence

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108549647A (en) * 2018-01-17 2018-09-18 中移在线服务有限公司 The method without accident in mark language material active predicting movement customer service field is realized based on SinglePass algorithms
CN111382276A (en) * 2018-12-29 2020-07-07 中国科学院信息工程研究所 Event development venation map generation method
CN110795607A (en) * 2019-10-29 2020-02-14 中国人民解放军32181部队 Equipment guarantee data matching method and system based on multi-stage similarity calculation
CN111444337A (en) * 2020-02-27 2020-07-24 桂林电子科技大学 Topic tracking method based on improved K L divergence

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116361468A (en) * 2023-04-03 2023-06-30 北京中科闻歌科技股份有限公司 Event context generation method, electronic equipment and storage medium
CN116361468B (en) * 2023-04-03 2024-05-03 北京中科闻歌科技股份有限公司 Event context generation method, electronic equipment and storage medium
CN117056459A (en) * 2023-08-07 2023-11-14 北京网聘信息技术有限公司 Vector recall method and device

Also Published As

Publication number Publication date
CN115840796A (en) 2023-03-24

Similar Documents

Publication Publication Date Title
Purohit et al. Emergency-relief coordination on social media: Automatically matching resource requests and offers
US20190034543A1 (en) Analyzing Concepts Over Time
WO2021139701A1 (en) Application recommendation method and apparatus, storage medium and electronic device
CN110597963B (en) Expression question-answering library construction method, expression search device and storage medium
CN110297893B (en) Natural language question-answering method, device, computer device and storage medium
CN110825956A (en) Information flow recommendation method and device, computer equipment and storage medium
Joorabchi et al. Text mining stackoverflow: An insight into challenges and subject-related difficulties faced by computer science learners
US20220245109A1 (en) Methods and systems for state navigation
CN113392197B (en) Question-answering reasoning method and device, storage medium and electronic equipment
CN111460145A (en) Learning resource recommendation method, device and storage medium
WO2023040516A1 (en) Event integration method and apparatus, and electronic device, computer-readable storage medium and computer program product
Rajput et al. Making a case for social media corpus for detecting depression
EP4068113A1 (en) Method for determining text similarity, method for obtaining semantic answer text, and question answering method
CN113571196A (en) Method and device for constructing medical training sample and method for retrieving medical text
CN110377706B (en) Search sentence mining method and device based on deep learning
CN113573128A (en) Audio processing method, device, terminal and storage medium
JPWO2019167281A1 (en) Response processing program, response processing method, response processing device and response processing system
WO2019173737A1 (en) Methods, systems, devices, and software for managing and conveying knowledge
CN112989001B (en) Question and answer processing method and device, medium and electronic equipment
Foote et al. A computational analysis of social media scholarship
CN115269961A (en) Content search method and related device
CN114942981A (en) Question-answer query method and device, electronic equipment and computer readable storage medium
CN113407806A (en) Network structure searching method, device, equipment and computer readable storage medium
US20210279605A1 (en) Efficiently generating accurate responses to a multi-facet question by a question answering system
Vasiliou Implementation of intelligent system to support remote telemedicine services using chatbots technology

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22868893

Country of ref document: EP

Kind code of ref document: A1