CN115840813A - Extended event display method, device, equipment and computer readable storage medium - Google Patents

Extended event display method, device, equipment and computer readable storage medium Download PDF

Info

Publication number
CN115840813A
CN115840813A CN202111101755.7A CN202111101755A CN115840813A CN 115840813 A CN115840813 A CN 115840813A CN 202111101755 A CN202111101755 A CN 202111101755A CN 115840813 A CN115840813 A CN 115840813A
Authority
CN
China
Prior art keywords
event
candidate
core
dimension
events
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111101755.7A
Other languages
Chinese (zh)
Inventor
房育勋
朱斌
宋泓臻
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202111101755.7A priority Critical patent/CN115840813A/en
Publication of CN115840813A publication Critical patent/CN115840813A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides an extended event display method, device and equipment and a computer readable storage medium, which are applied to various scenes such as cloud technology, artificial intelligence, intelligent traffic, vehicle-mounted and the like; the method comprises the following steps: obtaining events to be processed by extracting event information of the hot events, and determining core events from the events to be processed; extracting key words from the core events to obtain key words corresponding to the core events; searching and matching are carried out on the basis of the event to be processed by utilizing the keywords to obtain a candidate event; screening at least one dimension of the candidate event to obtain a venation event; and performing extended event display according to the context events and the core events. By the method and the device, the efficiency and the accuracy of extended event display can be improved.

Description

Extended event display method, device, equipment and computer readable storage medium
Technical Field
The present application relates to the field of terminal technologies, and in particular, to a method, an apparatus, a device, and a computer-readable storage medium for displaying an extended event.
Background
At present, for some news events with longer duration, the important stage of the event development can be extracted through a machine mining means to serve as the venation information of the event, and the extended event is displayed or recommended through the venation information, so that a user can intuitively know the process of the event development through the displayed extended event.
The method for generating event context information in the related art usually generates events through articles, and then connects similar events into contexts in series by using a model based on similarity measurement or a large-scale clustering method, wherein the large-scale clustering method has low efficiency and can only process a small number of events within a period of time. When the magnitude of the number of the events is over ten thousands, the accuracy of the related technology is greatly reduced, and the requirement of generating a long-time span venation in a normalized large scale cannot be met, so that the accuracy of displaying the extended events is reduced.
Disclosure of Invention
The embodiment of the application provides an extended event display method, an extended event display device, extended event display equipment and a computer readable storage medium, and efficiency and accuracy of extended event display can be improved.
The technical scheme of the embodiment of the application is realized as follows:
the embodiment of the application provides an extended event display method, which comprises the following steps:
obtaining events to be processed by extracting event information of hot events, and determining a core event from the events to be processed;
extracting keywords from the core event to obtain keywords corresponding to the core event;
searching and matching are carried out on the basis of the event to be processed by utilizing the keywords to obtain a candidate event;
screening at least one dimension of the candidate event to obtain a venation event;
and performing extended event display according to the context event and the core event.
An embodiment of the present application provides an extended event display device, including:
the extraction module is used for obtaining the events to be processed by extracting the event information of the hot events and determining the core events from the events to be processed; extracting keywords from the core event to obtain keywords corresponding to the core event;
the matching module is used for searching and matching based on the event to be processed by utilizing the keyword to obtain a candidate event;
the screening module is used for screening at least one dimension of the candidate events to obtain venation events;
and the display module is used for performing extended event display according to the context event and the core event.
In the above apparatus, the extracting module is further configured to perform named entity identification on a core event title corresponding to the core event to obtain at least one named entity; and in the at least one named entity, a named entity of a preset noun type is used as the keyword.
In the device, the extended event display device further comprises a word right analysis module; the word right analysis module is used for performing word right analysis on the core event title to obtain at least one weight vocabulary under the condition that the number of the keywords is smaller than a preset number threshold; and determining the weight vocabulary of at least one verb type as the keyword in the at least one weight vocabulary.
In the above apparatus, the at least one dimension includes: at least one of a semantic dimension, a keyword dimension, an event heat dimension, and an event occurrence time dimension; the screening module is further configured to score the candidate event in at least one dimension to obtain at least one dimension score corresponding to the candidate event; obtaining a composite score based on the at least one dimension score; and screening the candidate events based on the comprehensive scores to obtain the venation events.
In the above apparatus, the semantic dimension includes: a semantic similarity dimension; the at least one dimension score includes: the screening module is further used for performing semantic analysis on the core event title of the core event and the candidate event title of the candidate event by using a preset semantic analysis model to obtain a core semantic feature corresponding to the core event and a candidate semantic feature corresponding to the candidate event; obtaining semantic similarity of the core event and the candidate event by calculating a feature vector distance between the core semantic feature and the candidate semantic feature; and scoring the candidate events based on the semantic similarity to obtain the semantic similarity score.
In the above apparatus, the semantic dimension includes: semantic clustering dimensions; the at least one dimension score includes: the screening module is further used for performing density clustering on the core event and the candidate event based on the semantic similarity of the core event and the candidate event to obtain at least one event cluster; and scoring the candidate events according to the event cluster where the core event is located to obtain the semantic clustering score of the candidate events.
In the above apparatus, the at least one dimension score includes: grading the keywords; the screening module is further used for calculating the word frequency of the keyword through a preset corpus and calculating the inverse document frequency corresponding to the keyword based on the word frequency; taking keywords contained in the candidate event titles of the candidate events as internal keywords, and scoring the candidate events based on the inverse document frequency of the internal keywords to obtain the scores of the keywords; the keyword score is inversely proportional to the inverse document frequency.
In the device, the screening module is further configured to perform weight adjustment of a preset coefficient on the inverse document frequency of the internal keyword to obtain a weight-adjusted inverse document frequency; and using a preset frequency adjusting factor to perform minimum value summation average on the weight-adjusting inverse document frequency to obtain the keyword score.
In the above apparatus, the event information includes: event heat; the screening module is further used for determining the corresponding target heat interval in at least one preset heat interval according to the event heat of the candidate event; and obtaining a heat score corresponding to the candidate event according to a preset heat coefficient corresponding to the target heat interval.
In the above apparatus, the event information includes: the time of occurrence of the event; the at least one dimension score includes: a time dimension score; the screening module is further used for calculating the time difference of the event occurrence time of the candidate event and the context event; and determining a corresponding target time difference interval in at least one preset time difference interval according to the time difference, and taking a preset score corresponding to the target time difference interval as a time score corresponding to the candidate event.
In the device, the extraction module is further configured to obtain, from background search data corresponding to a preset application, at least one search event with an event heat higher than a preset heat threshold as a hot event; and extracting an event title of each search event in the at least one search event and at least one of the event heat and the event occurrence time of each search event, and taking the event title and at least one of the event heat and the event occurrence time as the event information.
In the above apparatus, the display module is further configured to time-sequence the context events, generate an event context according to a sequencing result and in combination with the core event, and display an extended event through the event context.
An embodiment of the present application provides an electronic device, including:
a memory for storing executable instructions;
and the processor is used for realizing the extended event display method provided by the embodiment of the application when the executable instructions stored in the memory are executed.
The embodiment of the application provides a computer-readable storage medium, which stores executable instructions and is used for causing a processor to execute the method for displaying the extended event, which is provided by the embodiment of the application.
The embodiment of the present application provides a computer program product, which includes a computer program or instructions, and when the computer program or instructions are executed by a processor, the extended event presentation method provided by the embodiment of the present application is implemented.
The embodiment of the application has the following beneficial effects:
according to the method and the device, the candidate events related to the core event are recalled from the events to be processed according to the key words of the core event, then at least one dimension is screened for the candidate events to obtain the context event, the candidate events are subjected to noise reduction processing to obtain the context event, and the core event and the context event are combined to perform extended event display; the calculation amount of event venation generated directly through large-scale clustering is reduced, and the venation event screening efficiency is improved, so that the efficiency of extended event display is improved; and by means of keyword matching and a mode of screening and denoising in at least one dimension, the accuracy of the venation events obtained by screening is greatly improved, the accurate event venation construction under the large-scale event quantity is realized, and the accuracy of extended event display is improved.
Drawings
FIG. 1 is an alternative structural diagram of an extended event presentation system architecture provided by an embodiment of the present application;
FIG. 2 is an alternative structural diagram of an extended event display apparatus provided in the embodiment of the present application;
fig. 3 is an alternative flow chart of an extended event presentation method according to an embodiment of the present application;
FIG. 4 is a diagram illustrating an alternative effect of event context provided by an embodiment of the present application;
FIG. 5 is an alternative flowchart of an extended event presentation method according to an embodiment of the present disclosure;
FIG. 6 is an alternative flow chart of a predetermined semantic analysis model training method according to an embodiment of the present disclosure;
FIG. 7 is an alternative flowchart of an extended event presentation method according to an embodiment of the present application;
FIG. 8 is an alternative flow chart of an extended event presentation method according to an embodiment of the present disclosure;
FIG. 9 is an alternative flowchart of an extended event presentation method according to an embodiment of the present application;
FIG. 10 is an alternative flowchart of an extended event presentation method according to an embodiment of the present application;
FIG. 11 is a schematic flow chart of an alternative context event screening method according to an embodiment of the present disclosure;
fig. 12 is an optional schematic diagram of a DBSCAN clustering process provided in the embodiment of the present application.
Detailed Description
In order to make the objectives, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the attached drawings, the described embodiments should not be considered as limiting the present application, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.
In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.
In the following description, references to the terms "first \ second \ third" are only to distinguish similar objects and do not denote a particular order, but rather the terms "first \ second \ third" are used to interchange specific orders or sequences, where appropriate, so as to enable the embodiments of the application described herein to be practiced in other than the order shown or described herein.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.
Before further detailed description of the embodiments of the present application, terms and expressions referred to in the embodiments of the present application will be described, and the terms and expressions referred to in the embodiments of the present application will be used for the following explanation.
1) Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. That is, artificial intelligence is an integrated technique in computer science that is used to capture the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. In addition, artificial intelligence is also used for researching the design principle and implementation method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making. In addition, the artificial intelligence technology is a comprehensive subject, and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operating/interactive systems, and mechatronics. The artificial intelligence software technology mainly includes several directions, such as computer vision technology, speech processing technology, natural language processing technology, and Machine Learning (ML)/deep Learning.
2) Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence; refers to the study of various theories and methods that enable efficient communication between humans and computers using natural language. Therefore, natural language processing is a science integrating linguistics, computer science and mathematics; thus, research in the field of natural language processing will involve natural language, i.e., the language that people routinely use, so natural language processing has been intimately linked with research in linguistics. Natural language processing techniques typically include Machine Reading Comprehension (MRC), text processing, semantic understanding, machine translation, robotic question and answer, and knowledge mapping.
3) Machine learning, which is a multi-field cross discipline and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis and algorithm complexity theory; the method is used for researching how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, and is a fundamental approach for computers to have intelligence, the application of machine learning extends to various fields of artificial intelligence, and machine learning and deep learning generally comprise technologies such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning and formal education learning.
4) Named Entity Recognition (NER), also known as Entity Recognition, entity chunking and Entity extraction, is used to locate and classify Named entities in text into predefined categories, such as people, organizations, locations, time expressions, quantities, monetary values, percentages, etc.; generally, the task of named entity recognition is to identify named entities of three major classes (entity class, time class, and numeric class) and seven minor classes (person name, organization name, place name, time, date, currency, and percentage) in the text to be processed. In the embodiment of the application, entities of preset entity types, such as entities of a person name and a place name, are acquired through named entity identification.
5) Density-Based Spatial Clustering of Applications with Noise (DBSCAN): a representative density-based clustering algorithm.
6) The Inverse Document Frequency (IDF) is higher, which means that the higher the Inverse Document Frequency, the less the information gain of the corresponding vocabulary is, and the smaller the contribution to the text classification is.
7) Conversion model of Bidirectional Encoder representation (Bidirectional Encoder representation from transforms, BERT): a pretraining technique for Natural Language Processing (NLP).
At present, in the related technology, when an event context is generated, generally, time window aggregation is performed on articles, article clusters are generated through denoising and clustering, and finally, a representative article is selected to generate an event. Furthermore, when a venation is generated according to events, a model based on similarity measurement is generally used for judging whether two events belong to the same venation; or directly clustering all events into a plurality of clusters in a large-scale clustering mode, wherein each cluster is a venation. It can be seen that, in the method of the related art, events need to be generated through articles first and then connected in series to form a venation, so that the difficulties of two links are more, the efficiency is lower, and the venation is directly generated through a similarity measurement or clustering mode, and only a small number of events in a period of time can be processed. When the magnitude of the event library is over ten thousand, the accuracy of the related technology is greatly reduced, and the requirement of generating the long-time span venation on a large scale in a normalized mode cannot be met. Therefore, the contextual event obtaining efficiency and accuracy of the related art are low, so that the efficiency and accuracy of extended event display according to the contextual event are reduced.
The embodiment of the application provides an extended event display method, an extended event display device, extended event display equipment and a computer readable storage medium, and can improve the efficiency and accuracy of extended event display. The following describes an exemplary application of the electronic device provided in the embodiment of the present application, and the electronic device provided in the embodiment of the present application may be implemented as various types of user terminals such as a smart phone, a smart watch, a notebook computer, a tablet computer, a desktop computer, a set-top box, a mobile device (e.g., a mobile phone, a portable music player, a personal digital assistant, a dedicated messaging device, a portable game device), an intelligent voice interaction device, an intelligent appliance, and a vehicle-mounted terminal, and may also be implemented as a server. In the following, an exemplary application will be explained when the electronic device is implemented as a server.
Referring to fig. 1, fig. 1 is an alternative architecture diagram of an event context system 100 provided in the embodiment of the present application, in which a terminal 400 is connected to a server 200 through a network 300, and the network 300 may be a wide area network or a local area network, or a combination of both.
The server 200 is configured to obtain an event to be processed by extracting event information of the hot event, and determine a core event from the event to be processed; extracting key words from the core event to obtain key words corresponding to the core event; matching a recall event from the events to be processed based on the keywords; screening at least one dimension of the recalling event to obtain a candidate event; and performing extended event display according to the candidate event and the core event. Illustratively, the event context is generated according to the candidate event and the core event for displaying, or the recommendation display of the related entry is performed at a preset page position corresponding to the core event. Here, the server 200 may push the event context or the related entry as an extended event to the terminal 400, or may store the extended event in a storage space corresponding to the server 200 for the terminal 400 to access.
The terminal 400 is configured to access the server 200 through a web client or an application and acquire an extended event; or, receiving the extended event pushed by the server 200, and further displaying the extended event on the graphical interface 410 of the web client or the application.
In some embodiments, the server 200 may be an independent physical server, may also be a server cluster or a distributed system formed by a plurality of physical servers, and may also be a cloud server that provides basic cloud computing services such as cloud services, a cloud database, cloud computing, cloud functions, cloud storage, a network service, cloud communication, middleware services, domain name services, security services, a CDN, and a big data and artificial intelligence platform. The terminal 400 may be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, a smart voice interaction device, a smart home appliance, a vehicle-mounted terminal, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the embodiment of the present application is not limited.
Referring to fig. 2, fig. 2 is a schematic structural diagram of a server 200 according to an embodiment of the present application, where the server 200 shown in fig. 2 includes: at least one processor 210, memory 250, at least one network interface 220, and a user interface 230. The various components in server 200 are coupled together by a bus system 240. It is understood that the bus system 240 is used to enable connected communication between these components. The bus system 240 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 240 in fig. 2.
The Processor 210 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor, or the like.
The user interface 230 includes one or more output devices 231, including one or more speakers and/or one or more visual display screens, that enable the presentation of media content. The user interface 230 also includes one or more input devices 232, including user interface components that facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, other input buttons and controls.
The memory 250 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard disk drives, optical disk drives, and the like. Memory 250 optionally includes one or more storage devices physically located remotely from processor 210.
The memory 250 includes volatile memory or nonvolatile memory, and may include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read Only Memory (ROM), and the volatile Memory may be a Random Access Memory (RAM). The memory 250 described in embodiments herein is intended to comprise any suitable type of memory.
In some embodiments, memory 250 is capable of storing data, examples of which include programs, modules, and data structures, or a subset or superset thereof, to support various operations, as exemplified below.
An operating system 251 including system programs for processing various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and processing hardware-based tasks;
a network communication module 252 for communicating to other computing devices via one or more (wired or wireless) network interfaces 220, an exemplary network interface 420 comprising: bluetooth, wireless compatibility authentication (WiFi), and Universal Serial Bus (USB), etc.;
a presentation module 253 to enable presentation of information (e.g., a user interface for operating peripherals and displaying content and information) via one or more output devices 231 (e.g., a display screen, speakers, etc.) associated with the user interface 230;
an input processing module 254 for detecting one or more user inputs or interactions from one of the one or more input devices 232 and translating the detected inputs or interactions.
In some embodiments, the apparatus provided in the embodiments of the present application may be implemented in software, and fig. 2 illustrates an extended event presentation apparatus 255 stored in the memory 250, which may be software in the form of programs and plug-ins, and includes the following software modules: extraction module 2551, matching module 2552, screening module 2553 and presentation module 2554, which are logical and therefore can be arbitrarily combined or further split depending on the functionality implemented.
The functions of the respective modules will be explained below.
In other embodiments, the apparatus provided in the embodiments of the present Application may be implemented in hardware, and for example, the apparatus provided in the embodiments of the present Application may be a processor in the form of a hardware decoding processor, which is programmed to execute the extended event presentation method provided in the embodiments of the present Application, for example, the processor in the form of the hardware decoding processor may be one or more Application Specific Integrated Circuits (ASICs), DSPs, programmable Logic Devices (PLDs), complex Programmable Logic Devices (CPLDs), field Programmable Gate Arrays (FPGAs), or other electronic components.
In some embodiments, the terminal or the server may implement the extended event presentation method provided by the embodiments of the present application by running a computer program. For example, the computer program may be a native program or a software module in an operating system; the Application program may be a local (Native) Application program (APP), that is, a program that needs to be installed in an operating system to be executed, such as a social Application APP or a message sharing APP; or may be an applet, i.e. a program that can be run only by downloading it to the browser environment; but also an applet or web client that can be embedded in any APP. In general, the computer programs described above may be any form of application, module or plug-in.
The extended event presentation method provided by the embodiment of the present application will be described in conjunction with exemplary applications and implementations of the server provided by the embodiment of the present application. The event integration method provided by the embodiment of the application can be applied to various scenes such as cloud technology, artificial intelligence, intelligent traffic and vehicle-mounted.
Referring to fig. 3, fig. 3 is an alternative flowchart of an extended event presentation method provided in the embodiment of the present application, and will be described with reference to the steps shown in fig. 3.
S101, obtaining the event to be processed by extracting the event information of the hot event, and determining the core event from the event to be processed.
In the embodiment of the application, the electronic device may obtain the hotspot event from a background server of a preset application, where the hotspot event may include at least one event with an event heat higher than a preset heat threshold. The electronic equipment extracts event information from each event of the hot events and generates a to-be-processed event corresponding to each event according to the event information.
In some embodiments, the electronic device may obtain, from background search data corresponding to a preset application, at least one search event with an event popularity higher than a preset popularity threshold as a hotspot event. For example, the electronic device may obtain, from a background server of a news application or an instant message sharing social application, the top M topic events with high search ranking and/or interaction discussion degrees as hotspot events according to background search data of the application, such as list data of search ranking information of the topic events. Wherein M is a positive integer greater than or equal to 1. The electronic device extracts information required to generate an event context from at least one search event as event information.
Here, the event popularity may be determined according to at least one of a search volume, a click volume, a reading volume, a mention volume, a mutual volume (e.g., a comment volume, a forwarding volume, an approval volume), and the like of the search event, which characterize the event attention.
In some embodiments, since the hotspot event generally has temporal continuity, and the same hotspot event may correspond to different event titles in different time periods, the electronic device may not only obtain the hotspot event from current background search data, such as the latest hot search board data in the same day or an hour, but also obtain the hotspot event from a preset time range, such as the hot search board data in the last half year, to ensure that the obtained hotspot event can contain event information in different time spans.
In some embodiments, the electronic device may identify, as the event information, at least one of the event heat and the event occurrence time, the event title, and the content from the event title and the content identification of each of the at least one search event, and at least one of the event heat and the event occurrence time of each search event.
In some embodiments, the event popularity may include the event popularity, such as the event history popularity, the search volume, the reading volume, and so on; the content identification may include an article ID corresponding to the event, such as an article ID in background data of a content document corresponding to the event, and so on. The electronic device may construct a to-be-processed event in the form of an event library according to the event title and the content identifier extracted from each event of the hotspot event, and the event heat and the event occurrence time, as shown in table 1, as follows:
Figure BDA0003271175160000121
TABLE 1
In table 1, there may be one article ID or a plurality of article IDs corresponding to each event title, and the embodiment of the present application is not limited.
The background search data corresponding to the preset application is directly used as the source of the hotspot event, the to-be-processed event can be directly generated based on the existing list data, and compared with the current process of generating the event through the article in the related technology, the event processing process is saved, and the generation efficiency of the context event is improved.
In some embodiments, the electronic device may treat the target event as a core event by receiving a selected instruction for the target event in the pending event. The electronic device may also determine the core event from the events to be processed according to event information of the events to be processed, such as heat information of each event. The specific selection is performed according to actual conditions, and the embodiments of the present application are not limited.
And S102, extracting keywords from the core event to obtain keywords corresponding to the core event.
In the embodiment of the application, the electronic device can use a core word algorithm to extract the key words of the event information of the core event, and at least one vocabulary is obtained and used as the key words corresponding to the core event. The electronic equipment can determine that the entity or the vocabulary with higher weight is the keyword in the event information of the core event, and recall the candidate event with high correlation degree with the core event from the events to be processed through the keyword.
In some embodiments, the event information includes an event title; the process of extracting the keywords from the core event by the electronic device to obtain the keywords corresponding to the core event can be implemented through S1021-S1022, which will be described with reference to each step.
S1021, conducting named entity identification on the core event titles of the core events to obtain at least one named entity.
In the embodiment of the application, the electronic device obtains a core event title from event information of a core event, and performs named entity identification on the core event title to obtain at least one named entity.
In some embodiments, the electronic device may perform named entity recognition on the core event title through a character-level Convolutional Neural Network (CWCNN) to obtain a named entity; named entity recognition can also be performed through a Long Short-Term Memory network (LSTM) in combination with a Conditional Random Field (CRF) model or other natural language processing models to obtain a named entity, and the named entity is specifically selected according to actual conditions, which is not limited in the embodiment of the present application.
S1022, in the at least one named entity, a named entity of a preset noun type is used as a keyword.
In the embodiment of the application, the electronic device can take a named entity with a preset noun type in at least one named entity as a keyword; illustratively, the name of a person and the name of a place in at least one named entity are used as keywords.
In some embodiments, the electronic device may also use, as a keyword, a name plate, a major event name, and the like in at least one named entity of other preset noun types according to a requirement of an actual project, which is specifically selected according to an actual situation, and the embodiment of the present application is not limited.
In some embodiments, in the case that the electronic device obtains the number of the keywords smaller than the preset number threshold through the named entity extraction process of S1021-S1022 described above, the electronic device may further extract the keywords from the core event titles through a word right analysis, which may be implemented through S1023-S1024, as follows,
s1023, performing word right analysis on the core event title to obtain at least one weighted word.
In the embodiment of the application, the electronic device may perform word right analysis on words included in the core event title to obtain at least one weighted word.
In some embodiments, the electronic device may perform a word weight analysis on the core event title through an eXtreme Gradient Boosting (xgboost) classification model, and obtain a vocabulary with a vocabulary weight higher than a preset weight threshold as at least one weighted vocabulary. Other ensemble learning models can be selected for word right analysis according to actual conditions, and the selection is specifically performed according to the actual conditions, which is not limited in the embodiment of the present application.
S1024, determining the weight vocabulary of the at least one verb type as the key word in the at least one weight vocabulary.
In the embodiment of the application, the electronic equipment adds the vocabulary of the verb type in the at least one weighted vocabulary into the keyword.
In some embodiments, the preset number threshold may be three, and in the case that the number of the keywords identified by the named entity is greater than or equal to three, the electronic device may not perform the weight analysis process of S1023 to S1024 to add the keywords; in a case where the preset number threshold is less than three, the electronic device may add all the weighted vocabularies of the at least one verb type obtained by the method of the word weight analysis to the keyword. In some embodiments, the electronic device may also obtain the keywords in a manner of named entity extraction in combination with word right analysis, directly through S1021-S1024, regardless of the number of the keywords. The specific selection is performed according to actual conditions, and the embodiments of the present application are not limited.
S103, searching and matching are carried out on the basis of the event to be processed by utilizing the keywords, and candidate events are obtained.
In the embodiment of the application, the electronic device may use the keyword to perform matching in the event to be processed, and use at least one event containing the keyword in the event to be processed as a candidate event.
In some embodiments, the electronic device may also perform a web search using the keywords, extract event information from the searched articles or web contents matching the keywords, and obtain candidate events based on the extracted event information. Illustratively, extracting titles of articles or network contents from the searched articles or network contents as event titles; extracting links of the articles or the network contents as content identifications, and extracting reading amount and/or mutual amount of the articles or the network contents as event heat; extracting publication time of articles or network contents as event occurrence time; and generating candidate events according to the event titles, the content identifications, the event popularity and the event occurrence events, so that the candidate events can be recalled and generated in a larger network range by utilizing the keywords.
In some embodiments, the electronic device may also perform semantic expansion based on the keyword to obtain a similar vocabulary or a similar vocabulary corresponding to the keyword, and further expand the keyword according to the similar vocabulary or the similar vocabulary to more accurately match a greater number of candidate events.
S104, screening at least one dimension of the candidate events to obtain the venation events.
In the embodiment of the application, the electronic device can perform at least dimensional screening on the candidate events recalled by matching the keywords of the core event, perform noise reduction on the candidate events, and remove the candidate events with low correlation degree with the core event through at least dimensional evaluation to obtain the context event.
In some embodiments, the at least one dimension may include: at least one of a semantic dimension, a keyword dimension, an event structure information dimension. The semantic dimension representation is used for screening candidate events from the dimension of semantic similarity, the keyword dimension representation is used for screening the dimension of hit keywords contained in the candidate events, and the event structure information dimension representation is used for screening the dimension of structured information in event information, such as event occurrence time, event heat and the like. The electronic device may perform scoring of at least one dimension on the candidate event based on the at least one dimension to obtain at least one dimension score corresponding to the candidate event; obtaining a composite score based on the at least one dimension score; and screening the candidate events based on the comprehensive scores to obtain the venation events.
It should be noted that, in the embodiment of the present application, the process of scoring, by the electronic device, the candidate event in at least one dimension based on at least one dimension is not sequentially distinguished. Any sequence may be adopted to perform according to the needs of actual engineering, or scoring of at least one dimension may be performed in parallel, specifically, the scoring is performed according to actual conditions, and the embodiment of the present application is not limited.
In some embodiments, the electronic device can sum the at least one dimension score to obtain a composite score; at least one dimension score can also be subjected to weight adjustment or processing of other forms of algorithms to obtain a comprehensive score, and the comprehensive score is specifically selected according to actual conditions, which is not limited in the embodiment of the application.
In the embodiment of the application, the electronic device may rank the candidate events according to the composite score corresponding to at least one dimension, and determine the top N ranked events as context events according to the requirement of practical application. Wherein N is a positive integer greater than or equal to 1.
And S105, performing extended event display according to the candidate event and the core event.
In the embodiment of the application, the electronic device may use the candidate event as an extended event corresponding to the core event, and perform extended event display according to the candidate event and the core event.
In some embodiments, the electronic device may time-sort context events, generate an event context in conjunction with core events according to the results of the sorting, and spread event presentation through the event context.
For example, the extended event presentation generation method according to the embodiment of the present application may be used in a scene of a search application, and when a user uses a terminal to search for an event with continuation information, an event context corresponding to the event searched by the user may be generated and presented by the method according to the embodiment of the present application, as shown in fig. 4. Fig. 4 shows a graph of the effect of event context. In the page of fig. 4, the event context includes a core event 40 and a plurality of context events 41 presented in list form below the core event 40; each context event is ordered according to the occurrence time of the event, and the user can know the continuing process of the event development through the sequence of the context events. Moreover, in a case where one of the context events, such as the context event 42 shown in fig. 4, receives a viewing operation of the user, such as a clicking operation, the electronic device may enter an article page corresponding to the context event 42 according to a content identifier, such as an article ID, corresponding to the context event 42. In the case that the control 43 in fig. 4, i.e. the "expand more" button receives a click operation, the electronic device may expand all context events in the current context event list, so as to provide the user with complete context information of the events, and implement an expanded event presentation.
In some embodiments, the extended event presentation method in the embodiments of the present application may also be used in searching for a recommended information flow scenario; illustratively, under the condition that a user finishes reading an article once, search terms of context events related to the article can be recommended at the bottom of the article to realize extended event display, so that the requirement of the user on knowing the related events is met, and the search frequency of the user is increased.
In some embodiments, the extended event display method in the embodiments of the present application may also be used in a context of related search, and when a user searches for all related events, entries of the related events may be directly displayed in a search result, so as to implement extended event display, guide the user to pay attention to the related events, and improve the search frequency of the user.
In some embodiments, the extended event display in the embodiment of the application can also be used in an information stream distribution scene, and the extended event display is performed by directly displaying the key context event information and the corresponding article, so that the requirement of a user for knowing the consequences of the event is met, the interest of the user in clicking the article is promoted, and the article distribution is facilitated.
It can be understood that, according to the key words of the core event, the embodiment of the application recalls the candidate event related to the core event from the events to be processed, and then screens the candidate event in at least one dimension to obtain the context event, so that the noise reduction processing of the candidate event is realized to obtain the context event, and the extended event display is performed by combining the core event and the context event; the calculation amount of event venation generated directly through large-scale clustering is reduced, and the venation event screening efficiency is improved, so that the efficiency of extended event display is improved; and through keyword matching and a mode of screening and denoising by at least one dimension, the accuracy of the venation events obtained by screening is greatly improved, the accurate event venation construction under the large-scale event quantity is realized, and the accuracy of extended event display is improved.
In some embodiments, semantic dimensions of the at least one dimension include: a semantic similarity dimension; the at least one dimension score includes: the semantic similarity score, and the process of the electronic device scoring at least one dimension of the candidate event to obtain at least one dimension score corresponding to the candidate event may be implemented through S201 to S203 as shown in fig. 5, which will be described with reference to each step.
S201, performing semantic analysis on the core event titles of the core events and the candidate event titles of the candidate events by using a preset semantic analysis model to obtain core semantic features corresponding to the core events and candidate semantic features corresponding to the candidate events.
In the embodiment of the application, the electronic device can acquire the core event title of the core event and the candidate event title of the candidate event, and perform semantic analysis on the core event title and the candidate event title by using the preset semantic analysis model to obtain the core semantic features corresponding to the core event and the candidate semantic features corresponding to the candidate event.
In some embodiments, the preset semantic analysis model may be a BERT model, and the electronic device may perform semantic analysis on the core event title and the candidate event title through the BERT model, respectively, and obtain a cls vector of a first dimension, such as a 768-dimensional feature vector, output by the BERT model for the core event title and the candidate event title, respectively, as the core semantic feature and the candidate semantic feature.
In some embodiments, the electronic device may obtain an initial semantic analysis model and training samples before performing S201; illustratively, a training sample of manually labeled 1 ten thousand pairs of related events may be obtained; the training sample comprises a first event title and a second event title, wherein the first event title and the second event title are event titles of related events. The electronic device constructs a training model of a double-tower structure including a first initial semantic analysis model and a second initial semantic analysis model according to the initial semantic analysis model, as shown in fig. 6. Illustratively, the first initial semantic analysis model and the second initial semantic analysis model may be BERT models.
In some embodiments, as shown in fig. 6, in each round of training, the electronic device may perform semantic analysis processing on a first event title in a training sample through a first initial semantic analysis model to obtain a cls vector corresponding to the first event title as a first semantic feature; and performing semantic analysis processing on a second event title in the training sample through a second initial semantic analysis model to obtain a cls vector corresponding to the second event title as a second semantic feature. The electronic equipment calculates cosine similarity between the first semantic features and the second semantic features, and performs cross entropy loss calculation based on the cosine similarity to serve as training loss of each round of training. The electronic equipment adjusts the training model comprising the first initial semantic analysis model and the second initial semantic analysis model based on the training loss until the training loss reaches a preset training condition, and a preset semantic analysis model which is trained is obtained.
S202, obtaining semantic similarity of the core event and the candidate event by calculating a feature vector distance between the core semantic features and the candidate semantic features.
In the embodiment of the application, the electronic device may calculate a feature vector distance between the core semantic feature and the candidate semantic feature, for example, calculate a cosine distance as a semantic similarity between the core event and the candidate event.
In some embodiments, the electronic device may also calculate the semantic similarity between the core semantic features and the candidate semantic features by using other methods, specifically, the semantic similarity is selected according to actual situations, and the embodiments of the present application are not limited.
And S203, scoring the candidate events based on the semantic similarity to obtain a semantic similarity score.
In the embodiment of the application, the electronic device can score the candidate events based on the semantic similarity to obtain the semantic similarity score.
In some embodiments, the electronic device may normalize the calculated semantic similarity between the core semantic feature and the candidate semantic feature to a numerical interval of [0,1] to obtain a semantic similarity score.
In some embodiments, the semantic similarity score is 1 at most, and may also be set according to the needs of actual engineering, specifically selected according to the actual situation, and the embodiment of the present application is not limited.
It can be understood that the accuracy of the venation event screened from the candidate events can be improved by screening the candidate events in combination with the semantic similarity, so that the accuracy of extended event display according to the candidate events and the core events is improved.
In some embodiments, semantic dimensions include: semantic clustering dimensions; the at least one dimension score includes: a semantic clustering score; the process of the electronic device scoring the candidate event in at least one dimension to obtain at least one dimension score corresponding to the candidate event may be as shown in fig. 7, including S301-S302, which will be described with reference to the steps.
S301, performing density clustering on the core events and the candidate events based on semantic similarity of the core events and the candidate events to obtain at least one event cluster.
In some embodiments, the electronic device may perform DBSCAN density clustering on the candidate events based on semantic similarity between the core events and the candidate events to obtain at least one event cluster.
In some embodiments, the electronic device may calculate semantic similarity between the core event and the candidate event based on the method in S201-S202, and perform neighborhood size measurement in the DBSCAN cluster based on the semantic similarity, thereby implementing the DBSCAN cluster based on the semantic similarity.
In some embodiments, the preset minimum density sample number for DBSCAN clustering may be 1; the preset density threshold may be 0.2, which is specifically selected according to actual situations, and the embodiment of the present application is not limited.
And S302, scoring the candidate events according to the event cluster where the core event is located to obtain the semantic clustering score of the candidate events.
In the embodiment of the application, the correlation between the candidate event belonging to the same event cluster as the core event and the core event is higher, and the electronic equipment can score the candidate event by improving the score of the candidate event belonging to the same event cluster as the core event so as to obtain the semantic clustering score of the candidate event.
In some embodiments, the electronic device may count, by means of DBSCAN density clustering, 1 score for a semantic clustering score of a candidate event that belongs to the same event cluster as a core event, and a non-score for a candidate event that does not belong to the same event cluster as the core event, so as to obtain a semantic clustering score of the candidate event.
It can be understood that the score of the candidate event belonging to the same cluster as the core event can be greatly improved by scoring and screening the candidate event through the density clustering of semantic dimensions, so that the accuracy of the venation event screened from the candidate event is improved, and the accuracy of extended event display according to the candidate event and the core event is improved. Moreover, the difference between the embodiment of the present application and the context event obtained by using the clustering method in the related art at present is that: in the correlation technology, a plurality of clusters are obtained through semantic clustering, events in each cluster are venation events in the same venation, and the venation events obtained through direct clustering are low in accuracy. The semantic density clustering plays a role in noise reduction, and candidate events which do not belong to the same event cluster as the core event are removed by reducing scores from the candidate events obtained according to the keyword matching, so that the accuracy of screening the venation events from the candidate events is greatly improved, and the accuracy of performing extended event display according to the candidate events and the core event is further improved.
In some embodiments, where the semantic dimensions include keyword dimensions; the at least one dimension score may include: grading the keywords; the process of the electronic device scoring the candidate event in at least one dimension to obtain at least one dimension score corresponding to the candidate event may be implemented through S401 to S402 as shown in fig. 8, and will be described with reference to the steps.
S401, calculating word frequency of the keyword through a preset corpus, and calculating inverse document frequency corresponding to the keyword based on the word frequency.
In the embodiment of the application, the electronic device can obtain the word frequency of the keyword by segmenting the event title in the preset corpus and calculate the inverse document frequency corresponding to the keyword based on the word frequency.
In some embodiments, the predetermined corpus may contain 100 ten thousand event titles of the history, and the electronic device may calculate the inverse document frequency according to equation (1), as follows:
Figure BDA0003271175160000201
in the formula (1), num1 is an event title in a preset corpus, such as 100 ten thousand; num2 is the number of candidate event titles that contain the vocabulary. n is the number of keywords, and i characterizes the ith of the n keywords. IDF i The document frequency is the inverse of the ith keyword. It can be seen that the more common the keyword is in the candidate event titles, the larger the denominator of equation (1), and the smaller the inverse document frequency, the closer to 0.
S402, taking keywords contained in the candidate event titles of the candidate events as internal keywords, and scoring the candidate events based on the inverse document frequency of the internal keywords to obtain keyword scores; the keyword score is inversely proportional to the inverse document frequency.
In the embodiment of the application, the electronic device may use keywords included in the candidate event titles of the candidate events as internal keywords, and based on the inverse document frequency of the internal keywords, score the candidate events according to the IDF values of the keywords hit in the candidate event titles to obtain the keyword scores.
In some embodiments, the electronic device may perform weight adjustment of a preset coefficient on the inverse document frequency of the internal keyword to obtain a weight-adjusted inverse document frequency; and using a preset frequency adjusting factor to perform minimum value summation average on the weighting inverse document frequency to obtain a keyword score. Illustratively, the above process can be implemented by equation (2), as follows:
Figure BDA0003271175160000211
in formula (2), α is a preset coefficient, and may be 5, for example, i.e., 5 times of the inverse document frequency. β is a preset frequency adjustment factor, and may be 1, for example. score IDF The keywords are scored. It can be seen that the higher the inverse document frequency, the lower the keyword score, i.e., the keyword score is inversely proportional to the inverse document frequency.
In some embodiments, the score of the keyword may be 2 at the highest, or the preset coefficient and the preset frequency adjustment factor may be adjusted according to the requirement of the actual engineering, specifically, the selection is performed according to the actual situation, and the embodiment of the present application is not limited.
It can be understood that the accuracy of context events screened from the candidate events can be improved by screening the candidate events through the inverse document frequency of the keyword, so that the accuracy of extended event display according to the candidate events and the core events is improved.
In some embodiments, the event information comprises: event heat; where the at least one dimension comprises an event heat dimension, the at least one dimension score comprises: scoring the heat dimension; the process of the electronic device scoring the candidate event in at least one dimension to obtain at least one dimension score corresponding to the candidate event may be implemented through S501 to S502 as shown in fig. 9, and will be described with reference to the steps.
S501, according to the event heat of the candidate event, determining the corresponding target heat interval in at least one preset heat interval.
And S502, obtaining a heat score corresponding to the candidate event according to a preset heat coefficient corresponding to the target heat interval.
In this embodiment, the electronic device may perform heat dimension scoring on the candidate event according to an event heat in the event information, such as a historical maximum heat, by combining at least one preset score value corresponding to at least one preset heat interval.
In the embodiment of the application, the electronic device determines the corresponding target heat interval in at least one preset heat interval according to the event heat of the candidate event, and obtains the heat score corresponding to the candidate event according to the preset heat coefficient corresponding to the target heat interval. Illustratively, as shown in equation (3), the following:
Figure BDA0003271175160000221
in the formula (3), heat is the event heat, and 200 and 50 are respectively preset heat interval thresholds after the normalization of numerical units. According to formula (3), the electronic device may score 1 for candidate events with an event heat higher than 200 ten thousand; basis for event heat degree of 50 to 200 ten thousand
Figure BDA0003271175160000222
Is calculated to score 0 less than 50 ten thousand calories, thereby obtaining a heat score heat
It can be understood that the accuracy of the venation events screened from the candidate events can be improved by screening the candidate events according to the event heat, so that the accuracy of extended event display according to the candidate events and the core events is improved.
In some embodiments, the event information comprises: the time of occurrence of the event; where the at least one dimension comprises an event occurrence time dimension, the at least one dimension score comprises: a time dimension score; the process of the electronic device scoring the candidate event in at least one dimension to obtain at least one dimension score corresponding to the candidate event may be implemented through S601-S603 as shown in fig. 10, which will be described with reference to the steps.
S601, calculating the time difference of the event occurrence time of the candidate event and the core event.
S602, according to the time difference, determining a corresponding target time difference interval in at least one preset time difference interval, and taking a preset score corresponding to the target time difference interval as a time score corresponding to the candidate event.
In the embodiment of the application, the electronic device can score the candidate event according to the event occurrence time to obtain the time score of the candidate time event. Wherein the time difference between the candidate event occurrence time and the core event occurrence time is inversely proportional to the time score. Illustratively, as shown in equation (4), the following:
Figure BDA0003271175160000231
in the formula (4), the time _ diff is a time difference between the occurrence times of the candidate event and the context event, and the electronic device may score the time difference by 1 within 24 hours, score by 0.5 within 72 hours, and score by 0 in the rest according to the formula (4) to obtain the time score of the candidate event time
It can be understood that the candidate events are screened according to the time difference between the event occurrence times of the candidate events and the core events, so that the accuracy of the venation events screened from the candidate events can be improved, and the accuracy of extended event display according to the candidate events and the core events is improved.
Next, an exemplary application of the embodiment of the present application in a practical application scenario will be described.
The extended event display method in the embodiment of the application can generate an event context according to context events and core events under the condition of being applied to the event context information display in a search scene, and performs extended event display through the event context to provide gain information of a user outside a search word through the event context, actively excavate the requirement of related reading of the user on the premise of meeting the search requirement of the user, and improve the user value of a search result page. As shown in fig. 11, in the search scenario, event context generation and event presentation expansion can be implemented through the processes of S701-S707 as follows:
s701, determining a core event.
In S701, the electronic device obtains background search ranking data, such as a search ranking list, from a background server of the social network platform shared by the instant messages, when receiving an instruction to generate an event context. The electronic device extracts the hotspot events from the search leaderboard, extracts event information from the hotspot events, and constructs an event library shown in fig. 11 according to the event information. Here, the event in the event library is a pending event.
In S701, the electronic device may use, as the core event, the to-be-processed event corresponding to the selection instruction in the event library according to the selection instruction of the core event.
S702, acquiring a core event title.
In S702, the electronic device obtains a core event title from the event information of the core event.
And S703, extracting keywords.
In S703, the electronic device determines a core event from the event library, and extracts a keyword from the core event.
Here, the process of S703 is consistent with the process description in S102 described above, and is not described here again.
And S704, recalling the keywords.
In S704, the electronic device recalls the candidate event containing the keyword from the event library according to the keyword of the core event.
The process of S704 here is consistent with the process description in S103 above, and is not described here again.
S705, scoring at least one dimension.
In S705, the electronic device may use the event similarity, the keyword IDF, the DBSCAN cluster, the historical highest popularity, and the event occurrence time as scoring factors of at least one dimension, score the recalled candidate event according to the scoring factors of the at least one dimension, and obtain a comprehensive score corresponding to the candidate event according to the score corresponding to the at least one dimension.
It should be noted that the process of scoring by the electronic device based on the dimensions of event similarity, keyword IDF, DBSCAN cluster, historical highest heat, and event occurrence time is not sequentially performed.
And S706, grading and filtering.
In S706, the electronic device filters the candidate events according to a preset score filtering strategy and the composite score of the candidate events, so as to obtain context events. For example, the electronic device may rank the candidate events, selecting the top ranked candidate event as the context event.
Here, the event similarity corresponds to the semantic similarity dimension, the keyword IDF corresponds to the keyword dimension, the DBSCAN cluster corresponds to the semantic cluster dimension, the historical highest heat corresponds to the event heat dimension, and the processes of S704-S705 are consistent with the process of S104, and are not described again here.
And S707, displaying the event context.
In S706, the electronic device may generate an event context according to the context events obtained by the score filtering, and display the event context.
In some embodiments, a process schematic diagram of the DBSCAN clustering corresponding to S705 in fig. 11 may be as shown in fig. 12. In fig. 12, the candidate event and the core event are represented by different points, and the distance between the points represents the semantic similarity between the events. The area 120 represents a range area of the DBSCAN clustering threshold, and the electronic device may iteratively expand the clustering range from a high-density area where events are aggregated in the process of performing DBSCAN clustering until there is no newly iteratively added candidate event in the area. The region 121 centered on the core event in fig. 12 represents the region of the area of the context event determined by matching the scoring factor of at least one dimension according to the distance between the candidate event and the core event. The candidate events within the region 122 are context events obtained by screening. In an actual application scenario, the difference between the semantics of the core event and the semantics of the candidate events may be very large, the embodiment of the application can conditionally drift the semantics of the core event through DBSCAN clustering, and the drift degree is limited through the scoring factors of other dimensions, so that the accuracy of screening the venation events from the candidate events is improved.
In some embodiments, the applicant performed comparison experiments on a pure manual selection scheme, a scheme in which only DBSCAN semantic clustering screening is performed in the present application, and a complete scheme in the present application based on five dimensions of a semantic similarity dimension, a semantic clustering dimension, a keyword dimension, an event heat dimension, and an event occurrence time dimension, and performed index comparisons of accuracy, recall rate, and average manual operation duration of each context by using 1000 screening events in 84 event contexts, with the results shown in table 2 as follows:
Figure BDA0003271175160000251
TABLE 2
It can be seen that the scheme of the embodiment of the application can greatly reduce the time required by manual configuration and improve the operation efficiency. Meanwhile, the recall rate can be effectively improved through a keyword recall and scoring mode compared with manual searching from the event library, so that the operation burden is reduced, and the screening process after the recall of the candidate events can be more concentrated. In addition, by using the characteristics of the keyword IDF score, the event similarity score and the like, the accuracy rate of the context event can be greatly improved, the auditing burden of operation is reduced, and the efficiency is improved by 50%.
Continuing with the exemplary structure of the extended event presentation device 255 implemented as a software module provided in the embodiments of the present application, in some embodiments, as shown in fig. 2, the software module stored in the extended event presentation device 255 of the memory 250 may include:
an extracting module 2551, configured to obtain events to be processed by extracting event information of a hot event, and determine a core event from the events to be processed; extracting keywords from the core event to obtain keywords corresponding to the core event;
a matching module 2552, configured to perform search and matching based on the event to be processed by using the keyword to obtain a candidate event;
a screening module 2553, configured to perform at least one-dimensional screening on the candidate event to obtain a context event;
a presentation module 2554 is configured to generate an event context according to the context events and the core events.
In some embodiments, the extracting module 2551 is further configured to perform named entity identification on a core event title corresponding to the core event, so as to obtain at least one named entity; and in the at least one named entity, a named entity of a preset noun type is used as the keyword.
In some embodiments, the extended event presentation device 255 further comprises a word right analysis module; the word right analysis module is used for performing word right analysis on the core event title to obtain at least one weight vocabulary under the condition that the number of the keywords is smaller than a preset number threshold; and determining the weight vocabulary of at least one verb type as the keyword in the at least one weight vocabulary.
In some embodiments, the at least one dimension comprises: at least one of a semantic dimension, a keyword dimension, an event heat dimension, and an event occurrence time dimension; the screening module 2553 is further configured to score the candidate event in at least one dimension to obtain at least one dimension score corresponding to the candidate event; obtaining a composite score based on the at least one dimension score; and screening the candidate events based on the comprehensive scores to obtain the venation events.
In some embodiments, the semantic dimensions include: a semantic similarity dimension; the at least one dimension score includes: the screening module 2553 is further configured to perform semantic analysis on the core event title of the core event and the candidate event title of the candidate event by using a preset semantic analysis model to obtain a core semantic feature corresponding to the core event and a candidate semantic feature corresponding to the candidate event; obtaining semantic similarity of the core event and the candidate event by calculating a feature vector distance between the core semantic feature and the candidate semantic feature; and scoring the candidate events based on the semantic similarity to obtain the semantic similarity score.
In some embodiments, the semantic dimensions include: semantic clustering dimensions; the at least one dimension score includes: the screening module 2553 is further configured to perform density clustering on the core event and the candidate event based on semantic similarity between the core event and the candidate event to obtain at least one event cluster; and scoring the candidate events according to the event cluster where the core event is located to obtain the semantic clustering score of the candidate events.
In some embodiments, the at least one dimension score comprises: grading the keywords; the screening module 2553 is further configured to calculate a word frequency of the keyword through a preset corpus, and calculate an inverse document frequency corresponding to the keyword based on the word frequency; taking keywords contained in the candidate event titles of the candidate events as internal keywords, and scoring the candidate events based on the inverse document frequency of the internal keywords to obtain the scores of the keywords; the keyword score is inversely proportional to the inverse document frequency.
In some embodiments, the screening module 2553 is further configured to perform weight adjustment of a preset coefficient on the inverse document frequency of the internal keyword to obtain a weight-adjusted inverse document frequency; and using a preset frequency adjusting factor to perform minimum value summation average on the weight-adjusting inverse document frequency to obtain the keyword score.
In some embodiments, the event information comprises: the event heat; the screening module 2553 is further configured to determine, according to the event heat of the candidate event, the corresponding target heat interval in at least one preset heat interval; and obtaining a heat score corresponding to the candidate event according to a preset heat coefficient corresponding to the target heat interval.
In some embodiments, the event information comprises: the time of occurrence of the event; the at least one dimension score includes: a time dimension score; the screening module 2553 is further configured to calculate a time difference between the event occurrence times of the candidate event and the context event; and determining a corresponding target time difference interval in at least one preset time difference interval according to the time difference, and taking a preset score corresponding to the target time difference interval as a time score corresponding to the candidate event.
In some embodiments, the extracting module 2551 is further configured to obtain, from background search data corresponding to a preset application, at least one search event with an event heat higher than a preset heat threshold as a hot event; and extracting an event title of each search event in the at least one search event and at least one of the event heat and the event occurrence time of each search event, and taking the event title and at least one of the event heat and the event occurrence time as the event information.
In some embodiments, the presentation module 2554 is further configured to time-sort the context events, generate an event context according to a sorting result and combine the core events, and perform extended event presentation through the event context.
It should be noted that the above description of the embodiment of the apparatus, similar to the above description of the embodiment of the method, has similar beneficial effects as the embodiment of the method. For technical details not disclosed in the embodiments of the apparatus of the present application, reference is made to the description of the embodiments of the method of the present application for understanding.
Embodiments of the present application provide a computer-readable storage medium having stored therein executable instructions that, when executed by a processor, cause the processor to perform a method provided by embodiments of the present application, for example, the method as shown in fig. 3, 5, 7, 8, 9, 10.
In some embodiments, the computer-readable storage medium may be memory such as FRAM, ROM, PROM, EPROM, EEPROM, flash, magnetic surface memory, optical disk, or CD-ROM; or may be various devices including one or any combination of the above memories.
In some embodiments, executable instructions may be written in any form of programming language (including compiled or interpreted languages), in the form of programs, software modules, scripts or code, and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
By way of example, executable instructions may correspond, but do not necessarily have to correspond, to files in a file system, and may be stored in a portion of a file that holds other programs or data, such as in one or more scripts stored in a hypertext Markup Language (HTML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).
By way of example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or distributed across multiple sites and interconnected by a communication network.
In summary, according to the key words of the core event, the candidate events related to the core event are recalled from the events to be processed, and then at least one dimension is screened for the candidate events, so that the context event is obtained, the candidate events are subjected to noise reduction processing, the context event is obtained, and the core event and the context event are combined for extended event display; the calculation amount of event venation generated directly through large-scale clustering is reduced, and the venation event screening efficiency is improved, so that the efficiency of extended event display is improved; and by means of keyword matching and a mode of screening and denoising in at least one dimension, the accuracy of the venation events obtained by screening is greatly improved, the accurate event venation construction under the large-scale event quantity is realized, and the accuracy of extended event display is improved.
The above description is only an example of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, and improvement made within the spirit and scope of the present application are included in the protection scope of the present application.

Claims (15)

1. An extended event presentation method, comprising:
obtaining events to be processed by extracting event information of hot events, and determining a core event from the events to be processed;
extracting keywords from the core event to obtain keywords corresponding to the core event;
searching and matching are carried out on the basis of the event to be processed by utilizing the keywords to obtain a candidate event;
screening at least one dimension of the candidate event to obtain a venation event;
and performing extended event display according to the context event and the core event.
2. The method according to claim 1, wherein the extracting the key words from the core event to obtain the key words corresponding to the core event comprises:
conducting named entity recognition on a core event title corresponding to the core event to obtain at least one named entity;
and in the at least one named entity, a named entity of a preset noun type is used as the keyword.
3. The method of claim 2, further comprising:
under the condition that the number of the keywords is smaller than a preset number threshold, performing word right analysis on the core event title to obtain at least one weight vocabulary;
and determining the weight vocabulary of at least one verb type as the keyword in the at least one weight vocabulary.
4. The method according to any of claims 1-3, wherein the at least one dimension comprises: at least one of a semantic dimension, a keyword dimension, an event heat dimension, and an event occurrence time dimension;
the screening of at least one dimension on the candidate event to obtain the context event comprises:
scoring at least one dimension of the candidate event to obtain at least one dimension score corresponding to the candidate event;
obtaining a composite score based on the at least one dimension score; and screening the candidate events based on the comprehensive scores to obtain the venation events.
5. The method of claim 4, wherein the semantic dimensions comprise: a semantic similarity dimension; the at least one dimension score includes: a semantic similarity score;
the scoring of at least one dimension on the candidate event to obtain at least one dimension score corresponding to the candidate event includes:
performing semantic analysis on the core event title of the core event and the candidate event title of the candidate event by using a preset semantic analysis model to obtain a core semantic feature corresponding to the core event and a candidate semantic feature corresponding to the candidate event;
obtaining semantic similarity of the core event and the candidate event by calculating a feature vector distance between the core semantic feature and the candidate semantic feature;
and scoring the candidate events based on the semantic similarity to obtain the semantic similarity score.
6. The method of claim 4, wherein the semantic dimensions comprise: semantic clustering dimensions; the at least one dimension score includes: a semantic clustering score;
the scoring of at least one dimension on the candidate event to obtain at least one dimension score corresponding to the candidate event includes:
performing density clustering on the core event and the candidate event based on the semantic similarity of the core event and the candidate event to obtain at least one event cluster;
and scoring the candidate events according to the event cluster where the core event is located to obtain the semantic clustering score of the candidate events.
7. The method of claim 4, wherein the at least one dimension score comprises: grading the keywords;
the scoring of at least one dimension on the candidate event to obtain at least one dimension score corresponding to the candidate event includes:
calculating the word frequency of the keyword through a preset corpus, and calculating the inverse document frequency corresponding to the keyword based on the word frequency;
taking keywords contained in the candidate event titles of the candidate events as internal keywords, and scoring the candidate events based on the inverse document frequency of the internal keywords to obtain the scores of the keywords; the keyword score is inversely proportional to the inverse document frequency;
wherein the scoring the candidate event based on the inverse document frequency to obtain the keyword score comprises:
carrying out weight adjustment of a preset coefficient on the inverse document frequency of the internal keyword to obtain a weight-adjusted inverse document frequency;
and using a preset frequency adjusting factor to perform minimum value summation average on the weight-adjusting inverse document frequency to obtain the keyword score.
8. The method of claim 4, wherein the event information comprises: the event heat; the at least one dimension score includes: scoring the heat dimension;
the scoring of at least one dimension on the candidate event to obtain at least one dimension score corresponding to the candidate event includes:
determining the corresponding target heat interval in at least one preset heat interval according to the event heat of the candidate event;
and obtaining a heat score corresponding to the candidate event according to a preset heat coefficient corresponding to the target heat interval.
9. The method of claim 4, wherein the event information comprises: the time of occurrence of the event; the at least one dimension score includes: a time dimension score;
the scoring of at least one dimension on the candidate event to obtain at least one dimension score corresponding to the candidate event includes:
calculating a time difference between event occurrence times of the candidate event and the core event;
and determining a corresponding target time difference interval in at least one preset time difference interval according to the time difference, and taking a preset score corresponding to the target time difference interval as a time score corresponding to the candidate event.
10. The method according to claim 1, wherein obtaining the event to be processed by extracting event information of the hotspot event comprises:
acquiring at least one search event with the event heat higher than a preset heat threshold value from background search data corresponding to a preset application as a hot event;
extracting an event title and a content identification of each search event in the at least one search event, and at least one of an event heat and an event occurrence time of each search event, and using the at least one of the event heat and the event occurrence time, the event title and the content identification as the event information.
11. The method according to any one of claims 5-10, wherein said performing extended event presentation based on said context events and said core events comprises:
and time sequencing is carried out on the venation events, according to a sequencing result, an event venation is generated by combining the core events, and extended event display is carried out through the event venation.
12. An extended event presentation apparatus, comprising:
the extraction module is used for obtaining the event to be processed by extracting the event information of the hot event and determining a core event from the event to be processed; extracting keywords from the core event to obtain keywords corresponding to the core event;
the matching module is used for searching and matching based on the event to be processed by utilizing the keyword to obtain a candidate event;
the screening module is used for screening at least one dimension of the candidate event to obtain a venation event;
and the display module is used for performing extended event display according to the context event and the core event.
13. An electronic device, comprising:
a memory for storing executable instructions;
a processor for implementing the method of any one of claims 1 to 11 when executing executable instructions stored in the memory.
14. A computer-readable storage medium having stored thereon executable instructions for, when executed by a processor, implementing the method of any one of claims 1 to 11.
15. A computer program product comprising a computer program or instructions, characterized in that the computer program or instructions, when executed by a processor, implement the method of any of claims 1 to 11.
CN202111101755.7A 2021-09-18 2021-09-18 Extended event display method, device, equipment and computer readable storage medium Pending CN115840813A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111101755.7A CN115840813A (en) 2021-09-18 2021-09-18 Extended event display method, device, equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111101755.7A CN115840813A (en) 2021-09-18 2021-09-18 Extended event display method, device, equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN115840813A true CN115840813A (en) 2023-03-24

Family

ID=85575183

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111101755.7A Pending CN115840813A (en) 2021-09-18 2021-09-18 Extended event display method, device, equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN115840813A (en)

Similar Documents

Publication Publication Date Title
Bharti et al. Sarcastic sentiment detection in tweets streamed in real time: a big data approach
CN111324728B (en) Text event abstract generation method and device, electronic equipment and storage medium
US10217058B2 (en) Predicting interesting things and concepts in content
CN106649818B (en) Application search intention identification method and device, application search method and server
Nie et al. Data-driven answer selection in community QA systems
CN111401066B (en) Artificial intelligence-based word classification model training method, word processing method and device
Shi et al. Learning-to-rank for real-time high-precision hashtag recommendation for streaming news
Jotheeswaran et al. OPINION MINING USING DECISION TREE BASED FEATURE SELECTION THROUGH MANHATTAN HIERARCHICAL CLUSTER MEASURE.
CN102243647A (en) Extracting higher-order knowledge from structured data
CN113722438B (en) Sentence vector generation method and device based on sentence vector model and computer equipment
Li et al. Adaptive cross-contextual word embedding for word polysemy with unsupervised topic modeling
CN112307770A (en) Sensitive information detection method and device, electronic equipment and storage medium
CN112883229B (en) Video-text cross-modal retrieval method and device based on multi-feature-map attention network model
CN109271624A (en) A kind of target word determines method, apparatus and storage medium
WO2023040516A1 (en) Event integration method and apparatus, and electronic device, computer-readable storage medium and computer program product
CN111382563B (en) Text relevance determining method and device
CN114003682A (en) Text classification method, device, equipment and storage medium
Song et al. Semi-automatic construction of a named entity dictionary for entity-based sentiment analysis in social media
CN116975271A (en) Text relevance determining method, device, computer equipment and storage medium
CN113569118B (en) Self-media pushing method, device, computer equipment and storage medium
Zhang et al. Exploring coevolution of emotional contagion and behavior for microblog sentiment analysis: a deep learning architecture
CN113761192B (en) Text processing method, text processing device and text processing equipment
Li et al. Adaptive probabilistic word embedding
Procter et al. Enabling social media research through citizen social science
CN115062135A (en) Patent screening method and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination