CN113535940A

CN113535940A - Event abstract generation method and device and electronic equipment

Info

Publication number: CN113535940A
Application number: CN202010307123.5A
Authority: CN
Inventors: 李泉志; 张琼; 刘英箎
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2020-04-17
Filing date: 2020-04-17
Publication date: 2021-10-22

Abstract

Disclosed are an event summary generation method, an event summary generation device and electronic equipment, wherein the method comprises the following steps: acquiring a plurality of text objects describing a target event; extracting a plurality of semantic keywords from the plurality of text objects; calculating a probability distribution value of each semantic keyword in the plurality of text objects; aiming at any specific text object in the plurality of text objects, calculating the score of the specific text object according to the probability distribution value corresponding to the semantic keyword contained in the specific text object; and determining whether the specific text can be used as a summary for describing the target event according to the score of the specific text object.

Description

Event abstract generation method and device and electronic equipment

Technical Field

The disclosed embodiments relate to data processing technologies, and more particularly, to an event summary generation method, an event summary generation apparatus, an electronic device, and a computer-readable storage medium.

Background

With the development of internet technology, many text objects for commenting on a certain event exist on a network, for example, on social media, many messages are talking about the same event, wherein each piece of information constitutes one text object.

Since each event will typically contain tens to tens of thousands of text objects, a significant amount of time and effort is wasted if the user does not focus on browsing most of the text objects of an event in the hope of knowing about the event. Therefore, there is a need to provide a method for filtering out the abstract text representing an event from all the text objects of the event, so as to better show and browse the events.

Disclosure of Invention

An object of the embodiments of the present disclosure is to provide a new technical solution for obtaining a corresponding abstract text for any event.

According to a first aspect of the present disclosure, there is provided an event summary generation method, including:

acquiring a plurality of text objects describing a target event;

extracting a plurality of semantic keywords from the plurality of text objects;

calculating a probability distribution value of each semantic keyword in the plurality of text objects;

aiming at any specific text object in the plurality of text objects, calculating the score of the specific text object according to the probability distribution value corresponding to the semantic keyword contained in the specific text object;

and determining whether the specific text can be used as a summary for describing the target event according to the score of the specific text object.

Optionally, the semantic keywords comprise one or more of: keywords describing the content of the event, keywords describing the participants of the event, keywords describing the place where the event occurred, and keywords describing the time when the event occurred.

Optionally, the method further comprises:

obtaining scores for a plurality of specific textual objects;

and carrying out regularization processing on the plurality of scores.

Optionally, the regularizing the scores includes:

Z＝Y×log^(1+r)/log⁽¹⁺ⁿ⁾wherein Y is the score of the specific text object, r is the number of times the specific text object is forwarded, n is a length index value reflecting the text length of the specific text object, and Z is the score after the regularization processing of the specific text object.

Optionally, the determining whether the specific text can be used as a summary for describing the target event according to the score of the specific text object includes:

ranking the plurality of specific text objects according to their scores;

a predetermined number of text objects are selected among the ranked text objects as a summary describing the target event.

Optionally, the calculating a probability distribution value of each semantic keyword in the plurality of text objects includes:

for any semantic keyword, searching a text object containing the any semantic keyword in the plurality of text objects;

and calculating the number ratio of the searched text objects in the plurality of text objects as the probability distribution value of the random semantic keyword in the plurality of text objects.

Optionally, the calculating the score of the specific text object according to the probability distribution value corresponding to the semantic keyword included in the specific text object includes:

and calculating the score of the specific text object according to the probability distribution value corresponding to the semantic keyword contained in the specific text object and the weight coefficient of the semantic keyword contained in the specific text object.

Optionally, the method further comprises a step of determining a weight coefficient of any semantic keyword, including:

determining the semantic category to which the random semantic keyword belongs;

and acquiring a weight coefficient corresponding to the belonged semantic category as the weight coefficient of the random semantic keyword.

Optionally, the method further includes a step of obtaining a weighting factor for each of the plurality of semantic categories, including:

obtaining a plurality of text object samples describing event samples, wherein the text object samples have known scores reflecting summaries of the event samples with corresponding text object samples;

extracting semantic keywords conforming to the semantic categories from the text object samples, wherein the semantic categories comprise the semantic category to which the text object belongs;

and determining the weight coefficients of the semantic categories according to the semantic keywords contained in the text object samples and the scores of the text object samples.

Optionally, the extracting, from the plurality of text objects, a plurality of semantic keywords comprises:

acquiring a plurality of set semantic categories;

and extracting a plurality of semantic keywords which accord with the semantic categories from the text objects.

Optionally, the extracting, from the plurality of text objects, a plurality of semantic keywords that conform to the plurality of semantic categories includes:

and extracting a plurality of semantic words which accord with the semantic categories from the text objects according to a preset identification model to serve as the semantic keywords.

Optionally, the method further comprises the step of obtaining the recognition model, comprising:

obtaining semantic word samples, wherein each semantic word sample has a known semantic category label;

determining model parameters of the selected classification model according to the semantic word sample;

and obtaining the identification model according to the classification model and the determined model parameters.

According to a second aspect of the present disclosure, there is also provided an event summary generating method implemented by a terminal device, the method including:

providing a summary describing a target event according to a request for obtaining the summary of the target event;

wherein the operation of obtaining the summary describing the target event comprises:

acquiring a plurality of text objects describing a target event;

extracting a plurality of semantic keywords from the plurality of text objects;

aiming at any specific text object in the plurality of text objects, calculating the score of the specific text object according to the probability distribution value corresponding to the semantic keyword contained in the specific text object; and screening the abstracts which describe the target events in the plurality of text objects according to the scores of the plurality of text objects.

Optionally, the providing the summary describing the target event according to the request for obtaining the summary of the target event includes:

according to a request for acquiring the abstract of a target event, sending the information of the target event to the server to execute the operation of acquiring the abstract describing the target event;

obtaining an abstract which is returned by the server through executing the operation and describes the target event;

and providing the acquired abstract describing the target event.

Optionally, the method further comprises:

acquiring the user characteristics of the user who sends the request;

and screening a part matched with the user characteristics in the target event aimed by the request according to the user characteristics to be used as the target event needing to obtain the abstract.

According to a third aspect of the present disclosure, there is also provided an event summary generation apparatus, including:

the text acquisition module is used for acquiring a plurality of text objects describing the target event;

the extraction module is used for extracting a plurality of semantic keywords from the text objects;

a calculation module for calculating a probability distribution value of each semantic keyword in the plurality of text objects;

the scoring module is used for calculating the score of any specific text object in the text objects according to the probability distribution value corresponding to the semantic keyword contained in the specific text object; and the number of the first and second groups,

and the screening module is used for determining whether the specific text can be used as an abstract for describing the target event according to the score of the specific text object.

According to a fourth aspect of the present disclosure, there is also provided an electronic device comprising the apparatus according to the third aspect of the present disclosure; alternatively, the first and second electrodes may be,

the electronic device comprises a memory and a processor;

the memory stores a computer program which, when executed by the processor, implements the method according to the first or second aspect of the disclosure.

According to a fifth aspect of the present disclosure, there is also provided a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program readable and executable by a computer, and the computer program is configured to execute the method according to the first aspect or the second aspect of the present disclosure when the computer program is read and executed by the computer.

Based on the method for generating the event abstract provided by the embodiment of the disclosure, for any event, the abstract for describing the event can be screened from a plurality of text objects according to the distribution condition of semantic keywords describing the event in the plurality of text objects corresponding to the event. According to the event abstract generating method, the abstract representing the event can be selected from the whole content of the event by extracting the semantic keywords, so that the obtained abstract can comprehensively reflect the important content of the event, and the accuracy and the effectiveness of obtaining the abstract of any event are improved.

Other features of the present invention and advantages thereof will become apparent from the following detailed description of exemplary embodiments thereof, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description, serve to explain the principles of the invention.

Fig. 1 is a schematic diagram of an application manner of an optional application scenario according to an embodiment of the present disclosure;

FIG. 2 is a functional block diagram of a hardware architecture of an electronic device that can be used to perform a method of event summary generation according to one embodiment;

FIG. 3 is a flow diagram of a method of event summary generation, according to one embodiment;

FIG. 4 is a flow diagram of an event summary generation method according to another embodiment;

FIG. 5 is a functional block diagram of an event summary generation apparatus according to one embodiment.

FIG. 6 is a functional block diagram of an electronic device according to one embodiment.

Detailed Description

Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

For any event, there are usually many pieces of information commenting or describing the event, the information forms an information cluster of the event, and any piece of information in the information cluster may be text information, audio information, video information, or picture information, and is not limited herein. According to each piece of information in the information cluster, a corresponding text object can be obtained, the text object contains text content for describing the event, and the text content can be directly obtained from the text information or can be obtained by identifying audio information, video information or picture information.

For any event, at least one text object can be selected from a plurality of text objects describing the event to serve as a summary of the event, so that a user can know the main content of the event through the summary. The event can be any event that occurs in society, and the event becomes a news event after being propagated on the network.

Fig. 1 shows an application scenario in which abstract text of an arbitrary event needs to be obtained. The application scenario relates to an electronic content application for providing electronic content, where the electronic content is content stored or distributed in a data form, and may be news, introduction of goods, or articles in a certain professional field, and the electronic content application is client software for providing electronic content, such as an application for providing news or a social media application.

As shown in FIG. 1, the electronic content application provides a plurality of category labels on the top page, including labels Tab 1-Tab 4, for example. The user clicking on Tab2 of the application home page through terminal device 1200 will trigger a request to get an event belonging to the category of Tab2, which Tab2 may be any category label of the application, such as "recommendation" label, "entertainment" label, "history" label, etc., i.e. the electronic content application may be set to: clicking on any of the tabs will trigger the operation of generating a summary for the event that matches the tab to present the generated summary in the list interface of the tabs. The tag Tab2 may also be some specific tag, such as a "key point" tag or the like, i.e. the electronic content application may also be arranged to: only clicking the specific tag triggers the operation of generating the summaries for the events conforming to the tag, so as to display the generated summaries on the list interface of the specific tag, which is not limited herein.

After receiving the request, the terminal device 1200 sends the request to the server 1100, and after receiving the request, the server 1100 acquires text objects of all events conforming to the tag, and selects at least one text object from a plurality of text objects describing the event as an abstract describing the event for any event. After acquiring the abstract text of the arbitrary event, the server 1100 displays the acquired abstract text of the arbitrary event on the list interface of the Tab2 by the configuration terminal device 1200, so as to be read by the user. Therefore, the user can know the event content of any event by reading the abstract text of the event without blindly reading a plurality of text objects of the event, thereby improving the reading efficiency and the reading quality.

Of course, in the application scenario shown in fig. 1, the terminal device 1200 may also complete an operation of selecting an abstract describing any event from a plurality of text objects describing the event, which is not limited herein.

The operation of obtaining the summary of the event may also be applied to other scenarios, for example, in screening of product reviews, in screening of answers in a question and answer community, in screening of reviews for video files, and the like, which is not limited herein.

The above video files may be movies, television episodes, short videos, and the like. In the application scenario, the video file can be used as a target event, and the comment to the video file is used as a text object for describing the video file, so that according to the method of the embodiment, at least one comment is screened out from a plurality of comments describing the video file and is shown to the user as an abstract for describing the video file, and thus, the user can know the main content of the video file according to the abstract of the video file, so as to consider whether to watch the video file according to the abstract.

For any event, denoted herein as a target event, the abstract of the target event may be selected from a plurality of text objects describing the target event in a variety of ways. These ways may include, for example: 1) selecting a text object which is published firstly from a plurality of text objects describing the target event as an abstract of the event; 2) selecting a text object with the maximum forwarding amount or comment amount from a plurality of text objects describing the target event as an abstract of the event; 3) for a plurality of text objects describing the target event, performing word segmentation processing on each text object, calculating an index value of each word (including words, characters, words and the like) obtained by word segmentation to a set index, wherein the set index is, for example, a tf (term frequency) -idf (inverse Document frequency) index, then calculating a score value of the corresponding text object according to the index value of each word included in each text object, and finally selecting one or more text objects with higher scores as the abstract of the target event.

For the first approach above, since the description of the target event by the earliest published text object is not necessarily complete, it is not necessarily able to represent the entire event. In the second method, the forwarding or comment of the text object depends not only on the quality of the content of the text object itself, but also on the popularity of the user who published the text object, and therefore, the text object with a large forwarding amount or a large comment amount is not necessarily the best text capable of reflecting the complete content of the event. For the third method, since each text object may contain noise content unrelated to the event itself, the abstract text selected based on the index values of all words obtained by word segmentation may not represent an event well.

In this embodiment, an event summary generation method capable of selecting at least one text object from a plurality of text objects describing the target event more accurately as a summary of the target event is provided. The method extracts a plurality of semantic keywords from a plurality of text objects describing the target event, and obtains the score of a specific text object according to the probability distribution value of the semantic keywords in the text objects contained in any specific text object in the text objects, so that at least one text object can be selected from the text objects according to the score to be used as the abstract of the target event. Here, since the method of this embodiment scores whether the plurality of text objects can be used as the abstract for describing the target event based on the semantic keyword, noise irrelevant to the event content in the text objects can be effectively filtered, and thus the accuracy of the obtained abstract can be effectively improved.

< hardware configuration >

Fig. 2 illustrates a hardware structural schematic block diagram of an electronic device that can be used to execute the digest text generation method of any embodiment of the present invention.

As shown in fig. 2, the electronic device 1000 may be any type of terminal device, and may also be any type of server, including a cloud server, a server cluster, and the like, which is not limited herein.

The terminal device may be a portable computer, a desktop computer, a mobile phone, a tablet computer, and the like, which is not limited herein.

As shown in fig. 2, the electronic device 1000 may include a processor 1100, a memory 1200, an interface device 1300, a communication device 1400, a display device 1500, an input device 1600, a speaker 1700, a microphone 1800, and the like, all connected to the processor. Wherein the processor 1100 is adapted to execute computer programs. The computer program may be written in an instruction set of an architecture such as x86, Arm, RISC, MIPS, SSE, etc. The memory 1200 includes, for example, a ROM (read only memory), a RAM (random access memory), a nonvolatile memory such as a hard disk, and the like. The interface device 1300 includes, for example, a USB interface, a headphone interface, and the like. The communication device 1400 is capable of wired or wireless communication, for example, and may specifically include Wifi communication, bluetooth communication, 2G/3G/4G/5G communication, and the like. The display device 1500 is, for example, a liquid crystal display panel. The input device 1600 may include, for example, a touch screen, a keyboard, voice input, somatosensory input, and the like. The speaker 1700 is used to output an audio signal. The microphone 1800 is used to collect audio signals.

As applied to the disclosed embodiments, the memory 1200 of the electronic device 1000 is used to store instructions (computer programs) for controlling the processor 1100 to operate so as to perform the event summary generation method provided according to any of the embodiments of the present invention. The skilled person can design the instructions according to the disclosed solution. How the instructions control the operation of the processor is well known in the art and will not be described in detail herein.

It should be understood by those skilled in the art that although a plurality of devices are shown in fig. 1 for the electronic apparatus 1000, the present invention may only relate to some of the devices, for example, the electronic apparatus 1000 only relates to the processor 1100 and the memory 1200, etc.

In further embodiments, the electronic device 1000 may further include an event summary generation apparatus according to any embodiment of the present invention, and various modules of the event summary generation apparatus may be implemented by the processor 1100 of the electronic device 1000 in the above embodiments.

< method embodiment I >

Fig. 3 is a flowchart illustration of an event summary generation method according to an embodiment. In this embodiment, the event summary generation method may be implemented by any electronic device, for example, the electronic device 1000 in fig. 1, where the electronic device may be a terminal device or a server. Fig. 3 illustrates an example of obtaining a summary of an arbitrary event, which is referred to herein as a target event, and an event summary generation method according to the embodiment.

As shown in fig. 3, the event summary generation method of the present embodiment may include the following steps S3100 to S3500:

in step S3100, a plurality of text objects describing a target event are acquired.

In this embodiment, the target event may be any event, and the event may be a news event, a commodity purchase event, or the like, which is not limited herein. For example, the event is a news event about the person elected post, and so on.

In this embodiment, the collection of the plurality of text objects describing the target event may be performed by any clustering algorithm, that is, the text objects in the text object set may be classified by the clustering algorithm, so that the plurality of text objects describing the same event (the target event) are stored as one object cluster, and are used for the acquisition in this step S3100, and the like.

The clustering algorithm may be, for example, a K-means algorithm, a hierarchical clustering algorithm, or a GMM gaussian mixture model algorithm, and the clustering algorithm is a relatively mature algorithm, and therefore, it is not described herein again.

In this embodiment, the text objects describing the target event may be collected through the unique identifier of the target event to which the text object is directed, for example, in the product review of the product purchase event, each product review includes the unique identifier of the purchased product, and therefore, the text objects describing the target event may be collected directly according to the identifier, so as to be obtained in step S3100.

Any text object can be directly acquired from the corresponding text information describing the target event, and can also be identified from the corresponding audio information, video information and picture information describing the target event, which is not limited herein.

In one embodiment, the acquiring a plurality of text objects describing the target event in step S3100 may include: and acquiring a plurality of text objects describing the target event according to the trigger of the set event.

The set event may include an internal event and/or an external event.

The internal event may include, for example, at least one of an expiration of a set timed time and receipt of a new textual object.

The external event may include, for example, receiving any request or command, etc., triggered by the user through the terminal device, indicating that a summary of the target event needs to be obtained.

In one embodiment, the acquiring a plurality of text objects describing the target event in step S3100 may include: and acquiring a plurality of text objects which describe the target event and meet the set conditions. Thus, the number of text objects needing to be scored can be reduced, and the response speed can be improved.

The setting condition may include at least one of the following conditions: the length index value of the text object reflecting the text length is greater than or equal to a first set threshold value; the number of times that the text object is forwarded is greater than or equal to a second set threshold; the publication time of the text object is within a set time range from the current time, for example, within one week from the current time.

The first set threshold may be set according to a shortest text length that the text object should have for describing the target event more completely.

The above second set threshold value may be set based on the influence expectation on the number of times of being forwarded.

In step S3200, a plurality of semantic keywords are extracted from the plurality of text objects obtained in step S3100.

According to this step S3200, for any text object among the plurality of text objects, a semantic keyword included in the text object is extracted. Any extracted semantic keyword may include a word, a character, or a character string, and is not limited herein.

The semantic keywords can reflect words used for setting event elements, so that the extracted semantic keywords can well reflect the completeness of the corresponding text object for describing the target event.

In one embodiment, the semantic keywords may include one or more of: keywords describing the content of the event, keywords describing the participants of the event, keywords describing the place where the event occurred, and keywords describing the time when the event occurred.

The above event occurrence time (When), event occurrence place (Where), event attendee (Who), and event content (What) constitute four elements of one event, which are simply referred to as 4W elements.

For example, for a fire event, the content of one text object is "2020 + day", "city + building" where fire occurs, and casualties are avoided … due to the rescue of rescuers, wherein 2020 + day is a semantic keyword describing the occurrence time of the event in the text content, city + building is a semantic keyword describing the occurrence location of the event in the text content, "fire" is a semantic keyword describing the event content in the text content, and "rescuer" is a semantic keyword describing the participation of the event in the text content, etc.

According to this step S3200, for each text object of the plurality of text objects, a corresponding set of semantic keywords having a plurality of semantic keywords extracted from the corresponding text object is obtained. For example, 1000 text objects describing the target event are acquired in step S3100, then in step S3200, for each text object in the 1000 text objects, a plurality of semantic keywords included in the corresponding text object are extracted, so as to obtain 1000 sets of semantic keywords, where one set of semantic keywords corresponds to one text object.

In one embodiment, in order to be able to extract semantic keywords from text objects in a simple manner, a corresponding relationship between an event element and a semantic category may be established to extract a plurality of semantic keywords from a plurality of text objects according to the semantic category, wherein the event element includes an event occurrence time (When), an event occurrence place (Where), an event attendee (Who), an event content (What), and the like.

The semantic categories refer to: classes of language expressions divided according to semantic relationships.

In this embodiment, one event element may correspond to one semantic category, or may correspond to at least two semantic categories.

For example, in this embodiment, the following 7 semantic categories may be set: name entry-an entity word, such as ". about. the name of a person", corresponding to an element of an event participant; indication-username mentions, such as "@ person", elements corresponding to event attendees; Location-Location, such as ". city", corresponds to an element of the Location where the event occurred; verbs, such as "robbery," correspond to elements of the event content; noun-nouns, such as "merger cases," correspond to elements of the event content. Hashtag-topic tags, such as "# explode", corresponding to elements of event content; temporal Info-time, such as "morning today", corresponds to an element of the time of occurrence of the event.

In this embodiment, the extracting a plurality of semantic keywords from a plurality of text objects in step S3200 may include the following steps: acquiring a plurality of set semantic categories; and extracting a plurality of semantic keywords which accord with the plurality of semantic categories from the plurality of text objects.

In this embodiment, semantic words that conform to the set semantic categories may be extracted from any text object according to the definitions of the plurality of semantic categories as semantic keywords that embody the set event elements.

In this embodiment, a recognition model obtained through pre-training may also be used to extract semantic words belonging to multiple semantic categories from any text object, and the semantic words may be used as the extracted multiple semantic keywords.

The recognition model may be a model capable of recognizing multiple semantic categories, and the recognition model may also be formed by connecting multiple sub-recognition models in series, where one sub-recognition model is used to recognize a word of one semantic category, and is not limited herein.

In this embodiment, the method may further comprise the step of obtaining the recognition model. For example, obtaining the recognition model may include: obtaining semantic word samples, wherein each semantic word sample has a known semantic category label; determining model parameters of a set classification model according to the semantic word samples; and obtaining the recognition model according to the classification model and the determined model parameters.

In this embodiment, the classification model may be a model based on any classification algorithm, such as a binary or multi-classification model, and the like, which is not limited herein.

Step S3300, a probability distribution value of each semantic keyword in the plurality of text objects is calculated.

And the probability distribution value of any semantic keyword in the text objects reflects the occurrence frequency of the probability distribution value in the text objects.

In one embodiment, the calculating a probability distribution value of each semantic keyword in the plurality of text objects in this step may include: for any semantic keyword, searching for a text object containing the any semantic keyword from the plurality of text objects acquired in step S3100; and calculating the number ratio of the searched text objects in the plurality of text objects as the probability distribution value of the random semantic keyword in the plurality of text objects.

For example, if Nx text objects include the semantic keyword tx among the N text objects acquired in step S3100 for any semantic keyword tx extracted in step S3200, the probability distribution value p (tx) of the semantic keyword tx among the N text objects is:

in step S3400, for any specific text object among the plurality of text objects acquired in step S3100, a score of the specific text object is calculated from a probability distribution value corresponding to a semantic keyword included in the specific text object.

For example, if five semantic keywords are extracted for any specific text object among the plurality of text objects acquired in step S3100, in step S3400, a score of the specific text object may be obtained from a probability distribution value corresponding to each of the five semantic keywords.

In this embodiment, for the specific text object, for example, the sum of the probability distribution values of the semantic keywords included in the specific text object, or other parameter values reflecting the "sum" may be used as the score of the specific text object.

In this embodiment, for the specific text object, for example, an average value of probability distribution values of semantic keywords included in the specific text object, or other parameter values reflecting the "average value" may also be used as the score of the specific text object.

In this embodiment, for the specific text object, for example, the sum of the probability distribution values of the semantic keywords may be calculated by considering the weight coefficient of each semantic keyword, so as to obtain the score of the specific text object. This can be expressed as the following equation (2):

in formula (1), Y represents the score of a specific text object, t represents the tth semantic keyword of the specific text object, m represents the total number of semantic keywords extracted from the specific text object, and W represents_tThe weight coefficient represents the t-th semantic keyword, and p (t) represents the probability distribution value of the t-th semantic keyword.

In this embodiment, the weight coefficient of any semantic keyword may be determined according to the semantic category to which the semantic keyword belongs, or may be directly determined according to the corresponding event element. Here, the event elements and the semantic categories have a certain mapping relationship, and one event element may correspond to one semantic category or at least two semantic categories.

In the example of determining the weight coefficient of any semantic keyword according to semantic categories, respective weight coefficients may be set for different semantic categories, and the sum of the weight coefficients of these semantic categories is equal to 1.

In this example, determining the weight coefficient of any semantic keyword may include: determining the semantic category to which the random semantic keyword belongs; and acquiring a weight coefficient corresponding to the belonged semantic category as the weight coefficient of the random semantic keyword.

In the example of determining the weighting factor of any semantic keyword from event elements, respective weighting factors may be set for different event elements, and the sum of the weighting factors of these event elements is equal to 1 as well.

In this example, determining the weight coefficient of any semantic keyword may include: determining an event element to which the random semantic keyword belongs; and acquiring a weight coefficient corresponding to the affiliated event element as the weight coefficient of the random semantic keyword.

In this embodiment, the weight coefficients corresponding to different semantic categories or different event elements may be set, and may be preset fixed values.

In this embodiment, the weighting coefficients corresponding to different semantic categories or different event elements may also be determined according to a score model obtained through pre-training.

In one embodiment, obtaining the weight coefficient of each of the set semantic categories may include the following steps S3011 to S3013:

in step S3011, a plurality of text object samples describing the event sample are acquired.

In this embodiment, any event may be used as an event sample, in this embodiment, multiple text object samples of one event sample may be selected to participate in model training, or multiple text object samples of multiple event samples may be selected to participate in model training, which is not limited herein.

In this embodiment, the text object sample is a text object with a known score, the score reflects an adaptation degree of taking the corresponding text object sample as the abstract of the corresponding event sample, and the higher the score value is, the more suitable the corresponding text object sample is as the abstract of the corresponding event sample, that is, the text object sample has a score label, and the label may be manually labeled or provided by a label model, which is not limited herein.

Step S3012, extracting semantic keywords conforming to the semantic categories from the text object samples, where the semantic categories include the semantic categories to which the text object samples belong.

The plurality of semantic categories includes, for example, the above 7 semantic categories.

Step S3013 is to determine a weight coefficient for each of the plurality of semantic categories based on the semantic keyword included in the plurality of text object samples and the score of the plurality of text object samples.

In step S3013, a model parameter of the polynomial model may be determined by training a polynomial model, i.e., a score model, used for calculating a score of the text object sample, where the model parameter is a weight coefficient of each of the plurality of semantic categories.

Similarly, in the example of determining the weight coefficient of any semantic keyword according to the event element, a scoring model using the weight coefficient of the event element as a model parameter may also be trained in a similar manner, so as to determine the weight coefficient of each of the event elements, which is not described herein again.

Step S3500, determining whether the specific text can be used as an abstract for describing the target event according to the score of the specific text object.

In step S3500, a set number of text objects with the highest score may be selected as the abstract describing the target event. The set number may be 1 or another integer greater than 1. The set number may also be determined according to the total number of the text objects acquired in step S3100, for example, the set number is a set percentage of the total number, and the like, and is not limited herein.

In one embodiment, each of the plurality of text objects is used as the specific text object, so that the respective scores of the plurality of text objects can be obtained. Thus, based on the score of the particular textual object, and the ordering among the scores of the plurality of textual objects, it may be determined whether the particular textual object can serve as a summary describing the target event. As can be seen from the above steps S3100 to S3500, the method of this embodiment may screen at least one text object from the plurality of text objects describing the event according to the distribution of the semantic keyword describing the event in the plurality of text objects corresponding to the event, as the abstract describing the event. The semantic keywords contained in the text object are extracted, noise content contained in the text object can be effectively filtered, so that an abstract capable of describing the event can be selected from the overall content of the event, the obtained abstract can further comprehensively reflect the important content of the event, and the accuracy and the effectiveness of obtaining the abstract text of any event are improved.

In one embodiment, on the basis of the scores of the text objects, the abstracts describing the target events can be further screened in combination with other index values of the text objects, so as to realize comprehensive consideration for different indexes or realize regularization of the scores.

In this embodiment, the method may further include the following steps S3600 to S3700:

in step S3600, scores of a plurality of specific text objects are obtained.

In this embodiment, according to step S3400, each of the plurality of text objects may be regarded as a specific text object to calculate a score of each of the plurality of text objects.

In step S3700, the plurality of scores are normalized.

In this embodiment, by regularizing the scores, each score may be scaled to a unit norm, such that the accuracy of the summary describing the target event is chosen according to the score. In one embodiment, the regularizing the scores in step S3700 may include:

according to Z ═ Y × log^(1+r)/log⁽¹⁺ⁿ⁾The plurality of scores are regularized.

Wherein, Y is the score of the processed specific text object, r is the number of times the processed specific text object is forwarded, n is a length index value reflecting the text length of the processed specific text object, and Z is the score after the regularization processing of the processed specific text object.

The length index value is, for example, the number of words, the number of characters, or the like included in the specific text object to be processed, and is not limited herein.

In this embodiment, the scores of the specific text objects are regularized, so that the advantage of the text object with a long text length in selecting the abstract text can be weakened, and the purpose of screening the abstract describing the target event according to the text content can be achieved.

In one embodiment, after the above step S3500, the method may further include the following steps: and providing a summary which is screened out aiming at the target event and describes the target event.

In this embodiment, in a case where the method is implemented by the server 1100 shown in fig. 1, providing the summary describing the target event filtered for the target event may include: providing the summary describing the target event screened out for the target event to the terminal device 1200 for the terminal device 1200 to display.

In this embodiment, in a case where the method is implemented by the terminal device 1200 shown in fig. 1, providing the summary describing the target event filtered for the target event may include: the abstracts which are screened out for the target event and describe the target event are presented through the terminal device 1200, so that the abstracts of the target event are provided for the user.

< method example two >

Fig. 4 is a flowchart illustrating an event summary generation method according to another embodiment. In this embodiment, the event summary generation method is implemented by a terminal device, for example, the terminal device 1200 shown in fig. 1.

As shown in fig. 4, the event summary generation method of this embodiment may include the following steps:

step S4100: in accordance with a request to obtain a summary of a target event, a summary describing the target event is provided.

In this embodiment, the target event for which the request is directed may be one target event, or may include a plurality of target events conforming to the selected tag category, which is not limited herein.

In the case where the target event for which the request is directed includes a plurality of target events that conform to the selected tag category, the method of this embodiment may further include the steps of: and according to the user characteristics of the user sending the request, screening out target events matched with the user characteristics from the plurality of target events to be used as the target events of the abstract text to be acquired.

The user characteristics include age, gender, school calendar, historical browsing data, and the like, but are not limited thereto.

In this embodiment, the operation of acquiring the summary text describing any target event may include the following steps S4211 to S4215:

in step S4211, a plurality of text objects describing the target event are acquired.

Step S4212, extracting a plurality of semantic keywords from the plurality of text objects.

Step S4213, calculating a probability distribution value of each semantic keyword in the plurality of text objects.

Step S4214, for any specific text object in the plurality of text objects, calculating a score of the specific text object according to a probability distribution value corresponding to a semantic keyword included in the specific text object.

Step S4215, selecting an abstract describing the target event from the plurality of text objects according to the scores of the plurality of text objects.

In one embodiment, the step S4100 of providing the summary describing the target event according to the request for obtaining the summary of the target event may include the following steps S4111 to S4113:

step S4111, according to the request for obtaining the abstract of the target event, sending the information of the target event to a server to execute an operation of obtaining an abstract text describing the target event.

The information of the target event is used for determining the category of the target event, and the information may include at least one of a search keyword of the target event, a tag category to which the target event belongs, and the like.

Step S4112, obtain a summary describing the target event returned by the server by executing the operation.

Step S4113, providing a summary describing the target event returned by the server.

This step S4113 may include: and providing a list item of a summary describing the target event in a list interface of the target event. The user can enter the detail interface of the corresponding abstract by clicking any list item of the list interface.

In another embodiment, the operation of obtaining the summary describing the target event may also be implemented by the terminal device 1200, which is not limited herein.

< apparatus embodiment >

In one embodiment, an event summary generation apparatus is also provided, and fig. 5 shows a schematic block diagram of the event summary generation apparatus 5000 in one embodiment.

As shown in fig. 5, the event summary generation apparatus 5000 may include a text acquisition module 5100, an extraction module 5200, a calculation module 5300, a scoring module 5400, and a filtering module 5500. The text obtaining module 5100 is configured to obtain a plurality of text objects describing a target event.

The extraction module 5200 is used for extracting a plurality of semantic keywords from a plurality of text objects.

The calculation module 5300 is configured to calculate a probability distribution value of each semantic keyword among the plurality of text objects.

The scoring module 5400 is configured to calculate, for any specific text object in the plurality of text objects, a score of the specific text object according to a probability distribution value corresponding to a semantic keyword included in the specific text object.

The filtering module 5500 is configured to determine whether the specific text can be used as a summary for describing the target event according to the score of the specific text object.

In one embodiment, the semantic keywords include one or more of: keywords describing the content of the event, keywords describing the participants of the event, keywords describing the place where the event occurred, and keywords describing the time when the event occurred.

In one embodiment, the scoring module 5400 may also be used to: obtaining scores for a plurality of specific textual objects; and regularizing the plurality of scores.

In one embodiment, the scoring module 5400, when regularizing the plurality of scores, may be configured to: according to Z ═ Y × log^(1+r)/log⁽¹⁺ⁿ⁾And carrying out regularization processing on a plurality of scores, wherein Y is the score of any specific text objectR is the number of times of forwarding the arbitrary specific text object, n is a length index value reflecting the text length of the arbitrary specific text object, and Z is the score after the regularization processing of the arbitrary specific text object.

In one embodiment, the filtering module 5500, when determining whether the specific text can be used as a summary for describing the target event according to the score of the specific text object, may be configured to: sorting the plurality of specific text objects according to the scores of the plurality of specific text objects; and selecting a predetermined number of text objects among the sorted text objects as a summary describing the target event.

In one embodiment, the calculation module 5300, when calculating the probability distribution value for each semantic keyword among the plurality of text objects, may be configured to: for any semantic keyword, searching a text object containing the any semantic keyword in the plurality of text objects; and calculating the number ratio of the searched text objects in the plurality of text objects as the probability distribution value of the random semantic keyword in the plurality of text objects.

In one embodiment, the scoring module 5400, when calculating the score of the specific text object according to the probability distribution value corresponding to the semantic keyword included in the specific text object, may be configured to: and calculating the score of the specific text object according to the probability distribution value corresponding to the semantic keyword contained in the specific text object and the weight coefficient of the semantic keyword contained in the specific text object.

In one embodiment, the apparatus 5000 may further include a weight determination module configured to determine a weight coefficient of any semantic keyword. The weight determination module, when the determination module is configured to determine the weight coefficient of any semantic keyword, may be configured to: determining the semantic category to which the random semantic keyword belongs; and acquiring a weight coefficient corresponding to the belonged semantic category as the weight coefficient of the random semantic keyword.

In one embodiment, the weight determination module is further configured to obtain a weight coefficient for each of the plurality of semantic categories. The weight determination module, when obtaining the weight coefficient of each of the plurality of semantic categories, may be configured to: obtaining a plurality of text object samples describing event samples, wherein the text object samples have known scores reflecting summaries of the event samples with corresponding text object samples; extracting semantic keywords conforming to the semantic categories from the text object samples, wherein the semantic categories comprise the semantic category to which the text object belongs; and determining the weight coefficient of each semantic category according to each semantic keyword contained in the text object samples and the scores of the text object samples.

In one embodiment, the extraction module 5200, when extracting a plurality of semantic keywords in a plurality of text objects, can be configured to: acquiring a plurality of set semantic categories; and extracting a plurality of semantic keywords which accord with a plurality of semantic categories from the plurality of text objects.

In one embodiment, the extraction module 5200, when extracting a plurality of semantic keywords corresponding to a plurality of semantic categories from a plurality of text objects, can be configured to: and extracting a plurality of semantic words which accord with the semantic categories from a plurality of text objects according to a preset identification model to be used as a plurality of semantic keywords.

In one embodiment, the apparatus 5000 may further include a model generation module configured to obtain the identification module. The model generation module, when obtaining the recognition model, may be configured to: obtaining semantic word samples, wherein each semantic word sample has a known semantic category label; determining model parameters of the selected classification model according to the semantic word sample; and obtaining the recognition model according to the classification model and the determined model parameters.

In one embodiment, the apparatus 5000 may further include a summarization processing module configured to provide obtained summary text describing the target event.

In one embodiment, the text obtaining module 5100, when obtaining text objects describing a target event, may be configured to: and executing the operation of acquiring each text object describing the target event according to the detected set event.

< apparatus embodiment >

In one embodiment, an electronic device is further provided, as shown in fig. 6, the electronic device 6000 may include the event summary generating apparatus 5000 according to any embodiment of the present invention.

In another embodiment, the electronic device 6000 may further include a memory and a processor, the memory storing a computer program which, when executed by the processor, implements the event summary generation method according to any of the embodiments of the present invention.

In this embodiment, the electronic device may be, for example, the electronic device 1000 shown in fig. 2. The electronic device may be any terminal device, may also be any server, and is not limited herein.

The disclosed embodiments also provide a computer readable medium on which a computer program is stored, which when executed by a processor implements an event summary generation method according to any embodiment of the present invention.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the device and apparatus embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference may be made to some descriptions of the method embodiments for relevant points.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The present invention may be a system, method and/or computer program product. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied therewith for causing a processor to implement various aspects of the present invention.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present invention may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present invention are implemented by personalizing an electronic circuit, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA), with state information of computer-readable program instructions, which can execute the computer-readable program instructions.

Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. It is well known to those skilled in the art that implementation by hardware, by software, and by a combination of software and hardware are equivalent.

Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope of the described embodiments. The terms used herein were chosen in order to best explain the principles of the embodiments, the practical application, or technical improvements to the techniques in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein. The scope of the invention is defined by the appended claims.

Claims

1. An event summary generation method comprises the following steps:

acquiring a plurality of text objects describing a target event;

extracting a plurality of semantic keywords from the plurality of text objects;

2. The method of claim 1, the semantic keywords comprising one or more of: keywords describing the content of the event, keywords describing the participants of the event, keywords describing the place where the event occurred, and keywords describing the time when the event occurred.

3. The method of claim 1, further comprising:

obtaining scores for a plurality of specific textual objects;

and carrying out regularization processing on the plurality of scores.

4. The method of claim 3, the regularizing the plurality of scores comprising:

5. The method of claim 3, said determining whether the particular text can be used as a summary to describe the target event based on the score of the particular text object, comprising:

ranking the plurality of specific text objects according to their scores;

6. The method of claim 1, wherein said calculating a probability distribution value for each semantic keyword among the plurality of text objects comprises:

7. The method of claim 1, wherein the calculating the score of the specific text object according to the probability distribution value corresponding to the semantic keyword included in the specific text object comprises:

8. The method of claim 7, wherein the method further comprises the step of determining a weight coefficient for any semantic keyword, comprising:

determining the semantic category to which the random semantic keyword belongs;

9. The method of claim 8, wherein the method further comprises obtaining a weighting factor for each of a plurality of semantic categories, comprising:

10. The method of claim 1, wherein said extracting a plurality of semantic keywords among said plurality of text objects comprises:

acquiring a plurality of set semantic categories;

11. The method of claim 10, wherein said extracting, among the plurality of text objects, a plurality of semantic keywords that conform to the plurality of semantic categories comprises:

12. The method of claim 11, wherein the method further comprises the step of obtaining the recognition model, comprising:

13. An event summary generation method implemented by a terminal device includes:

acquiring a plurality of text objects describing a target event;

extracting a plurality of semantic keywords from the plurality of text objects;

and screening the abstracts which describe the target events in the plurality of text objects according to the scores of the plurality of text objects.

14. The method of claim 13, wherein said providing a summary describing a target event in accordance with the request to obtain the summary of the target event comprises:

and providing the acquired abstract describing the target event.

15. The method of claim 13, wherein the method further comprises:

acquiring the user characteristics of the user who sends the request;

16. An event summary generation apparatus, comprising:

17. An electronic device comprising the apparatus of claim 16; alternatively, the first and second electrodes may be,

the electronic device comprises a memory and a processor;

the memory stores a computer program which, when executed by the processor, implements the method according to any one of claims 1-15.

18. A computer-readable storage medium, in which a computer program is stored which is readable and executable by a computer, the computer program being adapted to perform the method according to any one of claims 1-15 when read and executed by the computer.