CN111814770A

CN111814770A - Content keyword extraction method of news video, terminal device and medium

Info

Publication number: CN111814770A
Application number: CN202010919780.5A
Authority: CN
Inventors: 周凡
Original assignee: Shenzhen Research Institute of Sun Yat Sen University
Current assignee: Shenzhen Research Institute of Sun Yat Sen University
Priority date: 2020-09-04
Filing date: 2020-09-04
Publication date: 2020-10-23
Anticipated expiration: 2040-09-04
Also published as: CN111814770B

Abstract

The application is applicable to the technical field of video processing, and provides a content keyword extraction method of a news video, terminal equipment and a medium, wherein a news text is obtained by performing content extraction operation on a target news video, and a corresponding word set is obtained by performing word segmentation processing on the news text; determining a news title of a target news video, and acquiring a preset named entity set; inputting the news text, the word set, the news headline and the named entity set into a trained keyword extraction model for processing to obtain a word score value matrix corresponding to the word set; the total score of the words is determined according to the probability of the words appearing in the news text, the relevance of the words and news titles, the distribution position score of the words in the news text and the matching degree of the words and the named entity set; and determining the target words meeting the preset conditions in the word set as the content keywords of the news video, so that the accuracy of the extracted content keywords is improved.

Description

Content keyword extraction method of news video, terminal device and medium

Technical Field

The application belongs to the technical field of video processing, and particularly relates to a content keyword extraction method of a news video, a terminal device and a computer readable storage medium.

Background

News video is a way to convey news information via video as a medium and information carrier. News videos usually clearly represent the content information of news events, including the people, time, place, and the origin, passage, and result of the event, which are related to the news events, and these key information can be represented by several words to summarize the main content of the news videos, which are called the content keywords of the news videos. In order to classify or search news videos, it is generally necessary to extract content keywords of the news videos from the news videos.

In a conventional method for extracting content keywords of a news video, a content keyword is generally extracted from a news text corresponding to the news video by using a document topic extraction (LDA) model, however, for a special genre, such as a news text, having a specific natural language characteristic, the accuracy of extracting the content keywords of the news video by using the LDA model is low.

Disclosure of Invention

In view of this, embodiments of the present application provide a method for extracting content keywords of a news video, a terminal device, and a computer-readable storage medium, so as to solve the problem that the accuracy of content keywords extracted by an existing method for extracting content keywords of a news video based on an LDA model is low.

In a first aspect, an embodiment of the present application provides a method for extracting content keywords of a news video, including:

performing content extraction operation on a target news video to obtain a news text for describing the target news video, and performing word segmentation processing on the news text to obtain a word set corresponding to the news text;

determining a news title of the target news video, and acquiring a preset named entity set;

inputting the news text, the word set, the news title and the named entity set into a trained keyword extraction model for processing to obtain a word score value matrix corresponding to the word set; wherein the value of each element in the term score value matrix is used for representing the total score value of the term corresponding to the element in the term set; the total score value of the word is determined according to the probability of the word appearing in the news text, the relevance of the word to the news headline, the distribution position score value of the word in the news text and the matching degree of the word and the named entity set;

and determining target words meeting preset conditions in the word set according to the word score value matrix, and determining the target words as content keywords of the news video.

Optionally, the inputting the news text, the word set, the news headline, and the named entity set into a trained keyword extraction model for processing to obtain a word score value matrix corresponding to the word set includes:

inputting the news text, the word set, the news headline and the named entity set into a trained keyword extraction model;

determining, by a document topic generation unit, for each term in the set of terms, a probability of the term appearing in the news text;

for each word in the word set, determining the coincidence quantity of the word and the word contained in the news title, and determining the relevancy of the word and the news title according to the coincidence quantity of the word and the word contained in the news title;

for each word in the word set, determining the position of the word in the news text, and determining the distributed position score value of the word according to the position of the word in the news text;

determining the number of named entities matched with the words in the named entity set aiming at each word in the word set, and determining the matching degree of the words and the named entity set according to the number of the named entities matched with the words in the named entity set;

and for each word in the word set, carrying out weighted summation operation on the probability of the word appearing in the news text, the correlation degree of the word and the news title, the distribution position score value of the word and the matching degree of the word and the named entity set to obtain the total score value of the word.

Optionally, the determining the relevancy of the word to the news headline according to the number of coincidence between the word and the word included in the news headline includes:

calculating the relevance of the terms to the news headlines according to the following formula:

；

wherein the content of the first and second substances,

for the relevance of the ith word in the set of words to the news headline,

for the ith word in the set of words,titlethe news headlines are represented as such,

indicating the number of coincidences of the ith word in the set of words with the words contained in the news headline.

Optionally, the determining the distributed location score value of the word according to the location of the word in the news text includes:

determining a distribution position score value of the term according to the following formula:

；

wherein the content of the first and second substances,

a distribution position score value for the ith word in the set of words,

for the ith word in the set of words,

indicating that the ith word in the set of words is distributed over the first three sentences of the news text,

and the ith word in the word set is distributed in the last three sentences of the news text, and n is the number of sentences contained in the news text.

Optionally, the determining the matching degree of the term and the named entity set according to the number of the named entities matched with the term in the named entity set includes:

calculating the matching degree of the words and the named entity set according to the following formula:

；

wherein the content of the first and second substances,

for the degree of matching of the ith word in the set of words with the set of named entities,

for the ith word in the set of words,

a number of named entities in the set of named entities that are matched for the ith word in the set of words.

Optionally, before the news text, the word set, the news headline, and the named entity set are input into a trained keyword extraction model for processing to obtain a word score matrix corresponding to the word set, the method further includes:

training a pre-constructed keyword extraction model by adopting a principal component analysis method based on a preset sample set to obtain the trained keyword extraction model; each sample data in the preset sample set comprises a sample news text for describing news content of a sample news video, a word set corresponding to the sample news text, a news title of the sample news video, a preset named entity set and content keywords of the sample news video; when the pre-constructed keyword extraction model is trained, the sample news text, the word set corresponding to the sample news text, the news title of the sample news video and a preset named entity set in each sample datum are used as the input of the pre-constructed keyword extraction model, and the content keywords of the sample news video in each sample datum are used as the output of the pre-constructed keyword extraction model.

Optionally, the performing a content extraction operation on the target news video to obtain a news text for describing the target news video includes:

performing framing operation on the video stream of the target news video, and performing optical character recognition operation on a plurality of video frame images obtained by the framing operation to obtain a first text segment corresponding to each video frame image;

based on the video frame images, performing segmentation operation on the audio stream of the target news video to obtain an audio segment corresponding to each video frame image, and performing voice recognition operation on the audio segment corresponding to each video frame image to obtain a second text segment corresponding to each video frame image;

determining a target text segment corresponding to each video frame image according to a first text segment and a second text segment corresponding to each video frame image;

and splicing the target text segments corresponding to the video frame images according to the sequence of the time nodes corresponding to the video frame images from early to late to obtain the news text.

In a second aspect, an embodiment of the present application provides a terminal device, including:

the first processing unit is used for carrying out content extraction operation on a target news video to obtain a news text for describing the target news video, and carrying out word segmentation processing on the news text to obtain a word set corresponding to the news text;

the first acquisition unit is used for determining a news title of the target news video and acquiring a preset named entity set;

the second processing unit is used for inputting the news text, the word set, the news title and the named entity set into a trained keyword extraction model for processing to obtain a word score value matrix corresponding to the word set; wherein the value of each element in the term score value matrix is used for representing the total score value of the term corresponding to the element in the term set; the total score value of the word is determined according to the probability of the word appearing in the news text, the relevance of the word to the news headline, the distribution position score value of the word in the news text and the matching degree of the word and the named entity set;

and the first determining unit is used for determining a target word meeting a preset condition in the word set according to the word score value matrix and determining the target word as a content keyword of the news video.

In a third aspect, an embodiment of the present application provides a terminal device, where the terminal device includes a processor, a memory, and a computer program stored in the memory and executable on the processor, and the processor, when executing the computer program, implements the method according to the first aspect or any optional manner of the first aspect.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium storing a computer program, which when executed by a processor implements the method according to the first aspect or any alternative manner of the first aspect.

In a fifth aspect, embodiments of the present application provide a computer program product, which, when run on a terminal device, causes the terminal device to perform the method of the first aspect or any alternative manner of the first aspect.

The method for extracting the content keywords of the news video, the terminal device and the computer-readable storage medium have the following advantages that:

according to the content keyword extraction method of the news video, provided by the embodiment of the application, a news text for describing a target news video is obtained by performing content extraction operation on the target news video, and word segmentation processing is performed on the news text to obtain a word set corresponding to the news text; determining a news title of a target news video, and acquiring a preset named entity set; inputting the news text, the word set, the news headline and the named entity set into a trained keyword extraction model for processing to obtain a word score value matrix corresponding to the word set; the value of each element in the term score value matrix is used for representing the total score value of the term corresponding to the element in the term set; and determining target words meeting preset conditions in the word set according to the word score value matrix, and determining the target words as content keywords of the news video. Since the content keywords of the news video generally have a great correlation with the news headlines, the frequency and distribution position of the content keywords of the news video in the news text generally have a certain regularity, and the content keywords of the news video belong to some fixed named entities, thus, the present scheme determines the total score value of each word by determining the probability of each word in the set of words appearing in the news text, the relevance of each word to the news headline, the distribution position score value of each word in the news text, and the matching degree of each word to the set of named entities, that is, the total score value is obtained by comprehensively considering the natural language characteristics of the news text, so that the content keywords determined based on the total score value of each word can more accurately summarize the content of the news text, and the accuracy of the extracted content keywords is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a schematic flow chart of a content keyword extraction method for a news video according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of a keyword extraction model provided in an embodiment of the present application;

fig. 3 is a flowchart of a specific implementation of S13 in the method for extracting content keywords of a news video according to the embodiment of the present application;

fig. 4 is a schematic structural diagram of a terminal device provided in an embodiment of the present application;

fig. 5 is a schematic structural diagram of a terminal device according to another embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It should be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items. Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.

It should also be appreciated that reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.

Referring to fig. 1, fig. 1 is a schematic flowchart of a content keyword extraction method for a news video according to an embodiment of the present application. The execution main body of the content keyword extraction method of the news video is the terminal device, and the terminal device can be a mobile terminal such as a smart phone or a tablet personal computer. The content keyword extraction method of the news video shown in FIG. 1 may include S11-S14, which is detailed as follows:

s11: and performing content extraction operation on the target news video to obtain a news text for describing the target news video, and performing word segmentation processing on the news text to obtain a word set corresponding to the news text.

In S11, the target news video may be any news video from which content keywords need to be extracted. Wherein, the news video refers to the video with the content of news reports.

In the embodiment of the application, the terminal equipment can acquire the target news video from various news websites or video websites. After the terminal equipment acquires the target news video, content extraction operation is carried out on the target news video so as to convert news content expressed by the target news video into news text.

In an embodiment of the present application, the news content expressed by the target news video may be composed of news content expressed by each video frame image included in the target news video, and based on this, the content extraction operation performed on the target news video by the terminal device may specifically include the following steps:

performing framing operation on a video stream of a target news video, and performing optical character recognition operation on a plurality of video frame images obtained by the framing operation to obtain a first text segment corresponding to each video frame image;

determining a target text segment corresponding to each video frame image according to the first text segment and the second text segment corresponding to each video frame image;

and splicing the target text segments corresponding to the video frame images according to the sequence of the time nodes corresponding to the video frame images from early to late to obtain a news text for describing a target news video.

In an embodiment of the application, the first text segment and the second text segment each comprise at least one sentence.

Because the news content expressed by each video frame image may be displayed in a subtitle form in the video frame image, or may be expressed in an audio form in an audio clip corresponding to the video frame image, after the terminal device obtains the first text clip and the second text clip corresponding to each video frame image, the terminal device may combine the first text clip and the second text clip corresponding to each video frame image to obtain a quasi-text clip corresponding to each video frame image, and perform deduplication processing on a sentence repeated in the quasi-text clip corresponding to each video frame image to obtain a target text clip corresponding to each video frame image.

Illustratively, suppose a first text segment corresponding to a certain video frame image comprises a sentence S₁Sentence S₂And sentence S₃The second text segment corresponding to the video frame image comprises a sentence S₁Sentence S₃And sentence S₄Then, the quasi text segment obtained by combining the first text segment and the second text segment corresponding to the video frame image includes a sentence S₁Sentence S₁Sentence S₂Sentence S₃Sentence S₃And sentence S₄Carrying out duplication elimination processing on repeated sentences in the quasi text segment to obtain a target text segment corresponding to the video frame image and comprising a sentence S₁Sentence S₂Sentence S₃And sentence S₄。

The time node corresponding to the video frame image refers to the time corresponding to the video frame image in the target news video. For example, if the total duration of the target news video is 10 minutes, and the time of the first video frame image included in the target news video corresponds to 3 minutes and 40 seconds in the target news video, the time node corresponding to the first video frame image is 3 minutes and 40 seconds; the time corresponding to the second video frame image included in the target news video is 4 minutes 01 seconds, and the time node corresponding to the second video frame image is 4 minutes 01 seconds.

It should be noted that the target text segment corresponding to the video frame image with the earlier time node is arranged before the target text segment corresponding to the video frame image with the later time node. For example, if the time node corresponding to the first video frame image is earlier than the time node corresponding to the second video frame image, the target text segment corresponding to the first video frame image is arranged before the target text segment corresponding to the second video frame image in the news text.

The method comprises the steps that after a terminal device obtains a news text used for describing a target news video, word segmentation operation is conducted on the news text to obtain a plurality of words, the terminal device conducts word removing operation and duplication removing operation on the plurality of words, and a word set formed by the remaining words after the word removing operation and the duplication removing operation is determined as a word set corresponding to the news text.

S12: and determining a news title of the target news video, and acquiring a preset named entity set.

Since the news headline usually appears in a plurality of video frame images of the news video, in an embodiment of the application, after the terminal device obtains the target text segment corresponding to each video frame image, if it is detected that at least a preset number of target text segments corresponding to the video frame images all include the same sentence, the terminal device may extract the sentence from any target text segment including the sentence, and determine the sentence as the news headline of the target news video. The preset number can be set according to actual requirements.

Since each news video will typically have a video title, and the video title can typically highly summarize the news content of the news video, in another embodiment of the present application, the terminal device may obtain the video title of the target news video and determine the video title of the target news video as the news title of the target news video. For example, if the video name of a certain target news video is "2019 oscar awards ceremony", the terminal device may determine "2019 oscar awards ceremony" as the news headline of the target news video.

In the field of natural linguistics, common entities with specific meanings, such as organization names, proper nouns, personal names, place names, time and quantity phrases, are collectively called named entities, and words can be classified by the named entities. Generally, content keywords of a news video all belong to named entities from the aspect of term attributes, and therefore, the possibility that each term in a term set can be used as a content keyword can be determined by judging the matching degree of each term in the term set and a preset named entity set.

The preset named entity set may be set according to actual requirements, for example, the preset named entity set may include named entities such as organization names, proper nouns, names of people, place names, time and quantity phrases, and the like. The matching degree of a certain word with a preset named entity set can be determined according to the matching degree of the word with each named entity included in the named entity set. For example, when a category to which a word belongs is a named entity in a set of named entities, it means that the word matches the named entity.

The terminal device may store a preset set of named entities in its memory in advance.

In this embodiment, the terminal device may obtain a preset named entity set from a memory thereof.

S13: and inputting the news text, the word set, the news title and the named entity set into a trained keyword extraction model for processing to obtain a word score value matrix corresponding to the word set.

In the embodiment of the present application, each element in the word score value matrix corresponds to one word in the word set, and the value of each element in the word score value matrix is used to indicate the total score value of the element in the word, i.e., the word corresponding to the sum. The total score of a word is determined according to the probability of the word appearing in the news text, the relevance of the word to the news headline, the distribution position score of the word in the news text, and the matching degree of the word and the named entity set.

Specifically, after the terminal device inputs the news text, the word set, the news headline and the named entity set into the trained keyword extraction model, the keyword extraction model calculates the probability of each word in the word set appearing in the news text, calculates the degree of correlation between each word in the word set and the news headline, calculates the value of the distribution position score of each word in the word set in the news text, and calculates the degree of matching between each word in the word set and the named entity set.

Then, for each word in the word set, the keyword extraction model calculates the total score of the word according to the probability of the word appearing in the news text, the relevance of the word to the news headline, the distribution position score of the word in the news text, and the matching degree of the word and the named entity set. After the keyword extraction model obtains the total score values of all the words in the word set, the total score values of all the words can be output in a matrix form, and the total score values of all the words represented in the matrix form are the word score value matrix corresponding to the word set.

Referring to fig. 2, fig. 2 is a schematic structural diagram of a preset keyword extraction model according to an embodiment of the present disclosure, and as shown in fig. 2, the preset keyword extraction model 20 may include a document topic generation (LDA) unit 21, a relevance calculation unit 22, a distribution location score value calculation unit 23, a matching degree calculation unit 24, and a total score value calculation unit 25.

Wherein, the input end of LDA unit 21, the input end of correlation calculation unit 22, the input end of distribution position score calculation unit 23, and the input end of matching degree calculation unit 24 constitute the input end of the keyword extraction model, the output end of LDA unit 21, the output end of correlation calculation unit 22, the output end of distribution position score calculation unit 23, and the output end of matching degree calculation unit 24 are all connected to the input end of total score calculation unit 25, and the output end of total score calculation unit 25 is the output end of the keyword extraction model.

Based on this, in an embodiment of the present application, S13 may specifically include S131 to S136 shown in fig. 3, which are detailed as follows:

s131: and inputting the news text, the word set, the news headline and the named entity set into a trained keyword extraction model.

S132: for each word in the set of words, determining, by a document topic generation unit, a probability of the word appearing in the news text.

In the embodiment of the present application, the probability that any word in the word set appears in the news text may refer to the frequency of the occurrence of the word in the news text.

After the terminal device inputs the news text and the word set corresponding to the news text into the keyword extraction model, for each word in the word set, the probability of the word appearing in the news text can be determined through the LDA unit 21.

In practical application, the document theme generation unit may specifically be an existing LDA model.

It should be noted that, because the structure and principle of the LDA model are the prior art, the specific process of determining the probability of occurrence of a word in a news text by using the LDA model may refer to the related description in the prior art, and is not described herein again.

S133: for each word in the word set, determining the coincidence quantity of the word and the word contained in the news title, and determining the relevance of the word and the news title according to the coincidence quantity of the word and the word contained in the news title.

In this embodiment of the application, after the terminal device inputs the word set and the news headline into the keyword extraction model, for each word in the word set, the number of coincidences of the word and the word included in the news headline may be determined in the relevancy calculation unit 22, and the relevancy between the word and the news headline may be determined according to the number of coincidences of the word and the word included in the news headline.

In an embodiment of the present application, the determining, by the terminal device, the relevance between the word and the news headline according to the number of coincidences between the word and the word included in the news headline may specifically include the following steps:

；

wherein the content of the first and second substances,

is the word setThe relevance of the ith word in the contract to the news headline,

In the embodiment of the present application,

indicating that every word in the news headline coincides with the ith word in the word set, the relevance of the ith word to the news headline is increased by 1. Wherein, the coincidence of two terms is used to indicate that two terms are completely the same or that two terms are similar to each other.

For example, if a first word in the word set is the same as a second word contained in a news headline, it means that the first word in the word set coincides with the second word contained in the news headline, the number of coincidence between the first word in the word set and the word contained in the news headline is 1, and the relevance between the first word in the word set and the news headline is 1.

S134: for each word in the set of words, determining a position of the word in the news text, and determining a distributed position score value for the word based on the position of the word in the news text.

In this embodiment of the application, after the terminal device inputs the word set and the news text into the keyword extraction model, for each word in the word set, the position of the word in the news text may be determined in the distributed position score value calculation unit 23, and the distributed position score value of the word may be determined according to the position of the word in the news text.

In an embodiment of the present application, the determining, by the terminal device, the distributed location score value of the word according to the location of the word in the news text may specifically include the following steps:

；

wherein the content of the first and second substances,

a distribution position score value for the ith word in the set of words,

for the ith word in the set of words,

In the embodiment of the present application,

indicating that the ith word in the word set is distributed in the first three sentences of the news text once, adding 1 to the distribution position score value of the ith word, wherein,

。

indicating that the ith word in the word set is distributed in the last three sentences of the news text once, adding 1 to the distribution position score value of the ith word, wherein,

。

for example, if the first word in the word set is distributed in the first sentence, the second sentence and the third sentence of the news text, the distribution position score of the word is 3; if the third word in the word set is distributed in the fourth sentence of the news text, and the news text comprises 15 sentences in total, the distribution position score value of the word is 1-log3/log 15; if the fifth word in the word set is distributed only in the penultimate sentence of the news text, the distribution position score of the word is 1.

S135: and determining the number of named entities matched with the words in the named entity set aiming at each word in the word set, and determining the matching degree of the words and the named entity set according to the number of the named entities matched with the words in the named entity set.

In this embodiment, after the terminal device inputs the word set and the preset named entity set into the keyword extraction model, for each word in the word set, the matching degree calculation unit 24 may determine the named entities in the named entity set that match the word, count the number of the named entities in the named entity set that match the word, and determine the matching degree between the word and the named entity set according to the number of the named entities in the named entity set that match the word.

In an embodiment of the present application, the terminal device determines the matching degree between a word and a named entity set according to the number of the named entities matching with the word in the named entity set, and specifically may include the following steps:

；

wherein the content of the first and second substances,

for the ith word in the set of words,

In the embodiment of the present application, matching a certain named entity in a named entity set with a certain word in a word set means that the named entity is a category to which the word belongs.

For example, if the named entity set includes a named entity of "place name" and the category to which the first word in the word set belongs is place name, it is stated that the named entity of "place name" in the named entity set matches the first word in the word set.

And indicating that each named entity in the preset named entity set is matched with the ith word in the word set, and adding 1 to the matching degree of the ith word and the named entity set.

For example, if two named entities in the preset named entity set both match a word in the word set, the matching degree of the word with the named entity set is 2.

S136: and for each word in the word set, carrying out weighted summation operation on the probability of the word appearing in the news text, the correlation degree of the word and the news title, the distribution position score value of the word and the matching degree of the word and the named entity set to obtain the total score value of the word.

In the embodiment of the application, after the terminal device obtains the probability of each word in the word set appearing in the news text, the correlation between each word and the news title, the distribution position score value of each word, and the matching degree between each word and the named entity set, for each word in the word set, the terminal device may perform weighted summation operation on the probability of the word appearing in the news text, the correlation between the word and the news title, the distribution position score value of the word, and the matching degree between the word and the named entity set to obtain the total score value of the word. The weighting coefficients corresponding to the weighting terms (i.e., the probability of each word appearing in the news text, the degree of correlation between each word and the news headline, the distribution position score of each word, and the matching degree between each word and the named entity set) may be set according to actual requirements, or may be learned during the training process of the keyword extraction model.

In the embodiment of the present application, S132, S133, S134, and S135 may be parallel steps, that is, the terminal device may simultaneously execute S132, S133, S134, and S135.

In an embodiment of the application, the trained keyword extraction model may be obtained by training a pre-constructed keyword extraction model by a principal component analysis method based on a preset sample set.

Each sample data in the preset sample set comprises a sample news text for describing news content of a sample news video, a word set corresponding to the sample news text, a news title of the sample news video, a preset named entity set and content keywords of the sample news video. The content keywords of the sample news video included in each piece of sample data may be manually extracted from the sample news text of the sample news video. The structure of the pre-constructed keyword extraction model can be as shown in fig. 2, and is not described herein again.

When a pre-constructed keyword extraction model is trained, a sample news text for describing news content of a sample news video, a word set corresponding to the sample news text, a news title of the sample news video and a preset named entity set, which are included in each sample datum, can be used as input of the keyword extraction model, content keywords of the sample news video, which are included in each sample datum, are used as output of the keyword extraction model, and a principal component analysis method is adopted to train the pre-constructed keyword extraction model. In the process of training the keyword extraction model by adopting a principal component analysis method, the keyword extraction model can learn the weighting coefficients corresponding to the weighting items.

After the training of the keyword extraction model is completed, the terminal device may store the trained keyword extraction model. Wherein, the trained keyword extraction model is the trained keyword extraction model in S13.

S14: and determining target words meeting preset conditions in the word set according to the word score value matrix, and determining the target words as content keywords of the news video.

In the embodiment of the present application, the preset condition may be set according to an actual requirement, and is not limited herein.

The number of target words meeting the preset condition in the word set may be one or multiple, and the number of target words is not particularly limited in the embodiments of the present application.

For example, in an embodiment of the present application, the preset condition may be: the total score value of the words is greater than a preset score threshold value. Wherein, the preset score value threshold value can be set according to actual requirements. Based on this, the terminal device may determine, as the target word, a word corresponding to an element in the word set whose median of the word score value matrix is greater than the preset score value threshold.

In another embodiment of the present application, the preset condition may be: when the words in the word set are arranged according to the sequence of the total score value from high to low, the words are arranged at the top m. Wherein m is an integer greater than or equal to 1. Based on this, the terminal device may arrange each element in the word score value matrix in order of a large value to a small value, and determine a word corresponding to the element arranged in the top m in the word set as a target word. It should be noted that the number of the determined target words in this embodiment is m.

As can be seen from the above, in the content keyword extraction method for a news video provided in the embodiment of the present application, a news text for describing a target news video is obtained by performing content extraction operation on the target news video, and word segmentation processing is performed on the news text to obtain a word set corresponding to the news text; determining a news title of a target news video, and acquiring a preset named entity set; inputting the news text, the word set, the news headline and the named entity set into a trained keyword extraction model for processing to obtain a word score value matrix corresponding to the word set; the value of each element in the term score value matrix is used for representing the total score value of the term corresponding to the element in the term set; and determining target words meeting preset conditions in the word set according to the word score value matrix, and determining the target words as content keywords of the news video. Since the content keywords of the news video generally have a great correlation with the news headlines, the frequency and distribution position of the content keywords of the news video in the news text generally have a certain regularity, and the content keywords of the news video belong to some fixed named entities, thus, the present scheme determines the total score value of each word by determining the probability of each word in the set of words appearing in the news text, the relevance of each word to the news headline, the distribution position score value of each word in the news text, and the matching degree of each word to the set of named entities, that is, the total score value is obtained by comprehensively considering the natural language characteristics of the news text, so that the content keywords determined based on the total score value of each word can more accurately summarize the content of the news text, and the accuracy of the extracted content keywords is improved.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Based on the content keyword extraction method for the news video provided by the embodiment, the embodiment of the invention further provides the embodiment of the terminal equipment for realizing the embodiment of the method.

Referring to fig. 4, fig. 4 is a schematic structural diagram of a terminal device according to an embodiment of the present application. In the embodiment of the present application, each unit included in the terminal device is configured to execute each step in the embodiments corresponding to fig. 1 to fig. 3. Please refer to fig. 1 to 3 and fig. 1 to 3 for the corresponding embodiments. For convenience of explanation, only the portions related to the present embodiment are shown. As shown in fig. 4, the terminal device 40 includes: a first processing unit 41, a first obtaining unit 42, a second processing unit 43 and a first determining unit 44. Wherein:

the first processing unit 41 is configured to perform content extraction on a target news video to obtain a news text for describing the target news video, and perform word segmentation on the news text to obtain a word set corresponding to the news text.

The first obtaining unit 42 is configured to determine a news headline of the target news video, and obtain a preset named entity set.

The second processing unit 43 is configured to input the news text, the word set, the news headline, and the named entity set into a trained keyword extraction model for processing, so as to obtain a word score value matrix corresponding to the word set; wherein the value of each element in the term score value matrix is used for representing the total score value of the term corresponding to the element in the term set; the total score value of the term is determined according to the probability of the term appearing in the news text, the relevance of the term to the news headline, the distribution position score value of the term in the news text, and the matching degree of the term to the named entity set.

The first determining unit 44 is configured to determine, according to the word score matrix, a target word in the word set that meets a preset condition, and determine the target word as a content keyword of the news video.

Optionally, the second processing unit 43 may include: the device comprises a first input unit, a second determination unit, a third determination unit, a fourth determination unit, a fifth determination unit and a sixth determination unit. Wherein:

the first input unit is used for inputting the news text, the word set, the news headline and the named entity set into a trained keyword extraction model.

The second determining unit is used for determining the probability of the word appearing in the news text through the document theme generating unit for each word in the word set.

The third determining unit is used for determining the coincidence quantity of the words contained in the words and the news headlines for each word in the word set, and determining the relevancy of the words and the news headlines according to the coincidence quantity of the words contained in the words and the news headlines.

The fourth determining unit is used for determining the position of each word in the word set in the news text, and determining the distributed position score value of the word according to the position of the word in the news text.

The fifth determining unit is used for determining the number of named entities matched with the word in the named entity set aiming at each word in the word set, and determining the matching degree of the word and the named entity set according to the number of the named entities matched with the word in the named entity set.

The sixth determining unit is configured to perform, for each word in the word set, a weighted summation operation on a probability that the word appears in the news text, a correlation degree between the word and the news title, a distribution position score of the word, and a matching degree between the word and the named entity set, so as to obtain a total score of the word.

Optionally, the third determining unit is specifically configured to:

；

wherein the content of the first and second substances,

for the relevance of the ith word in the set of words to the news headline,

Optionally, the fourth determining unit is specifically configured to:

；

wherein the content of the first and second substances,

a distribution position score value for the ith word in the set of words,

for the ith word in the set of words,

indicating that the ith word in the word set is distributed behind the news textAnd n is the number of sentences contained in the news text.

Optionally, the fifth determining unit is specifically configured to:

；

wherein the content of the first and second substances,

for the ith word in the set of words,

Optionally, the terminal device 40 further comprises a training unit.

The training unit is used for training a pre-constructed keyword extraction model by adopting a principal component analysis method based on a preset sample set to obtain the trained keyword extraction model; each sample data in the preset sample set comprises a sample news text for describing news content of a sample news video, a word set corresponding to the sample news text, a news title of the sample news video, a preset named entity set and content keywords of the sample news video; when the pre-constructed keyword extraction model is trained, the sample news text, the word set corresponding to the sample news text, the news title of the sample news video and a preset named entity set in each sample datum are used as the input of the pre-constructed keyword extraction model, and the content keywords of the sample news video in each sample datum are used as the output of the pre-constructed keyword extraction model.

Optionally, the first processing unit may include: the device comprises a framing unit, a segmenting unit, a seventh determining unit and a text splicing unit. Wherein:

and the framing unit is used for performing framing operation on the video stream of the target news video and performing optical character recognition operation on a plurality of video frame images obtained by the framing operation to obtain a first text segment corresponding to each video frame image.

The segmenting unit is used for segmenting the audio stream of the target news video based on the plurality of video frame images to obtain an audio segment corresponding to each video frame image, and performing voice recognition operation on the audio segment corresponding to each video frame image to obtain a second text segment corresponding to each video frame image.

The seventh determining unit is used for determining a target text segment corresponding to each video frame image according to the first text segment and the second text segment corresponding to each video frame image.

And the text splicing unit is used for splicing the target text segments corresponding to the video frame images according to the sequence of the time nodes corresponding to the video frame images from early to late to obtain the news text.

It should be noted that, because the contents of information interaction, execution process, and the like between the modules are based on the same concept as that of the embodiment of the method of the present application, specific functions and technical effects thereof may be referred to specifically in the embodiment of the method, and are not described herein again.

Fig. 5 is a schematic structural diagram of a terminal device according to another embodiment of the present application. As shown in fig. 5, the terminal device 5 provided in this embodiment includes: a processor 50, a memory 51 and a computer program 52, such as an image depth estimation program, stored in said memory 51 and executable on said processor 50. The processor 50, when executing the computer program 52, implements the steps in the above-mentioned embodiments of the content keyword extraction method for news videos, such as S11-S14 shown in fig. 1. Alternatively, the processor 50, when executing the computer program 52, implements the functions of the modules/units in the terminal device embodiments, such as the functions of the units 41 to 44 shown in fig. 4.

Illustratively, the computer program 52 may be partitioned into one or more modules/units, which are stored in the memory 51 and executed by the processor 50 to accomplish the present application. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program 52 in the terminal device 5. For example, the computer program 52 may be divided into a first processing unit, a first obtaining unit, a second processing unit and a first determining unit, and specific functions of each unit refer to the description in the embodiment corresponding to fig. 4, which is not described herein again.

The terminal device may include, but is not limited to, a processor 50, a memory 51. Those skilled in the art will appreciate that fig. 5 is merely an example of a terminal device 5 and does not constitute a limitation of terminal device 5 and may include more or fewer components than shown, or some components may be combined, or different components, e.g., the terminal device may also include input-output devices, network access devices, buses, etc.

The Processor 50 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 51 may be an internal storage unit of the terminal device 5, such as a hard disk or a memory of the terminal device 5. The memory 51 may also be an external storage device of the terminal device 5, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the terminal device 5. Further, the memory 51 may also include both an internal storage unit and an external storage device of the terminal device 5. The memory 51 is used for storing the computer program and other programs and data required by the terminal device. The memory 51 may also be used to temporarily store data that has been output or is to be output.

An embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the method for extracting content keywords of a news video may be implemented.

The embodiment of the application provides a computer program product, and when the computer program product runs on a terminal device, the method for extracting the content keywords of the news video can be realized when the terminal device executes the computer program product.

It is obvious to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional units and modules is merely used as an example, and in practical applications, the foregoing function distribution may be performed by different functional units and modules as needed, that is, the internal structure of the terminal device is divided into different functional units or modules to perform all or part of the above-described functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the description of each embodiment has its own emphasis, and parts that are not described or illustrated in a certain embodiment may refer to the description of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. A method for extracting content keywords of a news video is characterized by comprising the following steps:

2. The method of claim 1, wherein the inputting the news text, the word set, the news headline, and the named entity set into a trained keyword extraction model for processing to obtain a word score value matrix corresponding to the word set comprises:

3. The method of claim 2, wherein determining the relevance of the term to the news headline based on the number of coincidences of the term with terms contained in the news headline comprises:

；

wherein the content of the first and second substances,

for the relevance of the ith word in the set of words to the news headline,

4. The method of claim 2, wherein determining the distributed position score value for the term based on the position of the term in the news text comprises:

；

wherein the content of the first and second substances,

a distribution position score value for the ith word in the set of words,

for the ith word in the set of words,

5. The method of claim 2, wherein determining the degree of match of the term to the set of named entities based on the number of named entities in the set of named entities that match the term comprises:

；

wherein the content of the first and second substances,

is that it isThe ith word in the set of words,

6. The method according to any one of claims 1 to 5, wherein before the inputting the news text, the word set, the news headline, and the named entity set into a trained keyword extraction model for processing, and obtaining a word score value matrix corresponding to the word set, the method further comprises:

7. The method according to any one of claims 1 to 5, wherein the performing a content extraction operation on the target news video to obtain a news text describing the target news video comprises:

8. A terminal device, comprising:

9. A terminal device, characterized in that the terminal device comprises a processor, a memory and a computer program stored in the memory and executable on the processor, the processor implementing the method according to any of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.