CN112015928B

CN112015928B - Information extraction method and device for multimedia resources, electronic equipment and storage medium

Info

Publication number: CN112015928B
Application number: CN202010872925.0A
Authority: CN
Inventors: 杨帆
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2020-08-26
Filing date: 2020-08-26
Publication date: 2024-07-09
Anticipated expiration: 2040-08-26
Also published as: CN112015928A

Abstract

The disclosure relates to an information extraction method, an information extraction device, electronic equipment and a storage medium for multimedia resources, relates to the technical field of Internet, and aims to solve the problem that key information extraction of multimedia resources with text description is difficult in the related technology. The method disclosed by the invention comprises the following steps: the method comprises the steps of obtaining the category of the multimedia resource to be processed by carrying out characteristic analysis on the resource characteristic information of the multimedia resource to be processed; sub description information obtained by word segmentation processing of description information of the multimedia resource to be processed is matched with a label in a label set corresponding to a pre-configured category; and selecting at least one piece of sub description information as key information of the multimedia resource to be processed according to a matching result of the description information of the multimedia resource to be processed and the tag set. After the multimedia resources are classified, the description information of the multimedia resources is matched with the label set corresponding to the corresponding category, so that the accuracy of fine-grained key information extraction is improved.

Description

Information extraction method and device for multimedia resources, electronic equipment and storage medium

Technical Field

The disclosure relates to the technical field of internet, and in particular relates to an information extraction method and device of multimedia resources, electronic equipment and a storage medium.

Background

With the rapid development of multimedia technology and the popularization of intelligent terminals, multimedia resources bearing more information can be rapidly spread, and the multimedia resources become one of important channels for people to acquire information.

Taking short video as an example of multimedia resources, short video content understanding plays an important role in short video recommendation, search and operation, and short video content understanding generally refers to converting short video content into a embedding (embedded) feature or text label form. In the related art, a technology of extracting labels or keywords based on text analysis generally adopts methods such as TF-IDF (Term Frequency-inverse text Frequency index), textRank and the like, however, the methods have good effects on long texts, but have poor effects on short video description information, and the description information of the short video is generally short and refined, so that the key information such as labels of the video is difficult to analyze based on statistics.

Disclosure of Invention

The disclosure provides an information extraction method, an information extraction device, electronic equipment and a storage medium for multimedia resources, which at least solve the problem that key information extraction is difficult to be carried out on multimedia resources with text description in the related technology. The technical scheme of the present disclosure is as follows:

According to a first aspect of an embodiment of the present disclosure, there is provided an information extraction method of a multimedia resource, including:

the method comprises the steps of obtaining the category of the multimedia resource to be processed by carrying out feature analysis on the resource feature information of the multimedia resource to be processed;

Matching each piece of sub description information obtained by word segmentation processing of the description information of the multimedia resource to be processed with each tag in a tag set corresponding to the category which is pre-configured;

And selecting at least one piece of sub description information from the sub description information as key information of the multimedia resource to be processed according to a matching result between the description information of the multimedia resource to be processed and the tag set.

In an alternative embodiment, the tag set corresponding to the category is obtained according to the following manner:

Performing word segmentation processing on the description information of each sample multimedia resource belonging to the same category with the multimedia resource to be processed to obtain each label;

selecting at least one tag from the tags as a candidate tag according to the word frequency of the tags;

And after carrying out de-duplication treatment on each candidate label, taking a set formed by the rest candidate labels as a label set corresponding to the category.

In an optional implementation manner, each piece of sub description information obtained by word segmentation processing on the description information of the multimedia resource to be processed is matched with each tag in the tag set corresponding to the category which is pre-configured, and the method specifically includes:

obtaining information vectors of all sub description information obtained after word segmentation processing is carried out on the description information of the multimedia resource to be processed, and information vectors of all tags in the tag set;

And matching each piece of sub-description information with each tag according to the information vector of each piece of sub-description information and the information vector of each tag.

In an optional implementation manner, according to a matching result between the description information of the multimedia resource to be processed and the tag set, at least one piece of sub description information is selected from the sub description information as key information of the multimedia resource to be processed, and specifically includes:

For any piece of sub-description information, determining a matching parameter corresponding to the sub-description information according to a matching result between the sub-description information and each tag in the tag set;

And selecting at least one piece of sub-description information from the sub-description information based on a comparison result between a preset threshold value and the matching parameters of the sub-description information, wherein the key information comprises at least one of a target label and title description information for constructing a title.

In an optional implementation manner, the determining the matching parameter corresponding to the sub-description information based on the matching result between the sub-description information and each tag in the tag set specifically includes:

And taking the shortest distance between the information vector of the sub-description information and the information vector of each tag in the tag set as a matching parameter corresponding to the sub-description information.

In an alternative embodiment, the key information includes a target tag;

And selecting at least one piece of sub-description information from the sub-description information based on a comparison result between a preset threshold and the matching parameters of the sub-description information, wherein the piece of sub-description information is used as key information of the multimedia resource to be processed, and the method specifically comprises the following steps:

selecting first sub description information with the matching parameter smaller than the preset threshold value from the sub description information;

Respectively analyzing the parts of speech of each first sub-description information, and reserving the first sub-description information with the parts of speech being the target parts of speech;

and respectively matching each piece of residual first sub-description information with each tag in the tag set, and taking the first sub-description information which is the same as the tag in each piece of residual first sub-description information as the target tag of the multimedia resource to be processed if the tag which is the same as the first sub-description information exists in the tag set.

In an alternative embodiment, the key information includes title description information;

and selecting second sub-description information with the matching parameter not smaller than the preset threshold value from the sub-description information, and taking the second sub-description information as the title description information of the multimedia resource to be processed.

According to a second aspect of the embodiments of the present disclosure, there is provided an information extraction apparatus of a multimedia resource, including:

the classification unit is configured to perform characteristic analysis on the resource characteristic information of the multimedia resource to be processed to acquire the category to which the multimedia resource to be processed belongs;

The matching unit is configured to execute each piece of sub description information obtained by word segmentation processing of the description information of the multimedia resource to be processed, and match each piece of label in the label set corresponding to the category configured in advance;

And the screening unit is configured to execute matching results between the description information of the multimedia resource to be processed and the tag set, and at least one piece of sub description information is selected from the sub description information to serve as key information of the multimedia resource to be processed.

In an alternative embodiment, the apparatus further comprises:

The construction unit is configured to execute word segmentation processing on the description information of each sample multimedia resource belonging to the same category with the multimedia resource to be processed to obtain each label;

In an alternative embodiment, the matching unit is specifically configured to perform:

In an alternative embodiment, the screening unit is specifically configured to perform:

In an alternative embodiment, the key information includes a target tag;

the matching unit is specifically configured to perform:

According to a third aspect of embodiments of the present disclosure, there is provided an electronic device, comprising:

A processor;

a memory for storing the processor-executable instructions;

Wherein the processor is configured to execute the instructions to implement the method for extracting information of a multimedia resource according to any one of the first aspect of the embodiments of the present disclosure.

According to a fourth aspect of embodiments of the present disclosure, there is provided a non-transitory readable storage medium, which when executed by a processor of an electronic device, causes the electronic device to perform the method for information extraction of a multimedia resource according to any one of the first aspect of embodiments of the present disclosure.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product which, when run on an electronic device, causes the electronic device to perform a method of implementing the above-described first aspect and any one of the possible concerns of the first aspect of embodiments of the present disclosure.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

in the embodiment of the disclosure, besides classifying the multimedia resources based on the resource characteristic information of the multimedia resources, the description information of the multimedia resources is further utilized, the sub-description information obtained after the word segmentation processing of the description information is matched with each tag in the pre-configured tag set, and the tags in the tag set are related to the category to which the multimedia resources belong because the tag set corresponds to the category to which the multimedia resources belong, so that the sub-description information is screened based on the matching result between the sub-description information and the tags, and more effective sub-description information can be obtained as key information. The embodiment of the disclosure extracts the key information contained in the description information based on the category to which the multimedia resource belongs, and can solve the problem of a large part of fine-granularity labels.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure and do not constitute an undue limitation on the disclosure.

FIG. 1 is a schematic diagram of an application scenario shown in accordance with an exemplary embodiment;

FIG. 2 is a flow chart illustrating a method of information extraction of a multimedia asset according to an exemplary embodiment;

FIG. 3 is a schematic diagram of a classification model structure, shown in accordance with an exemplary embodiment;

FIG. 4 is a flowchart illustrating a method of building a set of tags, according to an exemplary embodiment;

FIG. 5 is a flowchart illustrating a complete method of extracting critical information, according to an exemplary embodiment;

fig. 6 is a block diagram of an information extraction apparatus of a multimedia asset, which is shown according to an exemplary embodiment;

Fig. 7 is a block diagram of an electronic device, according to an example embodiment.

Detailed Description

In order to enable those skilled in the art to better understand the technical solutions of the present disclosure, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the disclosure described herein may be capable of operation in sequences other than those illustrated or described herein. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.

Some words appearing hereinafter are explained:

electronic equipment: may be a mobile phone, computer, digital broadcast terminal, messaging device, game console, tablet device, medical device, exercise device, personal digital assistant, etc.

And (3) tag: in the embodiment of the disclosure, words with parts of speech are referred to, when the multimedia resource is a video, the tag refers to some core elements involved in the video, and the video title is a title that makes the relation between the tags clear, for example, a beauty, a squid is a tag, the beauty does the squid, and the beauty eats the squid. The embodiment of the disclosure is mainly used for converting descriptive information in video into labels or titles through NLP (Natural Language Processing ) technology.

Noun: the method is divided into entity nouns and abstract nouns. Wherein entity nouns refer to certain things (e.g., cats, beds, pyramids, etc.) that have an entity; abstract nouns refer to abstract things (e.g., free, sense, etc.) such as emotion, opinion, concept, etc. The distinction between the two is not very obvious and there are many cases of interpenetration.

Word frequency: is a common weighting technique for information retrieval and text mining to evaluate the degree of repetition of a word to a set of domain documents in a document or corpus. Word frequency statistics provides a new method and field of view for academic research. In the embodiment of the present disclosure, the word frequency refers to the repetition degree of one piece of sub-description information in each piece of sub-description information obtained by dividing the description information of the sample multimedia resource.

Modality (Modality): each source or form of information may be referred to as a modality. For example, a person has touch, hearing, vision, smell; the medium of information includes voice, video, text, etc.; a wide variety of sensors such as radar, infrared, accelerometers, etc. Each of the above may be referred to as a modality. At the same time, the mode can be defined very widely, for example, two different languages can be regarded as two modes, and even the data set collected under two different conditions can be regarded as two modes.

TF-IDF: is a common weighting technique for information retrieval and data mining. TF is the Term Frequency (Term Frequency) and IDF is the inverse text Frequency index (Inverse Document Frequency). To evaluate the importance of a word to one of the documents in a document set or corpus. The importance of a word increases proportionally with the number of times it appears in the file, but at the same time decreases inversely with the frequency with which it appears in the corpus. Various forms of TF-IDF weighting are often applied by search engines as a measure or rating of the degree of correlation between documents and user queries.

TextRank: the method is an algorithm for keyword extraction, and can also be used for word extraction and automatic abstract extraction. The main idea is as follows: and constructing a network through the adjacent relation among the words, then iteratively calculating the rank value of each node by using PageRank, and sequencing the rank values to obtain the keywords.

Multimedia resources: resources for digital transmission, such as video, short video, live broadcast, etc.

Resource characteristic information: in the embodiment of the present disclosure, the resource feature information is information for describing the features of the multimedia resource, and since there are many types of multimedia resources in the embodiment of the present disclosure, there are many corresponding resource feature information, which specifically includes the content features of the multimedia resource and the author features of the publishing multimedia resource. For example, the resource feature information may include information such as audio features, video sequences, cover images, text descriptions, and author features; the author information may be a user portrait of a user who publishes the multimedia resource, including age, gender, hobbies, etc. of the author. As an effective tool for outlining target users, contacting user appeal and design direction, user portraits are widely used in various fields.

Description information: the description information is convenient for users to know the multimedia resource object and also convenient for users to search and integrate information. For example, the main video content of the video is the smiling scene when the cat first sees snow, and the description information of the video can be described as: snow cats were seen for the first time. In the embodiment of the disclosure, the description information may be a text when the author publishes the multimedia resource, a text in the multimedia resource, and the like.

The application scenario described in the embodiments of the present disclosure is for more clearly describing the technical solution of the embodiments of the present disclosure, and does not constitute a limitation on the technical solution provided by the embodiments of the present disclosure, and as a person of ordinary skill in the art can know that, with the appearance of a new application scenario, the technical solution provided by the embodiments of the present disclosure is equally applicable to similar technical problems. Wherein in the description of the present disclosure, unless otherwise indicated, the meaning of "plurality" is used.

Fig. 1 is a schematic diagram of an application scenario of an embodiment of the disclosure. The application scenario diagram comprises two terminal devices 110 and a server 130, through which terminal devices 110 short video application interfaces 120 can be logged in. Communication between the terminal device 110 and the server 130 may be performed through a communication network. One user corresponding to each terminal device, in fig. 1, one terminal device 110 corresponding to each of the user a and the user B is taken as an example, and the number of terminal devices is not limited in practice. In some cases, the terminal devices may communicate with each other through the server 130, direct communication may be established between the terminal devices, and a manner of direct communication between the terminal devices may be referred to as point-to-point communication, in which case some interaction procedures between the terminal devices may not require the transfer of the server 130.

Wherein, each terminal device can be provided with a browser client provided in the embodiment of the disclosure. The client related to the embodiment of the disclosure may be a preinstalled client, a client (e.g. applet) embedded in a certain application, or a client of a web page, and is not limited to a specific type of client.

It should be noted that, the method for extracting information of a multimedia resource in the embodiment of the present disclosure may be performed by the server 130, and accordingly, the device for extracting information of a multimedia resource is generally disposed in the server 130. Optionally, the method for extracting information of a multimedia resource provided by the embodiments of the present disclosure may also be performed by the terminal device 110, and accordingly, the apparatus for extracting information of a multimedia resource is generally provided in the terminal device 110. In addition, it should be noted that, the method for extracting information of a multimedia resource provided in the embodiments of the present disclosure may also be executed by the server 130 and the terminal device 110 together, for example, the step of "obtaining the category to which the multimedia resource to be processed belongs by performing feature analysis on the resource feature information of the multimedia resource to be processed" may be executed by the terminal device 110, and the remaining steps may be executed by the server 130. The present disclosure is not limited in this regard.

In an alternative embodiment, the communication network is a wired network or a wireless network. The terminal device 110 and the server 130 may be directly or indirectly connected through wired or wireless communication, and the present disclosure is not limited herein.

In the embodiment of the present disclosure, the terminal device 110 is an electronic device used by a user, and the electronic device may be a personal computer, a mobile phone, a tablet computer, a notebook, a television, an electronic book reader, or a computer device with a certain computing capability and running software or a website related to a multimedia resource platform. Each terminal device 110 is connected to the server 130 through a wireless network, where the server 130 may be an independent physical server, or may be a server cluster or a distributed system formed by multiple physical servers, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN (Content Delivery Network ), and basic cloud computing services such as big data and an artificial intelligent platform.

In the embodiment of the disclosure, after extracting the key information of the multimedia resource to be processed, services such as recommendation, search, operation and the like can be effectively performed based on the key information.

Fig. 2 is a flowchart illustrating a method of extracting information of a multimedia asset according to an exemplary embodiment, as shown in fig. 2, including the following steps.

In step S21, the category to which the multimedia resource to be processed belongs is obtained by performing feature analysis on the resource feature information of the multimedia resource to be processed;

Where multimedia assets refer to digitally transmitted assets, such as video, short video, live broadcast, etc., in the following mainly short video is exemplified.

In step S22, each piece of sub-description information obtained by word segmentation processing on the description information of the multimedia resource to be processed is matched with each tag in the tag set corresponding to the pre-configured category;

in the embodiment of the present disclosure, the description information of the multimedia resource may include a series of character strings such as text, number, and english word, and the word segmentation process is specifically performed on the character strings such as text, number, and english word when the description information is segmented, so as to obtain multiple sub-description information.

In step S23, at least one piece of sub-description information is selected from the sub-description information according to the matching result between the description information of the multimedia resource to be processed and the tag set, and is used as the key information of the multimedia resource to be processed.

By means of the scheme, the embodiment of the disclosure not only classifies the multimedia resources based on the resource characteristic information of the multimedia resources, but also further utilizes the description information of the multimedia resources, the sub-description information obtained after word segmentation processing of the description information is matched with each tag in the pre-configured tag set, and the tags in the tag set are related to the category to which the multimedia resources belong because the tag set corresponds to the category to which the multimedia resources belong, so that the sub-description information is screened based on the matching result between the sub-description information and the tags, and more effective sub-description information can be obtained as key information. The embodiment of the disclosure extracts the key information contained in the description information based on the category to which the multimedia resource belongs, and can solve the problem of a large part of fine-granularity labels.

The following mainly takes a multimedia resource as a short video as an example, and describes in detail an information extraction method of the multimedia resource in the embodiment of the present disclosure:

Wherein the resource characteristic information of the multimedia resource includes, but is not limited to, part or all of the following:

Audio features, video sequences, cover images, text descriptions, author features.

In step S21, feature analysis is performed on the resource feature information of the multimedia resource to be processed, and a deep learning method may be adopted when the category to which the multimedia resource belongs is obtained. Referring to fig. 3, a schematic structure of a classification model according to an embodiment of the disclosure is shown, where the bottom layer of the classification model is a single-mode input layer (single Modality), a multi-mode fusion layer (multi Modality fusion), and an output layer.

The single-mode input layer is used for converting the expression form of the input resource feature information, the input shown in fig. 3 comprises a video sequence (video), a text description (title: subtitle, ocr: character recognition, asr: voice recognition), a cover image (cover), an audio feature (audio) and an author feature (author info), and the single-mode model adopts a CSN (CHANNEL SEPERATED network) model, a bert model, a res50 model, a DNN (Deep Neural Networks, a deep neural network) model and a DNN model respectively to convert the expression of the input feature and convert the feature information into a feature vector form.

The text description may specifically include a title and a subtitle configured when the author publishes the short video, and text information obtained by performing character recognition, voice recognition, and the like on the short video.

Specifically, performing expression conversion on a video sequence based on a CSN model to obtain a feature vector A; performing expression conversion on the text description based on bert model to obtain a feature vector B; performing expression conversion on the cover image based on a res50 (Residual Network) model to obtain a feature vector C; and respectively carrying out expression conversion on the audio features and the author features based on the same or different dnn models to obtain a feature vector D and a feature vector E.

And then, carrying out feature fusion on each feature vector based on the multi-mode fusion layer, wherein a concat mode can be adopted during fusion. For example, feature Fusion is performed on the five feature vectors to obtain Fusion features.

Finally, multi-level classification (Multi-LEVEL CLASS) can be performed on the multimedia resources based on Fusion feature, specifically including primary classification (FIRST CLASS), secondary classification (secondary classification), and the like. Where primary classification is relative to secondary classification, secondary classification is a second level classification, e.g., game is a primary classification, and that is secondary classification, intelligent, shooting, etc. It should be noted that, in the embodiment of the present disclosure, the category to which the multimedia resource belongs is first-level classification, and the second-level classification is not required to be used as mask.

When the short videos are classified in the embodiment of the disclosure, the short videos can be classified into 20 large verticals (categories) including music, dance, entertainment, games, three farmers, health, sports, financial accounting, law, home decoration, fun, quadratic elements, photography, travel, wearing, makeup, automobiles, food, live broadcasting and information.

In the embodiment of the disclosure, after 20 classes of multi-classification is performed on the multimedia resources based on Fusion feature, a primary classification result is finally obtained and is taken as the class to which the multimedia resources belong. The method specifically adopts a multi-mode fusion mode, and performs model learning end to end.

In the above embodiment, not only the text description information of the video is utilized, but also other modal information of the video, such as the video sequence features and audio features listed above, is indirectly utilized, and after the video is divided into a certain large vertical class, the accuracy and recall rate of key information extraction can be greatly improved based on analysis performed by priori knowledge of the vertical class.

After the multimedia resource to be processed is classified through the model, for example, the category to which the multimedia resource belongs is a game, the description information of the multimedia resource needs to be matched with a label set of a preconfigured game category.

The tag sets in the embodiment of the present disclosure are preconfigured, and correspond to categories of multimedia resources, where each category corresponds to one type of tag set. For example, when the multimedia resource is short videos, each short video may be classified into a corresponding category (a vertical category) based on the classification model shown in fig. 3, then 10 ten thousand short videos are selected for each vertical category, and based on the description information of the short videos, keywords (labels) commonly used below each vertical category are counted to form a label set.

Referring to fig. 4, a method for constructing a tag set in an embodiment of the present disclosure mainly uses a multimedia resource to be processed as an example, and describes how to construct a tag set corresponding to a category to which the multimedia resource to be processed belongs, which specifically includes the following procedures:

S41: performing word segmentation processing on the description information of each sample multimedia resource belonging to the same category with the multimedia resource to be processed to obtain each label;

as exemplified in the above embodiments, the categories of multimedia resources to be processed are: the game can classify a plurality of multimedia resources based on the enumerated classification model, and select multimedia resources of a plurality of game categories as sample multimedia resources, wherein the sample multimedia resources and the multimedia resources to be processed belong to the same category.

Further, after the description information of each selected sample multimedia resource is subjected to word segmentation, a plurality of labels can be obtained.

S42: selecting at least one tag from the tags as a candidate tag according to the word frequency of each tag;

In this step, before selecting a candidate tag from among the tags according to the word frequency of each tag, part-of-speech filtering may be performed on each tag, and only tags whose part-of-speech is the target part-of-speech may be retained. In the embodiment of the present disclosure, considering that the tag is generally a noun, the target part of speech may be set as the noun.

For example, the text and the like of each short video are segmented, part-of-speech is filtered, labels with part-of-speech as nouns are reserved, word frequency statistics is carried out, some labels with higher word frequency are selected from the labels with part-of-speech as nouns to serve as candidate labels, and a candidate set is added.

After that, each candidate label in the candidate set can be manually marked to confirm which texts can be used as labels, and the labels mainly consist of nouns, including entity nouns and abstract nouns. For example, empirically, some words in the candidate set that are not frequently used as labels, such as some nouns that are not commonly used in short videos, etc., are screened out.

S43: and after performing de-duplication processing on each candidate label, taking a set formed by the rest candidate labels as a label set corresponding to the category.

When the duplication elimination processing is performed, the candidate labels can be duplicated according to the distance between vectors, and the candidate labels with relatively close semantics are combined to construct a label set.

Specifically, the distance between the information vectors of each candidate tag is calculated, if the distance between the information vectors of two candidate tags is smaller than a certain threshold value, the two candidate tags are determined to have relatively close semantics, duplication removal can be performed, and one candidate tag is reserved to be added into the tag set.

In the embodiment, after the short videos are classified, the description information of the short videos belonging to the same class is statistically analyzed to construct the tag set, so that effective statistics of short videos in a certain class can be realized.

In the embodiment of the present disclosure, before matching description information of a multimedia resource to be processed with a label set of a pre-configured game category, word segmentation processing is first required to be performed on the description information of the multimedia resource, and each piece of sub-description information obtained after word segmentation is respectively matched with each label in the label set, where an optional matching manner is as follows:

acquiring information vectors of all sub description information obtained after word segmentation processing is carried out on description information of a multimedia resource to be processed, and information vectors of all tags in a tag set; and matching each piece of sub description information with each label according to the information vector of each piece of sub description information and the information vector of each label.

The information vector is obtained by representing the sub-description information or the label by a vector, and the general sub-description information or the label is a word, and the corresponding information vector is a word vector. And matching the information vector of the sub description information obtained after the description information of the multimedia resource to be processed is calculated and segmented with the information vector of each tag in the tag set.

Assuming that the description information of the multimedia resource to be processed is a sentence, dividing the sentence into words to obtain 4 pieces of sub description information, namely sub description information A, sub description information B, sub description information C and sub description information D, wherein the total number of the tags in the tag set is 10, and the tags are respectively: tag a, tag B, tag C, tag D, tag E, tag F, tag G, tag H, tag I, tag J, and tag K. When matching is performed, the distance between each of the information vectors of the 4 pieces of sub description information and the information vectors of the 10 tags is calculated, and the matching result between the sub description information and the tags is analyzed according to the distance between the information vectors.

In the above embodiment, the matching result between the sub-description information and the tag may be analyzed by calculating the distance between the vectors, wherein the shorter the distance between the vectors, the closer the semantics of the sub-description information and the tag are. And thus more accurate by comparing the information vector of the sub-description information of the multimedia asset with the information vector of the tag.

Wherein, when at least one piece of sub description information is selected from the sub description information according to the matching result between the description information of the multimedia resource to be processed and the tag set, the sub description information is used as the key information of the multimedia resource to be processed, the analysis can be performed according to the following modes:

For any piece of sub-description information, firstly, according to a matching result between the sub-description information and each tag in the tag set, determining a matching parameter corresponding to the sub-description information, wherein the matching parameter can be the shortest distance between an information vector of the sub-description information and an information vector of each tag in the tag set; and then, based on a comparison result between a preset threshold value and the matching parameters of each piece of sub-description information, selecting at least one piece of sub-description information from each piece of sub-description information as key information of the multimedia resource to be processed, wherein the key information comprises at least one of a target label and title description information for constructing a title.

That is, after the distances between the information vector of each piece of sub-description information and the information vector of each tag in the tag set are calculated by the matching method enumerated in the above embodiment, since it is necessary to calculate the distance between this piece of sub-description information and each tag in the tag set separately for each piece of sub-description information, the shortest distance among the distances between this piece of sub-description information and each tag in the tag set can be used as the matching parameter of this piece of sub-description information. That is, a label closest to the meaning of each piece of sub-descriptive information is found in the label set, and the matching parameters of the sub-descriptive information are determined based on the label closest to the meaning of each piece of sub-descriptive information.

After the matching parameters of each piece of sub description information are obtained through matching, the matching parameters of each piece of sub description information can be compared with a preset threshold value, and the sub description information is classified and screened according to the comparison result.

In the above embodiment, when the shortest distance between the sub-description information and each tag in the tag set is used as the matching parameter of the sub-description information, the tag with the shortest distance between the sub-description information and the tag with the shortest semantic meaning of the sub-description information is used as the matching parameter, and therefore when the target tag and the title description information are screened, the key information which is more matched with the multimedia resource can be accurately screened when the sub-description information is screened based on the comparison result between the preset threshold and the matching parameter of each sub-description information, and the accuracy of key information extraction is improved.

In the embodiment of the disclosure, the sub-description information is mainly classified into two categories, wherein the category is that the corresponding matching parameter is smaller than a preset threshold value, the sub-description information can be called first sub-description information, and the first sub-description information can be further screened to obtain the target label of the multimedia resource to be processed.

The other type of sub-description information is the corresponding sub-description information with the matching parameter not smaller than the preset threshold value, and the sub-description information can be called as second sub-description information. Further, the logic relationship between the second sub-description information can be analyzed according to the description information and/or the content of the multimedia resource, and the title for the multimedia resource to be processed can be constructed according to the title description information. For example, the two second sub-description information are respectively a beauty and a squid, the multimedia resource to be processed is a short video, and the title of the multimedia resource is finally constructed by analyzing the description information or the content of the short video: the beauty eats squid.

The following describes the determination manner of the target tag and the title description information, respectively:

Optionally, if the key information includes a target tag, at least one piece of sub-description information may be selected from the sub-description information as the target tag of the multimedia resource to be processed based on the following manner:

Selecting first sub description information with the matching parameter smaller than a preset threshold value from the sub description information; respectively analyzing the parts of speech of each first sub-description information, and reserving the first sub-description information with the parts of speech being the target parts of speech; and respectively matching each piece of residual first sub-description information with each piece of label in the label set, and taking the first sub-description information which is the same as the label in each piece of residual first sub-description information as a target label of the multimedia resource to be processed if the label which is the same as the first sub-description information exists in the label set.

Assuming that the preset threshold is 0.5, the target part of speech refers to nouns, and among the 4 sub-description information listed above, two corresponding matching parameters are smaller than the preset threshold, namely sub-description information A and sub-description information B, so that the two sub-description information are first sub-description information, part of speech analysis can be further carried out on the two first sub-description information, the first sub-description information with part of speech as noun is screened out, and the first sub-description information with part of speech not as noun is filtered.

If the parts of speech of the sub-description information A and the sub-description information B are nouns, the two sub-description information A and 10 tags in the tag set are further respectively matched, wherein the matching is not the matching between information vectors, but is hard matching, namely, the first sub-description information is respectively compared with each tag in the tag set, and whether the tag set has the tag completely consistent with the first sub-description information is judged. For example, the label a in the label set is identical to the sub-description information a, but the label set does not have the label identical to the sub-description information B, and the final target label is the sub-description information a.

In the embodiment of the disclosure, the description information of the multimedia resource can be analyzed in detail, so that the meaningful labels are extracted. For a short video platform, the proportion of text description contained in the video is about 60%, and based on the embodiment, effective labels can be obtained after text analysis of the part of video, so that the problem of a large part of fine-granularity labels can be solved.

Optionally, if the key information includes header description information, at least one piece of sub description information may be selected from the sub description information as header description information of the multimedia resource to be processed, based on the following manner: and selecting second sub description information with the matching parameter not smaller than a preset threshold value from the sub description information, and taking the second sub description information as the title description information of the multimedia resource to be processed.

For example, two of the 4 pieces of sub-description information are the sub-description information C and the sub-description information D, respectively, where the corresponding matching parameters are not less than the preset threshold, so that the two pieces of sub-description information are second sub-description information, and finally the two pieces of second sub-description information can be directly used as the header description information of the multimedia resource to be processed.

Based on the above embodiment, text analysis can be performed on a large part of video in the short video platform, so as to obtain an effective title. It should be noted that, based on the target tag or title description information acquired by the method in the embodiment of the present disclosure, not only the description information of the multimedia resource is analyzed, but also matching and screening are performed according to the preconfigured tag set, so that after the key information is extracted based on the embodiment of the present disclosure, services such as recommendation, search, operation and the like can be more effectively performed for the video.

FIG. 5 is a flowchart of a complete method of extracting critical information, according to an exemplary embodiment, specifically including the steps of:

s51: extracting resource characteristic information of the multimedia resources to be processed;

s52: the method comprises the steps of obtaining the category of the multimedia resource to be processed by carrying out feature analysis on the resource feature information of the multimedia resource to be processed;

S53: word segmentation processing is carried out on the description information of the multimedia resource to be processed to obtain each piece of sub description information;

S54: acquiring information vectors of all sub description information obtained after word segmentation processing is carried out on description information of a multimedia resource to be processed, and information vectors of all tags in a tag set;

s55: judging whether the shortest distance between the information vector of each piece of sub-description information and the information vector of each tag in the tag set is smaller than a preset threshold value, if so, executing a step S56, otherwise, executing a step S58;

S56: taking sub-description information smaller than a preset threshold value as first sub-description information, respectively analyzing the parts of speech of each first sub-description information, and reserving the first sub-description information with the parts of speech being the target part of speech;

S57: respectively matching each piece of residual first sub-description information with each tag in the tag set, and reserving the first sub-description information which is completely matched with any tag in the tag set as a target tag of the multimedia resource to be processed;

S58: and taking the sub-description information which is not smaller than a preset threshold value as second sub-description information, and taking all the second sub-description information as title description information of the multimedia resource to be processed.

Fig. 6 is a block diagram illustrating an information extraction apparatus 600 of a multimedia asset according to an exemplary embodiment. Referring to fig. 6, the apparatus includes a classification unit 601, a matching unit 602, and a screening unit 603.

A classification unit 601 configured to perform feature analysis by performing feature analysis on the resource feature information of the multimedia resource to be processed, and obtain a category to which the multimedia resource to be processed belongs;

A matching unit 602, configured to perform word segmentation processing on description information of a multimedia resource to be processed to obtain each piece of sub description information, and match each piece of tag in a tag set corresponding to a pre-configured category;

and a filtering unit 603 configured to perform selecting at least one piece of sub-description information from the sub-description information as key information of the multimedia resource to be processed according to a matching result between the description information of the multimedia resource to be processed and the tag set.

In an alternative embodiment, the apparatus further comprises:

a construction unit 604 configured to perform word segmentation processing on the description information of each sample multimedia resource belonging to the same category as the multimedia resource to be processed, so as to obtain each tag;

Selecting at least one tag from the tags as a candidate tag according to the word frequency of each tag;

And after performing de-duplication processing on each candidate label, taking a set formed by the rest candidate labels as a label set corresponding to the category.

In an alternative embodiment, the matching unit 602 is specifically configured to perform:

acquiring information vectors of all sub description information obtained after word segmentation processing is carried out on description information of a multimedia resource to be processed, and information vectors of all tags in a tag set;

and matching each piece of sub description information with each label according to the information vector of each piece of sub description information and the information vector of each label.

In an alternative embodiment, the screening unit 603 is specifically configured to perform:

for any piece of sub-description information, determining matching parameters corresponding to the sub-description information according to matching results between the sub-description information and each tag in the tag set;

In an alternative embodiment, the key information includes a target tag;

The matching unit 602 is specifically configured to perform:

Selecting first sub description information with the matching parameter smaller than a preset threshold value from the sub description information;

And respectively matching each piece of residual first sub-description information with each piece of label in the label set, and taking the first sub-description information which is the same as the label in each piece of residual first sub-description information as a target label of the multimedia resource to be processed if the label which is the same as the first sub-description information exists in the label set.

The matching unit 602 is specifically configured to perform:

And selecting second sub description information with the matching parameter not smaller than a preset threshold value from the sub description information, and taking the second sub description information as the title description information of the multimedia resource to be processed.

The specific manner in which the respective units execute the requests in the apparatus of the above embodiment has been described in detail in the embodiment concerning the method, and will not be described in detail here.

Fig. 7 is a block diagram of an electronic device 700, according to an example embodiment, the apparatus comprising:

A processor 710;

a memory 720 for storing instructions executable by the processor 710;

Wherein the processor 710 is configured to execute the instructions to implement the method for extracting information of any of the multimedia resources in the embodiments of the present disclosure.

In an exemplary embodiment, a storage medium is also provided, such as a memory 720 including instructions executable by the processor 710 of the electronic device 700 to perform the above-described method. Alternatively, the storage medium may be a non-transitory computer readable storage medium, which may be, for example, ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.

The disclosed embodiments also provide a computer program product, which when run on an electronic device, causes the electronic device to perform a method for implementing any one of the above-described information extraction methods of multimedia resources or any one of the methods that may be involved in the information extraction methods of multimedia resources of the disclosed embodiments.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The program product of embodiments of the present disclosure may employ a portable compact disc read only memory (CD-ROM) and include program code and may run on a computing device. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with a command execution system, apparatus, or device.

The readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with a command execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any adaptations, uses, or adaptations of the disclosure following the general principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. An information extraction method of a multimedia resource, comprising:

Matching each piece of sub description information obtained by word segmentation processing of the description information of the multimedia resource to be processed with each tag in a tag set corresponding to the category which is pre-configured; the label is obtained by converting description information of the multimedia resource;

2. The method of claim 1, wherein the set of labels corresponding to the category is obtained according to the following manner:

3. The method of claim 1, wherein matching each piece of sub-description information obtained by word segmentation processing on the description information of the multimedia resource to be processed with each tag in the tag set corresponding to the category configured in advance specifically comprises:

4. The method of claim 1, wherein the determining, according to the matching result between the sub-description information and each tag in the tag set, a matching parameter corresponding to the sub-description information specifically includes:

5. The method of claim 1, wherein the critical information comprises a target tag;

6. The method of claim 1, wherein the key information comprises title description information;

7. An information extraction apparatus for a multimedia resource, comprising:

The matching unit is configured to execute each piece of sub description information obtained by word segmentation processing of the description information of the multimedia resource to be processed, and match each piece of label in the label set corresponding to the category configured in advance; the label is obtained by converting description information of the multimedia resource;

The screening unit is configured to execute the matching parameters corresponding to any piece of sub-description information according to the matching result between the sub-description information and each tag in the tag set; and selecting at least one piece of sub-description information from the sub-description information based on a comparison result between a preset threshold value and the matching parameters of the sub-description information, wherein the key information comprises at least one of a target label and title description information for constructing a title.

8. The apparatus of claim 7, wherein the apparatus further comprises:

9. The apparatus of claim 7, wherein the matching unit is specifically configured to perform:

10. The apparatus of claim 7, wherein the matching unit is specifically configured to perform:

11. The apparatus of claim 7, wherein the critical information comprises a target tag;

the matching unit is specifically configured to perform:

12. The apparatus of claim 7, wherein the key information comprises title description information;

the matching unit is specifically configured to perform:

13. An electronic device, comprising:

A processor;

a memory for storing the processor-executable instructions;

Wherein the processor is configured to execute the instructions to implement the method of information extraction of a multimedia asset as claimed in any one of claims 1 to 6.

14. A storage medium, characterized in that instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the method of information extraction of a multimedia asset as claimed in any one of claims 1 to 6.