CN108829893B

CN108829893B - Method and device for determining video label, storage medium and terminal equipment

Info

Publication number: CN108829893B
Application number: CN201810712717.7A
Authority: CN
Inventors: 刘呈祥; 何伯磊; 吴甜
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2018-06-29
Filing date: 2018-06-29
Publication date: 2021-01-29
Anticipated expiration: 2038-06-29
Also published as: CN108829893A

Abstract

The invention provides a method, a device, a storage medium and a terminal device for determining a video label, wherein the method comprises the following steps: determining an acquisition mode of an associated text of a video to be processed according to the field of the video to be processed so as to extract the associated text of the video to be processed; extracting each candidate label of the video to be processed from the associated text of the video to be processed; ranking each of the candidate tags; and selecting the label which is consistent with the video to be processed from the candidate labels according to the sorting result. By adopting the method and the device, the accuracy of the video label description is improved.

Description

Method and device for determining video label, storage medium and terminal equipment

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method and an apparatus for determining a video tag, a storage medium, and a terminal device.

Background

With the development of internet information, personalized information recommendation for users is a new trend of information consumption based on the interest points of the users. The recommendation of the video is an important component of personalized recommendation. In order to implement personalized recommendation of videos, it is necessary to understand the content of the videos in advance and implement tagging of the videos. Tagging is a process of defining tags for a video, which may describe points of interest for video content.

At present, the scheme for defining video tags includes:

1. and carrying out extraction analysis according to the title text of the video, and extracting keywords from the title text of the video as tags of the video.

2. By manually watching the video, the person understands the content of the video and then tags the video with corresponding tags.

However, the above scheme has the following disadvantages:

1. for scheme 1, the title of the video is usually short, the description of the title is relatively spoken, few keywords are extracted from the title, and the defined tags are difficult to accurately describe the video content without understanding and verification based on the video content.

2. For the scheme 2, the human understanding that the video content can improve the accuracy of defining the tags, but the efficiency is low and the cost is high.

Disclosure of Invention

Embodiments of the present invention provide a method, an apparatus, a storage medium, and a terminal device for determining a video tag, so as to solve or alleviate one or more of the above technical problems in the prior art.

In a first aspect, an embodiment of the present invention provides a method for determining a video tag, including:

determining an acquisition mode of an associated text of a video to be processed according to the field of the video to be processed so as to extract the associated text of the video to be processed;

extracting each candidate label of the video to be processed from the associated text of the video to be processed;

ranking each of the candidate tags; and

and selecting the label which is consistent with the video to be processed from the candidate labels according to the sorting result.

With reference to the first aspect, in a first implementation manner of the first aspect, determining an obtaining manner of an associated text of a to-be-processed video according to a domain to which the to-be-processed video belongs, so as to extract the associated text of the to-be-processed video, includes:

if the video to be processed belongs to the target field, acquiring a subtitle text from the video to be processed by adopting an image recognition technology; and

and if the video to be processed does not belong to the target field, acquiring the title of the video to be processed.

With reference to the first implementation manner of the first aspect, in a second implementation manner of the first aspect, the extracting each candidate tag of the to-be-processed video from the associated text of the to-be-processed video includes:

and if the associated text is the subtitle text, performing structural analysis, semantic analysis and theme classification on the subtitle text to obtain a candidate tag of the video to be processed.

With reference to the second implementation manner of the first aspect, in a third implementation manner of the first aspect, performing structure analysis on the subtitle text includes:

determining key words forming the subtitle text according to the text structure of the subtitle text;

counting the frequency of each keyword appearing in the subtitle text; and

and selecting keywords from the keywords as candidate tags of the video according to the frequency of the keywords.

With reference to the second implementation manner of the first aspect, in a fourth implementation manner of the first aspect, performing semantic analysis on the subtitle text includes:

calculating semantic similarity between a preset label and the subtitle text according to a semantic analysis model; and

and selecting a label from preset labels as a candidate label of the video according to semantic similarity between the preset label and the subtitle text.

With reference to the second implementation manner of the first aspect, in a fifth implementation manner of the first aspect, the topic classification of the subtitle text includes:

and acquiring a label from a preset subject label according to the similarity between the candidate label of the video and the preset subject label, and using the label as the candidate label of the video.

With reference to the first aspect, in a sixth implementation manner of the first aspect, the ranking each candidate tag includes:

setting a weight value for each candidate label according to the frequency of the candidate label appearing in the associated text;

when the frequency of the candidate label appearing in the associated text is zero, adjusting the weight value of the candidate label according to the semantic similarity of the candidate label and the associated text; and

and sorting the candidate labels according to the weight value of each candidate label.

With reference to the first aspect and any one of its embodiments, in a seventh embodiment of the first aspect, the method further comprises:

preprocessing the associated text; wherein the pre-processing comprises: at least one of segmentation, clause segmentation, word segmentation, part-of-speech identification, and named entity recognition; and

post-processing the candidate tag; wherein the post-processing comprises: at least one of deduplication, format unification, disambiguation, tag timeliness processing.

With reference to the first aspect, in an eighth implementation manner of the first aspect, the extracting each candidate tag of the video to be processed from the associated text of the video to be processed further includes:

if the associated text is the title, performing word segmentation on the title to obtain candidate words;

calculating the weight value of each candidate word;

sequencing the candidate words according to external comparison information and the weighted values of the candidate words to obtain the labels of the videos to be processed; wherein the external comparison information comprises the search heat of each candidate word in an external system.

In a second aspect, an embodiment of the present invention provides an apparatus for determining a video tag, including:

the relevant text extraction module is used for determining an acquisition mode of the relevant text of the video to be processed according to the field to which the video to be processed belongs so as to extract the relevant text of the video to be processed;

the candidate label extraction module is used for extracting each candidate label of the video to be processed from the associated text of the video to be processed;

the candidate tag sorting module is used for sorting each candidate tag; and

and the label selection module is used for selecting a label which is consistent with the video to be processed from the candidate labels according to the sorting result.

With reference to the second aspect, in a first implementation manner of the second aspect, the tag selection module includes:

the caption text acquisition unit is used for acquiring a caption text from the video to be processed by adopting an image recognition technology if the video to be processed belongs to the target field; and

and the video title acquisition unit is used for acquiring the title of the video to be processed if the video to be processed does not belong to the target field.

With reference to the first implementation manner of the second aspect, in a second implementation manner of the second aspect, the candidate tag extraction module includes:

and the caption text analysis unit is used for carrying out structural analysis, semantic analysis and theme classification on the caption text to obtain a candidate label of the video to be processed if the associated text is the caption text.

With reference to the first implementation manner of the second aspect, in a third implementation manner of the second aspect, the candidate tag extraction module further includes:

the title word segmentation unit is used for segmenting the title to obtain each candidate word if the associated text is the title;

the weight calculation unit is used for calculating the weight value of each candidate word;

the word sorting unit is used for sorting the candidate words according to external comparison information and the weight values of the candidate words to obtain the labels of the videos to be processed; wherein the external comparison information comprises the search heat of each candidate word in an external system.

The functions of the device can be realized by hardware, and can also be realized by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above-described functions.

In one possible design, the structure for determining the video tag includes a processor and a memory, the memory is used for the apparatus for determining the video tag to execute the program for determining the video tag in the first aspect, and the processor is configured to execute the program stored in the memory. The means for determining a video tag may further comprise a communication interface for communicating the means for determining a video tag with other devices or a communication network.

In a third aspect, an embodiment of the present invention further provides a computer-readable storage medium for storing computer software instructions for an apparatus for determining a video tag, where the computer software instructions include a program for executing the method for determining a video tag according to the first aspect.

Any one of the above technical solutions has the following advantages or beneficial effects:

the embodiment of the invention can acquire the associated text of the video according to the field to which the video belongs. And extracting the tags from the associated text and sorting to select tags that are consistent with the video. Compared with the mode of acquiring the label only by the video title, the embodiment of the invention can acquire more comprehensive video information according to the video field, and the description accuracy of the label extracted from the video information is higher.

The foregoing summary is provided for the purpose of description only and is not intended to be limiting in any way. In addition to the illustrative aspects, embodiments, and features described above, further aspects, embodiments, and features of the present invention will be readily apparent by reference to the drawings and following detailed description.

Drawings

In the drawings, like reference numerals refer to the same or similar parts or elements throughout the several views unless otherwise specified. The figures are not necessarily to scale. It is appreciated that these drawings depict only some embodiments in accordance with the disclosure and are therefore not to be considered limiting of its scope.

FIG. 1 is a schematic flow chart diagram illustrating one embodiment of a method for determining video tags;

FIG. 2 is a flowchart illustrating an embodiment of a method for obtaining associated text based on a target field according to the present invention;

FIG. 3 is a flow diagram illustrating one embodiment of a process for extracting candidate tags provided by the present invention;

FIG. 4 is a flow diagram illustrating one embodiment of text structure analysis provided by the present invention;

FIG. 5 is a flow diagram illustrating one embodiment of semantic analysis provided by the present invention;

FIG. 6 is a flow diagram for one embodiment of tag ordering provided by the present invention;

fig. 7 to 9 are schematic diagrams of an application example of the method for determining a video tag provided by the present invention;

FIG. 10 is a schematic structural diagram of an embodiment of an apparatus for determining a video tag provided in the present invention;

fig. 11 is a schematic structural diagram of an embodiment of a terminal device provided by the present invention.

Detailed Description

In the following, only certain exemplary embodiments are briefly described. As those skilled in the art will recognize, the described embodiments may be modified in various different ways, all without departing from the spirit or scope of the present invention. Accordingly, the drawings and description are to be regarded as illustrative in nature, and not as restrictive.

Referring to fig. 1, an embodiment of the present invention provides a method for determining a video tag, which can be applied to a terminal device. The terminal devices may include smart phones, tablets, computers, and the like. The embodiment includes steps S100 to S400, which are specifically as follows:

s100, determining an acquisition mode of the associated text of the video to be processed according to the field of the video to be processed, so as to extract the associated text of the video to be processed.

In the present embodiment, the field to which the video belongs may include a video with subtitles, a video without subtitles, a drama video, a lyric video, a news video, and the like. The associated text may include a video title, subtitle text describing the video content, and the like. The acquisition mode of the associated text of the video can comprise direct acquisition and indirect acquisition. The text which can be directly obtained comprises a video title, a document of the video title and the like, and the text which can be indirectly obtained comprises the video audio and the title in the video image. For example: for video audio, text may be obtained by speech recognition conversion. For subtitles in a video image, subtitle text can be obtained from the video through an image recognition technology. Image Recognition techniques may include similar OCR (Optical Character Recognition) techniques.

S200, extracting each candidate label of the video to be processed from the associated text of the video to be processed.

In this embodiment, tags may be used to describe the content or points of interest of a video. Candidate tags may include entity words (names of people, places, time), proper nouns, and the like. For example: guangdong province, spring, May, navy, aircraft carrier, etc.

S300, sorting the candidate labels.

And S400, selecting the label which is consistent with the video to be processed from the candidate labels according to the sorting result. Wherein, the more the candidate tags are arranged, the higher the degree of matching with the video to be processed is.

In this embodiment, the associated text of the video may be acquired according to the domain to which the video belongs. And extracting the tags from the associated text and sorting to select tags that are consistent with the video. Compared with the mode of acquiring the label from the video title, the embodiment can acquire other video associated texts according to the video field, the information describing the video is more comprehensive, and the accuracy of describing the video by the extracted label is higher.

In one possible implementation, as shown in fig. 2, the step S100 may include steps S110 and S120, as follows:

and S110, if the video to be processed belongs to the target field, acquiring a subtitle text from the video to be processed by adopting an image recognition technology.

And S120, if the video to be processed does not belong to the target field, acquiring the title of the video to be processed.

In the present embodiment, the video belonging to the target domain may include a video suitable for acquiring subtitles from the video. For example, military, historical domain video. The tags are obtained from the subtitle content, the description can be supplemented for the candidate tags, and the information is more accurate. Videos that do not belong to the target domain, for example, videos without subtitles, and videos of drama, lyrics, and the like. For drama, lyrics, etc. videos, if caption text is introduced, the labels extracted from them will introduce interference to the description of the video. In this way, it is possible to consider extracting tags from the title of the video. For a video without subtitles but with captions that can be used to describe the content of the video, short text keyword analysis techniques can be used to extract tags from the video titles.

In a possible implementation manner, as shown in fig. 3, the process of extracting each candidate tag of the video to be processed in step S200 may include step S210, as follows:

and S210, if the associated text is the subtitle text, performing structural analysis, semantic analysis and theme classification on the subtitle text to obtain candidate tags of the video to be processed.

In this embodiment, through the processing of long texts such as structural analysis, semantic analysis, and topic classification, a variety of candidate tags can be obtained, and videos can be described from multiple angles.

In a possible implementation manner, as shown in fig. 4, the present embodiment may perform structure analysis on the subtitle text, including steps S510 to S530, as follows:

and S510, determining keywords forming the subtitle text according to the text structure of the subtitle text.

In an embodiment, the text structure of the subtitle text may include syntax, part of speech, and the like. The type of words in the text can be determined by the text structure, and thus, the keywords of the subtitle text are determined. For example, the caption text includes "a aircraft carrier is expected to arrive at XXX place for the first time in spring of the next year", and the caption text is analyzed for the subject-predicate object to obtain the entity word: number A, aircraft carrier, next year and XXX place, and taking the entity word as a keyword.

And S520, counting the frequency of each keyword appearing in the caption text.

And S530, selecting keywords from the keywords as candidate tags of the video according to the frequency of the keywords.

In this embodiment, the higher the frequency of occurrence of the keyword in the subtitle text, the higher the degree of coincidence with the video. For example, when the occurrence frequency of each keyword is high, the selected threshold value may be increased; when the occurrence frequency of each keyword is low, the selected threshold value can be adjusted to be low. For another example, when the frequency of occurrence of each keyword is different greatly, a keyword with a high frequency of occurrence may be selected from the keywords. Through the keyword selection in the above steps S510 to S530, keywords can be directly extracted from the subtitle text as candidate tags of the video.

In a possible implementation manner, as shown in fig. 5, the present embodiment may perform semantic analysis on the subtitle text, including steps S610 to S620, as follows:

s610, according to the semantic analysis model, calculating semantic similarity between the preset label and the subtitle text.

And S620, selecting the labels from the preset labels as candidate labels of the video according to the semantic similarity between the preset labels and the subtitle text.

In this embodiment, the semantic analysis model may include a simnet (Simulation Network) operator, and the semantic analysis model may be obtained by training in advance according to training data. The training data includes processed label and caption text. The trained simnet operator can be used to calculate semantic similarity between the preset tag and the subtitle text. Wherein, the higher the semantic similarity, the higher the degree of conformity with the video.

In this embodiment, keywords that have not been found in the subtitle text may be selected as candidate tags by the model prediction tag form. For example, for an automatically driven text, if the word of "artificial intelligence" does not appear in the text, the way of extracting keywords from the text cannot be labeled with "artificial intelligence", but the way of predicting the labels through a model can be realized.

In a possible implementation manner, the embodiment may perform topic classification on the subtitle text, including: and acquiring a label from the preset subject label according to the similarity between the candidate label of the video and the preset subject label, and taking the label as the candidate label of the video.

In this embodiment, the similarity between the candidate tag and the preset topic tag may be calculated through a topic model. The topic model can be obtained by training the semantic analysis model in advance according to training data, wherein the training data comprises a topic label and a candidate label of the video. The hashtag may describe the central idea or main content of the video. The topic model can generalize the central idea or main content of the video expression from the candidate labels of the video.

In another possible implementation manner, the training data may include a topic label and a text of the video, and the model may calculate a similarity between a subtitle text of the video and a preset topic label, acquire the label from the preset topic label, and use the label as a candidate label of the video.

In a possible implementation manner, as shown in fig. 6, the process of sorting candidate tags in step S300 may include steps S310 to S330, as follows:

s310, setting weight values for the candidate labels according to the frequency of the occurrence of each candidate label in the associated text.

And S320, when the frequency of the candidate tags appearing in the associated text is zero, adjusting the weight values of the candidate tags according to the semantic similarity of the candidate tags and the subtitle text.

S330, sorting the candidate labels according to the weight values of the candidate labels.

In the embodiment of the present invention, the higher the frequency of occurrence of the candidate tag in the associated text, the higher the weight value of the candidate tag. The higher the weight value occupied by the candidate tags, the earlier the candidate tags are ranked. To avoid some tags that may accurately describe a video but do not appear in the associated text being ranked too far behind to be missed as tags for the video. For the labels which do not appear in the associated text, the weight value of the candidate labels can be readjusted according to the semantic similarity between the candidate labels and the associated text. The higher the semantic similarity between the candidate tag and the associated text is, the higher the weight value of the candidate tag is. For example, if the label is "artificial intelligence", for the text of automatic driving, and if the word of "artificial intelligence" does not appear in this text, the weight value of the candidate label "artificial intelligence" is zero in step S310. However, if the semantic similarity between the candidate tag "artificial intelligence" and the text of the automatic driving is high, the weight value of the candidate tag "artificial intelligence" may be increased through step S320.

In a possible implementation manner, the embodiment further includes a process of preprocessing and post-processing. Wherein the preprocessing comprises preprocessing the associated text. The pretreatment comprises the following steps: at least one of segmentation, clause, word segmentation, part-of-speech identification, and named entity recognition. And post-processing comprises post-processing the candidate tags. The post-treatment comprises the following steps: at least one of deduplication, format unification, disambiguation, tag timeliness processing.

Illustratively, text may be segmented if paragraphs are not to be broken between the texts. If a sentence of text is too long, the text may be divided into several clauses. The word segmentation can divide a sentence into a plurality of words according to a dictionary and identify the part of speech of each word. Parts of speech may include nouns, verbs, prepositions, or words of moods, etc. Named Entity Recognition (NER) is a technique of proper noun Recognition, for example, recognizing a person's name, a place name, an organization's name, time, currency, etc. in a text. Deduplication may preserve one of the same content. If the candidate tags have tags with non-uniform formats, the tags can be formatted in a preset or default format. If an ambiguous tag exists, a re-recognition of the semantics can be performed to determine the semantics of the tag. For example, for the place name "texas", it is necessary to determine whether this "texas" is texas in shandong province or texas in the united states. If the tag's timeliness is over, the tag may be removed.

In a possible implementation manner, as shown in fig. 3, the process of extracting each candidate tag of the video to be processed in step S200 in this embodiment may further include steps S220 to S240, as follows:

and S220, if the associated text is the title, performing word segmentation on the title to obtain each candidate word.

In this embodiment, assuming that the title is "country B becomes country C new weapon actual combat test field", the title may be segmented based on the granularity of the word or word, and the obtained candidate words may include country B, country C, new weapon, actual combat, test field, actual combat test field, and the like.

And S230, calculating the weight value of each candidate word.

In this embodiment, the weight value may be calculated for each candidate word based on an algorithm expressed by a word vector, and the word vector expression algorithm may include a wordrink (word rank) algorithm.

And S240, sequencing the candidate words according to the external comparison information and the weighted values of the candidate words to obtain the labels of the videos to be processed. The external comparison information comprises the search heat degree of each candidate word in an external system. For example, the term "B country" is the search popularity of each of the major mainstream search platforms. Mainstream search platforms can include hundredths, google, or Homing, among others. The higher the search heat of the candidate word, the more top the candidate word is ranked. The larger the weight value of the candidate word, the more advanced the ranking of the candidate word.

In an embodiment, if the associated text is not suitable for acquiring a caption as the associated text of the video, but there is a title of the video, the title of the video may be acquired to extract the tag. The method can fully utilize external verification information on the basis of calculating the weight value for each candidate word, and avoid the over generalization of the label result extracted from the title of the video.

Fig. 7 to fig. 9 are schematic diagrams of application examples of determining a video tag according to an embodiment of the present invention. The application example comprises a label extraction process for videos with subtitles, videos without subtitles and videos with subtitles but not suitable for subtitles, and specifically comprises the following steps:

(1) video with subtitles: subtitles within a video can be identified by OCR technology, and the obtained subtitles are a text format document. The title of the video corresponds to the title of the document, and the OCR (optical character recognition) subtitles correspond to the content of the document. The embodiment can obtain the prediction result by analyzing the structure and the semantics of the document.

As shown in fig. 7, the content framed by the solid frame in the video image is document content, which includes: "when the a aircraft carrier reaches the XXX place three years ago" and "do the D aircraft carrier need to support the waist in the E country? What is intended? ". Through the processing of the text content in the embodiment, the prediction result given by the dotted line in fig. 7 is obtained, which includes: labels of D country aircraft carriers, navy, warships and aircraft carriers. This tag may be displayed in the video.

As shown in fig. 8, the prediction system of the present application example includes a preprocessing layer, a core operator layer, and a post-processing layer.

(a) A pretreatment layer: the method is mainly responsible for carrying out general text analysis work so as to carry out preprocessing. For example: and carrying out segmentation, word segmentation, sentence segmentation, part of speech identification, named entity recognition, syntactic dependency analysis and the like on the document.

(b) Core arithmetic layer: is the most core part in the prediction system and can comprise a plurality of operators with different functions. For example:

b1, semantic structure analysis operator: analyzing the structure of the document and the statistical information of the words in the statistical document, and extracting the attention points (the candidate labels) from the document;

b2, simnet operator: analyzing the semantics of the document and predicting the attention point of the video;

b3, topic model operator: analyzing and predicting a category to which a topic of a document belongs;

b4, extraction of time-dependent concerns: and analyzing the timeliness of the label to extract the attention points from the candidate attention points.

(c) Post-treatment layer: mainly responsible for post-processing of extracted labels. May include merge (deduplication), normalized rewrite (format unification), disambiguation, ordering, and intervention functions.

It should be noted that not all videos are suitable for the prediction system of fig. 8 to perform the focus prediction. For example, military, historical fields of video are suitable, and subtitle content may supplement information for points of interest of the video. However, in some video fields, such as drama, word, etc., the content of OCR in the video may interfere with the prediction of the focus. Therefore, for videos that are not suitable for the OCR-introduced point of interest prediction, the following scheme of non-captioned video can be used for the solution.

(1) Video without subtitles: the information of the video title may be used for analysis, for example, using short text keyword analysis techniques. The keyword of the title can be extracted by using external verification information on the basis of a wordrank algorithm, so that an excessively generalized extraction result can be avoided.

As shown in fig. 9, the main logic of the short text keyword analysis is as follows:

(a) and (3) producing candidates: the title is first participled. For an input title of 'national B becomes national C novel weapon actual combat test field', the title can be segmented based on the granularity of characters or words, and the obtained candidate words can comprise 'national B, national C, novel weapons, actual combat, test field and actual combat test field'. Then, determining candidate tag results according to the word segmentation results, which may include: recall policies based on word granularity and phrase granularity. Among them, the phrase candidates can be generated by n-gram (Chinese language model) of word. The initial screening of the candidate can also be performed by using part of speech and some heuristic rules.

(b) Candidate ranking: and calculating the weight value of the candidate words and sequencing the candidate words. The weight values (wordrank scores) for word and phrase may be calculated based on wordrank algorithm. Then, sorting is performed according to the weight of each word and phrase.

(c) And (5) checking the result: with external verification information, for example: the sorted results are checked by the attention point map, the query heat (querypv), the encyclopedia search heat (encyclopedia pv) and the IDF (Inverse text Frequency index), so that the result can be prevented from being extracted excessively and generally.

The embodiment has the following advantages:

1. the prediction effect is good. Subtitle information is introduced by using an OCR subtitle recognition technology, so that a prediction system can acquire information capable of more comprehensively describing videos. Can help to understand the content of the video pictures to a certain extent. The recall rate of the labels can be improved, and the accuracy of label recall is also improved.

2. High efficiency and low cost. After the model is trained, the model is used for prediction, manual marking is not needed, the efficiency is high, and a good prediction effect can be kept.

Referring to fig. 10, an apparatus for determining a video tag according to an embodiment of the present invention includes:

the associated text extraction module 100 is configured to determine an obtaining manner of an associated text of a to-be-processed video according to a field to which the to-be-processed video belongs, so as to extract the associated text of the to-be-processed video;

a candidate tag extraction module 200, configured to extract candidate tags of the video to be processed from the associated text of the video to be processed;

a candidate tag ranking module 300, configured to rank each candidate tag; and

a tag selecting module 400, configured to select, according to the sorting result, a tag that matches the to-be-processed video from the candidate tags.

In one possible implementation, the tag selection module includes:

In one possible implementation, the candidate tag extraction module includes:

In one possible implementation manner, the candidate tag extraction module further includes:

An embodiment of the present invention further provides a terminal device, as shown in fig. 11, where the terminal device includes: a memory 21 and a processor 22, the memory 21 having stored therein computer programs that may be executed on the processor 22. The processor 22, when executing the computer program, implements the method of determining a video tag in the above embodiments. The number of the memory 21 and the processor 22 may be one or more.

The apparatus further comprises:

a communication interface 23 for communication between the processor 22 and an external device.

The memory 21 may comprise a high-speed RAM memory, and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory.

If the memory 21, the processor 22 and the communication interface 23 are implemented independently, the memory 21, the processor 22 and the communication interface 23 may be connected to each other through a bus and perform communication with each other. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 11, but this is not intended to represent only one bus or type of bus.

Optionally, in a specific implementation, if the memory 21, the processor 22 and the communication interface 23 are integrated on a chip, the memory 21, the processor 22 and the communication interface 23 may complete mutual communication through an internal interface.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.

The computer readable media of embodiments of the present invention may be computer readable signal media or computer readable storage media or any combination of the two. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable read-only memory (CDROM). Additionally, the computer-readable storage medium may even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

In embodiments of the present invention, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, input method, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, Radio Frequency (RF), etc., or any suitable combination of the preceding.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware that is related to instructions of a program, and the program may be stored in a computer-readable storage medium, and when executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a separate product, may also be stored in a computer readable storage medium. The storage medium may be a read-only memory, a magnetic or optical disk, or the like.

While the invention has been described with reference to specific embodiments, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A method for determining a video tag, comprising:

ranking each of the candidate tags; and

according to the sorting result, selecting a label which is consistent with the video to be processed from all the candidate labels;

determining an acquisition mode of an associated text of a video to be processed according to a field to which the video to be processed belongs to extract the associated text of the video to be processed, wherein the acquisition mode comprises the following steps: if the video to be processed belongs to the target field, acquiring a subtitle text from the video to be processed by adopting an image recognition technology; the method comprises the following steps that a video suitable for acquiring a caption text from the video belongs to the target field; if the video to be processed does not belong to the target field, acquiring the title of the video to be processed;

extracting each candidate label of the video to be processed from the associated text of the video to be processed, wherein the extracting comprises the following steps: if the associated text is the subtitle text, performing theme classification on the subtitle text to obtain a candidate tag of the video to be processed; the theme classification of the subtitle text comprises the following steps: and acquiring a label from a preset subject label according to the similarity between the candidate label of the video and the preset subject label, and using the label as the candidate label of the video.

2. The method of claim 1, wherein extracting candidate tags of the video to be processed from the associated text of the video to be processed further comprises:

and if the associated text is the subtitle text, performing structural analysis and semantic analysis on the subtitle text to obtain a candidate tag of the video to be processed.

3. The method of determining a video tag of claim 2, wherein performing a structural analysis on the subtitle text comprises:

counting the frequency of each keyword appearing in the subtitle text; and

4. The method of determining a video tag of claim 2, wherein semantically analyzing the subtitle text comprises:

5. The method of determining video tags of claim 1, wherein ranking each of said candidate tags comprises:

6. The method of determining a video tag according to any of claims 1 to 5, wherein the method further comprises:

7. The method of claim 1, wherein extracting candidate tags of the video to be processed from the associated text of the video to be processed further comprises:

calculating the weight value of each candidate word;

sorting the candidate words according to external verification information and the weight values of the candidate words to obtain tags of the video to be processed; wherein the external verification information comprises the search heat of each candidate word in an external system.

8. An apparatus for determining a video tag, comprising:

the candidate tag sorting module is used for sorting each candidate tag; and

the tag selection module is used for selecting a tag which is consistent with the video to be processed from the candidate tags according to the sorting result;

wherein, the associated text extraction module comprises:

the caption text acquisition unit is used for acquiring a caption text from the video to be processed by adopting an image recognition technology if the video to be processed belongs to the target field; the method comprises the following steps that a video suitable for acquiring a caption text from the video belongs to the target field; and

the video title acquisition unit is used for acquiring the title of the video to be processed if the video to be processed does not belong to the target field;

the candidate tag extraction module comprises: the caption text analysis unit is used for carrying out theme classification on the caption text to obtain a candidate label of the video to be processed if the associated text is the caption text; the subtitle text analysis unit performs theme classification on the subtitle text, and includes: and acquiring a label from a preset subject label according to the similarity between the candidate label of the video and the preset subject label, and using the label as the candidate label of the video.

9. The apparatus for determining video tags according to claim 8, wherein the caption text analysis unit is further configured to perform structural analysis and semantic analysis on the caption text to obtain candidate tags of the video to be processed if the associated text is the caption text.

10. The apparatus for determining video tags of claim 8, wherein said candidate tag extraction module further comprises:

the word sorting unit is used for sorting the candidate words according to external verification information and the weight values of the candidate words to obtain the labels of the videos to be processed; wherein the external verification information comprises the search heat of each candidate word in an external system.

11. A terminal device for implementing video tag determination, the terminal device comprising:

one or more processors;

storage means for storing one or more programs;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of determining a video tag of any of claims 1-7.

12. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method of determining a video tag according to any one of claims 1 to 7.