CN112580696A

CN112580696A - Advertisement label classification method, system and equipment based on video understanding

Info

Publication number: CN112580696A
Application number: CN202011393760.5A
Authority: CN
Inventors: 冯希宁
Original assignee: Star Media Ltd
Current assignee: Star Media Ltd
Priority date: 2020-12-03
Filing date: 2020-12-03
Publication date: 2021-03-30

Abstract

The invention provides a method, a system and equipment for classifying advertisement labels based on video understanding, which comprises the following steps: labeling the advertisement video to finish the preparation of a data set; adopting Resnet-50 as a backbone network, inserting the time sequence conversion model into the Resnet-50 network to construct a preview frequency classification model, and then training the preview frequency classification model by using a data set to generate a video classification model; and carrying out classification prediction on the advertisement videos to be subjected to label classification by using a video classification model to obtain content classification results of the advertisement videos, and carrying out label classification. The invention can analyze the advertisement video content in multiple dimensions, understand the video semantics and automatically classify and tag, thereby greatly saving the manual examination efficiency and saving the cost.

Description

Advertisement label classification method, system and equipment based on video understanding

Technical Field

The invention relates to the technical field of image classification and identification, in particular to a method, a system and equipment for classifying advertisement labels based on video understanding.

Background

With the rapid development of network technology and multimedia technology, video advertisements are also rapidly developed, and how to accurately push an advertisement to a user interested in the advertisement is an urgent problem to be solved.

The key for realizing accurate advertisement pushing is to accurately classify the advertisements and label the advertisements by using labels, the existing advertisement label classification method on the market generally adopts manual examination and labeling, wastes time and labor, and the manual examination and labeling can not ensure real time along with the explosive growth of advertisement video streams; in addition, the subjectivity of manual review is too strong, and for some advertisement video streams with multi-element characteristics, a manually labeled tag system is deficient, so that accurate advertisement recommendation cannot be performed.

Disclosure of Invention

In view of the above problems, an object of the present invention is to provide a method, a system, and a device for classifying advertisement tags based on video understanding, which can analyze advertisement video content in multiple dimensions, understand video semantics, automatically classify and tag, greatly save manual review efficiency, and save cost.

In order to achieve the purpose, the invention is realized by the following technical scheme: a video understanding-based advertisement label classification method comprises the following steps:

s1: labeling the advertisement video to finish the preparation of a data set;

s2: adopting Resnet-50 as a backbone network, inserting the time sequence conversion model into the Resnet-50 network to construct a preview frequency classification model, and then training the preview frequency classification model by using a data set to generate a video classification model;

s3: and carrying out classification prediction on the advertisement videos to be subjected to label classification by using a video classification model to obtain content classification results of the advertisement videos, and carrying out label classification.

Further, the tag labeling categories include: an action class tag, a scene class tag, and an object class tag.

Further, the data set includes: clipped datasets and non-clipped datasets; label category labeling is directly carried out on the clipped data set; the non-clipped dataset is tagged with tag categories by time period.

Further, before S2, the step further includes:

and performing data enhancement on the data set, wherein the data enhancement specifically comprises geometric transformation enhancement and color transformation enhancement.

Further, the geometric transformation enhancement comprises: flipping, rotating, cropping, distorting, and scaling each frame of the ad video.

Further, the color transform enhancement comprises: noise transformation, blur transformation, and color transformation for each frame of the advertisement video.

Correspondingly, the invention also discloses an advertisement label classification system based on video understanding, which comprises the following steps:

the data set preparation unit is used for labeling the advertisement video to complete the preparation of the data set;

the model training unit is used for adopting Resnet-50 as a backbone network, inserting the time sequence conversion model into the Resnet-50 network to construct a pre-video classification model, and then training the pre-video classification model by using a data set to generate a video classification model;

and the model reasoning unit is used for carrying out classification prediction on the advertisement video to be subjected to label classification by using the video classification model to obtain a content classification result of the advertisement video and carry out label classification.

Further, still include: and the data set enhancement unit is used for carrying out geometric transformation enhancement and color transformation enhancement on the data set.

Correspondingly, the invention also discloses advertisement label classification equipment based on video understanding, which comprises the following components:

a memory for storing a computer program;

a processor for implementing the video understanding-based advertisement tag classification method steps as described in any one of the above when the computer program is executed.

Compared with the prior art, the invention has the beneficial effects that:

1. according to the invention, the self-defined label labeling can be carried out on the advertisement video according to the actual application condition, and the data set for classifying the advertisement video is generated.

2. The invention classifies the advertisement videos by using a Temporal Shift Module (TSM) and a 2D neural network, thereby not only ensuring the accuracy, but also ensuring the speed by adding the Temporal Shift Module and realizing the intelligent classification of the advertisement videos.

3. The invention adopts the video understanding technology based on the content, can analyze the video content in multiple dimensions, understand the video semantics and automatically classify and tag, greatly saves the manual review efficiency and the cost, and has important guiding significance for intelligent advertisement recommendation.

Therefore, compared with the prior art, the invention has prominent substantive features and remarkable progress, and the beneficial effects of the implementation are also obvious.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a flow chart of the method of the present invention.

Fig. 2 is a system block diagram of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made with reference to the accompanying drawings.

A video understanding-based advertisement tag classification method as shown in fig. 1 includes the following steps:

s101: and labeling the advertisement video to finish the preparation of the data set.

Wherein, the label labeling category comprises: action class tags (e.g., tennis), scene class tags (e.g., beach), and object class tags (e.g., car). Various custom tags can also be labeled according to actual applications. The data set includes: clipped datasets and non-clipped datasets; the clipped data set may be directly labeled categories, but for non-clipped data sets it may be necessary to label different label categories for time periods.

S102: data enhancement is performed on the data set.

Including specifically geometric transformation enhancement and color transformation enhancement. Wherein the geometric transformation enhancement comprises: flipping, rotating, cropping, distorting, and scaling each frame of the ad video. The color transform enhancement includes: noise transformation, blur transformation, and color transformation for each frame of the advertisement video.

S103: and adopting Resnet-50 as a backbone network, inserting the time conversion model into the Resnet-50 network to construct a preview frequency classification model, and then training the preview frequency classification model by using a data set to generate a video classification model.

S104: and carrying out classification prediction on the advertisement videos to be subjected to label classification by using a video classification model to obtain content classification results of the advertisement videos, and carrying out label classification.

Correspondingly, as shown in fig. 2, the present invention also discloses an advertisement tag classification system based on video understanding, which includes:

and the data set preparation unit is used for labeling the advertisement video to complete the preparation of the data set.

And the data set enhancement unit is used for carrying out geometric transformation enhancement and color transformation enhancement on the data set.

And the model training unit is used for adopting Resnet-50 as a backbone network, inserting the time sequence conversion model into the Resnet-50 network to construct a pre-video classification model, and then training the pre-video classification model by using a data set to generate a video classification model.

a memory for storing a computer program;

Those skilled in the art will readily appreciate that the techniques of the embodiments of the present invention may be implemented as software plus a required general purpose hardware platform. Based on such understanding, the technical solutions in the embodiments of the present invention may be embodied in the form of a software product, where the computer software product is stored in a storage medium, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and the like, and the storage medium can store program codes, and includes instructions for enabling a computer terminal (which may be a personal computer, a server, or a second terminal, a network terminal, and the like) to perform all or part of the steps of the method in the embodiments of the present invention. The same and similar parts in the various embodiments in this specification may be referred to each other. Especially, for the terminal embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and the relevant points can be referred to the description in the method embodiment.

In the embodiments provided by the present invention, it should be understood that the disclosed system, system and method can be implemented in other ways. For example, the above-described system embodiments are merely illustrative, and for example, the division of the units is only one logical functional division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, systems or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each module may exist alone physically, or two or more modules are integrated into one unit.

Similarly, each processing unit in the embodiments of the present invention may be integrated into one functional module, or each processing unit may exist physically, or two or more processing units are integrated into one functional module.

The invention is further described with reference to the accompanying drawings and specific embodiments. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention may be made by those skilled in the art after reading the teaching of the present invention, and these equivalents also fall within the scope of the present application.

Claims

1. A video understanding-based advertisement label classification method is characterized by comprising the following steps:

s1: labeling the advertisement video to finish the preparation of a data set;

s2: adopting Resnet-50 as a backbone network, inserting a time conversion model into the Resnet-50 network to construct a preview frequency classification model, and then training the preview frequency classification model by using a data set to generate a video classification model;

2. The video understanding-based advertisement tag classification method according to claim 1, wherein the tag labeling category comprises: an action class tag, a scene class tag, and an object class tag.

3. The video understanding-based advertisement tag classification method according to claim 1, wherein the data set includes: clipped datasets and non-clipped datasets; label category labeling is directly carried out on the clipped data set; the non-clipped dataset is tagged with tag categories by time period.

4. The method for classifying advertisement tags according to claim 1, wherein said step of S2 is preceded by the step of:

5. The video understanding-based advertisement tag classification method according to claim 4, wherein the geometric transformation enhancement comprises: flipping, rotating, cropping, distorting, and scaling each frame of the ad video.

6. The video understanding-based advertisement tag classification method according to claim 4, wherein the color transformation enhancement comprises: noise transformation, blur transformation, and color transformation for each frame of the advertisement video.

7. An advertisement tag classification system based on video understanding, comprising:

8. The video understanding-based advertisement tag classification system of claim 7, further comprising:

9. An advertisement tag classification apparatus based on video understanding, comprising:

a memory for storing a computer program;

a processor for implementing the video understanding-based advertisement tag classification method steps of any one of claims 1 to 6 when executing said computer program.