CN114443901A

CN114443901A - Label analysis method and device, computer equipment and storage medium

Info

Publication number: CN114443901A
Application number: CN202111041764.1A
Authority: CN
Inventors: 王喆; 范凌
Original assignee: Tezign Shanghai Information Technology Co Ltd
Current assignee: Tezign Shanghai Information Technology Co Ltd
Priority date: 2021-09-06
Filing date: 2021-09-06
Publication date: 2022-05-06

Abstract

The application discloses a label analysis method and device, computer equipment and a storage medium. The method comprises the following steps: acquiring a video content label of the marked video data; analyzing effect data of the marked video data to obtain an effect data analysis result; aligning the effect data analysis result with the video content label to obtain combined data; and calling a pre-constructed label analysis model, and respectively inputting the combined data and the video content label into the label analysis model to determine the importance degree of the video content label. The video analysis accuracy can be improved.

Description

Label analysis method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for analyzing a tag, a computer device, and a storage medium.

Background

With the blowout growth of marketing content and the abundance of online channels, brands have greater and greater demands on video content, and the volume of video production is increasing and the demand is also increasing. In video information stream delivery, enterprises find that different agents and different kol (Key Opinion Leader) have different content production modes and greatly different effects. In the traditional mode, the video delivery effect is analyzed by recording the number of users entering a video projection area and the number of users watching videos, so that feedback optimization is carried out on video delivery. However, the traditional method cannot determine the video content elements with good video delivery effect, so that the video analysis is inaccurate.

Disclosure of Invention

A primary object of the present application is to provide a method, an apparatus, a computer device, and a storage medium for analyzing a tag, which can improve accuracy of video analysis.

In order to achieve the above object, according to one aspect of the present application, there is provided a method of analyzing a tag.

The label analysis method according to the application comprises the following steps:

acquiring a video content label of the marked video data;

analyzing effect data of the marked video data to obtain an effect data analysis result;

aligning the effect data analysis result with the video content label to obtain combined data;

and calling a pre-constructed label analysis model, respectively inputting the combined data and the video content label into the label analysis model, and determining the importance degree of the video content label.

Further, the label analysis model comprises a click rate prediction model and an attention model; inputting the combined data into the label analysis model, and determining the importance degree of the video content label, including:

determining the influence degree of each label in the combined data on the click rate through a click rate prediction model;

inputting the video content label into an attention model, and outputting attention distribution corresponding to the click rate of the marked video data;

and determining the importance degree of the video content label according to the attention distribution and the influence degree of each label in the combined data on the click rate.

Further, the inputting the video content tag into an attention model and outputting an attention distribution corresponding to the click rate of the marked video data includes:

inputting the video content tag into an attention model, and processing the combined data into token data through the attention model;

performing dimension reduction processing on each token data;

fusing the token data subjected to the dimensionality reduction to obtain a fusion sequence;

extracting the context characteristics of the fusion sequence, and calculating the attention distribution corresponding to the click rate of the marked video data according to the context characteristics.

Further, the analyzing the effect data of the marked video data to obtain an effect data analysis result includes:

calculating the click rate, consumption and conversion number of the marked video data;

and generating an effect data analysis result according to the click rate, the release consumption and the conversion number.

Further, the method further comprises:

flattening the video content label;

and counting the basic label information corresponding to the flattened video content label, and obtaining a label data analysis result according to the basic label information.

Further, the method further comprises:

classifying video content labels corresponding to the marked video data based on the effect data analysis result and the label data analysis result to obtain a plurality of label categories;

performing time sequence analysis on the video content tags corresponding to the tag types to obtain time periods corresponding to the tag types;

and generating a video analysis result according to the importance degree of the video content label, the time period corresponding to each label category and the effect data analysis result.

Further, the obtaining of the video content tag of the marked video data includes:

acquiring video data to be extracted, and extracting video characteristics of the video data to be extracted;

acquiring a pre-constructed video label system;

performing multi-dimensional processing on the video features to obtain target features;

and matching the target characteristics with preset labels in the video label system, and determining video content labels corresponding to the video data to be extracted.

In order to achieve the above object, according to another aspect of the present application, there is provided an analysis apparatus of a tag.

The tag analysis device according to the present application includes:

the label obtaining module is used for obtaining a video content label of the marked video data;

the effect analysis module is used for analyzing effect data of the marked video data to obtain an effect data analysis result;

the alignment module is used for aligning the effect data analysis result with the video content label to obtain combined data;

and the label analysis module is used for calling a pre-constructed label analysis model, inputting the combined data and the video content label into the label analysis model respectively, and determining the importance degree of the video content label.

A computer device comprising a memory and a processor, the memory storing a computer program operable on the processor, the processor implementing the steps in the various method embodiments described above when executing the computer program.

A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the respective method embodiment described above.

According to the label analysis method, the label analysis device, the computer equipment and the storage medium, the effect data analysis result and the video content labels corresponding to the marked video data are aligned, so that the importance degree of the labels can be analyzed subsequently, and the importance degree of the labels can be accurately predicted by analyzing the importance degree of the labels through a label analysis model constructed in advance.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, serve to provide a further understanding of the application and to enable other features, objects, and advantages of the application to be more apparent. The drawings and the description of the exemplary embodiments of the present application are for the purpose of explanation, and are not to be construed as limiting the present application. In the drawings:

FIG. 1 is a diagram of an application environment of a method of analyzing tags in one embodiment;

FIG. 2 is a schematic flow chart diagram illustrating a method for analyzing tags in one embodiment;

FIG. 3 is a schematic diagram of a tag structure of a video tag architecture in one embodiment;

FIG. 4 is a schematic diagram of single-stripe marked video data in one embodiment;

FIG. 5 is a diagram of a single tag datum in one embodiment;

FIG. 6 is a diagram illustrating the results of an analysis of effect data according to one embodiment;

FIG. 7 is a diagram illustrating combining data in one embodiment;

FIG. 8 is a flowchart illustrating the steps of inputting the combined data into a tag analysis model to determine the importance of the video content tag in one embodiment;

FIG. 9 is a scatter plot of location distributions corresponding to primary labels in an embodiment;

FIG. 10 is a distribution diagram of label categories in one embodiment;

fig. 11 is an illustrative diagram of a time period corresponding to each tag category when content is layered as a spoken broadcast and a subtitle in one embodiment;

FIG. 12 is a block diagram showing the structure of an analyzing apparatus for a tag in one embodiment;

FIG. 13 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, and not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort shall fall within the protection scope of the present application.

It should be noted that the terms "comprises" and "comprising," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

The analysis method of the label provided by the application can be applied to the application environment shown in fig. 1. Wherein the terminal 102 and the server 104 communicate via a network. The server 104 obtains the tag analysis request sent by the terminal 102, analyzes the tag analysis request to obtain a video content tag of the marked video data, performs effect data analysis on the marked video data to obtain an effect data analysis result, aligns the effect data analysis result with the video content tag to obtain combined data, calls a pre-constructed tag analysis model, inputs the combined data and the video content tag into the tag analysis model respectively, and determines the importance degree of the video content tag. The terminal 102 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices. The server 104 may be implemented as a stand-alone server or as a server cluster comprised of multiple servers.

In one embodiment, as shown in fig. 2, a method for analyzing a tag is provided, which is described by taking the method as an example for being applied to the server in fig. 1, and includes the following steps 202 to 208:

step 202, obtaining a video content label of the marked video data.

The marked video data refers to marked video data.

Specifically, obtaining the video content tag of the marked video data includes: acquiring video data to be extracted, and extracting video characteristics of the video data to be extracted; acquiring a pre-constructed video label system; performing multi-dimensional processing on the video characteristics to obtain target characteristics; and matching the target characteristics with preset labels in a video label system, and determining video content labels corresponding to the video data to be extracted. The video features of the video data to be extracted can be extracted through a feature extraction network, and the feature extraction network can be an inclusion-ResNet-v 2 convolutional neural network model, a C3D network and other networks for extracting the video features. Video features are used to represent content information of video images, and may include features in a temporal dimension as well as features in a spatial dimension.

The server stores a pre-constructed video tag system. The video tagging architecture is established by experts. The video tag hierarchy may include a content hierarchy, definition information of a primary tag, definition information of a secondary tag, and definition information of a tertiary tag. The label structure of the video label hierarchy may be a label tree, which may be as shown in fig. 3.

The multidimensional processing may include hierarchical processing as well as classification processing. Performing multi-dimensional processing on video features, comprising: and carrying out layered processing on the video features according to the video tag system to obtain layered features corresponding to the video data to be extracted, and carrying out classified processing on the layered features according to the video tag system to obtain target features.

Because the video data to be extracted comprises content hierarchies and video contents of different levels, in order to determine the content hierarchies corresponding to the data to be extracted, the video features can be processed in a layering manner to obtain layering features, and the layering labels corresponding to the video data to be extracted can be determined through the layering features. And then, classifying the layered features according to a video label system to obtain target features. By layering and classifying the video features, more accurate and detailed video features can be obtained, so that the corresponding video content labels are matched according to the target features, and the extraction accuracy of the video labels is improved.

And matching the target characteristics with preset labels in a pre-constructed video label system, and taking the labels successfully matched as video content labels corresponding to the video data to be extracted, so as to label the video data to be extracted according to the video content labels. The single marked video data may be as shown in fig. 4, and the video content label of the single marked video data may include a content hierarchy, a primary label, a secondary label, a tertiary label, and the like. And video content labels corresponding to the marked video data can be counted separately to obtain various label data including the appearance layer, the label, the secondary label and the original text. A single piece of tag data may be as shown in fig. 5.

Extracting the video features of the video data to be extracted, carrying out multi-dimensional processing on the video features to obtain target features, matching the target features with preset tags in a video tag system, and determining video content tags corresponding to the video data to be extracted. Through carrying out multi-dimensional processing on the video characteristics, more accurate and detailed video characteristics can be obtained, so that the corresponding video content tags are matched according to the target characteristics, and the extraction accuracy of the video tags is improved.

And 204, analyzing the effect data of the marked video data to obtain an effect data analysis result.

The effect Data Analysis may be referred to as effect Data EDA (Exploratory Data Analysis). The effect data analysis result can be obtained by calculating the click rate, the conversion number and the consumption of the video and drawing according to the click rate, the conversion number and the consumption. In this embodiment, a two-dimensional coordinate system diagram may be constructed by using the ID of the marked video data as the x-axis and the cost or duration as the y-axis. The ordering of each marked video data in the x-axis may be based on any of click rate, conversion count, or consumption. As shown in fig. 6, the order of the video data marked on the x-axis may be determined by taking the click rate as a sorting criterion, and a coordinate system diagram may be constructed by taking the consumption cost as the y-axis, so that a relationship between the click rate and the consumption cost of the video may be obtained.

And step 206, aligning the effect data analysis result with the video content label to obtain combined data.

The alignment method may be to splice a plurality of video content tags to obtain a tag column, where the column value is the time and duration of the tag, and is distinguished by "&" symbol, and a plurality of time points appearing in one video are distinguished by "and" time. As shown in fig. 7, columns are label names, and are formed by splicing an appearance layer, a first-level label, a second-level label and a third-level label, 291 label column values are the time and duration of appearance of the label, are distinguished by the "&" symbol, and are aligned with the result of the effect data analysis, and then, there are 19 pieces of data. How much the tag has a relationship with the change of the video CTR can be analyzed by combining the data.

And step 208, calling a pre-constructed label analysis model, and respectively inputting the combined data and the video content label into the label analysis model to determine the importance degree of the video content label.

A label analysis model is constructed in the server in advance, the label analysis model can be composed of a click rate prediction model and an attention model, and combined data and video content labels are respectively input into the label analysis model so as to output the importance degree of the video content labels.

In this embodiment, the effect data analysis result and the video content tag corresponding to the marked video data are aligned, so that the importance degree of the tag can be analyzed subsequently, and the importance degree of the tag can be accurately predicted by analyzing the importance degree of the tag through a pre-constructed tag analysis model.

In one embodiment, the tag analysis model includes a click-through rate prediction model and an attention model, and the step of inputting the combined data into the tag analysis model to determine the importance of the video content tag includes:

and step 802, determining the influence degree of each label in the combined data on the click rate through a click rate prediction model.

And step 804, inputting the video content label to the attention model, and outputting attention distribution corresponding to the click rate of the marked video data.

And step 806, determining the importance degree of the video content label according to the attention distribution and the influence degree of each label in the combined data on the click rate.

The label analysis model comprises a click rate prediction model and an attention model.

The server can input the combined data into the click rate prediction model, and determine the influence degree of each label in the combined data on the click rate. The click rate prediction model may be trained using a variety of regression models. For example, regression models may include KNeighborsUnif, KNeighborsDist, LightGBMXT, LightGBM, RandomForestMSE, CatBoost, ExtraTreesMSE, NeuraNetFastAI, XGboost, NeuraNetMXNet, LightGBMLarge, WeightedEnsemble _ L2. Specifically, the data features considered in the training process are: the number of times that the label appears in one video can be used for sorting the combined data according to the data characteristics, the click rate of the sorted data is predicted through a regression model, and a final model is determined according to the comparison between the predicted click rate and the actual click rate and serves as a click rate prediction model. And performing budget processing on the combined data through a click rate prediction model, and outputting the influence degree of each label on the click rate, wherein the influence degree comprises an importance score (importance) and a significance score (p _ value) of each label.

In the video content tag, the tag is composed of time sequence, and the characteristics influencing the video click rate CTR effect do not exist independently, and are probably the influence generated by the mode of combining several tags sequentially. Based on the above problem, the server can convert the time sequence label into a natural language task and output attention distribution by using an attention model. The attention model in this embodiment is based on an attention model under a label system, and the attention model can perform video CTR attribution processing.

In one embodiment, inputting a video content tag into an attention model, and outputting an attention distribution corresponding to a click rate of the marked video data includes: inputting the video content label into an attention model, and processing the combined data into token data through the attention model; performing dimension reduction processing on each token data; fusing the token data subjected to the dimensionality reduction to obtain a fusion sequence; and extracting the context characteristics of the fusion sequence, and calculating the attention distribution corresponding to the click rate of the marked video data according to the context characteristics.

Specifically, inputting video content tags, start times of the tags, and tag duration into the attention model, and processing the input data into token data, including: a label token, a label start time token and a label duration duty token. And performing dimension reduction processing on each token data to obtain a label imbedding, a label starting time imbedding and a label continuous occupation ratio imbedding corresponding to each label, and fusing the label imbedding, the label starting time imbedding and the label continuous occupation ratio imbedding corresponding to each label to obtain a label fusion characteristic corresponding to each label, thereby obtaining a fusion sequence. Extracting the context characteristics of the fusion sequence through a GRU network model, and obtaining the Attention distribution attribute weight of the video CTR score by using Attention Powing. And further determining the importance degree of all video content labels according to the attention distribution and the influence degree of each label in the combined data on the click rate. Important tags can be selected from all video content tags for video analysis, for example, 20 important tags can be selected as the important tags, as shown in the following table:

further, a feature may be added to each tag in the attention model: the weight is decreased along with the time, so that the attention tendency of a human to see the video really is simulated, and the effectiveness of the analysis of the importance degree of the label is improved.

In this embodiment, the click rate prediction model may accurately determine the influence of each tag in the combined data on the click rate, and the attention model may output the attention distribution corresponding to the click rate of the marked video data, and may obtain the influence degree of the combination of the tag sequences on the click rate, so as to determine the importance degree of the video content tag according to the attention distribution and the influence degree of each tag in the combined data on the click rate. Therefore, the influence of the label on the video delivery effect can be accurately predicted.

In one embodiment, performing effect data analysis on the marked video data to obtain an effect data analysis result includes: calculating the click rate, consumption and conversion number of the marked video data; and generating an effect data analysis result according to the click rate, the release consumption and the conversion number.

The effect Data Analysis may be referred to as effect Data EDA (Exploratory Data Analysis). The effect data analysis result can be obtained by calculating the click rate, the conversion number and the consumption of the video and drawing according to the click rate, the conversion number and the consumption. In this embodiment, a two-dimensional coordinate system diagram may be constructed by using the ID of the marked video data as the x-axis and the cost or duration as the y-axis. The ordering of each marked video data in the x-axis may be based on any of click rate, conversion count, or consumption.

In one embodiment, the method further comprises: and analyzing the label data of the marked video content label. Specifically, a video content label corresponding to the marked video data is flattened; and counting the basic label information corresponding to the flattened video content label, and obtaining a label data analysis result according to the basic label information.

The tag Data Analysis is called tag Data EDA (Exploratory Data Analysis). The label data analysis can comprise analysis of label basic information corresponding to the video content labels, including analysis of video time length distribution, analysis of the number of times each video content label is mentioned, analysis of position distribution of each video content label appearing in the video, and the like.

The video content label corresponding to the marked video data can be flattened. The video content label corresponding to each video is an Excel file, the label is a two-dimensional table and comprises a time dimension, and data are flattened for convenient analysis. The flattening processing takes the second-level label as the minimum label unit of data analysis, and takes the third-level label and the subsequent labels as the specific values of the minimum label unit.

For example, the video content tags corresponding to the marked video data may be as follows:

wherein, the content is layered and represents the position of the carrier where the label appears, such as the label appears in the broadcast, the picture, etc.; a first level label representing a root label of the label tree, which is subdivided into second level labels; a secondary label representing a leaf label in the label tree; tertiary labels, etc., representing specific values in the upper label. And flattening the video content label by taking the leaf label as a minimum label unit for data analysis and taking the third-level label as a specific value of the minimum label unit. The flattened data is as follows:

after the data flattening processing, one line is video data, and the columns are content labels. The cell is the label value and the time and duration of the label.

And analyzing the video time length distribution of the marked video data. Specifically, a video duration distribution histogram is constructed according to the marked video data, the average video duration is calculated according to the histogram, and the shortest video duration and the longest video duration are determined.

The server can also count the number of times each video content label is mentioned, and determine the labels which are mentioned more or less according to the counted number of times the labels are mentioned. For example, by counting the number of times the tags are mentioned, in the content hierarchical tags, more tags appear in the oracle and subtitle layer, which is 137 times and accounts for about 44%, followed by the picture presentation layer, which is 74 times and accounts for about 24%, and the fancy word and content creative layer has the least number of tags mentioned. In the level labels, labels of types of conversion stimulation, product display, efficacy description, brand information and product basic information are more mentioned, and labels of types of user psychology, preferential selling points, arrival calls and the like are less mentioned.

Furthermore, the position distribution of various labels such as a content hierarchical label, a first-level label, a second-level label and a third-level label appearing in the video is counted respectively. Specifically, the position distribution analysis is performed on each label separately, and the time point and the duration of each label appearing in the video in each label are obtained. And expressing the duration by using scatter points, and constructing a position scatter diagram corresponding to each label according to the time point of each label appearing in the video and the duration in each label, so as to determine the distribution characteristics of each label according to the position scatter diagram, wherein the distribution characteristics can comprise more labels with longer duration, fewer labels, the mentioned labels and the like at the beginning/middle/end. Exemplarily, as shown in fig. 9, a scatter diagram is distributed for the positions corresponding to the primary labels. From this graph can be derived: at the beginning, the conversion stimulation, the conversion purpose, the brand information, the product basic information and the text label are mentioned more and have longer duration; at the end of the period, the description of pain points, the description of efficacy, the recommendation of daemons, etc. are mentioned.

For another example, for a position scatter diagram corresponding to the content hierarchical class label, analyzing to obtain: at the beginning, the content tags mainly appear in a frame brand area layer, a paragraph layer, a multicast and subtitle layer and a picture display layer, wherein the content tags of the frame brand area layer and the paragraph layer are longer in duration, and the tags of the multicast and subtitle layers are shorter in duration. At the end of the session, tags appear in the drill and subtitle layers and the paragraph layer. The label occurrence rate is less around the middle 40-50 seconds, and whether the code is checked to have statistical errors or not is considered.

And after analyzing the basic information of the labels, such as analyzing the video time length distribution, analyzing the number of times each video content label is mentioned, analyzing the position distribution of each video content label in the video and the like, taking the analysis result as a label data analysis result.

In this embodiment, the video content tag corresponding to the marked video data is flattened, which is beneficial to the subsequent analysis of the tag basic information. The basic label information corresponding to the flattened video content labels is counted, label data analysis results are obtained according to the basic label information, and the conditions of the video content labels in the video, such as the occurrence time and the like, can be comprehensively and accurately analyzed through analysis of video time length distribution, analysis of the number of times that the video content labels are mentioned and analysis of the position distribution of the video content labels in the video.

In one embodiment, the method further comprises: and generating a video analysis result based on the effect data analysis result, the label data analysis result and the importance degree of the video content label. Specifically, based on the effect data analysis result and the label data analysis result, classifying video content labels corresponding to the marked video data to obtain a plurality of label categories; performing time sequence analysis on the video content tags corresponding to the tag categories to obtain time periods corresponding to the tag categories; and generating a video analysis result according to the importance degree of the video content label, the time period corresponding to each label category and the effect data analysis result.

And analyzing the quality of the label, the time of occurrence of the good/bad label and the importance degree of the label based on the effect data analysis result and the label data analysis result, and generating a target analysis result according to the analysis data, wherein the target analysis result can represent the relationship between the video content label and the delivery effect.

The number of hits of the tag, the number of rising/falling hits, and the tag ctr (Click-Through-Rate) can be calculated based on the result of the effect data analysis and the result of the tag data analysis. Where the hit number represents the number of times a tag appears in the video, the up/down hit number represents the number of times a tag appears in the video ctr up/down time period, and the tag ctr represents the weighted average ctr of all videos containing the tag. Therefore, video content labels corresponding to the marked video data can be classified according to the number of label hits, the number of rising/falling hits and the label ctr, and a plurality of label categories are obtained. Furthermore, the tag ctr is higher than the ctr median of all tags, and the tag with the rising number of hits larger than the falling number of hits is classified into A class. The tag ctr is lower than ctr median of all tags, the rising number of hits is greater than the falling number of hits, the tag ctr is divided into B classes, the tag ctr is higher than ctr median of all tags, the rising number of hits is less than the falling number of hits, the tag ctr is divided into C classes, the tag ctr is lower than ctr median of all tags, the rising number of hits is less than the falling number of hits, the tag ctr is divided into D classes, 4 classes of tags are obtained, the A class tags are the best, and the D class is the worst. A distribution graph of the label categories may be as shown in fig. 10. Class a tags may include mouth-cast and caption-fragrance-nice (no aspect) -nice, mouth-cast and caption-conversion purpose-guide purchase-buy immediately, class B tags may include mouth-cast and caption-efficacy description-fluffy-natural fluffy, class C tags may include mouth-cast and caption-efficacy description-fluffy-2 times fluffy, and class D tags may include mouth-cast and caption-efficacy description-fluffy.

The server then performs time sequence analysis on the video content tags corresponding to each tag category, specifically, checks the time sequence data of the tag appearance of the four areas of the ABCD corresponding to each content hierarchy, and obtains the better tag (AB) position of each content hierarchy. For example, when the content is layered as a multicast and a subtitle, a schematic diagram of a time period corresponding to each tag category may be as shown in fig. 11. By performing time sequence analysis on the video content tags corresponding to each tag type, better time for the tags to appear can be obtained.

And further generating a video analysis result according to the importance degree of the video content label, the time period corresponding to each label category and the effect data analysis result.

In this embodiment, based on the effect data analysis result and the tag data analysis result, the video content tags corresponding to the marked video data are classified to obtain multiple tag categories, so that the quality of the tags can be determined. And performing time sequence analysis on the video content tags corresponding to the tag types to obtain time periods corresponding to the tag types, so that the time when the good or bad tags appear can be determined. And analyzing the importance degree of the video content label corresponding to the marked video data based on the effect data analysis result, generating a video analysis result according to the importance degree of the video content label, the time period corresponding to each label category and the effect data analysis result, and obtaining the influence degree of the label on the putting effect.

In one embodiment, the generating the video analysis result according to the importance degree of the video content tag, the time period corresponding to each tag category, and the effect data analysis result may include: and taking the importance degree of the video content label, the time period corresponding to each label category and the effect data analysis result as a target analysis result, and extracting key information corresponding to the video data to be extracted according to the target analysis result to obtain the insight of each piece of video data. For example, instight for each piece of video data may include: instight 1: the user is severely lost after 3s, the first 3s and the first 10 seconds being the golden period of the video presentation. Specifically, the difference between the high click rate and the low click rate is whether the key data are densely displayed within 10 seconds, and the aspect of product display brand information should be displayed in the first 3 seconds. Origin 2: all video effects can climb to a peak within 3 seconds, the content is integrally promoted within 3 seconds before optimization, and the user loss optimization within 3-30 seconds is also an effective direction. Insight 3: video durations that are too long tend to decrease conversion rates, with 20-30 seconds being a more reasonable video duration. Specifically, the effect of the mixed cut video of 20-30 seconds is more stable than that of the longer mouth-cut video.

Further, the ranking information of the preset video content type can be counted, which includes: average click rate ranking, average conversion rate ranking, comprehensive index ranking, existing video material quantity ranking, click rate variance ranking and conversion rate variance ranking. The comprehensive index calculation formula is as follows: 60% click rate + 30% conversion-10% average click cost. The preset video content types may include a scenario, a dactylogram, a mixed cut, and a singleton-star, etc. And determining a video delivery strategy according to the statistical ranking information of the preset video content types and the insight of each piece of video data. For example, in the aspect, the conversion rate tends to decrease if the video duration is too long, and 20 to 30 seconds are more reasonable video durations, so that the mixed-cut video with 20 to 30 seconds can be adopted more, and the effect is more stable than the longer-cut video.

Summarizing the instight of the plurality of pieces of video data to obtain a video analysis result. For example, the summary table may be as follows:

in this embodiment, the key information corresponding to the video data to be analyzed is determined in the importance degree of the video content tag, the time period corresponding to each tag category, and the result data analysis result, so as to generate a video analysis result. The accuracy of video analysis can be improved, and the content label with a good video delivery effect and the production mode of video content can be quickly determined.

It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than presented herein.

In one embodiment, as shown in fig. 12, there is provided a tag analysis apparatus including: a tag acquisition module 1202, an effect analysis module 1204, an alignment module 1206, and a tag analysis module 1208, wherein:

the tag obtaining module 1202 is configured to obtain a video content tag of the marked video data.

And the effect analysis module 1204 is configured to perform effect data analysis on the marked video data to obtain an effect data analysis result.

An aligning module 1206, configured to align the effect data analysis result with the video content tag to obtain combined data.

The tag analysis module 1208 is configured to invoke a pre-constructed tag analysis model, and input the combined data and the video content tag into the tag analysis model respectively to determine the importance degree of the video content tag.

In one embodiment, the tag analysis model includes a click-through rate prediction model and an attention model; the label analysis module 1208 is further configured to determine, through the click rate prediction model, the degree of influence of each label in the combined data on the click rate; inputting the video content label into an attention model, and outputting attention distribution corresponding to the click rate of the marked video data; and determining the importance degree of the video content label according to the attention distribution and the influence degree of each label in the combined data on the click rate.

In one embodiment, the tag analysis module 1208 is further configured to input the video content tag into the attention model, and process the combined data into token data through the attention model; performing dimension reduction processing on each token data; fusing the token data subjected to the dimensionality reduction to obtain a fusion sequence; and extracting the context characteristics of the fusion sequence, and calculating the attention distribution corresponding to the click rate of the marked video data according to the context characteristics.

In one embodiment, the effect analysis module 1204 is further configured to calculate click through rate, consumption, and conversion number of the marked video data; and generating an effect data analysis result according to the click rate, the release consumption and the conversion number.

In one embodiment, the above apparatus further comprises: the label analysis module is used for flattening the video content label; and counting the basic label information corresponding to the flattened video content label, and obtaining a label data analysis result according to the basic label information.

In one embodiment, the above apparatus further comprises: the video analysis module is used for classifying video content labels corresponding to the marked video data based on the effect data analysis result and the label data analysis result to obtain a plurality of label categories; performing time sequence analysis on the video content tags corresponding to the tag categories to obtain time periods corresponding to the tag categories; and generating a video analysis result according to the importance degree of the video content label, the time period corresponding to each label type and the effect data analysis result.

In one embodiment, the tag obtaining module 1202 is further configured to obtain video data to be extracted, and extract video features of the video data to be extracted; acquiring a pre-constructed video label system; performing multi-dimensional processing on the video characteristics to obtain target characteristics; and matching the target characteristics with preset labels in a video label system, and determining video content labels corresponding to the video data to be extracted.

For the specific definition of the analysis means of the tag, reference may be made to the above definition of the analysis method of the tag, which is not described herein again. The various modules in the analysis means of the tag described above may be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, and the internal structure thereof may be as shown in fig. 13. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a nonvolatile storage medium, an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used to store data of an analysis method of a tag. The network interface of the computer device is used for connecting and communicating with an external terminal through a network. The computer program is executed by a processor to implement a method of analyzing a tag.

Those skilled in the art will appreciate that the architecture shown in fig. 13 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory storing a computer program and a processor implementing the steps of the various embodiments described above when the computer program is executed by the processor.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the respective embodiments described above.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by hardware related to instructions of a computer program, which may be stored in a non-volatile computer-readable storage medium, and when executed, the computer program may include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, however, as long as there is no contradiction between the combinations of the technical features, the scope of the present description should be considered as being described in the present specification.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the protection scope of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method for analyzing a label, comprising:

acquiring a video content label of the marked video data;

and calling a pre-constructed label analysis model, and respectively inputting the combined data and the video content label into the label analysis model to determine the importance degree of the video content label.

2. The method of claim 1, wherein the tag analysis model comprises a click-through rate prediction model and an attention model; inputting the combined data into the label analysis model, and determining the importance degree of the video content label, including:

3. The method of claim 2, wherein inputting the video content tag into an attention model and outputting an attention distribution corresponding to a click-through rate of the tagged video data comprises:

performing dimension reduction processing on each token data;

4. The method of claim 1, wherein the performing the effect data analysis on the marked video data to obtain the effect data analysis result comprises:

5. The method of claim 1, further comprising:

flattening the video content label;

6. The method of claim 5, further comprising:

performing time sequence analysis on the video content tags corresponding to the tag categories to obtain time periods corresponding to the tag categories;

7. The method of any of claims 1 to 6, wherein obtaining the video content tag of the marked video data comprises:

acquiring a pre-constructed video label system;

carrying out multi-dimensional processing on the video characteristics to obtain target characteristics;

8. An apparatus for analyzing a label, the apparatus comprising:

9. A computer device comprising a memory and a processor, the memory storing a computer program operable on the processor, wherein the processor implements the steps of the method of any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.