CN117750059A

CN117750059A - Data element value evaluation method based on multi-mode media asset

Info

Publication number: CN117750059A
Application number: CN202311505079.9A
Authority: CN
Inventors: 卓越; 张玮; 魏小彬; 汪昊辰; 石乐芸; 唐志文; 沈闻佳; 张洁莹
Original assignee: Wasu Media & Network Co ltd; Wasu Media Holdings Co ltd
Current assignee: Wasu Media & Network Co ltd; Wasu Media Holdings Co ltd
Priority date: 2023-11-13
Filing date: 2023-11-13
Publication date: 2024-03-22

Abstract

S1, creating a program macroscopic value index model according to program basic data on a plurality of existing media platforms, wherein the program macroscopic value index comprises a revenue-generating value, a viewing value, an investment value, an influence value and a cost value; s2, integrating the program basic data of a plurality of media platforms, comprising: creating a qualitative label set containing all basic data of the current program, supplementing basic data with program pictures and video extraction information, cleaning the basic data of the program, matching the same programs of multiple platforms, and merging the supplementary basic data; s3, calculating a macroscopic value index according to the program basic data. The value evaluation method used by the invention constructs a value evaluation system with unified standards on different media platforms, and sets macroscopic value indexes with unified standards, so that the value evaluation results of different platforms can be evaluated and analyzed uniformly and comprehensively.

Description

Data element value evaluation method based on multi-mode media asset

Technical Field

The invention belongs to the field of big data, and relates to a data element value evaluation method based on multi-mode media resources.

Background

Along with the diversification of media platform types, traditional broadcast television is impacted by internet television, internet platform, mobile phone television platform and the like, the acquisition mode of media content is diversified and portable, and enterprises can be familiar with industry development trend earlier through analysis of data element value evaluation of media resources.

The conventional value evaluation mode is not suitable for the current environment of various media platforms, and has a plurality of defects, namely, the programs which are basically the same exist on different media platforms, but the attribute information of the programs which are basically the same also have differences due to the differences of different platforms, even the information of the programs which are basically the same also have the defects on certain platforms, if the programs which are basically the same do not process any treatment, the situation that the programs are repeatedly evaluated can occur when the value evaluation is carried out, and the evaluation results also have differences, so that the evaluation work is complicated, and the problem of contradiction between the evaluation results exists.

Disclosure of Invention

The invention provides a data element value evaluation method based on multi-mode media resources in order to overcome the defects of the prior art.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

a data element value evaluation method based on multi-mode media assets comprises the following steps:

s1, creating a macroscopic value index model of a program according to basic data of the program on a plurality of existing media platforms, wherein macroscopic value indexes of the program comprise a revenue-generating value, a viewing value, an investment value, an influence value and a cost value;

s2, integrating the program basic data of a plurality of media platforms, comprising: creating a qualitative label set containing all basic data of the current program, supplementing basic data with program pictures and video extraction information, cleaning the basic data of the program, matching the same programs of multiple platforms, and merging the supplementary basic data;

s3, calculating a macroscopic value index according to the program basic data.

Further, in step S2, the multi-platform identical program matching includes the following steps:

s2.2.1, sorting the program sets P of the live broadcast service and the on-demand service in the media platform in the step S1;

s2.2.2, defining a subgroup to be matched for each program in the program set P, wherein the subgroup to be matched of each program comprises programs in the same online year range as the programs, the programs in the subgroup to be matched are called sub-programs, and a similarity value s between any program and any sub-program in the corresponding subgroup to be matched is calculated by using a similarity algorithm;

s2.2.3 introducing similarity contrast value s _t Introducing inequality s.gtoreq.s _t And judging whether any one of the sub programs is matched with the program to be the same program if the inequality is satisfied, judging all the programs to be the corresponding sub programs matched with each program, and marking the programs and the sub programs matched with the same program with corresponding similar marks.

Further, in step S2.2.3, a similarity comparison value s is introduced _t Further comprises: randomly selecting 50 programs, respectively at s _t ＝0.9、s _t ＝0.8、s _t ＝0.7、s _t S.gtoreq.s for 50 programs with =0.6 _t The inequality judgment of (a) judges whether the matched sub-programs are accurate or not for matching the same sub-programs of the programs, and obtains the respective matching accuracy under four conditionsSelecting matching accuracy->Highest s _t As s introduced in step S2.2.3 _t Wherein the ratio of the number of the programs, which are judged to be accurate in matching after the manual verification, of 50 programs to the total number of 50 is the matching accuracy->

Further, the union supplemental base data in step S2 is provided after step S2.2.3, and includes: and processing all programs in the program set P, specifically performing data union processing on the qualitative label sets in the two or more programs marked with the same class mark, generating the latest qualitative label set, and respectively replacing the latest qualitative label set with the old qualitative label sets in the two or more programs.

Further, after the step S2.2.1 and before the step S2.2.2, the method further includes performing data cleaning on the program set P, where cleaning includes variable normalization, missing value filling, redundant variable deletion, and outlier deletion.

Further, after the data cleaning is performed on the program set P and before step S2.2.2, the method further includes: and extracting keywords from the long text labels of the programs by using a TF-IDF algorithm to form program feature words, and supplementing the program feature words into a qualitative label set to form a new qualitative label set of the programs.

Further, in step S2, the program picture and video extraction information supplementary base data includes the steps of:

s2.1.1 acquiring a propaganda picture data set I= { I of the program ₁ ,i ₂ ,…,i _k And Video clip source Video, where i ₁ ,i ₂ ,…,i _k Files in all picture formats for the program in the platform program database;

s2.1.2 Chinese information massages= { W in extracted picture ₁ :Loc ₁ ,W ₂ :Loc ₂ ,…,W _k :Loc _k ' Loc therein ₁ ,Loc ₂ ,…,Loc _k Representing any position in the picture where Chinese exists, the position can be embodied as a circular or rectangular area, corresponding to W ₁ ,W ₂ ,…,W _k Representing the Chinese information extracted from the corresponding position area, forming labels of Chinese title, english title, year of showing, director, actor, drama, supervision and propaganda of the program, and supplementing the label data into a definite label set;

s2.1.3 extracting a keyframe picture set I' = { I in a Video source Video ₁ ,i ₂ ,…,i _k (i) ₁ ,i ₂ ,…,i _k Representing a plurality of pictures which are selected from a plurality of pictures which are split from a Video source Video according to a frame unit and contain Chinese information;

s2.1.4 extracting Chinese information Massages 'in the key frame picture set I', extracting keywords to form a label of the scenario introduction, and supplementing the data of the scenario introduction label into a specific label set.

Furthermore, the keyword extraction method adopts RNNs combined with the Attention mechanism.

Further, the media platform in step S1 includes a broadcast television platform, an interactive television platform, an internet television platform, a mobile phone television platform and an internet platform.

In summary, the invention has the following advantages:

the value evaluation method used by the invention constructs a value evaluation system with unified standards on different media platforms, and sets macroscopic value indexes with unified standards, so that the value evaluation results of different platforms can be evaluated and analyzed uniformly and comprehensively;

before macroscopic index calculation, integrating and unifying program basic data on all media platforms, and matching and marking the programs which are essentially the same on different media platforms, so that the same programs can be accurately identified when value evaluation is carried out, repeated evaluation work is avoided, and efficiency is improved;

and meanwhile, the basic data is supplemented and unified after the programs which are basically the same on different media platforms are matched, so that the method is suitable for a value evaluation system with unified standards, and the value evaluation result index data is more accurate, comprehensive and reasonable.

Drawings

FIG. 1 is a flow chart of the steps of the method for evaluating the value of a data element based on multi-modal media.

FIG. 2 is a block diagram of a program macroscopic value index model.

Fig. 3 is a schematic diagram of the hierarchy of the collection values created in fig. 2.

Fig. 4 is a schematic view of the hierarchical structure of the audience value in fig. 2.

Fig. 5 is a schematic view of the hierarchical structure of the investment value in fig. 2.

FIG. 6 is a schematic diagram of the hierarchy of impact values in FIG. 2.

Fig. 7 is a schematic diagram of the hierarchical structure of the cost value in fig. 2.

Detailed Description

Other advantages and effects of the present invention will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present invention with reference to specific examples. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention. It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict.

The embodiment discloses a data element value evaluation method based on multi-mode media assets, which comprises the following steps:

s1, constructing a program value evaluation system;

in a full media platform, program data surrounding the platform resources, a programming value model is designed to include five macroscopic indicators:

1) Creating a value: the amount data paid by the user due to the watching behavior is reflected, the lower level of the data comprises income indexes, and the income indexes comprise single-point income and package income;

2) Viewing value: the direct watching live broadcast and on-demand behavior data value is reflected from the user, the lower level of the direct watching live broadcast behavior data value comprises a live broadcast watching index and an on-demand watching index, the live broadcast watching index comprises a live broadcast watching breadth, a live broadcast watching depth, a live broadcast audience richness and a live broadcast watching proximity, and the on-demand watching index comprises an on-demand watching breadth, an on-demand watching depth, an on-demand audience richness and an on-demand watching proximity.

3) Investment value: considering three aspects of popularity, audience public praise and propaganda strength, wherein the popularity index comprises director popularity, actor popularity (comprising hundred-degree search index, hundred-degree media index and hundred-degree information index), drama popularity, theme popularity and IP popularity, the social public praise index comprises bean cotyledon score, cat eye score and hundred-degree score, and the propaganda strength index comprises broadcasting channel and broadcasting period;

4) Influence value: the influence brought by the program is mainly aimed at pulling the income of the company, and can be considered from visual factor indexes. The visual factors refer to the effects of pulling the viewing and income of other programs, such as the number of same IP programs, the number of same outgoing programs, the number of same actor programs, the number of same theme programs, the number of same times of programs and the number of same place programs, caused by the attribute relativity of the programs, such as themes, actors, directors and the like;

5) Cost value: the method refers to the value of a company sheet purchasing agreement, and the lower level is a sheet purchasing cost index which comprises a division type, an agreement cost, a division proportion, a provider, an asset type composition, an asset effective date, a copyright duration and an asset quantity.

In the evaluation of the above five macroscopic value indexes, the basic data in the platform program to be adopted includes, but is not limited to, one or more of live broadcast real-time viewing data, on-demand viewing data, program purchase protocol data, user portrait tags, program tags, user sample sets and program sample sets.

Wherein, the live broadcast real-time viewing data comprises, but is not limited to, a set top box number, a channel name, a program name, a live broadcast start time, a live broadcast end time, a program start time, a program end time and the like; the on-demand viewing data includes, but is not limited to, set top box number, on-demand columns, on-demand programs, on-demand start time, on-demand end time, etc.; program purchase agreement data includes, but is not limited to, agreement ID, program ID, agreement expiration time, purchase amount, etc.; the user portrait tag is mainly processed and abstracted from a series of basic data related to clients, such as a acceptance information table, a user total table, a product ordering detail table, a live broadcast detail table, a log detail table for ordering and playing back, a monthly account list, a broadband flow table, a call center log table, activity participation information and the like, and specifically comprises, but is not limited to, age, product line, preference of on-demand materials in different time periods, preference of on-demand actors, current value, live broadcast activity and the like; video tags include, but are not limited to, content type, genre major, genre minor, year of release, time of day of play (time of first run), director, actors, drama, character keywords, IP keywords, bean scores, value scores, chip length (album number), etc

S2, integrating basic data of multi-platform programs;

s2.1, creating a qualitative label set A for each program, wherein the qualitative label set comprises a set of all basic data of the program, supplementing the basic data in the qualitative label set based on multi-mode data, and supplementing the qualitative label set of any program comprises the following steps:

s2.1.1 acquiring a propaganda picture data set I= { I of the program ₁ ,i ₂ ,…,i _k And Video slice source Video, here i ₁ ,i ₂ ,…,i _k Files in all picture formats for the program in the platform program database;

s2.1.2 Chinese information massages= { W in extracted picture ₁ :Loc ₁ ,W ₂ :Loc ₂ ,…,W _k :Loc _k The picture processed by the step is usually a propaganda picture, a poster, and the like of the program, the text information on the picture is subjected to the process of simplifying, refining and beautifying, and the text information can be directly extracted for supplementing basic data, wherein Loc ₁ ,Loc ₂ ,…,Loc _k Representing any position in the picture where Chinese exists, the position can be embodied as a circular or rectangular area, corresponding to W ₁ ,W ₂ ,…,W _k Representing the Chinese information extracted in the corresponding location area), forms the labels of the Chinese title, english title, year of the showing, director, actor, drama, supervision and propaganda of the program, and supplements the label data into a definite label set;

s2.1.3 extracting a keyframe picture set I' = { I in a Video source Video ₁ ,i ₂ ,…,i _k }，i ₁ ,i ₂ ,…,i _k Representing a plurality of pictures which are split from a Video source Video according to a frame unit and contain Chinese information, wherein when the Chinese information in a plurality of frame pictures is the same, one frame picture with the earliest time point is classified into a set I';

s2.1.4 extracting Chinese information Massages ' in the key frame picture set I ', extracting keywords from the Massages ' by utilizing RNNs combined with an Attention mechanism to form a scenario profile label, and supplementing the data of the scenario profile label into a customization label set;

the application of RNNs combined with the Attention mechanism in step S2.1.4 to keyword extraction has been widely used in the prior art, and this embodiment is herein referred to adaptively, so that no excessive description will be made.

S2.2, multi-platform program fusion;

s2.2.1, a program set P of live broadcast service and on-demand service in a large platform of a sorting broadcast television platform, an interactive television platform, an internet television platform, a mobile phone television platform and an internet platform 5;

s2.2.2, cleaning the program set P, wherein the cleaning comprises variable normalization, missing value filling, redundant variable deletion, abnormal value deletion and the like;

s2.2.3 extracting keywords from long text labels of programs by using a TF-IDF algorithm to form program feature words, and supplementing the program feature words into a qualitative label set to form a new program qualitative label set A', wherein the long text labels are program information data commonly input by the programs when the programs are input into a platform, including but not limited to description and introduction of the programs, and the original input text is overlong and is not suitable for a format in the qualitative label set, so that the text is simplified by extracting the keywords;

s2.2.4 defining a subgroup to be matched for each program, wherein the subgroup to be matched of each program comprises programs in the same online year range as the programs, the online year range is preferably 3-5 years in the embodiment, the programs in the subgroup to be matched are called sub-programs, and a similarity value s between any program and any sub-program in the corresponding subgroup to the program is calculated by using a similarity algorithm;

s2.2.5 introducing similarity contrast value s _t For any program, s is greater than or equal to s _t If any sub-program and the program have the inequality condition, determining that the sub-program and the program are matched into the same program, randomly selecting 50 programs, and respectively at s _t ＝0.9、s _t ＝0.8、s _t ＝0.7、s _t Matching the same sub-programs for the programs under the condition of being 0.6, manually checking the matching result, judging whether the matched sub-programs are accurate or not, and obtaining the respective matching accuracy under the four conditionsWherein the ratio of the number of the programs, which are judged to be accurate in matching after the manual verification, of 50 programs to the total number of 50 is the matching accuracy +.>

S2.2.6 and accurately matching the four conditionsRate ofHighest s _t At s _t Under the condition that s is more than or equal to s is adopted for all programs _t The inequality of the program is judged, corresponding sub programs are matched for each program, and the programs and the sub programs matched into the same program are marked with corresponding similar marks;

s2.2.7, processing all programs in the program set P, specifically, performing data union processing on qualitative tag sets in two or more programs marked with the same class mark, generating a latest qualitative tag set a ", and respectively replacing old qualitative tag sets in the two or more programs with the latest qualitative tag set;

wherein, the similarity algorithm in step S2.2.4 comprises the following steps:

s2.2.4.1, subdividing according to release years, and establishing respective association models for programs in the adjacent years;

s2.2.4.2 selecting w attributes with remarkable characteristics from programs in the recent year, wherein the m characteristic value of the ith programExpressed as: /> Wherein the characteristic value is the associated information of the program, c represents the total program number of the media platform;

s2.2.4.3 measure similarity of feature m using C and e are respectively expressed as total program numbers corresponding to the two media platforms; program SP _i (i＝1,2, …, c) and program SP _j Similarity of the mth attribute of (j=1, 2, …, e)

S2.2.4.4 and program SP _i (i=1, 2, …, c) and program SP _j The similarity of (j=1, 2, …, e) is expressed asWherein k represents the weight of the mth feature;

s2.2.4.5 sorting the similarity to form a similarity matrix S _c-1,e-1 Representing the similarity of the c-th program of the media platform and the e-th program of another media platform;

s2.2.4.6, selecting programs with similarity values exceeding a threshold value, marking the programs with the same mark, and mutually supplementing data.

S2.3, because of the particularity of the broadcast and television service, a household customer has the viewing behaviors of a plurality of television set top boxes, and scientific weights are required to be formulated for a plurality of users under the customer name in order to extract the viewing characteristics of the household customer for corresponding programs, so that the household customer is converted into relevant basic labels of the customer;

the weighting rules are as follows: the more recent the time, the greater the behavioral weight; the larger the average active period, the larger the user weight; the longer the line the greater the user weight.

S2.4, normalizing data;

dividing a program set P into an important program set PU and a secondary program set SU by using the twenty-eight law, wherein the PU brings main profitable programs for companies;

continuing to divide the important program set PU into two parts and the secondary program set SU into two parts by the same method so as to finally divide the important program set PU into four sets with different grades;

different conversion formulas are formulated for different levels of sets and are divided into a linear y=ax+b and a nonlinear y=a+be ^c(x ^+d) The basic data are converted into values between 0 and 100, so that dimensional differences among data with different dimensions are eliminated, and the data are comparable.

S3, calculating five upper layer macroscopic indexes; noun description under unified inspection

a) Value of income

The macro value index evaluates associated revenue-generating attribute data of the program, wherein the revenue index of the program on the package ordering and single point service is x respectively ₁ 、x ₂ The calculation formula of the revenue generating value of the program is as follows:

H _p1 ＝m ₁ *x ₁ +m ₂ *x ₂

wherein, a weighting coefficient m is set ₁ ＝m ₂ The data of the revenue-generating value is data of a history period, and may be selected to have a duration of 3 months, 6 months, etc., in this embodiment, a duration of 6 months.

b) Value of audience

The traditional program comprises two broadcasting modes, namely live broadcasting and on-demand broadcasting. For the data of the two parts, firstly, based on the basic information of the user, the on-demand or live broadcast time length, the content and the like of the user are analyzed. Based on the above, the invention designs an index of 'audience rating' aiming at audience data which is most concerned by industry, wherein the audience data is divided into two dimensions: video-on-demand and live viewing. The audience value is calculated as follows:

H _p2 ＝m ₃ *x ₃ +m ₄ *x ₄

wherein x is ₃ And x ₄ Respectively representing the on-demand viewing value and the live viewing value; m is m ₃ And m ₄ Is a weight coefficient, wherein m ₃ And m ₄ Sum is 1, m ₃ And m ₄ The ratio of the direct viewing path to the on-demand viewing path is determined by the ratio of the traffic of the direct viewing path to the on-demand viewing path, m in this embodiment ₃ =0.8, indicating the weight of the video-on-demand, m ₄ =0.2, indicating the weight of live viewing.

The calculation of the audience rating comprises two steps, namely, the calculation of the on-demand audience rating and the calculation of the live audience rating.

A. The calculation rule of the on-demand audience value is as follows:

on-demand audience value = a ₁ * Jukebox proximity +alpha ₂ * Dibbling breadth +alpha ₃ * On-demand richness +alpha ₄ * Depth of on demand

Wherein the weight factor alpha ₁ 、α ₂ 、α ₃ 、α ₄ And determining by combining an expert scoring method with a hierarchical analysis method.

Calculating the order near degree, firstly calculating the time by stopping, calculating the time length of the program which is ordered for x times by taking the time by stopping as a starting point, calculating the interval time length of the average ordered program once, and according to the determined score of the time length, the shorter the time length is, the higher the score is, and the better the actual order condition of the program is reflected.

The calculation mode of the on-demand breadth is as follows:

wherein, history_vod _k Representing the historic kth month, wherein the number of on-demand users corresponding to the program in the month; current_vod represents the total number of on-demand users in the current month; epsilon _k For each weighting coefficient, the coefficient is gradually increased according to the time from far to near; in this embodiment, user data of the last half year is observed, thus taking E [1,6 ]]. The value 100 in the calculation of the order broadcasting breadth of the k months is calculated forward until the cut-off time, so as to convert the order broadcasting breadth into a percentage value, and the values of the order broadcasting near degree, the order broadcasting breadth, the order broadcasting richness and the order broadcasting depth are unified.

The on-demand richness is used for focusing on analyzing the user group condition of the on-demand program, further evaluating the on-demand adaptation group of the program, wherein the on-demand adaptation group comprises a user tag, the user tag comprises data dimensions of age layers, sexes, ordered packages and the like of users, and the on-demand richness is calculated as follows:

step a1: determining a user tag: label_code _i (i=1, 2, …, n'); wherein n' represents only one numerical value;

step a2: determining user tag label_code _i The following category values, including: label_value _ij (j=1, 2, …, m'); wherein m' represents only one numerical value;

step a3: determining a class value label value _ij Number of users N _ij And calculating the user duty ratio of the category value in the user tag:

step a4: according to the corresponding user duty ratio p of each class value _ij Calculating the label label_code of the user _i Standard deviation sigma of _i ；

Wherein,

step a5: calculating audience richness of the label:

step a6: taking historical data of k months as a statistical range, the program C on-demand richness is calculated as follows:

in step a2, user tag label_code _i Lower category value labelvalue _ij Category values under the user designation, such as age groups, may be 1-10 years old, 11-20 years old, 21-30 years old.

The calculation of the on-demand depth is as follows:

wherein x represents the x-th set of programs and y represents the total set number of programs; if the program is a movie or a television show of a single episode, etc., then1 is shown in the specification; alpha is only a calculation parameter, define +.>When (I)>Alpha of programs with different sets can be obtained; the watching time length refers to the watching time length when the user requests the program x-th set; the total duration of the piece refers to the total time length of the common y-set of the program; VP _n Weight indicating the category of the on-demand path for first-time program ordering, and corresponding weight VP of the on-demand path with higher on-demand freedom _n The larger; the on-demand path comprises searching on-demand, top page recommending on-demand, ranking list on-demand, dividing label on-demand and the like, wherein the weight of the searching on-demand is the largest, and the dividing label on-demand is on-demand according to the year label, the category label and the like of the program; n represents different users.

B. The calculation rule of the live broadcast audience value is as follows:

live viewing = beta ₁ * Live broadcast proximity +beta ₂ * Live broadcast breadth +β ₃ * Live abundance+β ₄ * Live depth

Wherein the weight factor beta ₁ 、β ₂ 、β ₃ 、β ₄ And determining by combining an expert scoring method with a hierarchical analysis method.

Calculating the live broadcast proximity, firstly determining the cut-off calculation time, calculating forward with the cut-off calculation time as a starting point, calculating the time length of the live broadcast of the program for y times, calculating the interval time length of the live broadcast once on average, and according to the determined score of the time length, the shorter the time length, the higher the score, and the better the actual live broadcast condition of the program.

The calculation mode of the live broadcast breadth is as follows:

wherein, history_live _k Representing the historic kth month, wherein the number of live broadcast users corresponding to the program in the month; current_live represents the total number of live users in the current month; epsilon' _k For each weighting coefficient, the coefficient is gradually increased according to the time from far to near; in this embodiment, user data of the last half year is observed, thus taking E [1,6 ]]The method comprises the steps of carrying out a first treatment on the surface of the The numerical value of 100 in the calculation of the live broadcast breadth is used for converting the live broadcast breadth into a percentile numerical value, and the numerical values of the live broadcast near-breadth, the live broadcast richness and the live broadcast depth are unified.

The live broadcast richness is used for focusing on analyzing the condition of a user group watching live broadcast of the program, further evaluating the live broadcast adaptation group of the program, wherein the live broadcast adaptation group comprises user tags, the user tags comprise data dimensions of age layers, sexes, ordered packages and the like of users, and the calculation steps of the live broadcast richness are as follows:

step b1: determination ofUser tag: label_code _i' (i' =1, 2, …, n "); wherein n' represents only one numerical value;

step b2: determining user tag label_code _i' The following category values, including: label_value _i'j' (j' =1, 2, …, m "); wherein m' represents only one numerical value;

step b3: determining a class value label value _i'j' Number of users N _i'j' And calculating the user duty ratio of the category value in the user tag:

step b4: according to the corresponding user duty ratio p of each class value _i'j' Calculating the label label_code of the user _i' Standard deviation sigma of _i' ；

Wherein the method comprises the steps of

Step b5: calculating audience richness of the label:

step b6: taking historical data of k months as a statistical range, the live broadcast richness of the program C is calculated as follows:

user tag label_code in step b2 _i' Lower category value labelvalue _i'j' 。

The live depth is used for evaluating the integrity of a user watching a live program, and is embodied in the aspects of watching time length, total time length of the program and the like, and the calculation of the live depth is as follows:

when the data such as the watching time length, the total time length and the like are counted, the data comprise the playback data of the program; x represents the x-th set of programs, and y represents the total set number of programs; alpha is only a calculated parameter, definitionWhen (I)>When the method is used, alpha of programs with different sets can be obtained; the watching duration refers to the time length of the user to watch the x-th set of the program through live broadcast in the corresponding time period; the total duration of this period refers to the length of the live time of the program x-th set in this period, if the x-th set is not broadcast in this period +.>Zero; if the program is a movie or a episode of a television show, a variety, etc., then +.>1 is shown in the specification; BT (BT) _q ，BD _q ，BC _q The influence factors of the broadcasting time period, the broadcasting date and the broadcasting channel are respectively represented; q represents different time periods of live broadcast; n represents different users.

And summarizing the on-demand viewing value and the live viewing value of the program C to obtain the viewing value.

Wherein, the expert scoring method and the analytic hierarchy process comprise the following steps:

1) The relative importance matrix A is scored by expert, as shown in the following table

a _ij Is the weight of dimension i relative to dimension j, a _ij ＝1/a _ji

2) Calculating a eigenvector matrix W of the matrix A as a weight, and further normalizing the matrix W:

3) Verifying whether the matrix A is consistent:

consistency ratioWhen CR is less than or equal to 0.1, the consistency is good.

c) Investment value

The investment value mainly analyzes the attention and effect condition of the program in society, wherein the evaluation mode of the investment value is as follows:

H _p3 ＝γ ₁ * Heat + gamma ₂ * Social public praise +gamma ₃ * Propaganda strength

Wherein gamma is ₁ 、γ ₂ 、γ ₃ The weight coefficient of each item is respectively expressed, and the numerical value is determined by the correlation between the formula sub-item and the investment value, and the weight coefficient is the experience value of the person skilled in the art.

The popularity comprises two dimensions of a program main creation team and an IP, wherein the main creation team can be divided into four aspects of directors, actors, drama and themes, and the popularity is calculated as follows:

heat = delta ₁ * Director heat +delta ₂ * Actor heat +delta ₃ * Drama heat +delta ₄ * Theme heat +delta ₅ * IP heat degree

Where δi represents the weighting coefficient of each item, which is determined by the correlation between each sub-item and the heat, in this embodiment, the coefficient of each item is set to 0.2. The corresponding heat value of each sub-item is obtained according to the searching times of the related term of the browser, and the higher the searching times are, the higher the attention of the user to the sub-item is, and the higher the corresponding heat value is.

The social public praise represents the scoring performance of the program in an open scoring channel, including network platform scoring, mobile phone voting, off-line questionnaires and the like; social public praise reflects the satisfaction of the viewer with the program and is the most direct feedback way.

The propaganda degree mainly comprises two aspects of broadcasting channel quantity and broadcasting period scoring, wherein the broadcasting period scoring refers to the average value of the time period scoring of the broadcasting of the programs in each channel, and the more the total number of audience and the larger the flow in the time period, the higher the time period scoring is. The propaganda force is calculated as follows:

propaganda strength=ω ₁ * Broadcast channel +omega ₂ * Broadcasting period

Wherein omega ₁ Omega, omega ₂ The weighting coefficients respectively representing the broadcasting channel and broadcasting period are determined by the correlation of the corresponding sub-items and the propaganda strength; omega in this embodiment ₁ And omega ₂ Both 0.5.

d) Influence value

The impact value is determined by the number of programs, which represents the program, i.e. the number of programs. The number of programs in the present embodiment includes the same director number of programs N ₁ Number of programs N of the same actor ₂ Number of programs N of the same year ₃ …, the impact value of the obtained program C is calculated as follows:

H _p4 ＝∑θ _t *N _t

wherein N is _t The dimension t of the same factor with the program C is evaluated, and the number of the obtained programs is represented; θ _t Is N _t Is uniformly divided according to the number of dimensionsDivide and Sigma theta _t ＝1。

e) Cost value

The cost value is used to analyze the value of the agreement to which the program belongs. Wherein the protocol attribute set of the content in the protocol is l=l ₁ ,l ₂ ,l ₃ ,…,l _n To represent; the content of the agreement can be divided into a purchased agreement Q' part and a to-be-purchased agreement Q part according to whether the purchase is completed or not, and the agreement attribute L is distributed in the purchased agreement and the purchasing-substitution agreement. Firstly, calculating and obtaining a protocol value component S 'according to the contribution obtained by a purchased protocol Q'; then calculating the similarity of protocol attributes between the purchased protocol and the protocol to be purchased according to the TF-IDF statistical method; the cost value of protocol Q is:

H _p5 ＝S' _t *t％

wherein t% is the maximum similarity between the protocol attribute in the purchased protocol and the protocol attribute in the to-be-purchased protocol, and is obtained according to the TF-IDF statistical method; s'. _t Protocol value of protocol attribute of the purchased protocol Q' part corresponding to the maximum similarity

In the step S5, the program value evaluation model includes a comprehensive value of the program and five macroscopic value indexes, where the five macroscopic value indexes are a revenue-generating value, a viewing value, an investment value, an influence value and a cost value, and actual data of the basic index according to each macroscopic value index. And when the comprehensive value of the program is calculated, determining the weight of the program value of each media platform to obtain a total evaluation value V reflecting the program value. The total program value can be expressed by the following formula:

v= Σweight coefficient V _P

The program value weight coefficient of each media platform is obtained according to the total number of users of each platform, the activity of the users and other factors, and in other embodiments, the program value weight coefficient can also be determined by data distribution, a hierarchical analysis method of dependent variable correlation, experience values, fixed values or other forms.

It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

Claims

1. The data element value evaluation method based on the multi-mode media asset is characterized by comprising the following steps of:

s2, integrating the program basic data of a plurality of media platforms, comprising: establishing qualitative tag sets containing all basic data of the current program, supplementing basic data with program pictures and video extraction information, matching the same programs of multiple platforms, and merging the supplementary basic data;

s3, calculating a macroscopic value index according to the program basic data.

2. The method for evaluating the value of a data element based on multi-modal media as claimed in claim 1, wherein in step S2, the multi-platform identical program matching includes the steps of:

s2.2.3 introducing similarity contrast value s _t Introducing inequality s.gtoreq.s _t Judging that the sub-program is matched with the program if any sub-program and the program have the inequality, judging that the sub-program and the program are matched into the same program, judging that all the programs are matched with the corresponding sub-program for each program, and combining the programs matched into the same programThe sub programs are marked with corresponding similar marks.

3. The method for evaluating the value of a data element based on multi-modal media as claimed in claim 2, wherein in step S2.2.3, a similarity comparison value s is introduced _t Further comprises: randomly selecting 50 programs, respectively at s _t ＝0.9、s _t ＝0.8、s _t ＝0.7、s _t S.gtoreq.s for 50 programs with =0.6 _t The inequality judgment of (a) judges whether the matched sub-programs are accurate or not for matching the same sub-programs of the programs, and obtains the respective matching accuracy under four conditionsSelecting matching accuracy->Highest s _t As s introduced in step S2.2.3 _t Wherein the ratio of the number of the programs, which are judged to be accurate in matching after the manual verification, of 50 programs to the total number of 50 is the matching accuracy->

4. A method for evaluating the value of data elements based on multi-modal media as claimed in claim 2 or 3, wherein the union supplementary base data in step S2 is provided after step S2.2.3, comprising:

and processing all programs in the program set P, specifically performing data union processing on the qualitative label sets in the two or more programs marked with the same class mark, generating the latest qualitative label set, and respectively replacing the latest qualitative label set with the old qualitative label sets in the two or more programs.

5. The method for evaluating the value of a data element based on multi-modal media as recited in claim 2, further comprising the step of performing data cleansing on the program set P after the step S2.2.1 and before the step S2.2.2, wherein cleansing includes variable normalization, missing value filling, redundant variable deletion, and outlier deletion.

6. The method for evaluating the value of a data element based on multi-modal media as recited in claim 5, wherein after the step of cleaning the data of the program set P, before the step of S2.2.2, further comprising:

and extracting keywords from the long text labels of the programs by using a TF-IDF algorithm to form program feature words, and supplementing the program feature words into a qualitative label set to form a new qualitative label set of the programs.

7. The method for evaluating the value of a data element based on multi-modal media as claimed in claim 1, wherein the step S2 of supplementing basic data with the program picture and video extraction information comprises the steps of:

8. The method for evaluating the value of a data element based on multi-modal media as recited in claim 7 wherein in step S2.1.4, the keyword extraction method employs RNNs combined with the Attention mechanism.

9. The method for evaluating the value of a data element based on multi-modal media as claimed in claim 1, wherein the media platform in step S1 includes a broadcast tv platform, an interactive tv platform, an internet tv platform, a mobile tv platform and an internet platform.