CN112291616B

CN112291616B - Video advertisement identification method, device, storage medium and equipment

Info

Publication number: CN112291616B
Application number: CN202010902794.6A
Authority: CN
Inventors: 朱永亮; 尹海沧; 马文闯; 刘利; 刘殿龙; 曹明阔; 熊浩
Original assignee: Potevio Peacetech Co ltd
Current assignee: Potevio Peacetech Co ltd
Priority date: 2020-09-01
Filing date: 2020-09-01
Publication date: 2023-01-06
Anticipated expiration: 2040-09-01
Also published as: CN112291616A

Abstract

This scheme disclosesA video advertisement identification method, apparatus, medium and device, the steps of the method comprising: performing feature extraction on an advertisement template image subset constructed according to an advertisement video template file and an image subset to be detected constructed according to a video file to be detected to obtain a first fingerprint feature vector matrix A and a second fingerprint feature vector matrix B; transpose matrix A of first fingerprint feature vector matrix A ^T Performing cosine similarity calculation with the second fingerprint feature vector matrix B to obtain a comparison matrix C; forming a data group with the row number being an arithmetic progression number row by using element bits which are larger than a first threshold value in the comparison matrix C; and under the condition that the data group with the column number of the arithmetic progression does not have arithmetic progression column fracture, determining the video advertisement in the video file to be detected based on the data group with the column number of the arithmetic progression column. The scheme reduces the fluctuation interference of an engineering application scene, enables the searching of the arithmetic progression to be more flexible, has higher and more stable recognition rate, and is more accurate to position the advertisement playing time.

Description

Video advertisement identification method, device, storage medium and equipment

Technical Field

The invention relates to the technical field of video processing. And more particularly, to an advertisement video identification method, apparatus and device.

Background

With the development of television videos, video advertisements are continuously introduced into video videos played daily. The traditional television station implants more advertisements intermittently through the played television videos, the playing of the advertisements can seriously affect the audio-visual experience of users, and the users can consume more time on the advertisements which are not interested.

In the traditional video advertisement identification method, audio and video MP4 files recorded 24 hours a day are decoded frame by frame, and one frame of image is taken as a subset of the image to be detected at a fixed integer M frame interval. And decoding a certain advertisement video file which is manually cut and verified in an advertisement material library frame by frame, and taking one frame of image as a template image subset from N frames at fixed intervals. And then based on image quality evaluation criteria such as SAD, PSNR, SSIM and the like, obtaining the corresponding relation of the most similar images. In an ideal situation, with the increment of the image frame number of the "image subset to be detected" (the first column of data in fig. 1), the image frame number of the "template image subset" (the fourth column of data in fig. 1) may form an "arithmetic difference number sequence", and the corresponding relationship between the images (the third column of data in fig. 1) is found according to the principle of the highest similarity between the images. If the arithmetic progression can be continuously prolonged, the start playing time and the end playing time of the video advertisement can be accurately acquired, the start playing time is the playing time of the image frame to be detected in the first data of the arithmetic progression, the time unit is millisecond, the end playing time is the playing time of the image frame to be detected in the last data of the arithmetic progression, the time unit is millisecond, based on the accurately acquired start playing time and end playing time of the video advertisement, whether the advertisement exists in the video MP4 file to be detected, and the start time and the end time (the time is accurate to millisecond) of each playing of the advertisement can be realized.

However, the advertisement identification result obtained by the method is not accurate enough, the missed identification is high, and the workload of manual review cannot be effectively reduced.

Disclosure of Invention

One object of the present solution is to provide a method, an apparatus, a storage medium, and a device for fast identification of a video advertisement segment that is repeatedly played.

Another object of the present solution is to provide a device and an apparatus for performing the above recognition method.

In order to achieve the purpose, the scheme is as follows:

in a first aspect, the present disclosure provides a video advertisement recognition method, including:

performing feature extraction on an advertisement template image subset constructed according to an advertisement video template file and an image subset to be detected constructed according to a video file to be detected to obtain a first fingerprint feature vector matrix A and a second fingerprint feature vector matrix B;

transpose matrix A of first fingerprint feature vector matrix A ^T Performing cosine similarity calculation with the second fingerprint feature vector matrix B to obtain a comparison matrix C;

forming a data group with the row number being an arithmetic progression number row by using element bits which are larger than a first threshold value in the comparison matrix C;

and under the condition that the data group with the column number of the arithmetic progression does not have the arithmetic progression column fracture, determining the video advertisement in the video file to be detected based on the data group with the column number of the arithmetic progression column.

In a preferred embodiment, the step of constructing the subset of images to be detected includes:

taking M frames as an extraction interval, and acquiring a to-be-processed image subset with the resolution of n from an originally recorded to-be-detected video file;

and intercepting each image in the image subset to be processed to obtain an image of a central m area in each image, and forming the image subset to be detected.

In a preferred embodiment, the constructing step of the image subset to be detected includes:

taking 12 frames as an extraction interval, and acquiring a to-be-processed image subset with the resolution of 360x288 from an originally recorded to-be-detected video file with the resolution of 720x 576;

and intercepting each image in the image subset to be processed, acquiring an image with the resolution of 224x224 in the middle area of each image, and forming the image subset to be detected.

In a preferred embodiment, the constructing of the subset of advertisement template images comprises:

and acquiring an advertisement template image subset from the advertisement video template file by taking the N frames as extraction intervals.

In a preferred example, the transpose matrix a of the first fingerprint feature vector matrix a is ^T The cosine similarity calculation is carried out with the second fingerprint feature vector matrix B, and the step of obtaining a comparison matrix C comprises the following steps:

the matrix A is a matrix with A rows and 512 columns, and the matrix B is a matrix with B rows and 512 columns;

transpose matrix A of first fingerprint feature vector matrix A ^T Multiplying the first column of the first fingerprint feature vector matrix B by each element bit in the first row of the second fingerprint feature vector matrix B, and then summing to obtain a first parameter value p11;

respectively aligning the first row of the second fingerprint characteristic vector matrix B and the transposed matrix A of the first fingerprint characteristic vector matrix A ^T Obtaining a second parameter value q11 and a third parameter value r11 by modulo calculation of the first column;

based on the first parameter value p11, the second parameter value q11 and the third parameter value r11, obtaining a cosine similarity value t11, where t11= p11/q11/r11, and using the value of t11 as the value of the first row and the first column element bit of the comparison matrix C;

based on the steps, the values of t12, t13 \8230, t 8230, t1a and tba are calculated in the same way and correspondingly used as the values of the element bits of the comparison matrix C.

In a preferred example, the step of forming the data group with the column number of the arithmetic progression column by using the element bits of the alignment matrix C larger than the first threshold includes:

comparing the value of the element bit with a first threshold value from the element bit of a first row and a first column in the comparison matrix C, searching the element bit of which the value is greater than the first threshold value, and taking the value of the element bit as a first expected value;

if the element bit of the ith row and the ith column finds a first expected value, finding a next expected value in the s column of the jth row, wherein j is larger than i; s = r + (j-i) × K; k = M/N, M is the frame interval of extracting the originally recorded to-be-detected video file image, and N is the frame interval of extracting the advertisement video template file image;

and forming a data set with the column numbers of the rows as arithmetic sequence numbers by using all expected values and the column numbers of the corresponding rows in the comparison matrix C, wherein the tolerance of the arithmetic sequence numbers is K.

In a preferred embodiment, if the element bit of the ith row and the r column finds the first expected value, the next expected value is not found in the s column of the jth row;

searching a next expected value in a preset range before and after the s column of the j row based on a local minimum searching algorithm;

if the next expected value is found in the preset range, continuously searching for the next expected value by taking the row where the expected value is located as a reference;

and if the next expected value is not found, jumping to the next row to continue searching until the next expected value is found.

In a preferred example, the predetermined range is within the region of s ± 20% K.

In a preferred embodiment, the method further comprises the steps of: and under the condition that the data group with the column number of the arithmetic progression column has arithmetic progression column fracture, processing the data fragment at the fracture part, continuing the data group with the column number of the arithmetic progression column, and determining the video advertisement in the video file to be detected based on the continued data group with the column number of the arithmetic progression column.

In a preferred example, in the case that there is an arithmetic progression fracture in the data set with the column number of the arithmetic progression, the step of processing the data fragment at the fracture site includes:

if the number of the data lines with the column numbers of the arithmetic progression is smaller than a third threshold value and the distance between the data lines with the column numbers of the arithmetic progression which appear again and the data lines with the column numbers of the arithmetic progression is larger than K lines, determining the data lines with the column numbers of the arithmetic progression which are smaller than the third threshold value as fragment segments, and discarding the fragment segments; and/or the presence of a gas in the atmosphere,

and judging the relation between the time length corresponding to the gap between two adjacent sets of row numbers with the arithmetic progression and the time total length of the advertisement template, if d is less than or equal to e × T, d is the time length corresponding to the gap between two adjacent segments, T is the time total length T of the advertisement template, and e is a preset percentage, integrating the data sets with the two sets of row numbers with the arithmetic progression into a data set.

In a preferred embodiment, the step of determining the video advertisement in the video file to be detected based on the data group with the column number being the arithmetic progression comprises:

determining image frames in the video file to be detected corresponding to the first expected value and the last expected value in the data group;

searching a display timestamp corresponding to the image frame based on the corresponding position serial numbers of the image frame corresponding to the first expected value and the image frame corresponding to the last expected value in the video file to be detected;

and determining the starting and stopping milliseconds of the image in the video file to be detected corresponding to the data group, namely the total milliseconds of the matched advertisements according to the time stamp.

In a preferred example, the step of determining the video advertisement in the video file to be detected based on the data set further includes:

comparing the matched total seconds of the advertisements with a second threshold value;

if the advertisement identification result is larger than the second threshold value, the advertisement identification result is confirmed to be effective; and if the advertisement identification result is smaller than the second threshold value, determining that the advertisement identification result is invalid.

In a second aspect, the present disclosure provides a video advertisement recognition apparatus, including:

the extraction unit is used for extracting the characteristics of the advertisement template image subset constructed according to the advertisement video template file and the image subset to be detected constructed according to the video file to be detected to obtain a first fingerprint characteristic vector matrix A and a second fingerprint characteristic vector matrix B;

a computing unit for transposing the first fingerprint feature vector matrix A into a matrix A ^T Performing cosine similarity calculation with the second fingerprint feature vector matrix B to obtain a comparison matrix C;

the data set building unit is used for forming a data set with an arithmetic sequence number of the row number of the element bits which are larger than a first threshold value in the comparison matrix C;

and the identification unit is used for determining the video advertisement in the video file to be detected based on the data group under the condition that the data group with the column number of the arithmetic progression column has no arithmetic progression column fracture.

In a third aspect, the present solution provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of the above-mentioned advertisement video identification method.

In a fourth aspect, the present solution provides an apparatus comprising: a memory, one or more processors; the memory is connected with the processor through a communication bus; the processor is configured to execute instructions in the memory; the memory medium stores instructions for executing the steps of the advertisement video identification method.

The scheme has the following beneficial effects:

the advertisement identification method based on the image fingerprints reduces fluctuation interference of engineering application scenes, enables searching of an arithmetic progression to be more flexible, is higher and stable in identification rate, and is more accurate in positioning of advertisement playing time.

Drawings

In order to illustrate the implementation of the solution more clearly, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the solution, and that other drawings may be derived from these drawings by a person skilled in the art without inventive effort.

FIG. 1 is a schematic diagram showing an example of a data group with equal difference column number according to the present embodiment

FIG. 2 is a schematic processing flow diagram illustrating a video advertisement recognition method according to the present embodiment;

FIG. 3 is a diagram illustrating a comparison of fingerprint frames;

FIG. 4 is a schematic diagram showing an example of an alignment matrix according to the present embodiment;

fig. 5 shows a schematic diagram of a video advertisement recognition device according to the present scheme;

FIG. 6 shows a schematic diagram of an apparatus according to the present solution;

fig. 7 shows a schematic diagram of the recognition accuracy of the advertisement recognition by using the method of the present invention.

Detailed Description

Embodiments of the present solution will be described in further detail below with reference to the accompanying drawings. It is clear that the described embodiments are only a part of the embodiments of the present solution, and not an exhaustive list of all embodiments. It should be noted that, in the present embodiment, features of the embodiment and the embodiment may be combined with each other without conflict.

In the traditional video advertisement identification method, audio and video MP4 files recorded 24 hours a day are decoded frame by frame, and M frames with fixed intervals are taken as a frame of image to be detected as a subset of the image to be detected. And (3) decoding a certain advertisement video file which is manually cut and verified in an advertisement material library frame by frame, and taking one frame of image as a template image subset at intervals of fixed integer N frames. And then based on image quality evaluation criteria, such as SAD, PSNR, SSIM and the like, obtaining the corresponding relation of the most similar images. Ideally, with the increment of the image frame number of the "image subset to be detected" (the first column of data in fig. 1), the image frame number of the "template image subset" (the fourth column of data in fig. 1) may form an "arithmetic number column", and the corresponding relationship between the images (the third column of data in fig. 1) is found according to the principle of the highest similarity between the images. If the 'arithmetic progression' can be continuously extended, the 'start playing time' and 'end playing time' of the video advertisement can be accurately acquired, the start playing time is the playing time of the image frame to be detected in the first data of the arithmetic progression, the time unit is millisecond, the end playing time is the playing time of the image frame to be detected in the last data of the arithmetic progression, the time unit is millisecond, based on the accurately acquired start playing time and end playing time of the video advertisement, whether the advertisement exists in the video MP4 file to be detected, and the start time and the end time (the time is accurate to millisecond) of each playing of the advertisement can be realized.

Through research and analysis of the existing television advertisement video identification method, the following interference factors exist in practical application:

(1) The input signal recorded by the acquisition card is a CVBS analog audio and video signal output by the set top box, and the gray value of the image pixel is slightly changed through operations such as analog-to-digital conversion, video coding and the like, so that the fingerprint characteristic vector of the image is slightly changed;

(2) The artificially cut advertisement template file requires the starting/ending position of the advertisement segment to be accurate to the image frame level, so that the originally recorded material file is required to be re-encoded after being decoded and positioned to the specific image frame position. In the stage of extracting fingerprints from the advertisement template, the encoding types (I frame and P frame) of the frame image are different from those of the original recorded material, so that the gray value of the pixel of the image is slightly changed, and the fingerprint feature vector of the image is slightly changed;

(3) The frame interval value M is not equal to N, so that the image content between the image subset to be detected and the template image subset is not consistent, and the fingerprint characteristic vectors of the images are obviously different;

(4) Due to the strategy of broadcasting the advertisement by the television station, after the advertisement template segment is broadcasted for a certain day, the local change of the advertisement picture content within a few seconds can be made, so that the novelty of the advertisement is increased, and the attention of consumers is attracted. Therefore, the content of the advertisement picture played at the later stage is greatly changed in the local image frame, and the fingerprint feature vector of the image is greatly changed;

(5) Due to the fact that the original recorded material MP4 files are different in resolution, the difference of fingerprint vectors of images is not prominent enough for two different advertisements with the same background;

(6) The start time and the end time of the advertisement are required to be accurate to the image frame, namely, millisecond level. However, the traditional method of directly calculating the image frame number (for example, the frame number obtained by using OpenCV decoding) and the video frame rate has a large error, and the accumulated error of the original material MP4 file recorded in one day reaches tens of seconds, which cannot meet the requirement.

Due to the existence of the interference factors from (1) to (6), large errors and fluctuation exist, so that the arithmetic progression is often forced to be interrupted, a large amount of fragmentation phenomena exist, the obtained advertisement identification result is not accurate enough, the identification omission is high, and the workload of manual review cannot be effectively reduced.

Therefore, the scheme aims to provide a video advertisement identification method, a fingerprint feature vector matrix of an image is extracted based on a VGG19 deep learning network model (VGG, visuAl Geometry Group), and then cosine similarity is used as a basis for comparing the fingerprint feature vector matrix to obtain a comparison matrix C, so that the calculation is simplified; and forming a data group with the column number of an arithmetic progression by comparing the element bits which are larger than a preset threshold value in the matrix C, and finally determining the video advertisement in the video file to be detected by using the data group with the column number of the arithmetic progression.

Hereinafter, a video advertisement recognition method proposed by the present solution is described in detail with reference to fig. 2 to 4. The method may comprise the steps of:

s1, constructing an image subset to be detected and an advertisement template image subset;

s2, performing feature extraction on a template image subset constructed according to the advertisement video template file and a to-be-detected image subset constructed according to the to-be-detected video file to obtain a first fingerprint feature vector matrix A and a second fingerprint feature vector matrix B;

s3, transpose matrix A of the first fingerprint characteristic vector matrix A ^T Performing cosine similarity calculation with the second fingerprint feature vector matrix B to obtain a comparison matrix C;

s4, forming a data group with the row number of an arithmetic progression row by using element bits which are larger than a first threshold value in the comparison matrix C;

s5, under the condition that no equal difference series fracture exists in the data set with the column number of the equal difference series, determining the video advertisement in the video file to be detected based on the data set with the column number of the equal difference series;

and S6, under the condition that the data group with the column number of the arithmetic progression is broken, processing the data fragments at the broken part, continuing the data group with the column number of the arithmetic progression, and determining the video advertisement in the video file to be detected based on the continued data group with the column number of the arithmetic progression.

When the scheme identifies the video advertisements, an image subset to be detected and an advertisement template image subset need to be prepared, namely the image subset to be detected and the advertisement template image subset are constructed in step S1 of the scheme.

The construction mode of the image subset to be detected can be as follows: taking M frames as an extraction interval, and acquiring a to-be-processed image subset with the resolution of n from an originally recorded to-be-detected video file; and intercepting each image, acquiring an image of an area m in each image area, and forming an image subset to be detected. In one embodiment, with 12 frames as an extraction interval, performing cubic convolution down-sampling processing on a to-be-processed video file with an original recording resolution of 720x576 to obtain a to-be-processed image subset with a resolution of 360x 288; and intercepting each image in the image subset to be processed, acquiring an image of a 224x224 area in the middle of each image, and forming the image subset to be detected. The image subset to be detected formed in the mode is used as input data, the fingerprint feature vectors of the obtained images are extracted, original image information can be utilized to a greater extent, the utilization rate of image texture content is increased from 12% (namely 224x 224/720/576) to 48% (namely 224x 224/360/288), different advertisement templates with the same picture background in a large range are effectively solved, and the difference of the fingerprint feature vectors of the images is more obvious.

The advertisement template image subset may be constructed in the following manner: and acquiring an advertisement template image subset from the advertisement video template file by taking the N frames as extraction intervals. In one embodiment, the subset of advertisement template images is obtained from an advertisement video template file manually cut from an advertisement material library at 2-frame extraction intervals.

In step S2, feature extraction may be performed on the advertisement template image subset and the to-be-detected image subset by using a VGG19 deep learning network model, so as to obtain a first fingerprint feature vector matrix a and a second fingerprint feature vector matrix B. The matrix A is X rows and A columns, and the matrix B is B rows and X columns. In one embodiment, the matrix a may be a matrix with a rows and 512 columns, and the matrix B may be a matrix with B rows and 512 columns.

In the scheme, a one-to-many mapping mode is adopted, cosine similarity calculation is carried out on the fingerprint characteristic vector of each line in the image subset to be detected and all fingerprint characteristic vectors corresponding to the advertisement template image subset, so that the image to be detected is matched in the advertisement template image, and the position sequence number of the matched frame fingerprint is found. The position sequence number is a sequence number corresponding to each image in the image subset, namely a sequence index number. In the above scheme, the original material image sequence is subjected to frame extraction at equal intervals by M frames, and the images in the obtained image subset have sequential serial numbers, for example, the images are numbered by 0,1, 2, and 3. The original recorded file is analyzed by using an industrial general tool software FFprobe, a corresponding table can be obtained, and for the image frames with the serial numbers of 0,1, 2 and 3, the Time Time0, time1, time2 and Time3 accurate to the millisecond level can be obtained. As the precision requirement of the user on time positioning needs to reach the millisecond level, the precision requirement of the millisecond level can be more accurately and reliably reached through the method.

In step S3, a transpose matrix a of the first fingerprint feature vector matrix a is formed ^T And performing cosine similarity calculation with the second fingerprint feature vector matrix B to obtain a comparison matrix C. Specifically, the transpose matrix A of the first fingerprint feature vector matrix A is firstly selected ^T Is multiplied by each element bit in the first row of the second fingerprint feature vector matrix B and then summed to obtain a first parameter value p ₁₁ (ii) a Respectively aligning the first row of the second fingerprint characteristic vector matrix B and the transposed matrix A of the first fingerprint characteristic vector matrix A ^T Obtaining a second parameter value q by modulo the first column of ₁₁ And a third parameter value r ₁₁ (ii) a Based on the first parameter value p ₁₁ A second parameter value q ₁₁ And a third parameter value r ₁₁ To calculate the cosine similarity value t ₁₁ ，t ₁₁ ＝p ₁₁ /q ₁₁ /r ₁₁ Will t ₁₁ The value of (b) is used as the value of the element bit of the first row and the first column of the comparison matrix C; continue to calculate t in the same way ₁₂ ，t ₁₃ ……t _1a Up to t _ba And corresponding to the value of each element bit of the comparison matrix C. In this way, the alignment matrix C is constructed. The element of the ith row and the jth column of the comparison matrix C is the cosine similarity between the fingerprint feature vector of the ith frame in the image subset to be detected and the fingerprint feature vector of the jth frame in the advertisement template image subset. In the scheme, the calculation formula of the cosine similarity is as follows: cos = (a × B)/(| a | B |), where a is a column of the matrix a and B is a row of the matrix B.

Wherein, t is as defined above ₁₁ ，t ₁₂ ，t ₁₃ ……t _1a Is the result of the calculation of cos θ, where the result of cos θ is oneMatrix, alignment matrix C was expanded as follows:

t ₁₁ ，t ₁₂ ，t ₁₃ ……t _1a

t ₂₁ ，t ₂₂ ，t ₂₃ ……t _2a

t ₃₁ ，t ₃₂ ，t ₃₃ ……t _3a

……

t _b1 ，t _b2 ，t _b3 ……t _ba

the expanded alignment matrix C, if represented by a column vector, is (t) ₁ ，t ₂ ，t ₃ ……t _a )

Thus, cos θ = (a × b)/(| a | b |) is a calculation formula, and (t | b |) is a calculation formula ₁ ，t ₂ ，t ₃ ……t _a ) Is a column vector representation of cos θ.

In the scheme, the numerical value of each element bit in the comparison matrix C is compared with a first threshold value, so that a data group with the row number of an arithmetic sequence is established. In step S4, the element bits larger than the first threshold in the comparison matrix C are combined into a data set with the row number being an arithmetic progression. Specifically, starting from the element bit in the first row and the first column in the comparison matrix C, comparing the value of the element bit with a first threshold, searching for the element bit with the value of the element bit larger than the first threshold, and taking the value of the element bit as a first expected value; if the element bit of the ith row and the ith column finds a first expected value, finding a next expected value in the s column of the jth row, wherein j is larger than i; s = r + (j-i) × K; k = M/N, M is the frame interval of extracting the originally recorded to-be-detected video file image, and N is the frame interval of extracting the advertisement video template file image; and forming a data set with the column numbers of the rows as arithmetic sequence numbers by using all expected values and the column numbers of the corresponding rows in the comparison matrix C, wherein the tolerance of the arithmetic sequence numbers is K.

Ideally, by the above steps, starting from a certain row of the comparison matrix C, a value greater than the first threshold can be found in a specific column of each subsequent row, and all values greater than the first threshold are combined into a data set with column numbers of arithmetic sequence, where K = M/N, M is a frame interval for extracting the original recorded image of the video file to be detected, and N is a frame interval for extracting the image of the advertisement video template file. However, since the arithmetic operations for the arithmetic operations are performed in a manner that the arithmetic operations are not performed, the arithmetic operations are not necessary.

Specifically, in step S4 of the present embodiment, if the element bit of the ith row and the nth column finds the first expected value, the next expected value is not found in the S column of the jth row; searching a next expected value in a preset range before and after the s column of the j row based on a local minimum searching algorithm; if the next expected value is found in the preset range, continuously searching the next expected value by taking the row where the expected value is located as a reference; and if the next expected value is not found, jumping to the next row to continue searching until the next expected value is found. In a preferred embodiment, the predetermined range may be within a region of s ± 20% K.

In the scheme, in order to ensure the accuracy of advertisement identification, it is required to ensure that the video advertisement is identified when the data group with the column number of the arithmetic progression is not broken. Therefore, if the constructed data set with the column number of the arithmetic progression does not break, step S5 can be directly executed, that is, the video advertisement in the video file to be detected is determined based on the data set with the column number of the arithmetic progression when the data set with the column number of the arithmetic progression does not break. If the data set with the column number of the arithmetic progression has the condition of arithmetic progression column fracture, step S6 needs to be further executed, namely, under the condition that the data set with the column number of the arithmetic progression column fracture, the data fragments at the fracture part are processed, the data set with the column number of the arithmetic progression column is continued, and the video advertisement in the video file to be detected is determined based on the continued data set with the column number of the arithmetic progression column.

Specifically, whether the data line is a fragmentation or not may be determined according to the length of the judgment fragment, for example, if the number of data lines with column numbers of arithmetic progression is smaller than a third threshold and is separated from the data line with column numbers of arithmetic progression appearing again by more than K lines, the data line with column numbers of arithmetic progression smaller than the third threshold is determined as a fragmentation fragment, and the fragmentation fragment is discarded. And if d is less than or equal to e × T, d is the time length corresponding to the gap between two adjacent segments, T is the total time length T of the advertisement template, and e is a preset percentage, integrating the data sets with the two groups of columns being the arithmetic progression into a data set.

In one embodiment, if the data row with the column number of the arithmetic sequence only continues for the first three rows, for example, the data row with the column number of the arithmetic sequence does not continuously appear after only 99840-99842, the data row with the column number of the arithmetic sequence continues to appear long away, for example, the first column of data with the column number of the arithmetic sequence appearing next may be 130000. Then the three lines 99840-99842 in front are "isolated" three very short data lines with arithmetic sequence number, and at this time, it is determined that these data lines may be a certain shot in the television program which is accidentally similar to a certain advertisement, and are not the target of the desired search, so the three isolated fragment segments are regarded as "fragments" which should be discarded.

In another embodiment, for example, for a tv station in a province, an advertisement template is usually extremely long in duration, such as more than 40 minutes for advertisements in healthcare products, medicines, medical instruments, etc. In the process of advertisement playing, for example, when meeting an hour, a television station inserts a hour-hour short video of 5 seconds. For another example, in order to avoid the advertisement identification of the AI artificial intelligence, the advertisement publisher temporarily adds a public service advertisement with a length of 10 seconds after playing the advertisement for 20 minutes each time, and then continues playing the advertisement. For the two application scenarios, the 5 second hour of the 40 minutes and the 10 second public service advertisement of the 20 minutes are both corresponding to the intermittent or transient disconnection phenomenon in the data row with long row number being the arithmetic progression. This intermittent or transient disconnection can be ignored at this point and the advertisement is considered to be still playing continuously. Therefore, in this case, two sets of data sets having the equal-difference number of column numbers can be integrated into one set of data set, so that the influence of the intermittent or transient disconnection phenomenon is not exerted. In a preferred embodiment, if the ratio of the time length corresponding to the disconnection phenomenon to the time length of the advertisement itself is not more than 10%, the advertisement can be considered to be still in continuous play.

In the scheme, after the data group with the column number of the arithmetic sequence is determined to have no fracture condition, the step of identifying the video advertisement can be continuously executed. Specifically, image frames in the video file to be detected corresponding to a first expected value and a last expected value in the data group are determined; searching a display time stamp (PTS) corresponding to the image frame based on the image frame corresponding to the first expected value and the position serial number corresponding to the image frame corresponding to the last expected value in the video file to be detected; and determining the starting and stopping seconds of the images in the video file to be detected corresponding to the data group, namely the total seconds of the matched advertisements according to the time stamp. The number of seconds to start and stop of the image can be accurate to the millisecond level.

In the scheme, in order to further improve the accuracy of advertisement identification, the total seconds of the obtained advertisement can be verified by setting a second threshold. Specifically, the matched total seconds of the advertisement is compared with a second threshold value; if the advertisement identification result is larger than the second threshold value, the advertisement identification result is confirmed to be effective; and if the advertisement identification result is smaller than the second threshold value, determining that the advertisement identification result is invalid.

By the method, interference caused by data fluctuation can be effectively reduced, and therefore the identification rate and stability of the advertisements in the video are improved.

As shown in fig. 5, the present solution further provides a video advertisement recognition apparatus 101 implemented in cooperation with the video advertisement recognition method, the apparatus including: a first image set constructing unit 102, a second image set constructing unit 103, an extracting unit 104, a calculating unit 105, a data constructing unit 106, an identifying unit 107 and a compensating unit 108.

When the video advertisement recognition device 101 works, the first image set construction unit 102 is used for constructing the image subset to be detected, and the second image set construction unit 103 is used for constructing the advertisement template image subset.Feature extraction is carried out on the advertisement template image subset and the image subset to be detected through an extraction unit 104 based on a VGG19 deep learning network model, and a first fingerprint feature vector matrix A and a second fingerprint feature vector matrix B are obtained. A transpose matrix A of the first fingerprint feature vector matrix A is calculated by a calculation unit 105 ^T And performing cosine similarity calculation with the second fingerprint feature vector matrix B to obtain a comparison matrix C. And then, a data group with the row number of an arithmetic progression row is formed by using the data group building unit to compare the element bits which are larger than the first threshold value in the matrix C. In case there is no break in the data set, the video advertisement in the video file to be detected is determined by the recognition unit 107 based on the data set. When the data set has a fracture, the data fragments at the fracture position need to be further processed by a compensation unit when the data set with the column number of the arithmetic progression has an arithmetic progression fracture, and the data set with the column number of the arithmetic progression is continued; after the compensation is completed, the identification unit 107 determines the video advertisement in the video file to be detected based on the data set of which the supplemented column number is the arithmetic progression column.

On the basis of the above data acquisition method embodiment, the present solution further provides a computer-readable storage medium. The computer-readable storage medium is a program product for implementing the above-described data acquisition method, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a device, such as a personal computer. However, the program product of the present invention is not limited in this respect, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as JAvA, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user computing device, partly on a remote computing device, or entirely on the remote computing device or server. In situations involving remote computing devices, the remote computing devices may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to external computing devices (e.g., through the internet using an internet service provider).

On the basis of the embodiment of the data acquisition method, the scheme further provides the electronic equipment. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the function and the use range of the embodiment of the present invention.

As shown in fig. 6, the electronic device 201 is in the form of a general purpose computing device. The components of the electronic device 201 may include, but are not limited to: at least one memory unit 202, at least one processing unit 203, a display unit 206 and a bus 204 for connecting different system components.

Wherein the storage unit 202 stores program code, which can be executed by the processing unit 203, so that the processing unit 203 executes the steps of the various exemplary embodiments described in the data acquisition method. For example, the processing unit 203 may perform the steps as shown in fig. 1.

The memory unit 202 may include volatile memory units such as a random access memory unit (RAM) and/or a cache memory unit, and may further include a read only memory unit (ROM).

The storage unit 202 may also include programs/utilities including program modules, including but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

The bus 204 may include a data bus, an address bus, and a control bus.

The electronic device 201 may also communicate with one or more external devices 207 (e.g., a keyboard, a pointing device, a bluetooth device, etc.), which may be through an input/output (I/O) interface 205. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 201, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

The present solution is further illustrated by way of example below.

In the embodiment, the original recorded MP4 file of the material to be detected is extracted with M =12 to obtain the image subset to be detected; for the artificially cropped advertisement MP4 file, the video advertisement identification method described in this example is described in detail by taking N =2 as an example of extracting a template image subset.

In this example, for an original image with 720x576 resolution, INTER _ CUBIC interpolation is used to obtain an image with 360x288 resolution, and then the middle 224x224 area is intercepted as input data of the VGG19 deep learning network model, so as to extract a fingerprint feature vector of the image. By the method, the original image information can be fully utilized, the utilization rate of the image texture content is improved from 12% (i.e. 224x 224/720/576) to 48% (i.e. 224x 224/360/288), different advertisement templates with the same picture background in a large range are effectively solved, and the difference of the fingerprint feature vectors of the image is more obvious.

And taking one frame from every 2 frames in the video advertisement template file, and taking a frames together to form an advertisement template image subset. And taking the advertisement template image subset as input data of the VGG19 deep learning network model, extracting the fingerprint characteristic vectors of the images, and obtaining a first fingerprint characteristic vector matrix A. Extracting fingerprint feature vectors for each frame, wherein each fingerprint feature vector is a data row containing 512 values, and finally obtaining a matrix of a × 512, which is called a matrix A; taking one frame from every M frames of a video to be detected, taking the integral multiple of the N for M, wherein M =6N is 12 for example, and assuming that b frames are taken from the video file together, forming an image subset to be detected. And taking the to-be-detected image subset as input data of the VGG19 deep learning network model, extracting the fingerprint characteristic vector of the image, and obtaining a second fingerprint characteristic vector matrix B. Extracting fingerprints for each frame, wherein each fingerprint is a data row containing 512 values, and finally obtaining a matrix B × 512, which is called a matrix B;

and matching the image to be detected in the advertisement template image based on the fingerprint characteristic vector, and searching the serial number of the matched frame fingerprint.

Firstly, the matrix A needs to be transposed to obtain the matrix A ^T Then A is ^T Is a matrix of 512 rows and a columns, then the matrices B and A ^T And performing cosine similarity operation to obtain a comparison matrix C of b rows and a columns. Specifically, a transpose matrix A of a first fingerprint feature vector matrix A is formed ^T Is multiplied by each element bit in the first row of the second fingerprint feature vector matrix B and then summed to obtain a first parameter valuep ₁₁ (ii) a Then, the first row of the matrix B is subjected to modulus calculation to obtain a second parameter value q ₁₁ Transpose matrix A ^T Is modulo to obtain a third parameter value r ₁₁ (ii) a Based on the first parameter value p ₁₁ A second parameter value q ₁₁ And a third parameter value r ₁₁ To calculate the cosine similarity value t ₁₁ ，t ₁₁ ＝p ₁₁ /q ₁₁ /r ₁₁ Will t ₁₁ The value of (b) is used as the value of the element bit of the first row and the first column of the comparison matrix C; continue to calculate t in the same way ₁₂ ，t ₁₃ ……t _1a Up to t _ba And corresponding to the value of each element bit of the comparison matrix C. Wherein, the cosine similarity formula is: cos = (a |) b/(| a | b |).

In this example, the comparison of the values of the element bits in the ith row and the jth column of the matrix C is the cosine similarity between the fingerprint feature vector of the ith frame of the video image to be detected and the fingerprint feature vector of the jth frame in the video image of the advertisement template. Searching a similarity value larger than a first threshold value T from a first row and a first column of a comparison matrix C to a b-th row and an a-th column of the comparison matrix C. In this example, the first threshold T is set to 0.85.

In an ideal case without influence of factors such as data toggling, a value greater than a first threshold T can be found at a specific column of each row, starting from a certain row in the alignment matrix C (where the first element bit greater than the first threshold is located), for example, the row is [100,15], [101,15+ (101-100) × K ], [102,15+ (102-100) × K ] \ 8230, and all column numbers are combined into an arithmetic number column with a tolerance of K, where K = M/N. However, in the actual recognition process, the data is toggled, which causes the series of the column number arithmetic progression to be interrupted, so that the arithmetic progression is required to be processed to realize the continuity of the arithmetic progression of the column number under the ideal condition. The specific treatment method is as follows:

if the first expected value is found in row i, column r, then the expected value should be found in column s of row j, where s = r + (j-i) K. If the expected position is not found, a search local minimum algorithm is adopted to expand the search range, namely, a desired value is searched within a certain range around s, and the search cost is minimum. In this example, one would look for within 20% (i.e., s ± 20% K), and if found, still consider the series of arithmetic numbers to continue; if not, jump to the next line to look for, and loop until the appropriate expected value is found.

Due to the interference of data shifting, the row number arithmetic series are broken, and at the moment, the broken part needs to be processed. The concrete method is as follows:

the scrap disposal and fracture repair comprises:

1) Particularly short fragments, such as fragments less than 3 in length, are discarded; specifically, if the data row with the column number of the arithmetic progression continues for only the first three rows, for example, the data row with the column number of the arithmetic progression does not appear continuously from the end of the data rows with the column number of the arithmetic progression only from 99840 to 99842, the data row with the column number of the arithmetic progression does not appear continuously for a long time, for example, the first column of data with the column number of the arithmetic progression may be 130000. Then the three rows 99840-99842 become three "isolated" rows of data with very short column numbers of arithmetic sequence, and it is assumed that these rows of data may be a shot in the television show that is accidentally similar to a certain advertisement and is not the target of the desired search, so the three isolated fragment segments are regarded as "fragments" that should be discarded.

2) The sequence of numbers between two adjacent segments, whose gaps correspond to a time length not exceeding the advertising template time length by a certain proportion, for example 10%, is then filled. In particular, for example, for a tv station in a province, an advertisement template usually has a very long duration, for example, more than 40 minutes for an advertisement period in terms of health products, medicines, medical instruments, etc. In the process of advertisement playing, for example, when meeting an hour, a television station inserts a hour-hour short video of 5 seconds. For another example, in order to avoid the advertisement identification of the AI artificial intelligence, the advertisement publisher temporarily adds a public service advertisement with a length of 10 seconds after playing the advertisement for 20 minutes each time, and then continues playing the advertisement. For the two application scenarios, the 5 second hour of the 40 minutes and the 10 second public service advertisement of the 20 minutes are both corresponding to the intermittent or transient disconnection phenomenon in the data row with long row number being the arithmetic progression. This intermittent or transient disconnection can be ignored at this point and the advertisement is considered to be still playing continuously. Therefore, in this case, two sets of data sets having the equal-difference number of column numbers can be integrated into one set of data set, so that the influence of the intermittent or transient disconnection phenomenon is not exerted. In a preferred embodiment, if the ratio of the time length corresponding to the disconnection phenomenon to the time length of the advertisement itself is not more than 10%, the advertisement can be considered to be still in continuous play.

3) Because different advertisement templates with similar contents exist, the matching of the segments possibly overlapped by the time period in the video to be detected can be determined according to the large time length, and the specific requirements of the user are met.

And finally, identifying the video advertisement according to the obtained data set of the column number arithmetic progression without fracture. Specifically, an image frame corresponding to a first expected value and an image frame corresponding to a last expected value in a column number arithmetic progression need to be determined, then a display time stamp PTS corresponding to the image frame is searched according to serial numbers corresponding to the two image frames, and starting and stopping seconds of images in a video file to be detected corresponding to a data set, namely the total number of seconds of matched advertisements, are determined according to the time stamps.

In this example, in order to improve the reliability of advertisement identification, the total seconds of the matched advertisement may be further compared with a second threshold; if the advertisement identification result is larger than the second threshold value, the advertisement identification result is confirmed to be effective; and if the advertisement identification result is smaller than the second threshold value, determining that the advertisement identification result is invalid. In this example, the second threshold value may be established based on accumulated empirical values.

In this example, in order to cooperate with the implementation of the video advertisement recognition method, an apparatus for implementing the advertisement recognition method is further provided, where the apparatus includes: a memory, one or more processors; the memory is connected with the processor through a communication bus; the processor is configured to execute instructions in the memory; the memory has stored therein instructions for carrying out the various steps of the above-described method. The processor can be a CPU chip with 8 cores and 16 threads, a main frequency of 2.1GHz, an L3 cache of 11M and a memory of 32 GB. The storage can be an 8TB mechanical hard disk. The device may also be equipped with a display equipped with a GTX 1080Ti standalone graphics card. Experiments prove that when the method is implemented by adopting the equipment, the CPU occupancy rate is only about 15%, the video card occupancy rate is only about 50%, and the maximum occupancy rate is not more than 85%. The achieved advertisement recognition effect is shown in table 1.

TABLE 1

It should be understood that the above-mentioned embodiments of the present invention are only examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention, and it will be obvious to those skilled in the art that other variations or modifications may be made on the basis of the above description, and all embodiments may not be exhaustive, and all obvious variations or modifications may be included within the scope of the present invention.

Claims

1. A method for identifying video advertisements, the method comprising the steps of:

forming a data group with the row number being an arithmetic progression row by using the element bits which are larger than the first threshold value in the comparison matrix C;

2. The method of claim 1, wherein the step of constructing the subset of images to be detected comprises:

intercepting each image in the image subset to be processed, acquiring an image with the resolution of 224x224 in the middle area of each image, and forming an image subset to be detected;

the construction step of the advertisement template image subset comprises the following steps:

3. The method of claim 1, wherein the transpose of the first fingerprint feature vector A is A ^T The cosine similarity calculation is carried out with the second fingerprint feature vector matrix B, and the step of obtaining a comparison matrix C comprises the following steps:

transpose matrix A of first fingerprint feature vector matrix A ^T Is multiplied by each element bit in the first row of the second fingerprint feature vector matrix B and then summed to obtain a first parameter value p ₁₁ ；

Respectively aligning the first row of the second fingerprint characteristic vector matrix B and the transposed matrix A of the first fingerprint characteristic vector matrix A ^T Obtaining a second parameter value q by modulo the first column of ₁₁ And a third parameter value r ₁₁ ；

Based on the first parameter value p ₁₁ A second parameter value q ₁₁ And a third parameter value r ₁₁ To calculate the cosine similarity value t ₁₁ ，t ₁₁ =p ₁₁ / q ₁₁ / r ₁₁ Will t ₁₁ The value of (a) is used as the value of the element bit of the first row and the first column of the comparison matrix C;

based on the steps, t is calculated in the same way _12， t ₁₃ ……t _1a Up to t _ba And corresponding to the value of each element bit of the comparison matrix C.

4. The method of claim 1, wherein the step of forming the data set with the row number of arithmetic progression rows by using the element bits larger than the first threshold in the comparison matrix C comprises:

comparing the value of the element bit with a first threshold value from the element bit of a first row and a first column in a comparison matrix C, searching the element bit of which the value is greater than the first threshold value, and taking the value of the element bit as a first expected value;

if the element bit of the ith row and the ith column finds a first expected value, finding a next expected value in the s column of the jth row, wherein j is larger than i; s = r + (j-i) K; k = M/N, M is the frame interval of extracting the originally recorded to-be-detected video file image, and N is the frame interval of extracting the advertisement video template file image;

and (4) forming a data set with the column numbers of the rows as arithmetic number rows by using all the expected values and the column numbers of the corresponding rows in the comparison matrix C, wherein the tolerance of the arithmetic number rows is K.

5. The method of claim 4, wherein if the first expected value is found in the element bit of the ith row and the r column, the next expected value is not found in the s column of the jth row;

searching a next expected value in a preset range before and after the s column of the j row based on a local minimum searching algorithm; wherein the predetermined range is within a region of s ± 20% K;

6. The method of claim 1, further comprising the steps of: processing the data fragments at the fracture part under the condition that the data group with the column number of the arithmetic progression is fractured, continuing the data group with the column number of the arithmetic progression, and determining the video advertisement in the video file to be detected based on the continued data group with the column number of the arithmetic progression;

wherein, when the data group with the column number of the arithmetic progression has the arithmetic progression fracture, the step of processing the data fragment at the fracture part and continuing the data group with the column number of the arithmetic progression comprises the following steps:

if the number of the data rows with the column numbers of the arithmetic progression is smaller than a third threshold value and the distance between the data rows with the column numbers of the arithmetic progression which appears again and is larger than K rows, determining the data rows with the column numbers of the arithmetic progression which are smaller than the third threshold value as fragment segments, and discarding the fragment segments; and/or the presence of a gas in the atmosphere,

7. The video advertisement identification method according to claim 1 or 6, wherein the step of determining the video advertisement in the video file to be detected based on the data group with the column number being an arithmetic progression comprises:

determining the starting and stopping milliseconds of the image in the video file to be detected corresponding to the data set, namely the total milliseconds of the matched advertisements according to the time stamp;

comparing the matched total number of seconds of the advertisement with a second threshold value;

8. An apparatus for identifying video advertisements, the apparatus comprising:

the data group building unit is used for forming a data group with the row number of an arithmetic progression row by the element bits which are larger than the first threshold value in the comparison matrix C;

9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.

10. A video advertisement recognition device, comprising: a memory, one or more processors; the memory is connected with the processor through a communication bus; the processor is configured to execute instructions in the memory; the memory has stored therein instructions for carrying out the steps of the method according to any one of claims 1 to 7.