CN111709762A

CN111709762A - Information matching degree evaluation method, device, equipment and storage medium

Info

Publication number: CN111709762A
Application number: CN202010518970.6A
Authority: CN
Inventors: 陈世喆; 李滇博; 张奕
Original assignee: Shanghai Jilian Network Technology Co ltd
Current assignee: Shanghai Jilian Network Technology Co ltd
Priority date: 2020-06-09
Filing date: 2020-06-09
Publication date: 2020-09-25
Anticipated expiration: 2040-06-09
Also published as: CN111709762B

Abstract

The invention discloses an evaluation method, a device, equipment and a storage medium of information matching degree, wherein the method comprises the following steps: determining position information and confidence information of the target material inserted into the video based on a pre-trained visual feature model; determining at least one target video frame according to the position and the confidence coefficient, and extracting image content characteristics of the target video frame; obtaining a correlation value between a target material and a target video frame by processing the material content characteristics and the image content characteristics; and determining a matching degree value between the target material and the target video frame based on the relevance degree value, the position information and the confidence degree information. According to the technical scheme, the problems that the matching degree between the advertisement and the video is determined by manually counting the insertion position and the playing frequency of the advertisement when the matching degree between the advertisement and the video is determined at present, the efficiency is low and the cost is high are solved, and the matching degree between the advertisement and the video is conveniently and efficiently determined.

Description

Information matching degree evaluation method, device, equipment and storage medium

Technical Field

The embodiment of the invention relates to the technical field of information processing, in particular to a method, a device, equipment and a storage medium for evaluating information matching degree.

Background

Currently, inserting advertisements in videos has become an important means of marketing. After inserting the advertisement into the video, it is also necessary to detect whether the advertisement matches the video and whether a certain advertisement benefit can be generated.

In the prior art, whether the advertisement material is matched with the video or not and whether the advertisement benefit is generated or not can be determined in an artificial mode. For example, the conventional advertisement monitoring generally aims at positioning the advertisement broadcasting space-time position, counting the frequency of advertisement broadcasting, and the like, and determines whether to match and have certain economic benefits according to the advertisement broadcasting position and the frequency of broadcasting. However, the method has the problems of high labor cost and time and labor waste.

Disclosure of Invention

The invention provides an information matching degree evaluation method, device, equipment and storage medium, and aims to achieve the technical effect of efficiently and conveniently evaluating the matching degree between an advertisement and a video.

In a first aspect, an embodiment of the present invention provides an information matching degree evaluation method, where the method includes:

determining a target material, position information and confidence information of the target material inserted into a video based on a pre-trained visual feature model;

determining at least one target video frame corresponding to the target material according to the position information, and extracting image content characteristics corresponding to the at least one target video frame;

obtaining a correlation value between the target material and the target video frame by processing material content characteristics corresponding to the target material and the image content characteristics;

determining a matching degree value between the target material and the at least one target video frame based on the relevance degree value, the position information and the confidence degree information, so as to evaluate the matching degree between the target material and the at least one target video frame based on the matching degree value.

In a second aspect, an embodiment of the present invention further provides an apparatus for evaluating an information matching degree, where the apparatus includes:

the information determination module is used for determining a target material, position information and confidence information of the target material inserted into the video based on a pre-trained visual feature model;

the feature extraction module is used for determining at least one target video frame corresponding to the target material according to the position information and extracting image content features corresponding to the at least one target video frame;

the relevance value determining module is used for processing the material content characteristics corresponding to the target material and the image content characteristics to obtain the relevance value between the target material and the video;

a matching degree determination module for determining a matching degree between the target material and the at least one target video frame based on the relevance degree, the position information and the confidence degree information, so as to evaluate a matching degree between the target material and the at least one target video frame based on the matching degree.

In a third aspect, an embodiment of the present invention further provides an apparatus, where the apparatus includes:

one or more processors;

a storage device for storing one or more programs,

when the one or more programs are executed by the one or more processors, the one or more processors implement the method for evaluating matching degree of information according to any one of the embodiments of the present invention.

In a fourth aspect, the present invention further provides a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform the method for evaluating information matching degree according to any one of the embodiments of the present invention.

According to the technical scheme of the embodiment of the invention, the video comprising the advertisement material is input into the pre-trained visual feature model, so that the position information and the confidence information of the advertisement material in the video can be determined, the corresponding target video frame during the playing of the advertisement material can be determined based on the position information, the insertion position of the advertisement material is prevented from being manually confirmed, the target video frame and the advertisement material are further processed, the matching degree between the advertisement material and the video can be determined, the problems that the insertion position of the advertisement material needs to be manually confirmed at present, the playing frequency of the advertisement material needs to be counted, and the matching degree between the advertisement material and the video is relatively time-consuming and labor-consuming when the matching degree between the advertisement material and the video is determined are solved, the matching degree between the advertisement material and the video is intelligently, conveniently and efficiently determined, and the technical effect of labor cost is also.

Drawings

In order to more clearly illustrate the technical solutions of the exemplary embodiments of the present invention, a brief description is given below of the drawings used in describing the embodiments. It should be clear that the described figures are only views of some of the embodiments of the invention to be described, not all, and that for a person skilled in the art, other figures can be derived from these figures without inventive effort.

Fig. 1 is a schematic flow chart illustrating an information matching degree evaluation method according to an embodiment of the present invention;

fig. 2 is a schematic flow chart illustrating an information matching degree evaluation method according to a second embodiment of the present invention;

fig. 3 is a schematic flow chart illustrating an information matching degree evaluation method according to a third embodiment of the present invention;

fig. 4 is a schematic structural diagram of an apparatus for evaluating matching degree of information according to a fourth embodiment of the present invention;

fig. 5 is a schematic structural diagram of an apparatus according to a fifth embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

Example one

Fig. 1 is a schematic flow chart of an information matching degree evaluation method according to an embodiment of the present invention, which is applicable to determining whether advertisement material inserted in a video matches with video content, and the method may be executed by an information matching degree evaluation device, and the device may be implemented in a form of software and/or hardware.

As shown in fig. 1, the method of this embodiment includes:

and S110, determining the target material, the position information of the target material inserted into the video and the confidence information based on the pre-trained visual feature model.

The target material may be an advertisement material inserted in a video, may be displayed in the form of a video, or may be displayed in the form of a picture, for example, the advertisement material may be a video advertisement or a video image including an article, a scene, a brand identifier, and the like. The position information can be a playing position of the target material inserted into the video, and can also include the position information of the target material in the display interface, wherein the position information includes the advertisement video image or the central point coordinate information of the advertisement image. The visual feature model is pre-trained and is used for determining position information, size information and confidence coefficient information of the target material inserted into the video according to the input video data. The confidence coefficient information is a confidence coefficient used for determining the confidence coefficient associated with the target material. The size information refers to a specific size of the advertisement material, for example, if the advertisement material inserted in the video is a hat, the length and width of the hat, i.e., the area of the hat in the video frame, or the length and width can be obtained, and such information can be used as the size information.

Specifically, the video data including the target material may be input into the visual feature model, and the visual feature model may determine a specific playing position of the target material in the video, a size of the target material, and a confidence coefficient of the target material.

For example, if an advertisement material related to a hat is inserted into the video a, a floating window related to the advertisement material as a hat may pop up on the playing interface during the video playing process. In order to determine the matching degree between the advertisement material and the video a, the video including the hat material may be input into the visual feature model, and the playing position information of the advertisement material in the video, the length and width, or the area of the advertisement material, and the confidence coefficient of the advertisement material may be output based on the visual feature extraction model.

And S120, determining at least one target video frame corresponding to the target material according to the position information, and extracting image content characteristics corresponding to the at least one target video frame.

The advertisement material is mostly displayed by the floating window or inserted into a certain position of the video interface, so that the corresponding video interface when the advertisement material is played can be determined according to the position information, and the video interface corresponding to the played advertisement material can be used as a target video frame. And taking the key information in the target video frame as the image content characteristics of the target video frame, such as the head portrait, the accessory of the person, the environment in which the person is located and the like in the target video frame. The number of the target video frames may be one or more, the specific number of the target video frames corresponds to the playing time length of the target material, for example, if the playing time length of the target material is 5min, a corresponding 5min video interface when the target material is played may be used as the target video frame. And extracting the characteristics of each target video frame as image content characteristics.

In this embodiment, the image content feature of the extracted target video frame may be: and acquiring feature extraction models with different dimensions, extracting features of the target video frame based on the feature extraction models with different dimensions to obtain features corresponding to different dimensions, and taking the features as image content features corresponding to the target video frame.

The image content features in the target video frame may include multiple dimensions, such as character avatar, environment, emotional action, etc. in the video playing interface. The method comprises the steps of obtaining content characteristics of multiple dimensions of a target video frame, and obtaining image content characteristics of the target video frame after the target video frame is processed through feature extraction models of different dimensions. The number of the feature extraction models is the same as the number of the predetermined dimensions, for example, 5 dimensions of content features of the target video frame are acquired, and the number of the feature extraction models is 5. The feature extraction models with different dimensions are obtained by pre-training and are used for extracting image content features in the target video frame. The image content features may be keywords and/or keywords corresponding to the target video frames.

Specifically, the target video frames are respectively input into the feature extraction models obtained through pre-training, so that the content of the target video frames can be extracted, and the image content features corresponding to different dimensions can be obtained. For example, the common dimensions are faces, objects, scenes, brand identifiers, emotions, actions, and the like, and the feature extraction models corresponding to different dimensions may be trained in advance to extract features, i.e., keywords and/or keywords, from the image based on the feature extraction models of different dimensions. And respectively inputting the target video frame into the feature extraction models with different dimensions to obtain the human face feature, the object feature, the scene feature, the brand identification feature, the emotion feature, the action feature and the like in the target video frame, and taking the feature of each dimension obtained at the moment as the image content feature.

It should be noted that the features extracted by each feature extraction model are keywords or keywords corresponding to the target video frame.

S130, obtaining a relevance value between the target material and the target video frame by processing the material content characteristics and the image content characteristics corresponding to the target material.

In the process of determining the image content characteristics, the content characteristics corresponding to the target material content may also be determined.

The material content features may be pre-marked or keywords and/or keywords obtained after processing the target material.

Optionally, the obtaining of the relevance value between the target material and the video by processing the material content features and the image content features corresponding to the target material includes: inputting the material content characteristics and the image content characteristics into a semantic association model obtained by pre-training to obtain an association value between the material content characteristics and the image content characteristics

The semantic association model is obtained by pre-training and is used for determining the association value between the material content characteristics and the image content characteristics. The semantic association model may be represented by R (W)_A,W_V)，W_ARepresenting a set of keywords corresponding to the target material, i.e. a set of material content characteristics, W_VRepresenting a set of keywords corresponding to the target video frame, i.e., a set of keywords corresponding to image content features. The semantic association model R can be obtained by training in a prior probability statistical model and other modes. The output value of the semantic relevance model may represent a relevance value between the target material and the video.

That is, after the material content features and the image content features are input into the semantic association model, the association value between the material content features and the image content features can be obtained.

And S140, determining a matching degree value between the target material and the at least one target video frame based on the association degree value, the position information and the confidence degree information, and evaluating the matching degree between the target material and the at least one target video frame based on the matching degree value.

It should be noted that the confidence information of the target material may also reflect the positioning accuracy of the target material to a certain extent. The size information of the target material has a certain influence on the target video frame and also on the final matching degree, so that the size information of the target material needs to be considered when determining the matching degree between the target material and the video.

Wherein, the matching degree is used for representing the matching degree value between the target material and the video.

Specifically, the matching degree between the target material and the video can be determined based on the obtained association degree information, the position information and the confidence degree information.

In this embodiment, when the matching degree value is higher than the preset matching degree threshold, it indicates that the matching degree between the target material and the video is high, and when the matching degree value is lower than the preset matching degree threshold, it indicates that the matching degree between the target material and the video is poor. Whether the inserting position of the target material is updated or not can be re-determined based on the obtained matching degree value, so that the integrating degree between the target material and the video is improved, and the technical effect of improving the advertising benefit is achieved.

According to the technical scheme of the embodiment of the invention, the video comprising the advertisement material is input into the pre-trained visual feature model, so that the position information, the size information and the confidence information of the advertisement material in the video can be determined, the corresponding target video frame during the playing of the advertisement material can be determined based on the position information, the target video frame and the advertisement material are further processed, the matching degree between the advertisement material and the video can be determined, the problems that the insertion position of the advertisement material needs to be manually confirmed at present, the playing frequency of the advertisement material needs to be counted, and the matching degree between the advertisement material and the video is determined before, time and labor are wasted are solved, the matching degree between the advertisement material and the video is determined intelligently, conveniently and efficiently, and the technical effect of labor cost is reduced.

Example two

Fig. 2 is a flowchart illustrating an evaluation method for information matching degree according to a second embodiment of the present invention. Before extracting the image content corresponding to the target video frame, the method further comprises the following steps: acquiring a material size occupied by a target material and a video size used by a target video frame, and determining the proportion occupied by the target video frame when the target material is played based on the material size and the video size; when the proportion is smaller than a preset threshold value, the target material is removed from the target video frame, and the removed target video frame is processed; and when the proportion is larger than the preset threshold value, restoring the at least one target video frame by adopting a preset restoring method, and processing the restored target video frame. The specific processing manner can be seen in this embodiment.

As shown in fig. 2, the method includes:

s201, based on a pre-trained visual feature model, determining a target material, and position information, material size information and confidence information of the target material inserted into a video.

Specifically, a video including a target material is input into a pre-trained visual feature model, the visual feature model can process the video, and can determine playing position information of the target material in the video, size information of the material and a confidence coefficient corresponding to the target material.

S202, determining at least one target video frame corresponding to the target material according to the position information.

Specifically, after the playing position information of the target material in the video is determined, each target video frame in which the target material is inserted into the video can be determined.

S203, acquiring the material size information occupied by the target material and the video size used by the target video frame, and determining the proportion occupied by the target video frame when the target material is played based on the material size and the video size.

The target material may be an advertisement material inserted in the video, for example, when a hat is worn by a person in a played video interface, the target material may be inserted as the advertisement material of the hat, and the specific display mode may be displayed in a pop-up window form. After the target material is determined, the size of the target material, i.e., the length and width dimensions of the target material, may be determined. Meanwhile, the length and width sizes of the target video frame, that is, the video size, can be obtained. Based on the visual feature model obtained by pre-training, the material features of the target material can be determined.

Specifically, the length and width dimensions corresponding to the playing of the target material and the length and width dimensions of the target video frame, that is, the material dimensions and the video dimensions, may be obtained. Based on the material size and the video size, a size ratio between the target material and the target video frame may be determined.

S204, judging whether the determined proportion meets a preset threshold value, if so, executing S205; if not, go to S206.

The preset threshold is set in advance, for example, the preset threshold may be ten percent.

Specifically, according to the material size and the video size, a proportion value of the size of the target material occupying the size of the target video frame can be determined. When the ratio value is smaller than the preset threshold, it indicates that the target material does not have a certain influence on the target video frame, and at this time, the target material may be removed from the target video frame, that is, S205 may be executed, and the video frame from which the target material is removed is used as the target video frame. When the ratio value is greater than the preset threshold, it indicates that the size of the target material is large, and may have a certain influence on the target video frame, and at this time, a preset restoring method needs to be adopted to restore the target video frame to obtain a restored target video frame, that is, S206 is executed.

S205, removing the target material from the target video frame, and processing the removed target video frame.

Whether the target video frame is recovered or not can be measured according to the proportion of the advertisement coverage area to the original video area and the position of the target material in the video display interface. The reason for this is that: when the advertisement area is small and the advertisement area is located at the edge position of the video playing interface, important information in a target video frame may not be shielded, so that the influence of advertisement materials on video frame images does not need to be considered; if the advertisement area is large or the center of the video display interface is located, information in the image of the display interface may be occluded, so that the occluded image information needs to be recovered, and the matching degree between the material and the video needs to be determined. That is, when the target material is played, the position of the target material in the display interface and the size information of the target material need to be acquired, so as to determine whether to process the target video frame based on the above information.

Specifically, when the target material is inserted into the video, the occupied video size is small, and the target material is located at the edge of the video picture, so that the target material has little influence on the target video frame, the target material can be directly removed from the target video frame, the removed video frame is used as the target video frame, and the target video frame is continuously processed to determine the matching degree between the target material and the inserted video.

S206, restoring at least one target video frame by adopting a preset restoring method, and processing the restored target video frame.

When the size of the target material occupies the size ratio of the target video frame larger than the preset threshold value, it is indicated that the target material has a certain influence on the target video frame when the features of the target video frame are extracted, so that a preset image recovery method can be adopted, the video material is removed from the target video frame, then the image of the target video frame is recovered to obtain an original image, and the matching degree between the target material and the video is determined according to the original image.

The restoration method comprises the steps of restoring by a time neighborhood method; correspondingly, the method for restoring at least one target video frame by adopting a preset restoring method and processing the restored target video frame comprises the following steps: and acquiring an adjacent video frame of the target video frame, and when the background of the adjacent video frame is the same as that of the target video frame and no target material is inserted, taking the adjacent video frame as the restored target video.

Among them, the time domain method is also called a time domain neighbor method. The time domain neighbor method is that after the position information corresponding to the target material is determined, the video frames located at the starting position and the ending position in the position information can be determined according to the position information, whether the backgrounds of the video frames and the target video frames are the same or not is determined, and the video frames with similar backgrounds and without the target material are used as the target video frames.

The restoration method also comprises space domain filling restoration, at least one target video frame is restored by adopting a preset restoration method, and the restored target video frame is processed, and the method comprises the following steps: inputting the target video frame and the target video material into a pre-trained image processing model for restoration processing to obtain a restored target video frame; an image processing model for determining an original video frame corresponding to each video frame.

The method for filling the spatial domain can be realized by adopting an off-line trained deep neural network, namely, the target video frame is restored through an image processing model. The image processing model is obtained by pre-training, and the specific training method can be as follows: the method comprises the steps of taking an original picture and a picture with an advertisement inserted as a group of training samples, taking the picture with the advertisement inserted as the in-out of a model, taking the original picture as the output of the model to train an image processing model, taking a pixel matching error between a network output picture and the original picture as a loss function, and taking an obtained model as the image processing model when the convergence of the loss function is detected.

Specifically, the target video frame inserted into the target material is input into the image processing model to obtain an image obtained by restoring the target video frame, the restored image can be used as the target video frame, image content features in the restored image are extracted, and the matching degree between the target material and the video frame is determined based on the image content features.

S207, feature extraction models with different dimensions are obtained, feature extraction is carried out on the target video frame based on the feature extraction models with different dimensions, features corresponding to the different dimensions are obtained, and the features are used as image content features corresponding to the target video frame.

Specifically, after the target video frame is obtained, the target video frame may be input into feature extraction models with different dimensions to obtain image content features with different dimensions, where the features may be displayed in the form of keywords.

S208, judging whether material content characteristics corresponding to the target material exist or not, and if not, executing S209; if yes, go to S210.

It should be noted that, in the process of obtaining the image content features of the target video frame, it may also be determined whether there are material content features corresponding to the target material. If yes, the material content characteristics and the image content characteristics can be processed, that is, S210 is executed; if not, the target material needs to be processed by the same method as the target video frame, i.e., S209 is executed.

S209, processing the target material based on the feature extraction models with different dimensions to obtain material content features corresponding to the target material.

Specifically, the target material is input into the feature extraction models with different dimensions, so that feature descriptions with different dimensions corresponding to the target material can be obtained, and the feature descriptions with different dimensions obtained at this time are used as the material content features.

After the material content characteristics are obtained, the material content characteristics and the image content characteristics are processed to obtain the association degree value between the target material and the video to which the target video frame belongs, and then the matching degree between the target material and the video is determined according to the association degree value. That is, after performing S209, S210 is performed to determine a correlation value between the material content feature and the image content feature.

S210, inputting the material content characteristics and the image content characteristics into a semantic association model obtained through pre-training to obtain an association value between the material content characteristics and the image content characteristics.

The semantic association model is pre-established and is used for determining the association degree of each keyword or among the keywords. For example, the material content features and the image content features are input into the semantic association model, and the association value between the material content features and the image content features can be obtained. The higher the relevance value is, the more relevant the target material is to the target video frame, and conversely, the less relevant the target material is to the target video frame.

And S211, determining a matching degree value between the target material and at least one target video frame based on the relevance degree value, the position information and the confidence degree information.

In this embodiment, determining a matching metric between the target material and the at least one target video frame includes: acquiring size information of the target material, and determining an interference value of the target material to the target video frame based on the size information and the position information; obtaining an intermediate value by calculating a product between the confidence information and the relevance value; determining a match value between the target material and the at least one target video frame by calculating a ratio between the intermediate value and the interference value.

When the target material is played, the display size of the target material can be acquired. The position information includes not only the specific playing position of the target material in the video, but also the position information of the target material in the playing interface, which position information indicates the position of the target material on the display interface, for example, the coordinate information of the target material on the display interface, the coordinate information of the center point of the video or image of the target material can be determined by using the center point of the display interface as the origin of coordinates, and the coordinate information of the four vertices. The inserting position and the inserting size of the target material can generate certain influence and interference on the target video frame, the size and the position of the target material are processed, and the interference value of the target material on the target video frame can be determined.

The interference value can be represented by a function N (s, P) ═ s × P, s is a ratio of the maximum side length of the target material to the maximum side length of the target video frame, and P is a maximum ratio of the distance from the ordinate of the center point of the target material to the nearest edge of the target video frame to the corresponding side length. Wherein, the maximum side length can be determined by the coordinate information of the four vertexes.

The intermediate value is only a noun, has no meaning, and only represents the product between the confidence information and the relevance value. The confidence information is a confidence coefficient, which can be denoted as d. The matching degree value is used for representing the matching degree between the target material and the target video frame.

The degree of match value may be labeled as M,

through the formula, the matching degree value between the target material and the target video frame can be calculated, d represents the output result value of the visual feature model, R represents the matching degree value between the material content feature and the image content feature, and N represents the interference degree value of the target material to the target video.

Specifically, based on the specific determination method of the matching degree value, the matching degree value between the target material and the target video frame can be obtained, and then whether the video in which the target material is inserted and the position in which the target material is inserted are appropriate or not can be determined based on the matching degree value.

And S212, evaluating the matching degree between the target material and at least one target video frame based on the matching degree value.

Specifically, whether the target material is matched with the target video frame can be determined based on the matching degree value. If the matching degree value is higher than the preset matching degree threshold value, the matching degree between the target material and the target video frame is better, otherwise, the matching degree between the target material and the target video frame is poorer.

Based on the matching value, the user can update the insertion position of the target material and the inserted video.

On the basis of the above embodiment, the method further includes training the visual feature model. Training the visual feature model includes: acquiring at least one basic material, extracting key frames in the basic material according to a preset rule, and marking position information, label information, size information and confidence information of the key frames; the key frame is a video frame which comprises effective information in the basic material; changing the scale and the background of the key frame to obtain training sample data for training the visual feature model; the training sample data comprises position information, label information, size information and confidence coefficient information corresponding to the key frames; training the training sample data to obtain the visual feature model; the visual feature model is used to determine location information and confidence information of material in the video.

Wherein a plurality of advertisement videos and/or advertisement images may be acquired and used as base material. The key frames may be images including effective information in the advertisement material, for example, if the advertisement material is a hat, links, prices, and pictures including the hat may be used as the key frames, the same key frames may exist in one advertisement video, and one of the same key frames may be selected as the key frame. The same key frame can be understood as the image, content and background are completely the same. After the key frame is determined, the position information, the label information, the size information and the confidence coefficient of the key frame can be marked manually, so that the visual feature model can be trained according to the marked sample data. It should be noted that, because the data is sample data participating in the training model, the confidence information may be uniformly set to 1, that is, the data are all credible data. The size information is the size of the advertising material in the key frame. In order to increase the number of the basic materials, the scale and the scene of the key frame may be transformed, where the change of the scale and the scene may be referred to in the description of the embodiment, and details are not repeated here. And taking the sample data obtained at the moment as training sample data for training the visual feature model. And taking the key frames in the training sample data as the input of the visual feature model, and taking the labeled information as the output corresponding to each sample data to train the visual feature model so as to determine the position information, the size information and the confidence coefficient information of the target material in the video based on the visual feature model.

EXAMPLE III

As a preferred embodiment of the foregoing embodiment, fig. 3 is another schematic flow chart of an information matching degree evaluation method provided in a third embodiment of the present invention. As shown in fig. 3, the method includes:

and S1, determining training sample data for training the visual feature model.

And acquiring advertisement materials, and making a basic training library based on the advertisement materials. The base training library is composed of each key video frame or raw image in the advertising material. If there is a difference between two adjacent video frames, one of the two adjacent video frames may be used as a key video frame. Of course, key video frames may also be determined artificially.

After the basic training library is determined based on the key video frames and the original images, data stored in the basic training library can be expanded to obtain training sample data. The expansion mode for expanding the basic training library can comprise scale expansion and background expansion. The scale expansion may be by scaling the target video frames or images in the base training library to a random scale. The background expansion can be realized by superposing the base training picture after the scale expansion to a random position in any video, and different transparencies can be set to generate a plurality of background expansion samples. And obtaining a sample training library for training the visual feature model by different selections of the four variables of scale, position, transparency and background picture. That is to say, after the key video frame and the original image are obtained, scale transformation and background transformation can be performed on the key video frame and the original image to obtain training sample data for training the visual feature model.

The training sample data includes position information, label information, size information, and confidence information of the key frame.

And S2, training the visual feature model based on the training sample data.

Specifically, after training sample data is obtained, material key frames and original images of the training sample data can be used as input of the visual feature model, and position information, label information and confidence information corresponding to the training sample data can be used as output of the visual feature model to train the visual feature model.

And S3, determining the position information and the confidence coefficient information of the target material inserted into the video based on the visual feature model.

In the actual application process, if it is determined whether the target material is matched with the target video frame of the current video, the video may be input into a visual feature model obtained through pre-training, and the position information and the confidence information of the target material in the current video are determined, so as to determine the matching degree between the target material and the target video frame based on the position information and the confidence information.

And S4, restoring the video image of the target video frame to which the target material belongs.

And the target video frame is a video frame to which the target material belongs. The main reason for performing the recovery processing on the target video frame is that the target material may cover the key content in the target video frame, so that the image processing can be performed on the target video frame to determine the video content covered by the target material, and further obtain the original video frame corresponding to the target video frame. The original video frame is an original image corresponding to the target video frame.

In this embodiment, the specific method of performing the restoration process on the target video frame may be determined by the proportional relationship between the size of the target material and the size of the video image.

When the inserted advertisement video image has a smaller dimension and is positioned at the edge of the picture. The scale can be measured by the proportion of the advertisement coverage area to the original video area, and the position can be measured by the proportion of the distance between the advertisement center and the nearest edge of the original video in the horizontal and vertical coordinate to the length and width of the original video. When the proportion is smaller than the preset proportion threshold value, the cutting condition is met, the advertisement coverage area can be cut, and the maximum rectangle without the advertisement area is reserved as the original video image after cutting for subsequent content matching.

And when the proportion is larger than the preset proportion threshold value, the original image needs to be restored by adopting a restoration mode. The recovery mode comprises two modes of time domain neighbor and space domain filling. The time domain neighbor method is to use the frame as a restored image, i.e., a restored target video frame, if the first few frames or the last few frames of the video frame corresponding to the moment of the advertisement material appear are the same as or similar to the background of the video frame to which the advertisement material belongs. The space domain filling can be realized by adopting an offline training deep neural network, a pair of original pictures and pictures with advertisements inserted are used as a group of training samples, the pictures with the advertisements inserted are used as the input of the neural network, and the original pictures are used as the output of the neural network to train the model. And determining an original image corresponding to the target video frame based on the trained model.

And S5, extracting the target material characteristics and the image characteristics of the target video frame.

The target material characteristics can be adopted and can be marked in advance or extracted from the image corresponding to the target material. And respectively processing the target video frames by adopting a multi-dimensional characteristic model, so that the image characteristics in the target video frames can be extracted. The image features can be information such as vehicle identification, face features, scenes, brand identification and the like. That is, the image feature may be a target video frame, a keyword in a target material.

Specifically, based on the multi-dimensional feature model, the image features in the target material and the image features in the target video frame may be extracted, that is, keywords corresponding to the target material and the target video frame may be extracted.

And S6, determining the matching degree between the target material characteristics and the image characteristics of the target video frame.

The degree of match value may be labeled as M,

Example four

Fig. 4 is a schematic structural diagram of an apparatus for evaluating information matching degree according to a fourth embodiment of the present invention, where the apparatus includes: an information determination module 410, a feature extraction module 420, a relevance value determination module 430, and a matching value determination module 440. Wherein the content of the first and second substances,

the information determining module 410 is configured to determine, based on a pre-trained visual feature model, a target material, and position information and confidence information of the target material inserted into a video; a feature extraction module 420, configured to determine at least one target video frame corresponding to the target material according to the location information, and extract an image content feature corresponding to the at least one target video frame; a relevance value determining module 430, configured to obtain a relevance value between the target material and the video by processing a material content feature corresponding to the target material and the image content feature; a matching degree determination module 440, configured to determine a matching degree between the target material and the at least one target video frame based on the relevance degree, the location information, and the confidence information, so as to evaluate a matching degree between the target material and the at least one target video frame based on the matching degree.

On the basis of the above technical solution, the apparatus further includes: a model training module to: training the visual feature model specifically comprises:

the system comprises a sample marking unit, a data processing unit and a data processing unit, wherein the sample marking unit is used for acquiring at least one basic material, extracting key frames in the basic material according to a preset rule, and marking position information, label information, size information and confidence information of the key frames; the key frame is a video frame which comprises effective information in the basic material; the training data determining unit is used for changing the scale and the background of the key frame to obtain training sample data for training the visual feature model; the training sample data comprises position information, label information and confidence information corresponding to the key frames; the model training unit is used for training the training sample data to obtain the visual feature model; the visual feature model is used to determine location information, size information, and confidence information of material in the video.

On the basis of the above technical solutions, before the feature extraction module is configured to extract the image content feature corresponding to the at least one target video frame, the feature extraction module is further configured to:

acquiring a material size occupied by a target material and a video size used by a target video frame, and determining the proportion occupied by the target video frame when the target material is played based on the material size and the video size; when the proportion is smaller than a preset threshold value, the target material is removed from the target video frame, and the removed target video frame is processed; and when the proportion is larger than the preset threshold value, restoring the at least one target video frame by adopting a preset restoring method, and processing the restored target video frame.

On the basis of the technical schemes, the restoration method comprises the steps of restoring by a time neighborhood method; correspondingly, the restoring the at least one target video frame by using a preset restoring method, and processing the restored target video frame includes:

acquiring an adjacent video frame of the target video frame, and when the background of the adjacent video frame is the same as that of the target video frame and the target material is not inserted, taking the adjacent video frame as a restored target video;

on the basis of the above technical solutions, the restoring method includes spatial domain filling restoration, and the restoring method that restores the at least one target video frame by using a preset restoring method and processes the restored target video frame includes:

inputting the target video frame and the target video material into a pre-trained image processing model for restoration processing to obtain a restored target video frame; the image processing model is used for determining an original video frame corresponding to each video frame.

On the basis of the technical schemes, feature extraction models with different dimensions are obtained, feature extraction is carried out on the target video frame on the basis of the feature extraction models with different dimensions, features corresponding to different dimensions are obtained, and the features are used as image content features corresponding to the target video frame.

On the basis of the foregoing technical solutions, before the relevance value determining module is configured to process the material content features and the image content features corresponding to the target material to obtain the relevance value between the target material and the video, the relevance value determining module is further configured to:

and when the material content characteristics corresponding to the target material are not detected, processing the target material based on the feature extraction models with different dimensions to obtain the material content characteristics corresponding to the target material.

On the basis of the above technical solutions, the relevance value determining module is further configured to input the material content features and the image content features into a semantic relevance model obtained through pre-training to obtain a relevance value between the material content features and the image content features.

On the basis of the above technical solutions, the matching degree value determining module further includes:

the interference value determining unit is used for acquiring the size information of the target material and determining the interference value of the target material to the target video frame based on the product of the size information and the position information; an intermediate value determining unit configured to obtain an intermediate value by calculating a product between the confidence information and the relevance value; a matching degree value determining unit for determining a matching degree value between the target material and the at least one target video frame by calculating a ratio between the intermediate value and the interference degree value.

The device for evaluating the information matching degree provided by the embodiment of the invention can execute the method for evaluating the information matching degree provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.

It should be noted that, the units and modules included in the apparatus are merely divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the embodiment of the invention.

EXAMPLE five

Fig. 5 is a schematic structural diagram of an apparatus according to a fifth embodiment of the present invention. FIG. 5 illustrates a block diagram of an exemplary device 50 suitable for use in implementing embodiments of the present invention. The device 50 shown in fig. 5 is only an example and should not bring any limitation to the function and scope of use of the embodiments of the present invention.

As shown in FIG. 5, device 50 is embodied in a general purpose computing device. The components of the device 50 may include, but are not limited to: one or more processors or processing units 501, a system memory 502, and a bus 503 that couples the various system components (including the system memory 502 and the processing unit 501).

Bus 503 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, or a local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.

Device 50 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by device 50 and includes both volatile and nonvolatile media, removable and non-removable media.

The system memory 502 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)504 and/or cache memory 505. The device 50 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 506 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 5, commonly referred to as a "hard drive"). Although not shown in FIG. 5, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to the bus 503 by one or more data media interfaces. Memory 502 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.

A program/utility 508 having a set (at least one) of program modules 507 may be stored, for instance, in memory 502, such program modules 507 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 507 generally perform the functions and/or methodologies of embodiments of the invention as described herein.

Device 50 may also communicate with one or more external devices 509 (e.g., keyboard, pointing device, display 510, etc.), with one or more devices that enable a user to interact with device 50, and/or with any devices (e.g., network card, modem, etc.) that enable device 50 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interfaces 511. Also, device 50 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via network adapter 512. As shown, the network adapter 512 communicates with the other modules of the device 50 over a bus 503. It should be appreciated that although not shown in FIG. 5, other hardware and/or software modules may be used in conjunction with device 50, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

The processing unit 501 executes various functional applications and data processing by running a program stored in the system memory 502, for example, to implement the method for evaluating the degree of matching of information provided by the embodiment of the present invention.

EXAMPLE six

The sixth embodiment of the present invention further provides a storage medium containing computer-executable instructions, which are used to perform a method for evaluating matching degree of information when executed by a computer processor.

The method comprises the following steps:

Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for embodiments of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. An information matching degree evaluation method is characterized by comprising the following steps:

2. The method of claim 1, further comprising: training the visual feature model;

wherein training the visual feature model comprises:

acquiring at least one basic material, extracting key frames in the basic material according to a preset rule, and marking position information, label information, size information and confidence information of the key frames; the key frame is a video frame which comprises effective information in the basic material;

changing the scale and the background of the key frame to obtain training sample data for training the visual feature model; the training sample data comprises position information, size information label information and confidence coefficient information corresponding to the key frames;

training the training sample data to obtain the visual feature model;

the visual feature model is used to determine location information, size information, and confidence information of material in the video.

3. The method of claim 1, wherein prior to said extracting image content features corresponding to said at least one target video frame, further comprising:

acquiring a material size occupied by a target material and a video size used by a target video frame, and determining the proportion occupied by the target video frame when the target material is played based on the material size and the video size;

when the proportion is smaller than a preset threshold value, the target material is removed from the target video frame, and the removed target video frame is processed;

and when the proportion is larger than the preset threshold value, restoring the at least one target video frame by adopting a preset restoring method, and processing the restored target video frame.

4. The method of claim 3, wherein the restoration method comprises temporal neighborhood restoration;

correspondingly, the restoring the at least one target video frame by using a preset restoring method, and processing the restored target video frame includes:

and acquiring an adjacent video frame of the target video frame, and when the background of the adjacent video frame is the same as that of the target video frame and the target material is not inserted, taking the adjacent video frame as a restored target video.

5. The method according to claim 3, wherein the restoration method comprises spatial domain filling restoration, and the restoring the at least one target video frame by using a preset restoration method and processing the restored target video frame comprises:

inputting the target video frame and the target video material into a pre-trained image processing model for restoration processing to obtain a restored target video frame;

the image processing model is used for determining an original video frame corresponding to each video frame.

6. The method of claim 1, wherein said extracting image content features corresponding to the at least one target video frame comprises:

and acquiring feature extraction models with different dimensions, extracting features of the target video frame based on the feature extraction models with different dimensions to obtain features corresponding to different dimensions, and taking the features as image content features corresponding to the target video frame.

7. The method according to claim 6, wherein before the obtaining the relevance value between the target material and the video by processing the material content features corresponding to the target material and the image content features, further comprises:

8. The method of claim 1, wherein the obtaining the relevance value between the target material and the video by processing material content features corresponding to the target material and the image content features comprises:

and inputting the material content characteristics and the image content characteristics into a semantic association model obtained by pre-training to obtain an association value between the material content characteristics and the image content characteristics.

9. The method of claim 1, wherein determining a match score between a target material and the at least one target video frame based on the relevance score, the location information, and the confidence information comprises:

acquiring size information of the target material, and determining an interference value of the target material to the target video frame based on the size information and the position information;

obtaining an intermediate value by calculating a product between the confidence information and the relevance value;

determining a match value between the target material and the at least one target video frame by calculating a ratio between the intermediate value and the interference value.

10. An apparatus for evaluating a degree of matching of information, comprising: