CN113920463A

CN113920463A - Video matching method, device and equipment based on video fingerprints and storage medium

Info

Publication number: CN113920463A
Application number: CN202111217627.9A
Authority: CN
Inventors: 陈家龙
Original assignee: Ping An International Smart City Technology Co Ltd
Current assignee: Ping An International Smart City Technology Co Ltd
Priority date: 2021-10-19
Filing date: 2021-10-19
Publication date: 2022-01-11

Abstract

The application relates to the technical field of artificial intelligence, and discloses a video matching method, device, equipment and storage medium based on video fingerprints, wherein the method comprises the following steps: determining each target object image and each target object name corresponding to each image frame to be recognized by using a preset image recognition and name prediction model for each image frame to be recognized corresponding to a video to be recognized; determining the coordinates of the target object identification point corresponding to each target object image for each target object image corresponding to each image frame to be recognized; generating a single image frame fingerprint according to the coordinates of each target object identification point corresponding to each image frame to be identified, each target object image and each target object name to obtain a single image frame fingerprint to be judged; and performing similar video matching in a preset video fingerprint database according to the fingerprints of the single image frames to be judged to obtain a target similar video corresponding to the video to be identified. Therefore, the method for generating the single image frame fingerprint is simple, good in practicability and beneficial to commercialization.

Description

Video matching method, device and equipment based on video fingerprints and storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a video matching method, apparatus, device, and storage medium based on video fingerprints.

Background

Video fingerprinting is a software identification, extraction, and compression technique for video, and can use a unique "fingerprint" generated to represent a video file. Video fingerprinting is an emerging technology, and has proved to be effectively applied to the fields of video data preprocessing, hash value comparison, digital watermarking and the like. Currently, there is no industry standard or technical standard for video fingerprints, and common implementation schemes include: compared with hash values, video watermarks are added, and video fingerprints are extracted, in common implementation schemes, the scheme which is easy to implement is poor in practicability and cannot be commercialized, and the commercial scheme is high in implementation difficulty.

Disclosure of Invention

The application mainly aims to provide a video matching method, device, equipment and storage medium based on video fingerprints, and aims to solve the technical problems that in the prior art, the scheme which is easy to realize is poor in practicability and cannot be commercialized and the scheme which is already commercialized is high in difficulty due to the fact that the scheme is compared based on hash values, video watermarks are added and the video fingerprints are extracted.

In order to achieve the above object, the present application provides a video matching method based on video fingerprints, including:

acquiring a plurality of image frames to be identified corresponding to a video to be identified;

respectively carrying out object image identification and object name classification prediction on each image frame to be identified by adopting a preset image identification and name prediction model to obtain each target object image and each target object name corresponding to each image frame to be identified;

performing coordinate calculation of a preset identification point on each target object image corresponding to each image frame to be recognized to obtain a target object identification point coordinate corresponding to each target object image;

generating a single image frame fingerprint according to the coordinates of each target object identification point corresponding to each image frame to be identified, each target object image and each target object name to obtain a single image frame fingerprint to be judged;

and performing similar video matching in the acquired preset video fingerprint database according to each single image frame fingerprint to be judged to obtain a target similar video corresponding to the video to be identified.

Further, the step of obtaining a plurality of image frames to be identified corresponding to the video to be identified includes:

acquiring the video to be identified;

acquiring an extraction time range according to the video to be identified;

and extracting each image frame corresponding to the extraction time range from the video to be identified to obtain a plurality of image frames to be identified corresponding to the video to be identified.

Further, the step of generating a single image frame fingerprint according to the coordinates of the identification points of the target objects, the images of the target objects and the names of the target objects corresponding to the image frames to be recognized to obtain the single image frame fingerprint to be determined includes:

calculating an included angle of each target object identification point coordinate by adopting a preset included angle calculation rule to obtain a target object identification point included angle;

acquiring any image frame to be identified as an image frame to be analyzed;

acquiring any one of the target object images from each target object image corresponding to the image frame to be analyzed as an object image to be processed;

generating a single-object image fingerprint according to the object image to be processed and the target object identification point coordinates, the target object identification point included angle and the target object name corresponding to the object image to be processed to obtain a target single-object image fingerprint;

repeatedly executing the step of acquiring any one of the target object images from the target object images corresponding to the image frame to be analyzed as an object image to be processed until the acquisition of the target object images corresponding to the image frame to be analyzed is completed;

generating a single-object image fingerprint set according to each target single-object image fingerprint to obtain a target single-object image fingerprint set;

generating a single image frame fingerprint according to the time axis position corresponding to the image frame to be analyzed and the target single object image fingerprint set to obtain the single image frame fingerprint to be judged corresponding to the image frame to be analyzed;

and repeatedly executing the step of acquiring any image frame to be identified as an image frame to be analyzed until the acquisition of the image frame to be identified is completed.

Further, the step of performing similar video matching in the acquired preset video fingerprint database according to each single image frame fingerprint to be judged to obtain a target similar video corresponding to the video to be identified includes:

acquiring any one of the single image frame fingerprints to be judged as a target single image frame fingerprint;

acquiring the preset video fingerprint database;

similarity calculation is carried out on the target single image frame fingerprint and each single image frame fingerprint in the preset video fingerprint database, and similarity to be evaluated is obtained;

finding out a value larger than the acquired preset threshold value from each similarity to be evaluated to obtain a candidate similarity set;

finding out the similarity to be evaluated with the maximum value from the candidate similarity set to obtain a target similarity;

taking the single image frame fingerprint corresponding to the target similarity in the preset video fingerprint library as a single image frame fingerprint to be processed;

repeatedly executing the step of acquiring any one of the to-be-judged single image frame fingerprints as a target single image frame fingerprint until the acquisition of each to-be-judged single image frame fingerprint is completed;

according to the single image frame fingerprints to be processed, similarity calculation is carried out on each single image frame fingerprint set in the preset video fingerprint database to obtain video similarity to be analyzed;

and determining the target similar video corresponding to the video to be identified in the preset video fingerprint database according to the video similarity to be analyzed and the acquired preset video similarity threshold.

Further, before the step of obtaining the preset video fingerprint database, the method further includes:

acquiring a plurality of videos to be analyzed;

acquiring any one of the videos to be analyzed as a video to be processed;

acquiring image frames from the video to be processed by adopting the acquired preset sampling proportion to obtain a plurality of standard image frames;

acquiring any one of the standard image frames as an image frame to be processed;

respectively carrying out object image identification and object name classification prediction on the image frame to be processed by adopting the preset image identification and name prediction model to obtain each object image to be analyzed and each object name to be analyzed corresponding to the image frame to be processed;

according to the image frames to be processed, carrying out coordinate calculation of the preset identification points on each object image to be analyzed to obtain the coordinates of the object identification points to be analyzed corresponding to each object image to be analyzed;

performing single image frame fingerprint generation according to the identification point coordinates of each object to be analyzed, each image of the object to be analyzed and each name of the object to be analyzed to obtain standard single image frame fingerprints;

repeatedly executing the step of acquiring any one standard image frame as an image frame to be processed until the acquisition of each standard image frame is completed;

generating a video fingerprint according to the preset sampling proportion and each standard single image frame fingerprint to obtain a standard video fingerprint corresponding to the image frame to be processed;

and repeatedly executing the step of acquiring any one of the videos to be analyzed as the video to be processed until the acquisition of each video to be analyzed is completed, and generating the preset video fingerprint database according to each standard video fingerprint.

Further, the step of calculating the similarity between the target single image frame fingerprint and each single image frame fingerprint in the preset video fingerprint database to obtain the similarity to be evaluated includes:

acquiring one single image frame fingerprint from the preset video fingerprint database to serve as a single image frame fingerprint to be matched;

acquiring any object name from the target single image frame fingerprint as an object name to be identified;

carrying out object name matching on the object name to be identified in the single image frame fingerprint to be matched to obtain a matching result;

when the matching result is successful, performing difference calculation on the object identification point included angle corresponding to the object name to be identified and the object identification point included angle corresponding to the matching result in the single image frame fingerprint to be matched to obtain a difference value of the included angles to be analyzed;

when the matching result is failure, taking the obtained preset included angle difference value as the included angle difference value to be analyzed;

repeatedly executing the step of acquiring any object name from the target single image frame fingerprint as the object name to be identified until the acquisition of each object name in the target single image frame fingerprint is completed;

determining the similarity to be evaluated corresponding to the fingerprint of the single image frame to be matched according to the difference value of the included angles to be analyzed;

and repeatedly executing the step of acquiring one single image frame fingerprint from the preset video fingerprint database as the single image frame fingerprint to be matched until the acquisition of each single image frame fingerprint in the preset video fingerprint database is completed.

Further, the step of finding out a value greater than the obtained preset threshold value from each similarity to be evaluated to obtain a candidate similarity set includes:

finding out a value larger than the acquired preset threshold value from each similarity to be evaluated to obtain a similarity set to be processed;

acquiring any one of the similarity to be evaluated from the similarity set to be processed as the similarity to be analyzed;

according to each object image corresponding to the similarity to be analyzed in the preset video fingerprint library and each object image corresponding to the target single image frame fingerprint, carrying out shape similarity judgment on object images with the same object name to obtain a shape similarity judgment result;

repeatedly executing the step of obtaining any one of the similarities to be evaluated from the similarity sets to be processed as the similarity to be analyzed until the obtaining of each similarity to be evaluated in the similarity sets to be processed is completed;

and taking each similarity to be evaluated corresponding to each successful shape similarity judgment result as the candidate similarity set.

The application also provides a video matching device based on video fingerprint, the device includes:

the data acquisition module is used for acquiring a plurality of image frames to be identified corresponding to the video to be identified;

the object image identification and object name classification prediction module is used for respectively carrying out object image identification and object name classification prediction on each image frame to be identified by adopting a preset image identification and name prediction model to obtain each target object image and each target object name corresponding to each image frame to be identified;

the target object identification point coordinate determination module is used for performing coordinate calculation of preset identification points on each target object image corresponding to each image frame to be identified to obtain a target object identification point coordinate corresponding to each target object image;

the to-be-judged single image frame fingerprint determining module is used for generating a single image frame fingerprint according to the identification point coordinates of each target object, each target object image and each target object name corresponding to each to-be-identified image frame to obtain the to-be-judged single image frame fingerprint;

and the target similar video determining module is used for performing similar video matching in the acquired preset video fingerprint database according to each single image frame fingerprint to be judged to obtain a target similar video corresponding to the video to be identified.

The present application further proposes a computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the steps of any of the above methods when executing the computer program.

The present application also proposes a computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method of any of the above.

The video matching method, the device, the equipment and the storage medium based on the video fingerprints are characterized in that the method firstly obtains a plurality of image frames to be recognized corresponding to a video to be recognized, secondly adopts a preset image recognition and name prediction model to respectively carry out object image recognition and object name classification prediction on each image frame to be recognized so as to obtain each target object image and each target object name corresponding to each image frame to be recognized, carries out coordinate calculation of a preset identification point on each target object image corresponding to each image frame to be recognized so as to obtain a target object identification point coordinate corresponding to each target object image, and then carries out single-image-frame fingerprint generation according to each target object identification point coordinate, each target object image and each target object name corresponding to each image frame to be recognized, and finally, carrying out similar video matching in an acquired preset video fingerprint database according to each single image frame fingerprint to be judged to obtain a target similar video corresponding to the video to be identified. Therefore, the single image frame fingerprint is generated based on the object identification point coordinates, the object image and the object name, similar video matching is carried out in the preset video fingerprint database according to the generated single image frame fingerprint, and the method for generating the single image frame fingerprint is simple, good in practicability and beneficial to commercialization.

Drawings

Fig. 1 is a schematic flowchart of a video matching method based on video fingerprints according to an embodiment of the present application;

FIG. 2 is a block diagram illustrating a video matching apparatus based on video fingerprints according to an embodiment of the present application;

fig. 3 is a block diagram illustrating a structure of a computer device according to an embodiment of the present application.

The implementation, functional features and advantages of the objectives of the present application will be further explained with reference to the accompanying drawings.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

Referring to fig. 1, an embodiment of the present application provides a video matching method based on video fingerprints, where the method includes:

s1: acquiring a plurality of image frames to be identified corresponding to a video to be identified;

s2: respectively carrying out object image identification and object name classification prediction on each image frame to be identified by adopting a preset image identification and name prediction model to obtain each target object image and each target object name corresponding to each image frame to be identified;

s3: performing coordinate calculation of a preset identification point on each target object image corresponding to each image frame to be recognized to obtain a target object identification point coordinate corresponding to each target object image;

s4: generating a single image frame fingerprint according to the coordinates of each target object identification point corresponding to each image frame to be identified, each target object image and each target object name to obtain a single image frame fingerprint to be judged;

s5: and performing similar video matching in the acquired preset video fingerprint database according to each single image frame fingerprint to be judged to obtain a target similar video corresponding to the video to be identified.

In this embodiment, a plurality of image frames to be recognized corresponding to a video to be recognized are first obtained, then a preset image recognition and name prediction model is used to perform object image recognition and object name classification prediction on each image frame to be recognized, so as to obtain each target object image and each target object name corresponding to each image frame to be recognized, coordinate calculation of a preset identification point is performed on each target object image corresponding to each image frame to be recognized, so as to obtain a target object identification point coordinate corresponding to each target object image, then single image frame fingerprint generation is performed according to each target object identification point coordinate corresponding to each image frame to be recognized, each target object image and each target object name, so as to obtain a single image frame fingerprint to be determined, and finally similar video matching is performed in an obtained preset video fingerprint library according to each single image frame fingerprint to be determined, and obtaining a target similar video corresponding to the video to be identified. Therefore, the single image frame fingerprint is generated based on the object identification point coordinates, the object image and the object name, similar video matching is carried out in the preset video fingerprint database according to the generated single image frame fingerprint, and the method for generating the single image frame fingerprint is simple, good in practicability and beneficial to commercialization.

For S1, a plurality of image frames to be recognized corresponding to the video to be recognized input by the user may be acquired, a plurality of image frames to be recognized corresponding to the video to be recognized may be acquired from the database, and a plurality of image frames to be recognized corresponding to the video to be recognized may be acquired from the third-party application system.

The video to be identified is the video needing to search for similar videos.

The plurality of image frames to be identified are a plurality of continuous image frames in the video to be identified. An image frame, which is the smallest unit constituting a video.

It is to be understood that the plurality of image frames to be identified may also be a plurality of non-consecutive image frames in the video to be identified, and is not limited herein.

For S2, a preset image recognition and name prediction model is adopted to perform object image recognition on the image frame to be recognized, and each object image obtained through recognition is used as a target object image; and carrying out object name classification prediction on the target object image by adopting a preset image recognition and name prediction model, wherein the object name obtained by classification prediction is used as the target object name. That is, one or more target object images correspond to one image frame to be recognized, and each target object image corresponds to one target object name.

The preset image recognition and name prediction model is obtained by training a convolutional neural network by adopting a plurality of training samples. The training samples include: image sample, object image calibration and object name calibration values.

The object image is an image of an object. Objects include, but are not limited to: animals, plants, vehicles, household goods.

The object name is the name of an object, such as a rabbit, a vase, a table, and a chair.

For S3, performing coordinate calculation of a preset identification point on each target object image corresponding to each image frame to be recognized, to obtain a target object identification point coordinate corresponding to each target object image, that is, the target object identification point coordinate is a pixel coordinate of a pixel point in the target object image corresponding to the preset identification point in the image frame to be recognized.

The preset identification point is also the preset identification point. The identification points comprise: any one of the upper left corner, the lower right corner, the center point, the lower left corner and the upper right corner.

And S4, calculating an object identification point included angle according to the target object identification point coordinates, generating a single-image-frame fingerprint according to each object identification point included angle corresponding to each image frame to be identified, each target object identification point coordinate, each target object image and each target object name, and taking the generated single-image-frame fingerprint as the single-image-frame fingerprint to be judged.

And calculating an included angle of the object identification point according to the coordinates of the target object identification point, namely calculating an included angle between a connecting line of a pixel point corresponding to the preset identification point in the target object image and the original point of the image frame to be identified and a target axis.

The target axis includes: any one of the positive direction of the x axis of the image frame to be recognized, the negative direction of the x axis of the image frame to be recognized, the positive direction of the y axis of the image frame to be recognized and the negative direction of the y axis of the image frame to be recognized.

The single image frame fingerprint includes: the system comprises a time axis position and a single-object image fingerprint set, wherein the single-object image fingerprint set comprises at least 0 single-object image fingerprints. The single object image fingerprint includes: the object name, the object identification point coordinates, the object image and the object identification point included angle. That is, each single object image fingerprint set corresponds to one image frame.

The time axis position is the time point of the image frame in the video.

For S5, the preset video fingerprint library includes: a plurality of standard video fingerprints. Standard video fingerprints include: video identification, sampling proportion and a single image frame fingerprint set, wherein the single image frame fingerprint set comprises one or more single image frame fingerprints.

And performing similarity calculation on each single image frame fingerprint to be judged and each single image frame fingerprint in an acquired preset video fingerprint database, determining a video identifier of the video which is most similar to the video to be identified in the preset video fingerprint database according to data obtained by the similarity calculation, and taking the video corresponding to the determined video identifier as a target similar video. That is, the target similar video is a video corresponding to the video identifier in the preset video fingerprint library.

In an embodiment, the step of obtaining a plurality of image frames to be recognized corresponding to the video to be recognized includes:

s11: acquiring the video to be identified;

s12: acquiring an extraction time range according to the video to be identified;

s13: and extracting each image frame corresponding to the extraction time range from the video to be identified to obtain a plurality of image frames to be identified corresponding to the video to be identified.

In the embodiment, the continuous image frames are used as the plurality of image frames to be identified corresponding to the video to be identified, so that the time span of the image frames is favorably reduced, and the matching efficiency of similar videos is improved.

For S11, the video to be recognized input by the user may be obtained, the video to be recognized may also be obtained from a database, and the video to be recognized may also be obtained from a third-party application system.

And S12, adopting a preset time range determination rule, and acquiring an extraction time range according to the total duration of the video to be identified.

The preset time range determination rule includes: the time length proportion of the start time and the extraction time length proportion.

The proportion of the duration of the start time is the proportion of the time point corresponding to the first image frame to be identified in the total duration of the video to be identified. For example, if the proportion of the duration of the start time is 5%, and the total duration of the video to be recognized is 100 minutes, the image frame corresponding to the 5 th minute (100 minutes × 5%) of the video to be recognized is used as the first image frame to be recognized corresponding to the video to be recognized, which is not limited in this example.

And extracting the time length proportion, wherein the time length proportion is the time span of a plurality of image frames to be identified corresponding to the video to be identified. For example, the extraction duration proportion is 20%, the total duration of the video to be recognized is 100 minutes, and the duration proportion of the start time is 5%, then starting from the image frame corresponding to the 5 th minute (i.e., 100 minutes × 5%) of the video to be recognized, an image frame within 20 minutes (i.e., 100 minutes × 20%) is obtained, and each obtained image frame is taken as one image frame to be recognized, which is not limited in this example.

For S13, extracting respective image frames corresponding to the extraction time range from the video to be recognized, and regarding each extracted image frame as one image frame to be recognized.

In an embodiment, the step of performing single-image-frame fingerprint generation according to the coordinates of the identification point of each target object, the image of each target object, and the name of each target object corresponding to each image frame to be recognized to obtain a single-image-frame fingerprint to be determined includes:

s41: calculating an included angle of each target object identification point coordinate by adopting a preset included angle calculation rule to obtain a target object identification point included angle;

s42: acquiring any image frame to be identified as an image frame to be analyzed;

s43: acquiring any one of the target object images from each target object image corresponding to the image frame to be analyzed as an object image to be processed;

s44: generating a single-object image fingerprint according to the object image to be processed and the target object identification point coordinates, the target object identification point included angle and the target object name corresponding to the object image to be processed to obtain a target single-object image fingerprint;

s45: repeatedly executing the step of acquiring any one of the target object images from the target object images corresponding to the image frame to be analyzed as an object image to be processed until the acquisition of the target object images corresponding to the image frame to be analyzed is completed;

s46: generating a single-object image fingerprint set according to each target single-object image fingerprint to obtain a target single-object image fingerprint set;

s47: generating a single image frame fingerprint according to the time axis position corresponding to the image frame to be analyzed and the target single object image fingerprint set to obtain the single image frame fingerprint to be judged corresponding to the image frame to be analyzed;

s48: and repeatedly executing the step of acquiring any image frame to be identified as an image frame to be analyzed until the acquisition of the image frame to be identified is completed.

According to the method, the included angle of the object identification point of each object is calculated firstly, then the single-object image fingerprints are generated according to the object images, the coordinates of the object identification points, the included angle of the object identification points and the object names, and then the single-image frame fingerprints are generated according to the generated single-object image fingerprints.

And S41, calculating the object identification point included angle of the target object identification point coordinate in the image frame to be recognized corresponding to the target object identification point coordinate by adopting a preset included angle calculation rule, and taking the calculated object identification point included angle as the target object identification point included angle.

Optionally, the preset included angle calculation rule is as follows: and calculating an included angle between the connecting line and the positive direction of the x axis of the image frame by using the connecting line of the object identification point coordinates and the original point.

Optionally, the preset included angle calculation rule is as follows: and calculating an included angle between the connecting line and the positive direction of the y axis of the image frame by using the connecting line of the object identification point coordinates and the original point.

For S42, acquiring any one of the image frames to be recognized, and taking the acquired image frame to be recognized as an image frame to be analyzed.

For S43, any one of the target object images is acquired from each of the target object images corresponding to the image frame to be analyzed, and the acquired target object image is taken as an object image to be processed.

For S44, a preset single-object image fingerprint generation specification is adopted, single-object image fingerprint generation is carried out according to the to-be-processed object image and the target object identification point coordinates, the target object identification point included angle and the target object name corresponding to the to-be-processed object image, and the generated single-object image fingerprint is used as the target single-object image fingerprint corresponding to the to-be-processed object image.

The preset single object image fingerprint generation specification is as follows: { object name, object identification point coordinates, object image, object identification point included angle }. That is, the single object image fingerprint includes: : { object name, object identification point coordinates, object image, object identification point included angle }.

For S45, the steps S43 to S44 are repeatedly executed until the acquisition of each target object image corresponding to the image frame to be analyzed is completed. And when the acquisition of each target object image corresponding to the image frame to be analyzed is finished, generating a single object image fingerprint for each object image in the image frame to be analyzed is finished.

For S46, a single object image fingerprint set is generated from each of the target single object image fingerprints, and the generated single object image fingerprint set is used as the target single object image fingerprint set.

And S47, generating a single image frame fingerprint according to the time axis position corresponding to the image frame to be analyzed and the target single object image fingerprint set by adopting a preset single image frame fingerprint generation standard, and taking the generated single image frame fingerprint as the single image frame fingerprint to be judged corresponding to the image frame to be analyzed.

The preset single image frame fingerprint generation specification is as follows: [ time axis position, { object name, object identification point coordinates, object image, object identification point angle } … … { object name, object identification point coordinates, object image, object identification point angle } ]. That is, a single image frame fingerprint includes: : [ time axis position, { object name, object identification point coordinates, object image, object identification point angle } … … { object name, object identification point coordinates, object image, object identification point angle } ].

For S48, the steps S42 to S48 are repeatedly executed until the acquisition of the image frame to be recognized is completed. When the acquisition of the image frames to be identified is completed, it means that video fingerprints are generated in a plurality of image frames to be identified corresponding to the video to be identified.

In an embodiment, the step of performing similar video matching in the acquired preset video fingerprint database according to each single image frame fingerprint to be determined to obtain a target similar video corresponding to the video to be identified includes:

s51: acquiring any one of the single image frame fingerprints to be judged as a target single image frame fingerprint;

s52: acquiring the preset video fingerprint database;

s53: similarity calculation is carried out on the target single image frame fingerprint and each single image frame fingerprint in the preset video fingerprint database, and similarity to be evaluated is obtained;

s54: finding out a value larger than the acquired preset threshold value from each similarity to be evaluated to obtain a candidate similarity set;

s55: finding out the similarity to be evaluated with the maximum value from the candidate similarity set to obtain a target similarity;

s56: taking the single image frame fingerprint corresponding to the target similarity in the preset video fingerprint library as a single image frame fingerprint to be processed;

s57: repeatedly executing the step of acquiring any one of the to-be-judged single image frame fingerprints as a target single image frame fingerprint until the acquisition of each to-be-judged single image frame fingerprint is completed;

s58: according to the single image frame fingerprints to be processed, similarity calculation is carried out on each single image frame fingerprint set in the preset video fingerprint database to obtain video similarity to be analyzed;

s59: and determining the target similar video corresponding to the video to be identified in the preset video fingerprint database according to the video similarity to be analyzed and the acquired preset video similarity threshold.

According to the embodiment, similar video matching is carried out on the single image frame fingerprint to be judged in the acquired preset video fingerprint database, and the method for generating the single image frame fingerprint is simple, good in practicability and beneficial to commercialization.

And S51, acquiring any one of the fingerprints of the single image frame to be judged, and taking the acquired fingerprint of the single image frame to be judged as the fingerprint of the target single image frame.

For S52, the preset video fingerprint database may be obtained from a database, or may be obtained from a third-party application system.

For step S53, similarity calculation of each object image is performed on the target single image frame fingerprint and each single image frame fingerprint in the preset video fingerprint library, the similarity between the target single image frame fingerprint and one single image frame fingerprint in the preset video fingerprint library is determined according to the calculated similarity of each object image, and the determined similarity is used as the similarity to be evaluated. That is, the similarity to be evaluated is the similarity between two single image frame fingerprints.

For step S54, a value greater than the obtained preset threshold is found from each similarity to be evaluated, and each similarity to be evaluated corresponding to each found value is taken as the candidate similarity set.

The candidate similarity set may include 0 similarity to be evaluated, 1 similarity to be evaluated, and multiple similarities to be evaluated.

For step S55, the similarity to be evaluated with the largest value is found from the candidate similarity set, and the found similarity to be evaluated is taken as a target similarity.

For S56, the single image frame fingerprint corresponding to the target similarity in the preset video fingerprint library is taken as a single image frame fingerprint to be processed, so that a similar single image frame fingerprint of the target single image frame fingerprint is found from the preset video fingerprint library.

For S57, the steps S51 to S57 are repeatedly executed until the acquisition of the fingerprint of each to-be-determined single image frame is completed. And when the acquisition of the fingerprints of the single image frames to be judged is finished, finding out similar fingerprints of the single image frames to be identified corresponding to the videos to be identified from the preset video fingerprint database.

And S58, performing weighted summation calculation on the target similarity corresponding to each single image frame fingerprint to be processed corresponding to the same standard video fingerprint, and taking the data obtained through weighted summation calculation as the video similarity to be analyzed corresponding to the standard video fingerprint. That is, the similarity of the video to be analyzed is the similarity between the standard video fingerprint and the video to be identified.

For step S59, finding out a value greater than the preset video similarity threshold from the video similarities to be analyzed, and taking the video similarities to be analyzed corresponding to the found values as a candidate video similarity set; finding out the largest video similarity to be analyzed from the candidate video similarity set, and taking the found video similarity to be analyzed as a target video similarity; taking a video identifier corresponding to the similarity of the target video in the preset video fingerprint database as a target video identifier; and taking the video corresponding to the target video identification as the target similar video corresponding to the video to be identified.

In an embodiment, before the step of obtaining the preset video fingerprint database, the method further includes:

s521: acquiring a plurality of videos to be analyzed;

s522: acquiring any one of the videos to be analyzed as a video to be processed;

s523: acquiring image frames from the video to be processed by adopting the acquired preset sampling proportion to obtain a plurality of standard image frames;

s524: acquiring any one of the standard image frames as an image frame to be processed;

s525: respectively carrying out object image identification and object name classification prediction on the image frame to be processed by adopting the preset image identification and name prediction model to obtain each object image to be analyzed and each object name to be analyzed corresponding to the image frame to be processed;

s526: according to the image frames to be processed, carrying out coordinate calculation of the preset identification points on each object image to be analyzed to obtain the coordinates of the object identification points to be analyzed corresponding to each object image to be analyzed;

s527: performing single image frame fingerprint generation according to the identification point coordinates of each object to be analyzed, each image of the object to be analyzed and each name of the object to be analyzed to obtain standard single image frame fingerprints;

s528: repeatedly executing the step of acquiring any one standard image frame as an image frame to be processed until the acquisition of each standard image frame is completed;

s529: generating a video fingerprint according to the preset sampling proportion and each standard single image frame fingerprint to obtain a standard video fingerprint corresponding to the image frame to be processed;

s5210: and repeatedly executing the step of acquiring any one of the videos to be analyzed as the video to be processed until the acquisition of each video to be analyzed is completed, and generating the preset video fingerprint database according to each standard video fingerprint.

According to the embodiment, single image frame fingerprint generation is carried out according to the identification point coordinates of each object to be analyzed, each object image to be analyzed and each object name to be analyzed, and the video fingerprint is generated according to the preset sampling proportion and each standard single image frame fingerprint.

For S521, a plurality of videos to be analyzed input by the user may be acquired, or a plurality of videos to be analyzed may be acquired from a third-party application system.

The video to be analyzed, that is, the video from which the video fingerprint needs to be extracted.

For step S522, any one of the videos to be analyzed is acquired, and the acquired video to be analyzed is used as a video to be processed.

For step S523, the preset sampling ratio may be obtained from the database, the preset sampling ratio may also be obtained from the third-party application system, and the preset sampling ratio may also be written in the program file implementing the present application.

Optionally, the predetermined sampling ratio is a time interval.

Optionally, the preset sampling ratio is a ratio of the number of extracted image frames to the total number of image frames in the video.

The method comprises the steps of obtaining image frames from the video to be processed by adopting an obtained preset sampling proportion and an average extraction method, and taking each obtained image frame as a standard image frame.

For S524, acquiring any one of the standard image frames, and taking the acquired standard image frame as an image frame to be processed.

For step 525, a preset image recognition and name prediction model is adopted to perform object image recognition on the image frame to be processed, so as to obtain each object image to be analyzed corresponding to the image frame to be processed; and carrying out object name classification prediction on the object image to be analyzed by adopting a preset image recognition and name prediction model to obtain the object name to be analyzed.

For step S526, performing coordinate calculation of a preset identification point on each object image to be analyzed corresponding to the image frame to be processed, to obtain the coordinates of the identification point of the object to be analyzed corresponding to each object image to be analyzed.

And S527, adopting a preset included angle calculation rule to calculate an included angle of each to-be-analyzed object identification point coordinate to obtain a standard object identification point included angle.

Generating a standard by adopting a preset single-object image fingerprint, and generating a single-object image fingerprint according to the to-be-analyzed object image, the to-be-analyzed object identification point coordinate corresponding to the to-be-analyzed object image, the standard object identification point included angle and the to-be-analyzed object name to obtain a single-object image fingerprint to be analyzed; generating a single object image fingerprint set according to each single object image fingerprint to be analyzed, and taking the generated single object image fingerprint set as the single object image fingerprint set to be analyzed; and generating a single image frame fingerprint according to the time axis position corresponding to the image frame to be processed and the single object image fingerprint set to be analyzed by adopting a preset single image frame fingerprint generation standard, and taking the generated single image frame fingerprint as the standard single image frame fingerprint corresponding to the image frame to be processed.

For S528, steps S524 to S528 are repeatedly executed until the acquisition of each standard image frame is completed. When the acquisition of each standard image frame is completed, the generation of the single image frame fingerprint of each standard image frame corresponding to the video to be processed is completed.

And S529, generating a video fingerprint according to the preset sampling proportion and each standard single image frame fingerprint by adopting a preset video fingerprint generation standard, and taking the generated video fingerprint as a standard video fingerprint corresponding to the image frame to be processed.

The preset video fingerprint generation specification is as follows: video identification, sampling proportion, [ time axis position, { object name, object identification point coordinate, object image, object identification point included angle } … … { object name, object identification point coordinate, object image, object identification point included angle } ] … … [ time axis position, { object name, object identification point coordinate, object image, object identification point included angle } … … { object name, object identification point coordinate, object image, object identification point included angle } ]. That is, the video fingerprint includes: : video identification, sampling proportion, [ time axis position, { object name, object identification point coordinate, object image, object identification point included angle } … … { object name, object identification point coordinate, object image, object identification point included angle } ] … … [ time axis position, { object name, object identification point coordinate, object image, object identification point included angle } … … { object name, object identification point coordinate, object image, object identification point included angle } ].

The video identifier may be a video name, a video ID, or the like, which uniquely identifies a video.

For S5210, steps S522 to S5210 are repeatedly performed until the acquisition of each of the videos to be analyzed is completed. When the acquisition of each video to be analyzed is completed, the generation of the video fingerprint of each video to be analyzed is completed, and therefore the preset video fingerprint database is generated according to each standard video fingerprint.

In an embodiment, the step of calculating the similarity between the target single image frame fingerprint and each single image frame fingerprint in the preset video fingerprint database to obtain the similarity to be evaluated includes:

s531: acquiring one single image frame fingerprint from the preset video fingerprint database to serve as a single image frame fingerprint to be matched;

s532: acquiring any object name from the target single image frame fingerprint as an object name to be identified;

s533: carrying out object name matching on the object name to be identified in the single image frame fingerprint to be matched to obtain a matching result;

s534: when the matching result is successful, performing difference calculation on the object identification point included angle corresponding to the object name to be identified and the object identification point included angle corresponding to the matching result in the single image frame fingerprint to be matched to obtain a difference value of the included angles to be analyzed;

s535: when the matching result is failure, taking the obtained preset included angle difference value as the included angle difference value to be analyzed;

s536: repeatedly executing the step of acquiring any object name from the target single image frame fingerprint as the object name to be identified until the acquisition of each object name in the target single image frame fingerprint is completed;

s537: determining the similarity to be evaluated corresponding to the fingerprint of the single image frame to be matched according to the difference value of the included angles to be analyzed;

s538: and repeatedly executing the step of acquiring one single image frame fingerprint from the preset video fingerprint database as the single image frame fingerprint to be matched until the acquisition of each single image frame fingerprint in the preset video fingerprint database is completed.

In the embodiment, the included angle difference value is calculated based on the object name and the included angle of the object identification point, the similarity between the fingerprints of the single image frames is determined according to the calculated included angle difference value, and the method for generating the similarity between the fingerprints of the single image frames is simple, good in practicability and beneficial to commercialization.

And S531, acquiring one single image frame fingerprint from the preset video fingerprint library, and taking the acquired single image frame fingerprint as a single image frame fingerprint to be matched.

For step S532, any object name is obtained from the target single image frame fingerprint, and the obtained object name is used as the object name to be identified.

For step S533, the object name to be identified is subjected to object name matching in the single image frame fingerprint to be matched, and when the object name is matched, the matching result is determined to be successful, otherwise, the matching result is determined to be failed.

And S534, when the matching result is successful, performing difference calculation on the object identification point included angle corresponding to the object name to be identified and the object identification point included angle corresponding to the matching result in the single image frame fingerprint to be matched, and taking the calculated difference as the difference of the included angle to be analyzed.

For step S535, when the matching result is failure, the obtained preset included angle difference is used as the included angle difference to be analyzed, so as to facilitate the similarity calculation between two single image frame fingerprints accurately according to the included angle between each object name and each object identification point.

For S536, the steps S532 to S536 are repeatedly executed until the acquisition of each object name in the target single image frame fingerprint is completed. When the acquisition of the object names in the target single image frame fingerprint is completed, the matching of each object in the target single image frame fingerprint and the single image frame fingerprint to be matched is completed.

And S537, performing weighted summation on the difference values of the included angles to be analyzed, and taking data obtained by weighted summation as the similarity to be evaluated corresponding to the fingerprint of the single image frame to be matched.

And S538, repeating the step S531 to the step S538 until the acquisition of each single image frame fingerprint in the preset video fingerprint library is completed. When the acquisition of each single image frame fingerprint in the preset video fingerprint library is completed, the calculation of the similarity between the target single image frame fingerprint and each single image frame fingerprint in the preset video fingerprint library is completed.

In an embodiment, the step of finding out a value greater than the obtained preset threshold from each of the similarities to be evaluated to obtain a candidate similarity set includes:

s541: finding out a value larger than the acquired preset threshold value from each similarity to be evaluated to obtain a similarity set to be processed;

s542: acquiring any one of the similarity to be evaluated from the similarity set to be processed as the similarity to be analyzed;

s543: according to each object image corresponding to the similarity to be analyzed in the preset video fingerprint library and each object image corresponding to the target single image frame fingerprint, carrying out shape similarity judgment on object images with the same object name to obtain a shape similarity judgment result;

s544: repeatedly executing the step of obtaining any one of the similarities to be evaluated from the similarity sets to be processed as the similarity to be analyzed until the obtaining of each similarity to be evaluated in the similarity sets to be processed is completed;

s545: and taking each similarity to be evaluated corresponding to each successful shape similarity judgment result as the candidate similarity set.

According to the method and the device, the similarity set to be processed is determined based on the similarity to be evaluated, and then the similarity of the shape of the object image is adopted to calculate the similarity of the object, so that the accuracy of the similarity of the fingerprints of the single image frame is improved.

For step S541, a value greater than the obtained preset threshold is found from each similarity to be evaluated, and the similarity to be evaluated corresponding to each found value is used as a similarity set to be processed.

For step S542, any one of the similarities to be evaluated is obtained from the similarity set to be processed, and the obtained similarity to be evaluated is used as the similarity to be analyzed.

For step S543, according to the object images corresponding to the similarity to be analyzed in the preset video fingerprint library and the object images corresponding to the target single image frame fingerprint, performing shape similarity determination on object images with the same object name, when the shape similarity of two object images corresponding to each object name (that is, the object images corresponding to the similarity to be analyzed in the preset video fingerprint library and the object images corresponding to the target single image frame fingerprint) satisfies a preset shape similarity threshold, determining that the shape similarity determination result is successful, otherwise, determining that the shape similarity determination result is failed.

For S544, step S542 to step S544 are repeatedly executed until the acquisition of each to-be-evaluated similarity in the to-be-processed similarity set is completed. When the acquisition of each to-be-evaluated similarity in the to-be-processed similarity set is completed, the shape similarity judgment of the object image with the same object name is performed on the single image frame fingerprint corresponding to each to-be-evaluated similarity in the to-be-processed similarity set and the target single image frame fingerprint.

For S545, the similarity to be evaluated corresponding to each successful shape similarity determination result is used as the candidate similarity set, so that the accuracy of the similarity in the candidate similarity set is improved.

Referring to fig. 2, the present application also provides a video matching apparatus based on video fingerprints, the apparatus including:

a data obtaining module 100, configured to obtain a plurality of image frames to be identified corresponding to a video to be identified;

an object image recognition and object name classification prediction module 200, configured to perform object image recognition and object name classification prediction on each image frame to be recognized by using a preset image recognition and name prediction model, respectively, to obtain each target object image and each target object name corresponding to each image frame to be recognized;

a target object identification point coordinate determination module 300, configured to perform coordinate calculation of a preset identification point on each target object image corresponding to each image frame to be recognized, so as to obtain a target object identification point coordinate corresponding to each target object image;

a to-be-determined single-image-frame fingerprint determining module 400, configured to perform single-image-frame fingerprint generation according to the coordinates of the identification point of each target object, each target object image, and each target object name corresponding to each to-be-determined image frame, so as to obtain a to-be-determined single-image-frame fingerprint;

and the target similar video determining module 500 is configured to perform similar video matching in the acquired preset video fingerprint database according to each single image frame fingerprint to be determined, so as to obtain a target similar video corresponding to the video to be identified.

Referring to fig. 3, a computer device, which may be a server and whose internal structure may be as shown in fig. 3, is also provided in the embodiment of the present application. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the computer designed processor is used to provide computational and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The memory provides an environment for the operation of the operating system and the computer program in the non-volatile storage medium. The database of the computer equipment is used for storing data such as video matching methods based on video fingerprints. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a video matching method based on video fingerprints. The video matching method based on the video fingerprints comprises the following steps: acquiring a plurality of image frames to be identified corresponding to a video to be identified; respectively carrying out object image identification and object name classification prediction on each image frame to be identified by adopting a preset image identification and name prediction model to obtain each target object image and each target object name corresponding to each image frame to be identified; performing coordinate calculation of a preset identification point on each target object image corresponding to each image frame to be recognized to obtain a target object identification point coordinate corresponding to each target object image; generating a single image frame fingerprint according to the coordinates of each target object identification point corresponding to each image frame to be identified, each target object image and each target object name to obtain a single image frame fingerprint to be judged; and performing similar video matching in the acquired preset video fingerprint database according to each single image frame fingerprint to be judged to obtain a target similar video corresponding to the video to be identified.

An embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements a video matching method based on video fingerprints, including the steps of: acquiring a plurality of image frames to be identified corresponding to a video to be identified; respectively carrying out object image identification and object name classification prediction on each image frame to be identified by adopting a preset image identification and name prediction model to obtain each target object image and each target object name corresponding to each image frame to be identified; performing coordinate calculation of a preset identification point on each target object image corresponding to each image frame to be recognized to obtain a target object identification point coordinate corresponding to each target object image; generating a single image frame fingerprint according to the coordinates of each target object identification point corresponding to each image frame to be identified, each target object image and each target object name to obtain a single image frame fingerprint to be judged; and performing similar video matching in the acquired preset video fingerprint database according to each single image frame fingerprint to be judged to obtain a target similar video corresponding to the video to be identified.

The video matching method based on the video fingerprints comprises the steps of firstly obtaining a plurality of image frames to be recognized corresponding to a video to be recognized, secondly adopting a preset image recognition and name prediction model to respectively carry out object image recognition and object name classification prediction on each image frame to be recognized to obtain each target object image and each target object name corresponding to each image frame to be recognized, carrying out coordinate calculation of a preset identification point on each target object image corresponding to each image frame to be recognized to obtain a target object identification point coordinate corresponding to each target object image, and secondly carrying out single-image-frame fingerprint generation according to each target object identification point coordinate corresponding to each image frame to be recognized, each target object image and each target object name to obtain a single-image-frame fingerprint to be judged, and finally, performing similar video matching in the acquired preset video fingerprint database according to the fingerprints of the single image frames to be judged to obtain a target similar video corresponding to the video to be identified. Therefore, the single image frame fingerprint is generated based on the object identification point coordinates, the object image and the object name, similar video matching is carried out in the preset video fingerprint database according to the generated single image frame fingerprint, and the method for generating the single image frame fingerprint is simple, good in practicability and beneficial to commercialization.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium provided herein and used in the examples may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double-rate SDRAM (SSRSDRAM), Enhanced SDRAM (ESDRAM), synchronous link (Synchlink) DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and bus dynamic RAM (RDRAM).

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.

The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application, or which are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims

1. A video matching method based on video fingerprints is characterized by comprising the following steps:

2. The video matching method based on video fingerprints as claimed in claim 1, wherein the step of obtaining a plurality of image frames to be identified corresponding to the video to be identified comprises:

acquiring the video to be identified;

acquiring an extraction time range according to the video to be identified;

3. The video matching method based on video fingerprints according to claim 1, wherein the step of generating a single-image-frame fingerprint according to the coordinates of the identification point of each target object, the image of each target object and the name of each target object corresponding to each image frame to be identified to obtain the single-image-frame fingerprint to be determined includes:

acquiring any image frame to be identified as an image frame to be analyzed;

4. The video matching method based on video fingerprints according to claim 1, wherein the step of performing similar video matching in an acquired preset video fingerprint library according to each single image frame fingerprint to be determined to obtain a target similar video corresponding to the video to be identified comprises:

acquiring the preset video fingerprint database;

5. The video matching method based on video fingerprint according to claim 4, wherein said step of obtaining said preset video fingerprint library further comprises:

acquiring a plurality of videos to be analyzed;

acquiring any one of the videos to be analyzed as a video to be processed;

6. The video matching method based on video fingerprints according to claim 4, wherein the step of calculating the similarity between the target single-image-frame fingerprint and each single-image-frame fingerprint in the preset video fingerprint database to obtain the similarity to be evaluated comprises:

7. The video matching method based on video fingerprints as claimed in claim 4, wherein the step of finding out a value greater than an acquired preset threshold value from each of the similarities to be evaluated to obtain a candidate similarity set comprises:

8. An apparatus for video matching based on video fingerprints, the apparatus comprising:

9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.