CN114282052A

CN114282052A - Video image positioning method and system based on frame characteristics

Info

Publication number: CN114282052A
Application number: CN202111599413.2A
Authority: CN
Inventors: 王晶
Original assignee: Space Shichuang Chongqing Technology Co ltd
Current assignee: Space Shichuang Chongqing Technology Co ltd
Priority date: 2021-12-24
Filing date: 2021-12-24
Publication date: 2022-04-05

Abstract

The invention relates to the technical field of image data processing, in particular to a video image positioning method and a system based on frame characteristics, wherein the method comprises the following steps: video analysis step: splitting a video to be matched into video frames, extracting image features from the video frames, and storing the image features; an image processing step: acquiring a target image, segmenting the target image to generate a target main body image, and extracting target features from the target main body image; an image matching step: matching with the image features in sequence according to the target features, and calculating similarity; image screening and pushing: and screening a pushed image from the video frame according to the similarity, and acquiring and pushing image information according to the pushed image. By adopting the scheme, the technical problem of lower image matching accuracy in the prior art can be solved.

Description

Video image positioning method and system based on frame characteristics

Technical Field

The invention relates to the technical field of image data processing, in particular to a video image positioning method and system based on frame characteristics.

Background

Multimedia is rapidly developed, so that various videos are presented to people, favorite or interested pictures can be stored and shared in a screenshot mode in the process of watching the videos by a user, and when images are subsequently watched, the idea of watching the complete videos or the source of the images is often presented. In the prior art, a commonly adopted method is to match a screenshot with a video frame in a video, so as to know the source of the screenshot. However, considerable noise exists in the screenshot, and the noise is equivalent to interference information, and the accuracy of the matching process is affected by the interference information, so that the video frame corresponding to the screenshot cannot be accurately matched, and the video source cannot be accurately acquired.

Disclosure of Invention

One of the objectives of the present invention is to provide a video image positioning method based on frame characteristics, so as to solve the technical problem of low image matching accuracy in the prior art.

The invention provides a basic scheme I: the video image positioning method based on the frame characteristics comprises the following steps:

video analysis step: splitting a video to be matched into video frames, extracting image features from the video frames, and storing the image features;

an image processing step: acquiring a target image, segmenting the target image to generate a target main body image, and extracting target features from the target main body image;

an image matching step: matching with the image features in sequence according to the target features, and calculating similarity;

image screening and pushing: and screening a pushed image from the video frame according to the similarity, and acquiring and pushing image information according to the pushed image.

The beneficial effects of the first basic scheme are as follows:

the target image is an image from which a user wants to know a source, the target image is segmented through an image processing step to obtain a target main body image, the target main body image comprises key information in the target image, and noise in the target image is removed through segmenting the target image to reduce interference information in a matching process. The video analysis step and the image processing step respectively extract target features and image features, the target features and the image features are matched in the image matching step, and the matching speed is higher in a mode of extracting the features for matching.

And screening the video frames according to the similarity through image screening and pushing steps so as to obtain a pushed image similar to the target image, further obtaining image information of the pushed image for pushing, and obtaining the position of the pushed image in the video through the image information. Compared with the prior art, the scheme is adopted, the interference in the matching process is reduced through the segmentation processing of the target image, and the accuracy of image matching is improved, so that the target image is accurately positioned in the video.

Further, segmenting the target image to generate a target subject image, specifically including the following contents:

carrying out contour recognition on the target image; and screening out the target contour from the identified contours according to a preset screening condition to be used as a target main body image.

Has the advantages that: in most images, the effective information is concentrated in each figure in the image. When the target image is divided, each figure in the target image is obtained through outline recognition, and the identified outline is screened through screening conditions, so that the target main body image, namely effective information in the image is obtained.

Further, the image features are extracted from the video frame, and the method specifically comprises the following steps:

preprocessing a video frame;

acquiring low-frequency signals of all pixel points in the preprocessed video frame, and assigning values to all pixel points in the video frame according to the low-frequency signals;

and generating image characteristics according to the assignment result.

Has the advantages that: when the image features are extracted, assignment is carried out on each pixel point according to the low-frequency signals of the video frames, and therefore the image features are obtained.

Further, matching with the image features in sequence according to the target features, and calculating the similarity, specifically comprising the following contents:

and sequentially calculating the Hamming distance of the target feature and the image feature, wherein the Hamming distance is the similarity.

Has the advantages that: and (4) obtaining the distance between the target feature and the image feature by calculating the Hamming distance, so as to obtain the similarity between the target image and the video frame.

Further, the image information includes a time point and a similarity, and the format of the image information is json.

Has the advantages that: the time point is the time point of the video frame in the corresponding video, the similarity is the similarity between the video frame and the target image, and the image information is convenient for a user to quickly position and push the image in the corresponding video, so that whether the push image and the target image are the same image or not can be checked and confirmed. The image information adopts a json format, so that the reading and writing speed is higher, the use is easy, and the compatibility is wider.

It is another object of the present invention to provide a video image positioning system based on frame characteristics.

The invention provides a second basic scheme: a video image localization system based on frame features, comprising:

the video splitting module is used for splitting a video to be matched into video frames;

further comprising:

the target segmentation module is used for segmenting the acquired target image to generate a target main body image;

the characteristic extraction module is used for extracting image characteristics from the video frame and extracting target characteristics from a target main body image;

the similarity calculation module is used for matching with the image features in sequence according to the target features and calculating the similarity;

and the information screening module is used for screening the pushed images from the video frames according to the similarity and acquiring image information according to the pushed images.

The second basic scheme has the beneficial effects that:

the target image is an image from which a user wants to know a source, the image segmentation module is arranged to segment the target image to obtain a target main body image, the target main body image comprises key information in the target image, and noise in the target image is removed by segmenting the target image to reduce interference information in a matching process.

And the feature extraction module is arranged for respectively extracting target features and image features from the target image and the split video frame. The setting of the similarity calculation module is that the matching is carried out through the target characteristics and the image characteristics, and the matching speed is higher through a mode of extracting the characteristics for matching.

And the information screening module is arranged for screening the video frames according to the similarity so as to obtain a pushed image similar to the target image, further obtain the image information of the pushed image for pushing, and know the position of the pushed image in the video through the image information. Compared with the prior art, the scheme is adopted, the interference in the matching process is reduced through the segmentation processing of the target image, and the accuracy of image matching is improved, so that the target image is accurately positioned in the video.

Further, the target segmentation module is used for carrying out contour recognition on the obtained target image and screening out the target contour from the recognized contour according to a preset screening condition to be used as a target main body image.

Has the advantages that: in most images, the effective information is concentrated in each figure in the image. The image segmentation module is arranged to obtain each figure in the target image through contour recognition when the target image is segmented, and filter the identified contour through a filtering condition so as to obtain the target main body image, namely effective information in the image.

Further, the feature extraction module is used for preprocessing the video frame; and the image processing device is also used for acquiring low-frequency signals of all pixel points in the preprocessed video frame, assigning values to all the pixel points in the video frame according to the low-frequency signals, and generating image characteristics according to assignment results.

Has the advantages that: and when the characteristic extraction module extracts the image characteristics, the low-frequency signals of the pixel points are used for re-assigning the values of the pixel points, so that the image characteristics are obtained.

Further, the similarity calculation module is used for calculating the Hamming distances of the target features and the image features in sequence, and the Hamming distances are similarity.

Has the advantages that: and the similarity calculation module is arranged for calculating the Hamming distance to obtain the distance between the target characteristic and the image characteristic so as to obtain the similarity between the target image and the video frame.

Drawings

FIG. 1 is a logic diagram of an embodiment of a video image positioning system based on frame features according to the present invention.

Detailed Description

The following is further detailed by way of specific embodiments:

examples

The video image positioning method based on the frame characteristics comprises the following steps:

video analysis step: and splitting the video to be matched into video frames, extracting image features from the video frames, and storing the image features. In other embodiments, when the image to be positioned is an image stored in the screenshot of the user, the video to be matched is the video browsed by the user.

Extracting image features from the video frame, specifically including the following: preprocessing a video frame; acquiring low-frequency signals of all pixel points in the preprocessed video frame, and assigning values to all pixel points in the video frame according to the low-frequency signals; and generating image characteristics according to the assignment result. Specifically, an image size is preset, and the video frame is zoomed according to the image size to obtain a video frame with the same size as the image size; converting the zoomed video frame into a gray image; calling preset discrete cosine transform, calculating and generating a DCT matrix according to the gray level image, wherein the DCT matrix comprises DCT values of all pixel points, calculating DCT average values of the gray level image according to the DCT matrix, assigning values according to the DCT values of all the pixel points and the DCT average values, assigning the hash value of each pixel point to be 1 when the DCT value of each pixel point is greater than or equal to the DCT average value, and assigning the hash value of each pixel point to be 0 when the DCT value of each pixel point is less than the DCT average value; the hash code is generated according to the assignment result of each pixel point, in this embodiment, the hash values are combined into an integer from left to right from top to bottom, the integer is the hash code, and the hash code is stored by using an npy file, that is, the hash code is the image feature. The npy file refers to a numpy specific binary file, and numpy is a package for data processing in python.

An image processing step: and acquiring a target image, segmenting the target image to generate a target main body image, and extracting target features from the target main body image.

Segmenting the target image to generate a target subject image, which specifically comprises the following contents: the method comprises the steps of presetting a screening condition, wherein the screening condition is the outline with the largest area, namely the outline with the largest pixel points. Carrying out contour recognition on the target image; and screening a target contour from the identified contours according to a preset screening condition to serve as a target main body image, namely counting pixel points of the identified contour, and screening the contour with the most pixel points to serve as the target contour.

Extracting target features from a target subject image, which is the same as extracting image features from a video frame, specifically includes the following contents: preprocessing a target subject image; acquiring low-frequency signals of all pixel points in the preprocessed target main body image, and assigning values to all pixel points in the target main body image according to the low-frequency signals; and generating the target characteristics according to the assignment result. Specifically, an image size is preset, and the target subject image is zoomed according to the image size to obtain the target subject image with the same size as the image size; converting the zoomed target subject image into a gray image; calling preset discrete cosine transform, calculating and generating a DCT matrix according to the gray level image, wherein the DCT matrix comprises DCT values of all pixel points, calculating DCT average values of the gray level image according to the DCT matrix, assigning values according to the DCT values of all the pixel points and the DCT average values, assigning the hash value of each pixel point to be 1 when the DCT value of each pixel point is greater than or equal to the DCT average value, and assigning the hash value of each pixel point to be 0 when the DCT value of each pixel point is less than the DCT average value; according to the assignment result of each pixel point, a hash code is generated, in this embodiment, the hash values are combined into an integer from left to right and from top to bottom, binary storage is adopted, the binary storage is the hash code, and the hash code is stored by adopting an npy file, namely the target feature.

An image matching step: and matching the target characteristics with the image characteristics in sequence to calculate the similarity. The image matching step specifically comprises the following steps: and sequentially calculating the Hamming distances of the target features and the image features, namely calculating the Hamming distances between the hash codes corresponding to the target features and the hash codes corresponding to the image features, and representing the difference of the two hash codes through the Hamming distances, namely the Hamming distances are the similarity of the target features and the image features.

Image screening and pushing: and screening a pushed image from the video frame according to the similarity, and acquiring and pushing image information according to the pushed image. The image information comprises time points and similarity, and the format of the image information is json. Specifically, the video frames are sorted from high to low according to the similarity, a corresponding number of video frames are screened out according to a preset push threshold value to serve as push images, time points of the video frames corresponding to the push images in the video are obtained according to the push images, image information corresponding to the push images is generated according to the time points and the similarity, and the push information is integrated into a json format. In this embodiment, the push threshold is 4, i.e. the video frame with the top 4 bits of the sorting order is filtered out as the push image.

The video image positioning system based on the frame characteristics, which uses the video image positioning method based on the frame characteristics, as shown in fig. 1, includes a video splitting module, a target segmentation module, a characteristic extraction module, a similarity calculation module and an information screening module.

The video splitting module is used for splitting a video to be matched into video frames, extracting image features from the video frames and storing the image features. In other embodiments, when the image to be positioned is an image stored in the screenshot of the user, the video to be matched is the video browsed by the user. Specifically, the video splitting module is used for preprocessing the split video frame. The pretreatment comprises the following steps: presetting an image size, and zooming the video frame according to the image size to obtain a video frame with the same size as the image size; and converting the scaled video frame into a gray scale image. The video splitting module is further used for obtaining low-frequency signals of all pixel points in the preprocessed video frame, assigning values to all the pixel points in the video frame according to the low-frequency signals, and generating image features according to assignment results. Calling preset discrete cosine transform, calculating and generating a DCT matrix according to the gray level image, wherein the DCT matrix comprises DCT values of all pixel points, calculating DCT average values of the gray level image according to the DCT matrix, assigning values according to the DCT values of all the pixel points and the DCT average values, assigning the hash value of each pixel point to be 1 when the DCT value of each pixel point is greater than or equal to the DCT average value, and assigning the hash value of each pixel point to be 0 when the DCT value of each pixel point is less than the DCT average value; the hash code is generated according to the assignment result of each pixel point, in this embodiment, the hash values are combined into an integer from left to right from top to bottom, the integer is the hash code, and the hash code is stored by using an npy file, that is, the hash code is the image feature. The npy file refers to a numpy specific binary file, and numpy is a package for data processing in python.

The target segmentation module is used for segmenting the acquired target image to generate a target main body image. Specifically, the target segmentation module is preset with a screening condition, wherein the screening condition is the contour with the largest area, namely the contour with the largest pixel points. The target segmentation module is used for carrying out contour recognition on the obtained target image, screening out a target contour from the recognized contours according to screening conditions to be used as a target main body image, namely counting pixel points of the recognized contour, and screening out the contour with the most pixel points to be used as the target contour.

The feature extraction module is used for extracting target features from the target main body image. Specifically, the feature extraction module is configured to perform preprocessing on the target subject image, where the preprocessing includes: presetting an image size, and zooming the target subject image according to the image size to obtain the target subject image with the same size as the image size; and converting the scaled target subject image into a gray scale image. The feature extraction module is further used for obtaining low-frequency signals of all pixel points in the preprocessed target main body image and assigning values to all pixel points in the target main body image according to the low-frequency signals; and generating the target characteristics according to the assignment result. Calling preset discrete cosine transform, calculating and generating a DCT matrix according to the gray level image, wherein the DCT matrix comprises DCT values of all pixel points, calculating DCT average values of the gray level image according to the DCT matrix, assigning values according to the DCT values of all the pixel points and the DCT average values, assigning the hash value of each pixel point to be 1 when the DCT value of each pixel point is greater than or equal to the DCT average value, and assigning the hash value of each pixel point to be 0 when the DCT value of each pixel point is less than the DCT average value; according to the assignment result of each pixel point, a hash code is generated, in this embodiment, the hash values are combined into an integer from left to right and from top to bottom, binary storage is adopted, the binary storage is the hash code, and the hash code is stored by adopting an npy file, namely the target feature.

And the similarity calculation module is used for matching with the image characteristics in sequence according to the target characteristics and calculating the similarity. Specifically, the similarity calculation module is used for sequentially calculating the hamming distances of the target features and the image features, namely calculating the hamming distances of the hash codes corresponding to the target features and the image features, and representing the similarities and differences of the two hash codes through the hamming distances, namely representing the similarity between the target features and the image features.

The information screening module is used for screening the pushed images from the video frames according to the similarity, and obtaining and pushing image information according to the pushed images. The image information comprises time points and similarity, and the format of the image information is json. Specifically, the information filtering module is preset with a push threshold, in this embodiment, the push threshold is 4, that is, the video frames with the first 4 bits in the filtering order are used as the push images. The information screening module is used for sorting the video frames from high to low according to the similarity, screening out a corresponding number of video frames as push images according to a preset push threshold, acquiring time points of the video frames corresponding to the push images in the video according to the push images, generating image information corresponding to the push images according to the time points and the similarity, and integrating the push information into a json format.

The foregoing is merely an example of the present invention, and common general knowledge in the field of known specific structures and characteristics is not described herein in any greater extent than that known in the art at the filing date or prior to the priority date of the application, so that those skilled in the art can now appreciate that all of the above-described techniques in this field and have the ability to apply routine experimentation before this date can be combined with one or more of the present teachings to complete and implement the present invention, and that certain typical known structures or known methods do not pose any impediments to the implementation of the present invention by those skilled in the art. It should be noted that, for those skilled in the art, without departing from the structure of the present invention, several changes and modifications can be made, which should also be regarded as the protection scope of the present invention, and these will not affect the effect of the implementation of the present invention and the practicability of the patent. The scope of the claims of the present application shall be determined by the contents of the claims, and the description of the embodiments and the like in the specification shall be used to explain the contents of the claims.

Claims

1. The video image positioning method based on the frame characteristics is characterized by comprising the following steps:

2. The method of claim 1, wherein: segmenting the target image to generate a target subject image, which specifically comprises the following contents:

3. The method of claim 1, wherein: extracting image features from the video frame, specifically including the following:

preprocessing a video frame;

and generating image characteristics according to the assignment result.

4. The method of claim 1, wherein: matching with the image features in sequence according to the target features, and calculating the similarity, wherein the similarity specifically comprises the following contents:

5. The method of claim 1, wherein: the image information comprises time points and similarity, and the format of the image information is json.

6. A video image localization system based on frame features, comprising:

it is characterized by also comprising:

7. The frame feature based video image positioning system of claim 6, wherein: the target segmentation module is used for carrying out contour recognition on the obtained target image and screening out the target contour from the recognized contour according to preset screening conditions to be used as a target main body image.

8. The frame feature based video image positioning system of claim 6, wherein: the characteristic extraction module is used for preprocessing the video frame; and the image processing device is also used for acquiring low-frequency signals of all pixel points in the preprocessed video frame, assigning values to all the pixel points in the video frame according to the low-frequency signals, and generating image characteristics according to assignment results.

9. The frame feature based video image positioning system of claim 6, wherein: the similarity calculation module is used for calculating the Hamming distances of the target features and the image features in sequence, and the Hamming distances are similarity.

10. The frame feature based video image positioning system of claim 6, wherein: the image information comprises time points and similarity, and the format of the image information is json.