CN111144241A

CN111144241A - Target identification method and device based on image verification and computer equipment

Info

Publication number: CN111144241A
Application number: CN201911278754.2A
Authority: CN
Inventors: 岑俊毅; 傅东生
Original assignee: Miracle Intelligent Network Co ltd
Current assignee: Miracle Intelligent Network Co ltd
Priority date: 2019-12-13
Filing date: 2019-12-13
Publication date: 2020-05-12
Anticipated expiration: 2039-12-13
Also published as: CN111144241B

Abstract

The application relates to a target identification method and device based on image verification and computer equipment. The method comprises the following steps: acquiring video data, and extracting multi-frame sample images from the video data; comparing the sample images to obtain a plurality of sample similarities; when the sample similarity is larger than a sample threshold value, determining the sample image as a reference image; verifying an image to be identified in the video data according to the reference image; and when the image similarity between the image to be recognized and the reference image is smaller than an image threshold value, performing target recognition on the image to be recognized to obtain a target area corresponding to the image to be recognized. By adopting the method, the target recognition of the image to be recognized with larger image similarity in the video data can be avoided, the unnecessary image to be recognized is reduced, and the resources consumed by the recognized image are effectively saved.

Description

Target identification method and device based on image verification and computer equipment

Technical Field

The present application relates to the field of computer technologies, and in particular, to a target identification method and apparatus based on image verification, a computer device, and a storage medium.

Background

With the development of computer technology, image recognition technology is widely applied to a plurality of application fields such as face recognition, automatic driving, safety monitoring and the like. Real-time image recognition can continuously and rapidly recognize target objects in images from captured images such as pictures, videos and the like. For example, in the field of security monitoring, multiple frames of monitoring images may be continuously identified, and a target object to be monitored may be identified in each frame of monitoring image.

However, in a scene of real-time image recognition such as security monitoring, there are often cases where there are few target objects in an image, for example, during off-peak hours or at night, there are few or no target objects included in a partial image. It is meaningless to repeatedly identify images that do not include the target object, and continuously identifying each frame of image causes unnecessary waste of resources such as computation.

Disclosure of Invention

In view of the above, it is necessary to provide an image verification-based object recognition method, apparatus, computer device, and storage medium that can save resources in view of the above-described technical problem of wasting resources such as unnecessary operations.

An object recognition method based on image verification, the method comprising:

acquiring video data, and extracting multi-frame sample images from the video data;

comparing the sample images to obtain a plurality of sample similarities;

when the sample similarity is larger than a sample threshold value, determining the sample image as a reference image;

verifying an image to be identified in the video data according to the reference image;

and when the image similarity between the image to be recognized and the reference image is smaller than an image threshold value, performing target recognition on the image to be recognized to obtain a target area corresponding to the image to be recognized.

In one embodiment, the comparing the sample images to obtain a plurality of sample similarities includes:

preprocessing the sample image;

acquiring gray values corresponding to a plurality of pixel points in the processed sample image;

determining characteristic information corresponding to the sample image according to the gray value;

and comparing the characteristic information corresponding to the sample images to obtain the sample similarity between the sample images.

acquiring preset frame images in a plurality of frames of sample images;

comparing the sample image with the preset frame image to obtain a plurality of sample similarities;

the determining the sample image as the reference image when the sample similarity is greater than a sample threshold comprises:

and when any sample similarity in the sample similarities is larger than the sample threshold, determining the preset frame image as a reference image.

In one embodiment, after the comparing the sample images to obtain a plurality of sample similarities, the method further includes:

when the sample similarity smaller than or equal to a sample threshold exists in the plurality of sample similarities, counting the sample image quantity corresponding to the sample similarity smaller than or equal to the sample threshold;

acquiring a corresponding preset time period according to the sample image quantity;

and carrying out target identification on the image to be identified in the video data within the preset time period.

In one embodiment, when there is a sample similarity smaller than a sample threshold among the plurality of sample similarities, the method further includes:

recording the sample image with the sample similarity smaller than or equal to a sample threshold as an image to be identified;

and carrying out target identification on the image to be identified to obtain a target area corresponding to the image to be identified.

In one embodiment, the verifying the image to be identified in the video data according to the reference image includes:

extracting an image behind the sample image from the video data according to a preset sampling rate to serve as an image to be identified;

comparing the image to be identified with the reference image to obtain image similarity;

and when the image similarity is larger than or equal to the image threshold, repeating the step of extracting the image after the sample image from the video data according to a preset sampling rate as the image to be identified.

In one embodiment, the method further comprises:

counting the number of the images to be identified, wherein the image similarity is greater than or equal to an image threshold;

adjusting the preset sampling rate according to the number of the images to be identified;

and extracting the image to be identified from the video data according to the adjusted sampling rate.

An image verification-based object recognition apparatus, the apparatus comprising:

the video acquisition module is used for acquiring video data and extracting multi-frame sample images from the video data;

the reference image determining module is used for comparing the sample images to obtain a plurality of sample similarities; when the sample similarity is larger than a sample threshold value, determining the sample image as a reference image;

the image checking module is used for checking an image to be identified in the video data according to the reference image;

and the image identification module is used for carrying out target identification on the image to be identified when the image similarity between the image to be identified and the reference image is smaller than an image threshold value to obtain a target area corresponding to the image to be identified.

A computer device comprising a memory storing a computer program and a processor implementing the steps of the above-described image verification-based object identification method when the processor executes the computer program.

A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned object identification method based on image verification.

According to the target identification method and device based on image verification, the computer equipment and the storage medium, the multi-frame sample images are extracted from the acquired video data, the multi-frame sample images are compared with each other to obtain the similarity of a plurality of samples, and the reference image is determined from the multi-frame sample images according to the similarity of the samples. And verifying the image to be identified in the video data according to the reference image so as to screen the image to be identified. The target recognition is carried out on the image to be recognized, the image similarity between the image to be recognized and the reference image is smaller than the image threshold value, so that the repeated target recognition of the image to be recognized with larger image similarity in the video data is avoided, the unnecessary image to be recognized is reduced, and the resources consumed by the recognized image are effectively saved.

Drawings

FIG. 1 is a diagram of an exemplary embodiment of an application environment for an image verification-based object recognition method;

FIG. 2 is a schematic flow chart illustrating a method for object recognition based on image verification according to an embodiment;

FIG. 3 is a schematic flowchart illustrating the steps of comparing sample images to obtain similarity of multiple samples according to an embodiment;

FIG. 4 is a block diagram of an embodiment of an apparatus for object recognition based on image verification;

FIG. 5 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The target identification method based on image verification can be applied to the application environment shown in fig. 1. Including at least one monitoring device 102 and a server 104, the monitoring device 102 may communicate with the server 104 via a network. The monitoring device 102 may be disposed in a variety of application environments to capture corresponding video data in a variety of environments. For example, the monitoring device 102 may be used to capture video including, but not limited to, road monitoring video, shop monitoring video, campus monitoring video, indoor monitoring video, and the like. The server 104 may obtain video data collected by the monitoring device 102. In one embodiment, the server 104 may also obtain video data from video capture devices, computers, servers, or the like in other application scenarios. The server 104 extracts a plurality of frames of sample images from the video data. The server 104 compares the sample images to obtain a plurality of sample similarities, and determines the sample images as the reference images when the sample similarities are greater than the sample threshold. The server 104 verifies the image to be recognized in the video data according to the reference image, determines that verification fails when the image similarity between the image to be recognized and the reference image is smaller than the image threshold, and performs target recognition on the image to be recognized to obtain a target area corresponding to the image to be recognized. The monitoring device 102 may include, but is not limited to, various video capture devices and image capture devices, and the server 104 may be implemented by a stand-alone server or a server cluster composed of a plurality of servers.

In one embodiment, as shown in fig. 2, an object recognition method based on image verification is provided, which is described by taking the method as an example applied to the server 104 in fig. 1, and includes the following steps:

step 202, acquiring video data, and extracting multi-frame sample images from the video data.

The server can acquire video data, the video data comprises multi-frame image data, and the server can perform target identification on the multi-frame image data in the video data to obtain a target area where a target object to be identified in the image data is located. Specifically, the server may obtain complete video data sent by the computer or the server, the server may also obtain pre-stored video data from the database, and the server may also obtain video data acquired by the acquisition device in real time. When the server acquires the video data collected in real time, the video data can be transmitted in a video streaming mode. Wherein the video data may be transmitted via a plurality of transmission protocols. For example, the transmission Protocol of the video data may include, but is not limited to, RTSP (Real Time streaming Protocol), RTMP (Real Time Messaging Protocol), and the like. Video data refers to a continuous image sequence, and the video data comprises continuous multiple frames of image data with chronological order. A frame is the smallest visual unit of video data, one image data for each frame in the video data.

The server may analyze the video data after acquiring the video data, to obtain multi-frame image data included in the video data. The server may extract current multi-frame image data from the video data according to the order of the image data, and record the extracted multi-frame image data as a sample image. The number of sample images may be preset according to actual needs.

And step 204, comparing the sample images to obtain a plurality of sample similarities.

The server can judge whether the image contents in the time period corresponding to the multi-frame sample images are similar or not according to the extracted multi-frame sample images. Specifically, the server may compare the extracted multiple frame sample images with each other to obtain a sample similarity between the multiple sample images. The server can compare the multi-frame sample images with each other in one of a plurality of combination modes according to actual requirements. For example, the server may sequence the multiple frames of sample images according to the chronological order of the sample images to obtain a sample image sequence, and the server may compare two adjacent sample images with each other according to the sample image sequence to obtain a plurality of sample similarities. The server can also determine a frame of sample image from the multiple frames of sample images, and compare other multiple frames of sample images with the determined frame of sample image respectively to obtain the similarity of multiple samples. The one-frame sample image determined by the server may be a first frame sample image in the multiple-frame sample images.

The sample similarity between the sample images can be used to indicate the similarity of the image contents corresponding to the two sample images. When the sample similarity between the sample images is large, the image content similarity of the corresponding sample images is high, the change of the real scene in the time period corresponding to the sample images is small, and the image data with the high similarity does not need to be repeatedly identified for many times. For example, in a road monitoring video, no vehicle or pedestrian appears on a monitored road for a period of time, and the corresponding image data has a high degree of similarity. When the sample similarity between the sample images is small, the similarity degree of the image content of the corresponding sample images is low, the change of the real scene in the time period corresponding to the sample images is large, and the changed image data needs to be rapidly identified, so that unnecessary resource consumption, such as operation resources of a server, is avoided when the image identification is rapidly and accurately performed, and the resource cost consumed by the image identification is saved.

And step 206, when the sample similarity is greater than the sample threshold, determining the sample image as the reference image.

The server may compare the sample similarity between sample images to a sample threshold. The sample threshold may be preset by the user according to actual needs, and the sample threshold may be a constant. For example, the sample threshold may be set to 95%. When the sample similarity between the sample images is greater than the sample threshold, the server may determine the sample images as reference images. Specifically, the server may extract a plurality of frame sample images, and the similarity of a plurality of samples may be obtained after the plurality of frame sample images are compared with each other. The server may compare the plurality of sample similarities with sample thresholds, respectively. When any sample similarity in the multiple sample similarities is greater than the sample threshold, it indicates that the similarity between the multiple frame sample images is high, and the environmental content corresponding to the multiple frame sample images does not change greatly, and the server may record the sample image as a reference image. The reference image can be used for comparing the images to be recognized and judging whether the environmental content corresponding to the images to be recognized is changed greatly, so that the images to be recognized are screened.

In one embodiment, when any sample similarity in the multiple sample similarities is smaller than or equal to the sample threshold, it indicates that at least one sample image in the multiple frame sample images has a greater difference from other sample images, and the server may perform target recognition on the extracted multiple frame sample images, and re-extract the sample images from the video data after performing the target recognition on the extracted sample images.

In one embodiment, when the server aligns the sample images, the server may obtain a preset frame image from the extracted multiple frames of sample images. The preset frame image may be a frame sample image at a preset position in an image sequence composed of multiple frame sample images according to a chronological order. The position of the preset frame image in the image sequence can be preset according to actual requirements. For example, the preset frame image may be a first frame sample image in the image sequence, or may be a last frame sample image in the image sequence. The server can call the similarity model, and compare the sample images except the preset frame image of the plurality of frames with the preset frame image one by using the similarity model to obtain the sample similarity between the plurality of sample images and the preset frame image. The similarity model may be pre-trained and configured in the server. The server may compare the plurality of sample similarities to a sample threshold in sequence. The server may also compare the sample similarity with the sample threshold every time the sample similarity is obtained, and when the sample similarity is greater than the sample threshold, continue to compare the sample similarity between the next sample image and the preset frame image until the comparison between all the sample images except the preset frame image and the preset frame image is completed. When any one of the plurality of sample similarities is greater than the sample threshold, the server may determine the preset frame image as the reference image.

And 208, verifying the image to be identified in the video data according to the reference image.

After the server determines the sample image as the reference image, the server may verify the image to be identified in the video data according to the reference image, so as to filter the image to be identified in the video data. The image to be identified is image data which is extracted from the video data by the server and is positioned behind the sample image. It is understood that the server may sequentially extract the image data from the video data according to the chronological order of the image data. According to different extraction time, the server can record the extracted image data as a sample image or an image to be identified respectively. For example, when the server acquires video data and verifies the image data by determining a reference image, the server may record the extracted image data as a sample image and determine the reference image from a plurality of frame sample images. After the server determines the reference image, the server may record the extracted image data as an image to be recognized, and verify the image to be recognized according to the reference image.

Specifically, the server may sequentially extract image data after the sample image from the video data as an image to be recognized, and the server may compare the image to be recognized with the reference image one by one, so as to verify the image to be recognized. And when the image to be identified and the reference image are successfully verified, repeatedly verifying the image to be identified and the reference image of the next frame until the image to be identified and the reference image are failed to be verified. The server compares the image to be recognized with the reference image to obtain the image similarity between the image to be recognized and the reference image. The server can also call the similarity model, and the similarity model is used for determining the image similarity between the image to be identified and the reference image.

The server may compare the image similarity between the image to be recognized and the reference image with an image threshold. The image threshold may be preset according to actual requirements. In one embodiment, the image threshold may be the same as the sample threshold. When the image similarity is greater than or equal to the image threshold, the server may determine that the comparison between the image to be recognized and the reference image is successful. When the image similarity is smaller than the image threshold, the server may determine that the comparison between the image to be recognized and the reference image fails.

And step 210, when the image similarity between the image to be recognized and the reference image is smaller than the image threshold, performing target recognition on the image to be recognized to obtain a target area corresponding to the image to be recognized.

When the image similarity between the image to be recognized and the reference image is smaller than the image threshold, it indicates that the image content corresponding to the image to be recognized is significantly changed compared with the reference image, the image to be recognized does not belong to the image data similar to the reference image, and the server may determine that the verification between the image to be recognized and the reference image fails. And when the image similarity between the image to be recognized and the reference image is greater than or equal to the image threshold, determining that the verification between the image to be recognized and the reference image is successful. The server can perform target identification on the image to be identified which fails to be verified, so that image data in the video data are filtered, and a target area corresponding to the target object in the image to be identified is obtained.

Specifically, the server may call the image recognition model to perform target recognition on the screened image to be recognized. The image recognition model may be pre-established and trained, and the image recognition model may include at least one of a plurality of image recognition algorithms. The server can input the image to be recognized into the image recognition model, and the image to be recognized obtained after filtering is operated by the image recognition model to obtain a target area corresponding to the image to be recognized output by the image recognition model.

In one embodiment, after the image to be identified which fails to be verified is screened, the server may clear the reference image, and repeat the step of extracting the multi-frame sample image from the video data, thereby re-determining the reference image and effectively improving the accuracy of image verification.

In this embodiment, the server extracts a plurality of frame sample images from the acquired video data, compares the plurality of frame sample images, and determines the sample image as the reference image when the sample similarities between the plurality of frame sample images are all greater than the sample threshold. The server can verify the image to be identified in the video data according to the reference image. Before image recognition is carried out, image data in the video data is checked, and therefore images to be recognized are screened, and target recognition of the images to be recognized with high image similarity in the video data is avoided repeatedly. When the image similarity between the image to be recognized and the reference image is smaller than the image threshold, the server performs target recognition on the screened image to be recognized to obtain a target area corresponding to the image to be recognized, so that unnecessary images to be recognized are reduced, and resources consumed by the image recognition are effectively saved.

In one embodiment, as shown in fig. 3, the step of comparing the sample images to obtain the similarity of the plurality of samples includes:

step 302, pre-processing the sample image.

And 304, acquiring gray values corresponding to a plurality of pixel points in the processed sample image.

Step 306, determining the characteristic information corresponding to the sample image according to the gray value.

And 308, comparing the characteristic information corresponding to the sample images to obtain the sample similarity between the sample images.

The server can call the similarity model, and the extracted sample images are compared by using the similarity model to obtain the sample similarity between the sample images output by the similarity model. The similarity model may be pre-established and trained, and the similarity model may be pre-configured in the server for the server to call. The similarity model includes a similarity function, which may include at least one of a plurality of similarity algorithms. For example, the server may specifically use a Difference Hash Algorithm (DHA), an Average Hash Algorithm (AHA), a Perceptual Hash Algorithm (PHA), a SIFT-invariant feature transform (SIFT-invariant feature transform) algorithm, and the like. The server can process the two sample images through the similarity function, and the sample similarity between the two frame sample images is calculated.

Specifically, the server may perform preprocessing on the two compared sample images, where the preprocessing includes at least one of a plurality of processing manners. For example, the preprocessing may specifically include, but is not limited to, scaling processing and graying processing. The server may perform scaling processing on the extracted sample image to scale the sample image into image data of a preset size. Wherein, the scaled size of the sample image can be preset according to actual requirements. For example, according to different actual requirements, the server may reduce the sample image into image data with a size of 32 pixels or 72 pixels, so as to avoid the difference of the sample image caused by different sizes or different scales.

The server can perform graying processing on the scaled sample image and convert the sample image into a grayscale image, thereby reducing the calculation amount of the sample image. The server can obtain the gray value corresponding to each pixel point in the processed sample image, and the values of the three color channels of each pixel point are equal in the sample image subjected to graying processing. The server can determine the characteristic information corresponding to the sample image according to the gray value corresponding to each pixel point. Specifically, a plurality of pixel points in the sample image are arranged in a rectangular shape, and the server can sequentially traverse the gray value corresponding to each row of pixel points. The server can compare the gray values corresponding to the adjacent pixel points in each row according to the arrangement sequence of the pixel points, and judge whether the gray value of the previous pixel point is greater than or equal to the gray value of the next pixel point. The server may tag the result by comparing "0" to "1". When the gray value of the previous pixel point is greater than or equal to the gray value of the next pixel point, the server may record the comparison result as "1". When the gray value of the previous pixel point is smaller than the gray value of the next pixel point, the server may record the comparison result as "0". After traversing the pixel points in each row, the server obtains hash values including "0" and "1", and the server can record the hash values of the sample image as the characteristic information corresponding to the sample image.

The server can compare the characteristic information corresponding to the two frames of sample images to obtain the sample similarity between the sample images. The server may specifically compare the hash values corresponding to the two frames of sample images one by one to see whether the hash values are the same. For example, the server may calculate a hamming distance between sample images based on their corresponding hash values. The hamming distance can be used to indicate the number of bits corresponding to two strings of the same length that are different. And the server determines the sample similarity between the two frame sample images according to the Hamming distance between the sample images.

In this embodiment, the server may perform preprocessing on the sample image, and determine the feature information corresponding to the sample image according to the gray value corresponding to the pixel point in the processed sample image. And the server compares the characteristic information corresponding to the sample images to obtain the sample similarity between the sample images. The server can determine whether the extracted multi-frame sample images are similar according to the sample similarity, if so, the similar image data can be skipped to be identified, the repeated target identification of the image data with larger image similarity in the video data is avoided, the unnecessary image identification is reduced, and the resources consumed by image identification are effectively saved.

In an embodiment, after the step of comparing the sample images to obtain the similarity of the plurality of samples, the target identification method based on image verification further includes: when the sample similarity smaller than or equal to the sample threshold exists in the plurality of sample similarities, counting the sample image quantity corresponding to the sample similarity smaller than or equal to the sample threshold; acquiring a corresponding preset time period according to the sample image quantity; and carrying out target recognition on the image to be recognized in the video data within a preset time period.

After the server compares the sample images to obtain the sample similarity corresponding to the sample images, the server can compare the sample similarity with the sample threshold to determine whether the sample similarity is greater than the sample threshold. The server can compare all the sample images with each other, and then compare the obtained similarity of the plurality of samples with the sample threshold respectively. The server can also compare the obtained sample similarity with a sample threshold value every time one sample similarity is obtained. When the sample similarity is greater than the sample threshold, the sample images are continuously compared, and the operation resources of the server are further saved.

In one embodiment, when the sample similarity is smaller than or equal to the sample threshold, it indicates that the sample images corresponding to the sample similarity have a large difference, and the change of the target object exists in the sample images, and it is necessary to perform target identification on the sample images with the sample similarity smaller than or equal to the sample threshold. The server can record the corresponding sample image as an image to be recognized, and perform target recognition on the image to be recognized, so as to obtain a target area corresponding to the target object in the image to be recognized. Therefore, omission of image data with dissimilar image contents is avoided, server resources are saved, and accuracy and effectiveness of target object identification in the image are guaranteed.

When the sample similarity smaller than or equal to the sample threshold exists in the sample similarities corresponding to the multiple frame sample images, the server may count the sample similarities smaller than or equal to the sample threshold to obtain the sample image amount corresponding to the sample similarity. The sample image size may be used to represent the number of sample images with dissimilar image content. It can be understood that the sample image amount indicates the number of sample images corresponding to the sample similarity between the sample images being less than or equal to the sample threshold, and the sample image amount can be reset when the server re-extracts the sample images.

The server can obtain the corresponding preset time period according to the sample image quantity. The preset time period is a time length preset by a user according to actual requirements, a corresponding association relation is preset between the time length and the sample image quantity, and different sample image quantities can correspond to time periods with different lengths. The sample image amount may correspond to a discrete preset time period or a continuous preset time period. The preset time period may increase as the number of sample images increases. The server may acquire a preset time period associated with the sample image amount according to a preset association relationship. When the sample similarity smaller than or equal to the sample threshold exists between multi-frame sample images extracted from the video data by the server, the change of the target object exists in a scene corresponding to the video data, and different image data are necessary to be identified.

The server can record the image data in the video data within the preset time period as the image to be recognized according to the acquired preset time period, and directly perform target recognition on the image to be recognized. The server can record the image data in the corresponding time length as the image to be identified in sequence from the extracted last frame sample image according to the time sequence corresponding to the image data. The server can directly identify the target of the image to be identified without checking the image data until the image to be identified is identified, and then sequentially extracts the multi-frame sample images for comparison, so that the image is not checked any more when the content of the image changes greatly, and the resource cost is further saved.

For example, in a road surveillance video, the target object may be a vehicle or a pedestrian, or the like. When there are few vehicles and pedestrians at night, the multi-frame image data of the surveillance video may be the same or similar, and the server may not perform unnecessary image recognition on the repeated image data. When vehicles or pedestrians appear in the monitored area, the similarity between the corresponding image data is low, and the server can perform target identification on the image data with low similarity, so that the target area where the target object is located in the image data is accurately identified. When more images with lower similarity exist in the multi-frame sample images, the continuous existence of moving vehicles or pedestrians in the monitored road can be indicated, such as peak periods of the monitored road. At this time, the image needs to be continuously identified, and the server may stop performing unnecessary similarity verification on the image data. The less the image data with higher similarity in the sample image, the greater the moving capability of the target object in the monitored road, so the longer the server can stop checking the image data, thereby further saving the resources of the server.

In this embodiment, when the sample similarity smaller than or equal to the sample threshold exists among the plurality of sample similarities, the server may count the sample image amount corresponding to the sample similarity smaller than or equal to the sample threshold, obtain a corresponding preset time period according to the sample image amount, and directly perform target identification on the image to be identified within the preset time period, thereby avoiding performing unnecessary image verification on the image data with smaller similarity, and further saving resources of the server.

In one embodiment, the step of verifying the image to be identified in the video data according to the reference image further comprises: extracting an image after a sample image from video data according to a preset sampling rate to be used as an image to be identified; comparing the image to be identified with the reference image to obtain image similarity; and when the image similarity is larger than or equal to the image threshold, repeating the step of extracting the image after the sample image from the video data according to the preset sampling rate as the image to be identified.

After determining the reference image, the server may sequentially extract image data following the sample image from the video data according to a preset sampling rate as an image to be recognized. The preset sampling rate is the frequency of extracting image data preset by a user according to actual requirements. The sampling rate at which the server extracts the image to be identified may be the same as the sampling rate at which the sample image is extracted. In one embodiment, the sampling rate of the image data extracted by the server may be consistent with the frame rate corresponding to the video data.

The server can sequentially compare the extracted image to be recognized with the reference image to obtain the image similarity between the image to be recognized and the reference image. The manner in which the server compares the image to be recognized with the reference image may be the same as the manner in which the server compares the sample image in the above embodiment, and therefore, the description thereof is omitted here. The server may compare the image similarity between the image to be recognized and the reference image with an image threshold, and determine whether the image to be recognized is similar to the reference image. When the image similarity is greater than or equal to the image threshold, it is determined that the image verification is successful, which indicates that the image to be recognized is similar to the reference image, and it is not necessary to repeatedly recognize similar image data. The server can skip the target recognition of the image to be recognized and repeatedly extract the image to be recognized of the next frame from the video data to be compared with the reference image. In one embodiment, when the image similarity is smaller than the image threshold, it is determined that the image verification fails, which indicates that the image to be recognized is not similar to the reference image, a moving target object may appear in the image to be recognized, and it is necessary for the server to perform target recognition on different images to be recognized. The server can perform target recognition on the image to be recognized which fails to be verified, clear the reference image and extract the sample image from the video data again.

In this embodiment, after determining the reference image, the server may record image data after the sample image in the video data as an image to be recognized, and perform similarity check on the image to be recognized by comparing the image to be recognized with the reference image. When the image similarity is greater than or equal to the image threshold, it indicates that the image to be recognized is similar to the reference image, and it is not necessary to repeatedly recognize similar image data. The server can skip the target recognition of the image to be recognized, repeatedly extract the next frame of image to be recognized and compare the next frame of image to be recognized with the reference image, so that unnecessary repeated recognition of similar image data by the server is avoided, and resource costs of the server, such as calculation resources, power consumption and the like, are effectively saved. The resources are really utilized to the necessary identification processing, and the efficiency of image identification is improved.

In one embodiment, the above target identification method based on image verification further includes: counting the number of the images to be identified, wherein the image similarity is greater than or equal to an image threshold; adjusting the preset sampling rate according to the number of the images to be identified; and extracting the image to be identified from the video data according to the adjusted sampling rate.

When the image similarity of the image to be recognized and the reference image is larger than or equal to the image threshold, the server can count the number of the images to be recognized, of which the image similarity is larger than or equal to the image threshold. The number of images to be recognized may be used to indicate the number of images to be recognized that are skipped for recognition among the images to be recognized extracted by the server. When the number of the images to be recognized is small, the content of the images in a short time is greatly changed. When the number of the images to be recognized is large, the image content in a long time period is not changed greatly.

The server can adjust the preset sampling rate according to the number of the images to be identified. Specifically, the server may store in advance a correspondence between the number of images to be recognized and the sampling rate adjustment amplitude, where the correspondence between the number of images to be recognized and the sampling rate adjustment amplitude may be preset by the user according to actual requirements. The adjustment range of the sampling rate with different sizes can be set according to the number of the images to be identified. The sample rate adjustment magnitude may be discrete. For example, the sample rate adjustment magnitude may include a reduction of one-half, one-fifth, and one-tenth of the original sample rate. The server can extract the image to be identified from the video data according to the adjusted sampling rate. It is to be understood that the number of images to be recognized is used to indicate the number of images to be recognized that are similar to the reference image, and the server clears the reference image when the image similarity is smaller than the image threshold. Correspondingly, the server can reset the number of images to be identified. After the reference image is re-determined, the server may re-count the number of the images to be recognized, of which the image similarity is greater than or equal to the image threshold.

When the number of the images to be recognized is large, the image content in a long time period is not changed greatly, and the scene corresponding to the images to be recognized is stable. Therefore, the server can reduce the preset sampling rate, reduce the frequency of comparison between the extracted image to be identified and the reference image, reduce the comparison operation between the image to be identified and the reference image, and further save server resources.

In this embodiment, the server may count the number of images to be recognized whose image similarity is greater than or equal to the image threshold, adjust the preset sampling rate according to the number of images to be recognized, and extract the images to be recognized from the video data according to the adjusted sampling rate, so that the number of times of comparing the images to be recognized with the reference image is reduced under the condition that the scene corresponding to the images to be recognized is relatively stable, and server resources are further saved.

It should be understood that although the various steps in the flow charts of fig. 2-3 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2-3 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternating with other steps or at least some of the sub-steps or stages of other steps.

In one embodiment, as shown in fig. 4, there is provided an object recognition apparatus based on image verification, including: a video acquisition module 402, a reference image determination module 404, an image verification module 406, and an image recognition module 408, wherein:

the video obtaining module 402 is configured to obtain video data, and extract a plurality of frames of sample images from the video data.

A reference image determining module 404, configured to compare the sample images to obtain a plurality of sample similarities; and when the sample similarity is larger than the sample threshold value, determining the sample image as the reference image.

And the image checking module 406 is configured to check the image to be identified in the video data according to the reference image.

The image recognition module 408 is configured to perform target recognition on the image to be recognized when the image similarity between the image to be recognized and the reference image is smaller than the image threshold, so as to obtain a target area corresponding to the image to be recognized.

In one embodiment, the reference image determining module 404 is further configured to pre-process the sample image; acquiring gray values corresponding to a plurality of pixel points in the processed sample image; determining characteristic information corresponding to the sample image according to the gray value; and comparing the characteristic information corresponding to the sample images to obtain the sample similarity between the sample images.

In one embodiment, the reference image determining module 404 is further configured to obtain a preset frame image in the multiple frame sample images; comparing the sample image with a preset frame image to obtain a plurality of sample similarities; and when any sample similarity in the multiple sample similarities is larger than a sample threshold value, determining the preset frame image as the reference image.

In one embodiment, the image identification module 408 is further configured to count an amount of sample images corresponding to sample similarities smaller than or equal to the sample threshold when there is a sample similarity smaller than or equal to the sample threshold among the plurality of sample similarities; acquiring a corresponding preset time period according to the sample image quantity; and carrying out target recognition on the image to be recognized in the video data within a preset time period.

In one embodiment, the image recognition module 408 is further configured to record the sample image with the sample similarity smaller than or equal to the sample threshold as the image to be recognized; and carrying out target identification on the image to be identified to obtain a target area corresponding to the image to be identified.

In an embodiment, the image verification module 406 is further configured to extract an image after the sample image from the video data according to a preset sampling rate as an image to be identified; comparing the image to be identified with the reference image to obtain image similarity; and when the image similarity is larger than or equal to the image threshold, repeating the step of extracting the image after the sample image from the video data according to the preset sampling rate as the image to be identified.

In one embodiment, the image verification module 406 is further configured to count the number of images to be identified, of which the image similarity is greater than or equal to an image threshold; adjusting the preset sampling rate according to the number of the images to be identified; and extracting the image to be identified from the video data according to the adjusted sampling rate.

For specific limitations of the target recognition device based on image verification, reference may be made to the above limitations of the target recognition method based on image verification, and details are not repeated here. The modules in the target identification device based on image verification can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 5. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing object identification data based on image verification. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of object recognition based on image verification.

Those skilled in the art will appreciate that the architecture shown in fig. 5 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, which includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the above-mentioned target identification method embodiment based on image verification when executing the computer program.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the above-mentioned image verification-based object recognition method embodiment.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. An object recognition method based on image verification, the method comprising:

comparing the sample images to obtain a plurality of sample similarities;

2. The method of claim 1, wherein comparing the sample images to obtain a plurality of sample similarities comprises:

preprocessing the sample image;

3. The method of claim 1, wherein comparing the sample images to obtain a plurality of sample similarities comprises:

acquiring preset frame images in a plurality of frames of sample images;

4. The method of claim 1, wherein after comparing the sample images to obtain a plurality of sample similarities, the method further comprises:

5. The method of claim 4, wherein when there is a sample similarity of the plurality of sample similarities that is less than a sample threshold, the method further comprises:

6. The method of claim 1, wherein the verifying the image to be identified in the video data according to the reference image comprises:

7. The method of claim 6, further comprising:

8. An object recognition apparatus based on image verification, the apparatus comprising:

9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.