CN113869364A

CN113869364A - Image processing method, image processing apparatus, electronic device, and medium

Info

Publication number: CN113869364A
Application number: CN202110990326.3A
Authority: CN
Inventors: 吴曌
Original assignee: Beijing Kuangshi Technology Co Ltd; Beijing Megvii Technology Co Ltd
Current assignee: Beijing Kuangshi Technology Co Ltd; Beijing Megvii Technology Co Ltd
Priority date: 2021-08-26
Filing date: 2021-08-26
Publication date: 2021-12-31

Abstract

The application provides an image processing method, an image processing device, an electronic device and a storage medium, wherein the method comprises the following steps: obtaining a test sample; the test sample comprises a sample image and marking information of a target object in the sample image; the marking information comprises detection frame marking information and identification result marking information corresponding to the target object; detecting the sample image through the detection model to obtain detection frame information corresponding to the target object, and identifying the target object through the identification model based on the detection frame information to obtain identification result information corresponding to the sample image; and if the test sample identification failure is determined based on the identification result marking information and the identification result information of the sample image, analyzing the reason of the test sample identification failure according to the detection frame information and the detection frame marking information corresponding to the sample image. The method and the device realize joint analysis of the detection model and the identification model by using the same test sample, and can accurately determine the reason of identification failure.

Description

Image processing method, image processing apparatus, electronic device, and medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to an image processing method and apparatus, an electronic device, and a storage medium.

Background

In a service scenario of target recognition (such as face recognition), target detection is usually implemented by passing an image through a detection model, and then the detected target is sent to a recognition model and compared with an image of a base library to obtain a recognition result.

During algorithm optimization, the defects of the model need to be determined, and the model is optimized based on the defects. However, in the prior art, the reason for the recognition failure cannot be well determined for the scene of the recognition failure.

Disclosure of Invention

In view of the above problems, embodiments of the present application are proposed to provide an image processing method, an apparatus, an electronic device, and a storage medium that overcome or at least partially solve the above problems.

According to a first aspect of embodiments of the present application, there is provided an image processing method, including:

obtaining a test sample; wherein the test sample comprises a sample image and marking information of a target object in the sample image; the marking information comprises detection frame marking information and identification result marking information corresponding to the target object;

detecting the sample image through a detection model to obtain detection frame information corresponding to the target object, and identifying the target object through an identification model based on the detection frame information to obtain identification result information corresponding to the sample image;

and if the test sample identification failure is determined based on the identification result marking information and the identification result information of the sample image, analyzing the reason of the test sample identification failure according to the detection frame information and the detection frame marking information corresponding to the sample image.

According to a second aspect of embodiments of the present application, there is provided an image processing apparatus comprising:

the acquisition module is used for acquiring a test sample; wherein the test sample comprises a sample image and marking information of a target object in the sample image; the marking information comprises detection frame marking information and identification result marking information corresponding to the target object;

the detection and identification module is used for detecting the sample image through a detection model to obtain detection frame information corresponding to the target object, and identifying the target object through an identification model based on the detection frame information to obtain identification result information corresponding to the sample image;

and the reason analysis module is used for analyzing the reason of the identification failure of the test sample according to the detection frame information and the detection frame mark information corresponding to the sample image if the identification failure of the test sample is determined based on the identification result mark information and the identification result information of the sample image.

According to a third aspect of embodiments of the present application, there is provided an electronic apparatus, including: a processor, a memory and a computer program stored on the memory and executable on the processor, which computer program, when executed by the processor, implements the image processing method as described in the first aspect.

According to a fourth aspect of embodiments of the present application, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the image processing method according to the first aspect.

According to the image processing method and device, the electronic equipment and the storage medium, the test sample is obtained, the sample image is detected through the detection model, the detection frame information corresponding to the target object is obtained, the target object is identified through the identification model based on the detection frame information, the identification result information corresponding to the sample image is obtained, if the identification failure of the test sample is determined based on the identification result marking information and the identification result information of the sample image, the reason of the identification failure of the test sample is analyzed according to the detection frame information and the detection frame marking information corresponding to the sample image, the joint analysis of the detection model and the identification model by using the same test sample is realized, and the reason of the identification failure can be accurately determined.

The foregoing description is only an overview of the technical solutions of the present application, and the present application can be implemented according to the content of the description in order to make the technical means of the present application more clearly understood, and the following detailed description of the present application is given in order to make the above and other objects, features, and advantages of the present application more clearly understandable.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application.

Fig. 1 is a flowchart illustrating steps of an image processing method according to an embodiment of the present application;

FIG. 2 is a flowchart illustrating steps of another image processing method according to an embodiment of the present disclosure;

FIG. 3 is a flowchart illustrating steps of another image processing method according to an embodiment of the present disclosure;

fig. 4 is a block diagram of an image processing apparatus according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Exemplary embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

In recent years, technical research based on artificial intelligence, such as computer vision, deep learning, machine learning, image processing, and image recognition, has been actively developed. Artificial Intelligence (AI) is an emerging scientific technology that studies, develops theories, methods, techniques and application systems for simulating, extending and expanding human Intelligence. Artificial intelligence is a comprehensive subject and relates to various technical categories such as chips, big data, cloud computing, internet of things, distributed storage, deep learning, machine learning, neural networks and the like. Computer vision is used as an important branch of artificial intelligence, particularly a machine is used for identifying the world, and computer vision technology generally comprises the technologies of face identification, living body detection, fingerprint identification and anti-counterfeiting verification, biological feature identification, face detection, pedestrian detection, target detection, pedestrian identification, image processing, image identification, image semantic understanding, image retrieval, character identification, video processing, video content identification, behavior identification, three-dimensional reconstruction, virtual reality, augmented reality, synchronous positioning and base map construction (SLAM), computational photography, robot navigation and positioning and the like. With the research and progress of the artificial intelligence technology, the technology is applied to various fields, such as security and protection, city management, traffic management, building management, park management, face passage, face attendance, logistics management, warehouse management, robots, intelligent marketing, computational photography, mobile phone photography, cloud service, smart home, wearable equipment, unmanned driving, automatic driving, smart medical treatment, face payment, face unlocking, fingerprint unlocking, testimony verification, smart screen, smart television, camera, mobile internet, live network broadcasting, beauty, medical beauty, intelligent temperature measurement and other fields.

Fig. 1 is a flowchart of steps of an image processing method provided in an embodiment of the present application, and as shown in fig. 1, the method may include:

step 101, obtaining a test sample; wherein the test sample comprises a sample image and marking information of a target object in the sample image; the labeling information comprises detection frame labeling information and identification result labeling information corresponding to the target object.

The detection labeling information may include position labeling information of the detection frame, or may also include position labeling information of the detection frame and position labeling information of the key point. The identification mark comprises a bottom library image identifier and a corresponding identification score, and the identification score is the similarity between a sample image in the test sample and the bottom library image. The detection box characterizes the position of the target object in the sample image. The test sample is used for testing the detection model and the identification model of the target object. The target object can be a human face, a human body, a license plate, a human body, a vehicle, an animal and the like.

When the detection model and the identification model of the target object need to be subjected to combined optimization testing, a test sample with the labeling information can be obtained.

102, detecting the sample image through a detection model to obtain detection frame information corresponding to the target object, and identifying the target object through an identification model based on the detection frame information to obtain identification result information corresponding to the sample image.

The detection frame information may include position information of the detection frame, or may also include position information of the detection frame and position information of the key point. The recognition result information can comprise the identification and recognition score of the bottom library image with the highest recognition score and can also comprise the target identity.

Detecting a sample image in a test sample through a detection model to obtain detection frame information corresponding to a target object in the sample image, intercepting the target object in the detection frame information according to the detection frame information to obtain a screenshot of the target object, inputting the screenshot into an identification model, and identifying the screenshot through the identification model to obtain identification result information corresponding to the sample image.

For example, when the target object is a human face, a sample image of a test sample is input into the detection model, the detection model outputs a largest human face frame corresponding to the sample image, that is, detection frame information corresponding to the sample image is output, a human face screenshot is intercepted based on the detection frame information, the human face screenshot is input into the recognition model, the recognition model recognizes the human face screenshot, the human face screenshot is compared with human face images in a base library to obtain a human face base library image with the highest similarity, a similarity score is output, the similarity score is a recognition score, and if the recognition score exceeds a recognition threshold, the sample image and the human face in the base library image can be considered to be the same person.

Step 103, if the test sample identification failure is determined based on the identification result labeling information and the identification result information of the sample image, analyzing the reason of the test sample identification failure according to the detection frame information and the detection frame labeling information corresponding to the sample image.

If the identification result information of the sample image in the test sample is the same as the identification result marking information, determining that the test sample is successfully identified; and if the identification result information of the sample image in the test sample is different from the identification result marking information, determining that the identification of the test sample fails.

After the test sample is determined to fail to be identified, the reason of the test sample failure can be analyzed based on the detection frame information and the detection frame mark information corresponding to the sample image, namely, the detection frame information and the detection frame mark information are compared, whether the reason of the failure identification is the problem of the detection model is determined based on the difference between the detection frame information and the detection frame mark information, if the difference between the detection frame information and the detection frame mark information meets the target difference condition, the detection model is determined to have no problem, and if the difference between the detection frame information and the detection frame mark information does not meet the target difference condition, the reason of the failure identification is determined to be the problem of the detection model. If the detection model has no problem, it can be determined that the reason why the test sample identification fails is a problem in the image quality of the sample or a problem in the identification model.

According to the image processing method provided by the embodiment of the application, the test sample is obtained, the sample image is detected through the detection model, the detection frame information corresponding to the target object is obtained, the target object is identified through the identification model based on the detection frame information, the identification result information corresponding to the sample image is obtained, if the identification failure of the test sample is determined based on the identification result marking information and the identification result information of the sample image, the reason of the identification failure of the test sample is analyzed according to the detection frame information and the detection frame marking information corresponding to the sample image, the joint analysis of the detection model and the identification model by using the same test sample is realized, and the reason of the identification failure can be accurately determined.

Fig. 2 is a flowchart of steps of an image processing method according to an embodiment of the present application, in this embodiment, based on the foregoing embodiment, the test sample is test video data, where the test video data includes multiple frame sample images and annotation information of a target object in each frame of the sample images, and as shown in fig. 2, the method may include:

step 201, obtaining a test sample; wherein the test sample comprises a sample image and marking information of a target object in the sample image; the labeling information comprises detection frame labeling information and identification result labeling information corresponding to the target object.

Step 202, detecting the sample image through a detection model to obtain detection frame information corresponding to the target object, and identifying the target object through an identification model based on the detection frame information to obtain identification result information corresponding to the sample image.

The method comprises the steps of enabling each frame of sample image in a test sample to pass through a detection model respectively to obtain detection frame information of each frame of sample image, intercepting a target object in the detection frame information of each frame of sample image according to the detection frame information of each frame of sample image to obtain a screenshot of the target object, inputting the screenshot into an identification model, and identifying the screenshot through the identification model to obtain other result information of each frame of sample image. Or after the detection frame information of each frame of sample image is obtained, a plurality of frames of sample images with quality attributes meeting certain requirements can be screened from the test samples, sample images with certain proportions are screened from the sample images to serve as a push frame, a target object in the push frame is subjected to screenshot according to the detection frame information of the push frame, the screenshot is input into an identification model, the screenshot is subjected to target object identification through the identification model, and identification result information of the screened sample images is obtained.

Step 203, for each frame of sample image in the multiple frames of sample images, verifying the identification result of the sample image based on the identification result labeling information and the identification result information corresponding to the sample image.

And verifying the identification result of each frame of sample image in the multi-frame sample image based on the identification result marking information and the identification result information respectively so as to determine that the identification of each frame of sample image is successful, failed or not identifies the target object.

In an embodiment of the present application, the verifying, for each of the multiple frames of sample images, the identification result of the sample image based on the identification result tagging information and the identification result information corresponding to the sample image includes:

traversing the multi-frame sample image;

aiming at the sample image traversed currently, checking whether a target object is identified according to the identification result information corresponding to the sample image;

if the verification result indicates that the target object is identified, verifying whether the sample image is correctly identified based on the identification result information and the identification result marking information corresponding to the sample image;

if the checking result indicates that the target object is not identified, continuously executing the operation of traversing the multi-frame sample image until the condition of traversing ending is met;

wherein the traversal end condition comprises any one of the following conditions: there are sample images that identify correctly, sample images that identify incorrectly, or none of the sample images identify the target object.

Verifying the identification result of each frame of sample image in the multi-frame sample images in a traversing manner, verifying whether a target object is identified or not according to the identification result information corresponding to the sample image aiming at the currently traversed sample image, if the verification result indicates that the target object is identified, comparing the identification result information corresponding to the sample image with the identification result marking information, if the identification result information corresponding to the sample image and the identification result marking information are the same target object, determining that the sample image is correctly identified, and if the identification result information corresponding to the sample image and the identification result marking information are different target objects, determining that the sample image is incorrectly identified; and if the verification result indicates that the target object is not identified, executing the operation of traversing the multi-frame sample image until the condition of traversing ending is met. In the traversing process, if the currently traversed sample image is identified correctly or incorrectly, ending the traversing, and taking the identification result of the sample image as the identification result of the test sample, namely determining that the test sample is identified correctly when the currently traversed sample image is identified correctly, and determining that the test sample is identified incorrectly when the currently traversed sample image is identified incorrectly; and if the multi-frame sample images in the test sample are traversed and the target object is not identified in all the sample images, ending the traversal and determining that the target object is not identified in the test sample. The test sample that identifies an error or that does not identify the target object is the test sample that failed identification.

The multi-frame sample images are verified in a traversing mode, so that the multi-frame sample images can be sequentially verified, and the identification result of the test sample can be accurately determined.

It should be noted that, in addition to the verification of the identification result of each frame of sample image by using the traversal method, the identification result of each frame of sample image may also be verified by using other methods, for example, the multi-frame sample images may also be verified simultaneously, or the multi-frame sample images may be divided into multiple batches and verified in batches.

And 204, if the verification result indicates that the correctly identified sample image exists in the multi-frame sample images, determining that the test sample is successfully identified.

Step 205, if the verification result indicates that there is a sample image with an identification error in the multi-frame sample images, determining that the identification of the test sample fails.

Step 207 is then performed.

And step 206, if the verification result indicates that the target object is not identified in each frame of sample image in the multi-frame sample images, determining that the identification of the test sample fails.

Step 207 is then performed.

And step 207, analyzing the reason of the test sample identification failure according to the detection frame information and the detection frame mark information corresponding to the sample image.

In an embodiment of the present application, the analyzing, according to the detection frame information and the detection frame label information corresponding to the sample image, the reason for the failure of the identification of the test sample includes: determining a key video frame corresponding to the test sample based on the identification result marking information and the identification result information of the sample image of each frame; and analyzing the reason of the test sample identification failure based on the detection frame information and the detection frame mark information corresponding to the key video frame.

When one test sample is test video data and includes a plurality of frame sample images, a key video frame corresponding to the test sample may be determined first, and the reason for the failure in identifying the test sample may be analyzed based on the key video frame. The key video frame may be a sample image which is identified incorrectly in the test sample, or a sample image which is determined by a condition in a plurality of frame sample images or a random frame sample image when no target object is identified in the plurality of frame sample images in the test sample.

When the reason of the identification failure of the test sample is analyzed, the analysis is only carried out when the identification of the test sample fails, and the analysis is not required for the test sample which is successfully identified.

In an embodiment of the application, the determining, based on the identification result tagging information and the identification result information of the sample image of each frame, a key video frame corresponding to the test sample includes: for each frame of sample image in the multi-frame sample image, verifying the identification result of the sample image based on the identification result marking information and the identification result information corresponding to the sample image; and determining the sample image with the highest recognition score in the sample images with the verification results indicating that the target object is not recognized as the key video frame.

And verifying the identification result of each frame of sample image in the multi-frame sample image based on the identification result marking information and the identification result information respectively so as to determine that the identification of each frame of sample image is successful, failed or not identifies the target object. And determining the sample image with the verification result indicating that the identification is wrong as the key video frame of the test sample, or determining the sample image with the highest identification score in the sample images with the verification result indicating that the target object is not identified as the key video frame of the test sample. By determining the key video frames based on the verification result, the key video frames of the test sample can be accurately determined, and accurate analysis data can be provided for subsequent analysis.

In an embodiment of the present application, analyzing the reason for the failure of the identification of the test sample based on the detection frame information and the detection frame label information corresponding to the key video frame includes: based on the detection frame information and the detection frame mark information corresponding to the key video frame, performing detection performance and/or quality attribute analysis on the key video frame to determine the reason of the identification failure of the test sample; wherein the detection performance reflects detection-related issues of the detection model, and the quality attribute reflects quality-related issues of the test sample.

When the reason of the test sample identification failure is analyzed, only the key video frame needs to be analyzed, so that the data volume of processing is reduced. The key video frames can be analyzed for detection performance and/or quality attributes based on the detection frame information and the detection frame label information corresponding to the key video frames, so as to determine whether the reason for the failed identification of the test sample is the problem of the detection model or the identification model, or the problem of the video quality. And comparing the detection frame information and the detection frame marking information corresponding to the key video frame, determining whether the identification failure reason is the problem of the detection model or not based on the difference between the detection frame information and the detection frame marking information, determining that the detection model has no problem if the difference between the detection frame information and the detection frame marking information meets the target difference condition, and determining that the identification failure reason is the problem of the detection model if the difference between the detection frame information and the detection frame marking information does not meet the target difference condition. And performing quality attribute analysis on the key video frame to judge whether the quality attribute meets a target attribute condition, if the quality attribute does not meet the target attribute condition, determining that the identification failure reason is the quality attribute problem of the test sample, and if the quality attribute meets the target attribute condition, determining that the identification failure reason is the problem of the identification model.

For a test sample, only the detection performance analysis or the quality attribute analysis may be performed, and if the problem of the detection model or the problem of the video quality is analyzed, it may be determined that the reason of the identification failure is the problem of the detection model or the problem of the video quality.

When the reason of the identification failure is determined to be that the detection model or the identification model has a problem, the detection model or the identification model can be optimized to eliminate the problem; when the reason for the identification failure is determined to be the problem of the video quality, the camera for obtaining the test sample can be adjusted, so that the problem of the quality attribute is eliminated to the maximum extent.

In one embodiment of the present application, performing detection performance and quality attribute analysis on the key video frames to determine the reason for the failure of the test sample identification includes:

performing detection performance analysis on the key video frame according to the detection frame information and the detection frame mark information of the key video frame;

if the detection performance analysis result indicates that the detection performance is in a problem, determining that the reason for the identification failure of the test sample is the problem of the detection model;

if the detection performance analysis result indicates that the detection performance is not problematic, checking whether the quality attribute of the key video frame meets a target attribute condition;

if the quality attribute does not meet the target attribute condition, determining that the reason of the identification failure of the test sample is the problem of the video quality;

and if the quality attribute meets the target attribute condition, determining that the reason of the identification failure of the test sample is the problem of the identification model.

Determining the difference between the detection frame information of the key video frame and the detection frame marking information, determining the detection performance analysis result of the key video frame based on the difference, if the difference does not meet the target difference condition, determining that the detection performance has a problem, determining that the reason of the failed identification of the test sample is the problem of the detection model, not performing subsequent analysis, if the difference meets the target difference condition, determining that the detection performance has no problem, then further obtaining the quality attribute of the key video frame, analyzing whether the quality attribute meets the target attribute condition, if the quality attribute does not meet the target attribute condition, determining that the reason of the failed identification of the test sample is the problem of the video quality, and if the quality attribute meets the target attribute condition, determining that the reason of the failed identification of the test sample is the problem of the identification model. Wherein the quality attribute may include: the target object angle may be an angle of looking up or looking down when the target object is a human face, and the like. When the reason for the identification failure is determined to be the problem of the video quality, the quality attribute which does not meet the target attribute condition may also be determined, that is, the specific quality attribute which causes the identification failure may be determined.

By sequentially carrying out detection performance analysis and quality attribute analysis, the reason of the identification failure of the test sample can be accurately determined.

In an optional implementation manner, the detection frame information includes position information of the detection frame, and the detection frame annotation information includes position annotation information of the detection frame; the analyzing the detection performance of the key video frame according to the detection frame information and the detection frame mark information of the key video frame comprises the following steps: determining the overlapping degree of a first detection frame and a second detection frame corresponding to the key video frame; the first detection frame is determined based on the position information of the detection frame corresponding to the key video frame, and the second detection frame is determined based on the position marking information corresponding to the key video frame; if the overlapping degree is larger than or equal to the overlapping degree threshold value, determining that no problem exists in the detection performance; and if the overlapping degree is smaller than the overlapping degree threshold value, determining that the detection performance has a problem.

When the detection frame information is the position information of the detection frame and the detection frame mark information is the position mark information of the detection frame, a first detection frame determined by the position information of the detection frame corresponding to the key video frame may be determined, determining a second detection frame determined by the position marking information of the detection frame, calculating the overlapping degree of the first detection frame and the second detection frame, namely, calculating the proportion of the overlapping area of the first detection frame and the second detection frame to the total area of the first detection frame and the second detection frame to obtain the overlapping degree of the first detection frame and the second detection frame, if the overlapping degree is more than or equal to the threshold value of the overlapping degree, i.e., the first detection frame and the second detection frame are mostly overlapped, it is determined that there is no problem in detection performance, and if the degree of overlap is less than the overlap threshold, that is, the portion where the first detection frame and the second detection frame overlap is insufficient, it is determined that there is a problem in detection performance. By judging based on the detection frame, the processed data amount is less, and the optimization analysis efficiency can be improved.

In another optional implementation, the detection frame information includes position information of the detection frame and position information of the key point, and the detection frame annotation information includes detection frame position annotation information and position annotation information of the key point; the analyzing the detection performance of the key video frame according to the detection frame information and the detection frame mark information of the key video frame comprises the following steps: determining the overlapping degree of a first detection frame and a second detection frame corresponding to the key video, wherein the first detection frame is determined based on the position information of the detection frame corresponding to the key video frame, and the second detection frame is determined based on the position marking information corresponding to the key video frame; if the overlapping degree is smaller than the overlapping degree threshold value, determining that the first type of problem exists in the detection performance; wherein the first type of problem is associated with target detection; if the overlapping degree is larger than or equal to an overlapping degree threshold value, determining the distance between the key point position information of the key video frame and the key point position marking information; if the distance is greater than the distance threshold, determining that a second type of problem exists in the detection performance; wherein the second type of problem is related to keypoint detection; and if the distance is smaller than or equal to the distance threshold, determining that no problem exists in the detection performance.

When the detection frame information includes the position information of the detection frame and the position information of the key point, the detection frame annotation information includes the position annotation information of the detection frame and the position annotation information of the key point, the first detection frame and the second detection frame can be firstly compared, namely, the overlapping degree of the first detection frame and the second detection frame corresponding to the key video frame is determined, if the overlapping degree is smaller than the overlapping degree threshold value, the detection performance can be determined to have a first type problem, if the overlapping degree is larger than or equal to the overlapping degree threshold value, the detection model can be determined not to have the first type problem, then, the key point detection is further judged, namely, the distance between the key point position information of the key video frame and the key point position annotation information is determined, if the distance is larger than the distance threshold value, namely, the difference between the key point position information in the detection frame information and the key point position annotation information in the detection frame annotation information is larger, then it is determined that there is a second type of problem with detection performance and if the distance is less than or equal to the distance threshold, then it is determined that there is no problem with detection performance. The first type of problem is a problem related to target detection, such as inaccurate target detection, which may be, for example, a missed target object or an erroneously detected target object, where the missed target object misses a target object, and the erroneously detected target object erroneously detects other objects as target objects; the second type of problem is one related to keypoint detection, such as inaccurate keypoint detection.

By judging the detection frame and the key points at the same time, a relatively accurate and refined analysis result can be given.

In the image processing method provided by this embodiment, when the test sample is the test video data, for each frame of sample image in the multiple frames of sample images, the identification result of the sample image is verified based on the identification result tagging information and the identification result information corresponding to the sample image, the test sample with failed identification is determined based on the verification result, and the analysis of the reason of the failed identification is performed on the test sample based on the identification result tagging information and the identification result information of the sample image, so that the joint optimization analysis of the detection model and the identification model by using the same test sample is realized, and the reason of the failed identification can be accurately determined.

Fig. 3 is a flowchart of steps of an image processing method according to an embodiment of the present application, in this embodiment, on the basis of the foregoing embodiment, a plurality of sets of models may be subjected to comparative analysis, and a set of models may include a detection model and a recognition model, as shown in fig. 3, the method may include:

step 301, obtaining a test sample; wherein the test sample comprises a sample image and marking information of a target object in the sample image; the labeling information comprises detection frame labeling information and identification result labeling information corresponding to the target object.

Step 302, respectively identifying sample images in the test sample through a plurality of sets of models to obtain a plurality of detection frame information and a plurality of identification result information corresponding to the sample images; each set of models includes a detection model and a recognition model.

When a plurality of sets of models are compared and analyzed, the sample images are respectively input into each set of models to obtain the detection frame information and the identification result information of each set of models to the sample images, namely the sample images are respectively input into the detection models of each set of models, the detection models are detected to obtain the detection frame information corresponding to the target object, the identification models in the same set of models identify the target object based on the detection frame information to obtain the identification result information corresponding to the sample images, and therefore the plurality of sets of models are used for obtaining the plurality of detection frame information and the plurality of identification result information.

Step 303, if it is determined that the recognition results of the multiple sets of models for the test sample are different based on the recognition result tagging information and the multiple sets of recognition result information corresponding to the sample image, analyzing the reason for the recognition failure of each set of models that failed in recognition in the test sample according to the multiple sets of detection frame information and the detection frame tagging information corresponding to the sample image.

Each piece of identification result information in the plurality of pieces of identification result information corresponds to one set of model, based on the identification result marking information and the plurality of pieces of identification result information, the identification result marking information and the plurality of pieces of identification result information can be respectively compared, whether the identification result of the plurality of sets of models to the test sample is successful or failed is determined, if the identification results of the plurality of sets of models to the test sample are different, the model which fails to identify the test sample in the plurality of sets of models is determined, and the reason why each set of model which fails to identify fails is analyzed in the test sample.

The number of the multiple sets of models can be two sets, or three sets or more than three sets.

When the number of the multiple sets of models is two, the two sets of models can be a first set of model and a second set of model respectively, the first set of model comprises a first detection model and a first identification model, the second set of model comprises a second detection model and a second identification model, and the first set of model and the second set of model can be models before and after upgrading or can be completely different models. When the two sets of models are subjected to comparative analysis, the sample images are respectively input into a first detection model and a second detection model, the first detection model detects the sample images to obtain first detection frame information corresponding to a target object, and the target object is identified through a first identification model based on the first detection frame information to obtain first identification result information corresponding to the sample images; the second detection model detects the sample image to obtain second detection frame information corresponding to the target object, and the target object is identified through the second identification model based on the second detection frame information to obtain second identification result information corresponding to the sample image; and determining whether the identification result of the first set of models to the test sample is successful or failed based on the identification result marking information and the first identification result information of the sample image, determining whether the identification result of the second set of models to the test sample is successful or failed based on the identification result marking information and the second identification result information of the sample image, and analyzing the reason why the identification of the model failed in the test sample fails in the test sample if the identification results of the first set of models and the second set of models to the test sample are different. When a plurality of test samples exist, respectively identifying the sample images in each test sample by using a first set of model and a second set of model to obtain first detection frame information and first identification result information of each test sample corresponding to the first set of model respectively, and second detection frame information and second identification result information of each test sample corresponding to the second set of model respectively, determining a test sample set with different identification results of the first set of model and the second set of model for the plurality of test samples based on the identification result marking information, the first identification result information and the second identification result information corresponding to each test sample, respectively determining a first test sample subset of the first set of model which fails to identify in the test sample set and the second set of model which succeeds to identify, and a second test sample subset of the second set of model which fails to identify in the test sample combination and the first set of model succeeds to identify, the first set of models is analyzed to identify causes of failures in the first subset of test samples, and the second set of models is analyzed to identify causes of failures in the second subset of test samples.

When the number of the multiple sets of models is three or more, the identification result marking information and the multiple pieces of identification result information can be respectively compared based on the identification result marking information and the multiple pieces of identification result information, whether the identification result of the multiple sets of models to the test sample is successful or failed is determined, if the identification results of the multiple sets of models to the test sample are different, the model which fails to identify the test sample in the multiple sets of models is determined, and the reason why each set of model which fails to identify fails in the test sample is analyzed. When a plurality of test samples exist, each test sample is respectively identified through a plurality of sets of models, each test sample can correspond to the detection frame information and the identification result information of each set of models, namely, the plurality of detection frame information and the plurality of identification result information exist corresponding to each test sample, the identification result marking information and the plurality of identification result information are respectively compared corresponding to each test sample, whether the identification results of the plurality of sets of models to the test samples are the same or not is determined, therefore, different test sample sets of the identification results of the plurality of sets of models are determined based on the identification results of the plurality of sets of models to each test sample, the subset of the test samples of each set of models which the identification fails in the test sample sets is respectively determined, and the reason for the identification failure of each set of models in the corresponding test sample subsets is analyzed.

In the image processing method provided by this embodiment, the sample images in the test samples are respectively identified by the multiple sets of models, so as to obtain the multiple pieces of detection frame information and the multiple pieces of identification result information corresponding to the sample images, and if it is determined that the identification results of the multiple sets of models for the test samples are different based on the identification result labeling information and the multiple pieces of identification result information corresponding to the sample images, the reason why each set of models failed in identification fails in the test sample set is analyzed according to the multiple pieces of detection frame information and the detection frame labeling information corresponding to the sample images, so that the comparative analysis of the multiple sets of models is realized, the reason for identifying the difference of the multiple sets of models can be determined, and the comparative analysis of the models before and after upgrading is facilitated.

It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the embodiments are not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the embodiments. Further, those skilled in the art will also appreciate that the embodiments described in the specification are presently preferred and that no particular act is required of the embodiments of the application.

Fig. 4 is a block diagram of an image processing apparatus according to an embodiment of the present application, and as shown in fig. 4, the image processing apparatus may include:

an obtaining module 401, configured to obtain a test sample; wherein the test sample comprises a sample image and marking information of a target object in the sample image; the marking information comprises detection frame marking information and identification result marking information corresponding to the target object;

a detection and identification module 402, configured to detect the sample image through a detection model to obtain detection frame information corresponding to the target object, and identify the target object through an identification model based on the detection frame information to obtain identification result information corresponding to the sample image;

a reason analyzing module 403, configured to, if it is determined that the test sample fails to be identified based on the identification result tagging information and the identification result information of the sample image, analyze a reason why the test sample fails to be identified according to the detection frame information and the detection frame tagging information corresponding to the sample image.

Optionally, the test sample is test video data, and the test video data includes multiple frames of sample images and annotation information of a target object in each frame of the sample images;

the apparatus further comprises a recognition result determination module comprising:

the verification unit is used for verifying the identification result of the sample image based on the identification result marking information and the identification result information corresponding to the sample image aiming at each frame of sample image in the multi-frame sample image;

the identification result determining unit is used for determining that the test sample is successfully identified if the verification result indicates that the correctly identified sample image exists in the multi-frame sample images; if the check result indicates that the sample image with the identification error exists in the multi-frame sample image, determining that the identification of the test sample fails; and if the verification result indicates that the target object is not identified in each frame of sample image in the multi-frame sample images, determining that the identification of the test sample fails.

Optionally, the verification unit is specifically configured to:

traversing the multi-frame sample image;

Optionally, the reason analyzing module includes:

the key frame determining unit is used for determining a key video frame corresponding to the test sample based on the identification result marking information and the identification result information of the sample image of each frame;

and the reason analysis unit is used for analyzing the reason of the test sample identification failure based on the detection frame information and the detection frame mark information corresponding to the key video frame.

Optionally, the key frame determining unit is specifically configured to:

for each frame of sample image in the multi-frame sample image, verifying the identification result of the sample image based on the identification result marking information and the identification result information corresponding to the sample image;

and determining the sample image with the highest recognition score in the sample images with the verification results indicating that the target object is not recognized as the key video frame.

Optionally, the reason analyzing unit is specifically configured to:

based on the detection frame information and the detection frame mark information corresponding to the key video frame, performing detection performance and/or quality attribute analysis on the key video frame to determine the reason of the identification failure of the test sample;

wherein the detection performance reflects detection-related issues of the detection model, and the quality attribute reflects quality-related issues of the test sample.

Optionally, the reason analyzing unit includes:

the performance analysis subunit is used for carrying out detection performance analysis on the key video frame according to the detection frame information and the detection frame mark information of the key video frame;

the detection model problem determining subunit is used for determining that the reason of the failure in the identification of the test sample is the problem of the detection model if the detection performance analysis result indicates that the detection performance has a problem;

a quality attribute checking subunit, configured to check whether the quality attribute of the key video frame meets a target attribute condition if the detection performance analysis result indicates that there is no problem with the detection performance;

the quality problem determining subunit is configured to determine that the reason for the failed identification of the test sample is a problem of video quality if the quality attribute does not satisfy the target attribute condition;

and the identification model problem determining subunit is used for determining that the reason of the identification failure of the test sample is the problem of the identification model if the quality attribute meets the target attribute condition.

Optionally, the detection frame information includes position information of the detection frame, and the detection frame label information includes position label information of the detection frame;

the performance analysis subunit is specifically configured to:

determining the overlapping degree of a first detection frame and a second detection frame corresponding to the key video frame; the first detection frame is determined based on the position information of the detection frame corresponding to the key video frame, and the second detection frame is determined based on the position marking information corresponding to the key video frame;

if the overlapping degree is larger than or equal to the overlapping degree threshold value, determining that no problem exists in the detection performance;

and if the overlapping degree is smaller than the overlapping degree threshold value, determining that the detection performance has a problem.

Optionally, the detection frame information includes position information of the detection frame and position information of the key point, and the detection frame label information includes position label information of the detection frame and position label information of the key point;

the performance analysis subunit is specifically configured to:

determining the overlapping degree of a first detection frame and a second detection frame corresponding to the key video frame, wherein the first detection frame is determined based on the position information of the detection frame corresponding to the key video frame, and the second detection frame is determined based on the position marking information corresponding to the key video frame;

if the overlapping degree is smaller than the overlapping degree threshold value, determining that the first type of problem exists in the detection performance; wherein the first type of problem is associated with target detection;

if the overlapping degree is larger than or equal to an overlapping degree threshold value, determining the distance between the key point position information of the key video frame and the key point position marking information;

if the distance is greater than the distance threshold, determining that a second type of problem exists in the detection performance; wherein the second type of problem is related to keypoint detection;

and if the distance is smaller than or equal to the distance threshold, determining that no problem exists in the detection performance.

Optionally, the detection and identification module is specifically configured to:

respectively identifying sample images in the test sample through a plurality of sets of models to obtain a plurality of detection frame information and a plurality of identification result information corresponding to the sample images; each set of models comprises a detection model and a recognition model;

the reason analysis module is specifically configured to:

and if the recognition results of the multiple sets of models on the test sample are different based on the recognition result labeling information and the multiple sets of recognition result information corresponding to the sample image, analyzing the reason of the recognition failure of each set of models in the test sample, wherein the recognition failure of each set of models in the test sample is failed according to the multiple sets of detection frame information and the detection frame labeling information corresponding to the sample image.

The specific implementation process of the functions corresponding to each module and unit in the apparatus provided in the embodiment of the present application may refer to the method embodiment shown in fig. 1 to 3, and the specific implementation process of the functions corresponding to each module and unit of the apparatus is not described herein again.

The image processing apparatus provided in this embodiment obtains the test sample, detects the sample image through the detection model, obtains the detection frame information corresponding to the target object, identifies the target object through the identification model based on the detection frame information, obtains the identification result information corresponding to the sample image, and if it is determined that the test sample fails to be identified based on the identification result label information and the identification result information of the sample image, analyzes the reason of the test sample failing to be identified according to the detection frame information and the detection frame label information corresponding to the sample image, so that the detection model and the identification model are jointly analyzed by using the same test sample, and the reason of the identification failure can be accurately determined.

For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure, and as shown in fig. 5, the electronic device 500 may include one or more processors 510 and one or more memories 520 connected to the processors 510. Electronic device 500 may also include input interface 530 and output interface 540 for communicating with another apparatus or system. Program code executed by processor 510 may be stored in memory 520.

The processor 510 in the electronic device 500 calls the program code stored in the memory 520 to execute the image processing method in the above-described embodiment.

According to an embodiment of the present application, there is also provided a computer readable storage medium including, but not limited to, a disk memory, a CD-ROM, an optical memory, etc., having stored thereon a computer program which, when executed by a processor, implements the image processing method of the foregoing embodiment.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

As will be appreciated by one of skill in the art, embodiments of the present application may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present application have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the true scope of the embodiments of the application.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.

The foregoing detailed description is directed to an image processing method, an image processing apparatus, an electronic device, and a storage medium, which are provided by the present application, and specific examples are applied in the present application to explain the principles and implementations of the present application, and the descriptions of the foregoing examples are only used to help understand the method and the core ideas of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. An image processing method, comprising:

2. The method according to claim 1, wherein the test sample is test video data, and the test video data comprises a plurality of frames of sample images and annotation information of a target object in each frame of the sample images;

determining whether the test sample fails to be identified based on the identification result marking information and the identification result information of the sample image by the following steps:

if the verification result indicates that correctly identified sample images exist in the multi-frame sample images, determining that the test samples are successfully identified;

if the check result indicates that the sample image with the identification error exists in the multi-frame sample image, determining that the identification of the test sample fails;

and if the verification result indicates that the target object is not identified in each frame of sample image in the multi-frame sample images, determining that the identification of the test sample fails.

3. The method according to claim 2, wherein the verifying the identification result of the sample image based on the identification result labeling information and the identification result information corresponding to the sample image for each of the plurality of frames of sample images comprises:

traversing the multi-frame sample image;

4. The method of claim 1, wherein analyzing the reason for the failure of the identification of the test sample according to the detection frame information and the detection frame label information corresponding to the sample image comprises:

determining a key video frame corresponding to the test sample based on the identification result marking information and the identification result information of the sample image of each frame;

and analyzing the reason of the test sample identification failure based on the detection frame information and the detection frame mark information corresponding to the key video frame.

5. The method according to claim 4, wherein the determining the key video frame corresponding to the test sample based on the identification result tagging information and the identification result information of the sample image of each frame comprises:

6. The method according to claim 4 or 5, wherein analyzing the reason for the failure of the identification of the test sample based on the detection frame information and the detection frame mark information corresponding to the key video frame comprises:

7. The method of claim 6, wherein performing detection performance and quality attribute analysis on the key video frames to determine the cause of the test sample identification failure comprises:

8. The method according to claim 7, wherein the detection frame information includes position information of the detection frame, and the detection frame label information includes position label information of the detection frame;

the analyzing the detection performance of the key video frame according to the detection frame information and the detection frame mark information of the key video frame comprises the following steps:

9. The method according to claim 7, wherein the detection frame information includes position information and key point position information of the detection frame, and the detection frame label information includes position label information and key point position label information of the detection frame;

10. The method according to any one of claims 1 to 9, wherein detecting the sample image by a detection model to obtain detection frame information corresponding to the target object, and identifying the target object by an identification model based on the detection frame information to obtain identification result information corresponding to the sample image comprises:

if the test sample identification failure is determined based on the identification result labeling information and the identification result information of the sample image, analyzing the reason of the test sample identification failure according to the detection frame information and the detection frame labeling information corresponding to the sample image, wherein the analysis comprises the following steps:

11. An image processing apparatus characterized by comprising:

12. An electronic device, comprising: processor, memory and computer program stored on the memory and executable on the processor, which computer program, when executed by the processor, implements the image processing method according to any of claims 1 to 10.

13. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the image processing method according to any one of claims 1 to 10.