WO2022062379A1

WO2022062379A1 - Image detection method and related apparatus, device, storage medium, and computer program

Info

Publication number: WO2022062379A1
Application number: PCT/CN2021/088718
Authority: WO
Inventors: 时占; 闫研
Original assignee: 北京市商汤科技开发有限公司
Priority date: 2020-09-22
Filing date: 2021-04-21
Publication date: 2022-03-31
Also published as: JP2022552754A; KR20220042301A; CN112115886A

Abstract

An image detection method and a related apparatus, a device, and a storage medium. The image detection method comprises: acquiring a first image containing a target to be detected (S11); performing detection on the first image, and obtaining a detection result of the first image, the detection result comprising whether said target in the first image is blocked by a preset object (S12); and executing a preset operation matching the detection result (S13).

Description

Image detection method and related apparatus, equipment, storage medium, computer program

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is based on the Chinese patent application with the application number 202011002322.1, the application date is September 22, 2020, and the application name is "image detection method and related devices, equipment, storage medium", and claims the priority of the Chinese patent application, The entire content of this Chinese patent application is hereby incorporated by reference in its entirety.

technical field

The present application relates to the technical field of image processing, and in particular, to an image detection method and related apparatus, equipment, storage medium, and computer program.

Background technique

At present, image processing, especially the detection and recognition of objects in images, has been widely used in various usage scenarios. Taking faces as an example, the detection and recognition of faces in images has been widely used in finance, border inspection, government, aerospace, electric power, factories, education, medical care and other fields. In the prior art, a camera device is used to collect an image or video stream containing a human face, and the human face is automatically detected in the image, and then face recognition is performed on the detected human face, and corresponding processing is performed based on the recognition result.

SUMMARY OF THE INVENTION

The embodiments of the present application provide at least one image detection method and related apparatus, equipment, storage medium, and computer program.

An embodiment of the present application provides an image detection method, including: acquiring a first image containing a target to be detected; detecting the first image to obtain a detection result of the first image, wherein the detection result includes the target to be detected in the first image Detect whether the target is occluded by a preset object; perform a preset operation that matches the detection result.

Therefore, by detecting the first image containing the object to be detected to obtain a detection result of whether the object to be detected is blocked, and then performing a preset operation matching the detection result, it is possible to determine whether the object to be detected is blocked, so as to carry out subsequent The preset operation matching the detection result realizes flexible processing based on the occlusion state of the target to be detected in the image.

In some embodiments, the detection result is obtained by using a neural network to detect the first image.

Therefore, the detection is performed by a neural network trained in advance, so that the detection result is more accurate and the detection speed is faster.

In some embodiments, performing a preset operation matching the detection result includes: when the target to be detected is not blocked by a preset object, issuing a first reminder; wherein, the first reminder is used to prompt using a preset object to treat Detect the target for occlusion.

Therefore, by issuing the first reminder when the object to be detected is not blocked, the situation that the object to be detected is not blocked by the preset object is timely reminded, and the person being reminded can also take corresponding measures in time.

In some embodiments, the detection result further includes whether the occlusion mode in which the target to be detected is occluded by a preset object is a preset occlusion mode; performing a preset operation matching the detection result includes: when the target to be detected is occluded by a preset object and When the occlusion mode does not belong to the preset occlusion mode, a second reminder is issued; wherein, the second reminder is used to prompt to adjust the occlusion mode of the preset object.

Therefore, when the occlusion mode is incorrect, a second reminder is issued, so as to adjust the occlusion mode of the target to be detected in time.

In some embodiments, performing a preset operation matching the detection result includes: when the object to be detected is occluded by a preset object, extracting at least a first feature of an unoccluded portion of the object to be detected from the first image , as the to-be-identified feature of the target to be detected; using the to-be-identified feature to identify the target to be detected, and obtain the recognition result.

Therefore, when the target to be detected is occluded by a preset object, the features of the unoccluded part are extracted for identification, which realizes the recognition based on the local features of the target to be detected, and since the local features are not occluded, it can represent the features to be detected. target, to a certain extent to ensure the accuracy of recognition.

In some embodiments, extracting at least a first feature of an unoccluded part of the object to be detected from the first image as the feature to be identified of the object to be detected includes: extracting an unoccluded part of the object to be detected from the first image Part of the first feature, and obtains the second feature of the occluded part of the object to be detected; the first feature and the second feature are used as the feature to be identified of the object to be detected.

Therefore, in addition to using the features of the unoccluded portion of the object to be detected, the feature of the occluded portion is also combined, thereby improving the feature richness of the object to be detected.

In some embodiments, acquiring the second feature of the occluded portion of the object to be detected includes: extracting the feature of the occluded portion from the first image as the second feature; or, acquiring a preset feature of the occluded portion as the second feature features, wherein the preset features include features obtained based on at least one reference feature, and each reference feature is obtained by extracting an area corresponding to the occluded part in a reference target that does not have an occluded part.

Therefore, regarding the method of determining the features of the occluded part, the features of the occluded part can be directly extracted. Since the features of the occluded part can be different to a certain extent with different targets to be detected, this method can improve the accuracy of recognition; It is also possible to obtain a preset feature as the feature of the occluded part. This method does not need to perform feature extraction on the occluded part, which can reduce the consumption of processing resources and improve the processing efficiency.

In some embodiments, using the feature to be identified to identify the target to be detected, and obtaining a recognition result, includes at least one of the following: in the case that the preset target includes one, obtaining the difference between the feature to be identified and the pre-stored feature of the preset target If the first similarity satisfies the first preset condition, it is determined that the identification result includes the target to be detected passing the identity authentication; in the case of multiple preset targets, the features to be identified are obtained respectively The second similarity with the pre-stored features of each preset target, and determining the recognition result includes determining the identity of the target to be detected as the identity of the preset target corresponding to the second similarity that satisfies the second preset condition.

Therefore, by calculating the first similarity with the pre-stored features of a specific preset target, or calculating the similarity with the pre-stored features of a plurality of preset targets, it is possible to realize the identification of the target to be detected with the specific target according to the actual scene requirements. A preset target is compared or compared with a preset target in a database.

In some embodiments, the method includes at least one of the following: the first preset condition includes that the first similarity is greater than a first similarity threshold; the second preset condition includes that the second similarity is greater than a second similarity threshold.

Therefore, by setting the first similarity thresholds in different scenarios respectively, the recognition result is more accurate.

In some embodiments, the method includes at least one of the following: a first similarity threshold when the feature to be identified includes a second feature of an occluded portion of the object to be detected is smaller than a threshold value when the feature to be identified does not include the second feature The first similarity threshold in the case; the second similarity threshold in the case where the feature to be identified includes the second feature, is smaller than the second similarity threshold in the case where the feature to be identified does not include the second feature.

Therefore, if the feature to be identified contains the second feature, the second feature may be different from the real feature of the key points of the occluded part of the target to be detected. Therefore, in this case, appropriately reducing the similarity threshold can improve the recognition accuracy.

In some embodiments, before acquiring the first similarity between the feature to be identified and the pre-stored feature of the preset target, the method further includes: in response to an account registration request, registering an account for the user; In the frame of the second image, a second image that meets the preset quality requirements is determined, and the features of the user's preset part are extracted from the determined second image; the features of the preset part are associated with the account, and the preset part is The feature of the saves the pre-stored feature as the preset target.

Therefore, the features of the preset part are extracted by first determining the second image that meets the quality requirements, so that the extracted features are more accurate.

In some embodiments, when the object to be detected is occluded by a preset object, before extracting at least the first feature of the unoccluded portion of the object to be detected from the first image, the method further includes at least one of the following steps: from Among the multiple frames of first images containing the target to be detected, determine the first image that meets the preset quality requirements as the first image for subsequent feature extraction; preprocess the first image for subsequent feature extraction; perform subsequent feature extraction Perform living body detection on the first image of the to-be-detected object, and when the living-body detection result is that the target to be detected is a living body, it is determined to extract at least the first feature of the unoccluded part of the target to be detected from the first image and its subsequent steps.

Therefore, preprocessing is performed before feature extraction, so that the extracted features are more accurate. By identifying the target to be detected only when the target to be detected is a living body, the security of the recognition is enhanced, and it can be To a certain extent prevent prosthetic attack.

In some embodiments, from the multiple frames of first images containing the target to be detected, determining a first image that meets a preset quality requirement as the first image for subsequent feature extraction includes: based on a quality factor of each frame of the first image , correspondingly obtain the quality score of the first image of each frame, wherein the quality factor of the first image includes at least one of the following: pose information of the target to be detected relative to the photographing device, a value used to reflect the size of the target to be detected in the first image parameter information and brightness information of the first image; based on the quality score, determine the first image that meets the preset quality requirements as the first image for subsequent feature extraction, wherein the quality score of the selected first image is higher than that of other first images quality score.

Therefore, feature extraction is performed by determining images whose quality scores meet the requirements, so that the extracted features can better represent the target to be detected.

In some embodiments, preprocessing the first image for subsequent feature extraction includes: in the case that the first image includes multiple objects to be detected, determining the location of the objects to be detected in the first image that meet the preset extraction requirements target area, and remove the image portion other than the target area in the first image; and/or, detect that the inclination angle of the object to be detected in the first image is greater than a preset angle, and rotate the first image to the inclination of the object to be detected The angle is less than or equal to the preset angle.

Therefore, when there are multiple objects to be detected in the first image, only the objects to be detected that meet the preset extraction requirements are determined, and the objects to be detected that do not meet the requirements are discarded, reducing the identification of objects to be detected that do not meet the requirements. Second, when the inclination angle of the object to be detected in the first image is corrected, the influence caused by the inclination of the object to be detected is reduced.

In some embodiments, the preset extraction requirements include that the area of the area corresponding to the target to be detected is larger than the area of the area corresponding to other targets to be detected, and the other targets to be detected include targets other than the target to be detected.

Therefore, because the larger the area of the object to be detected is, the more accurate the extracted features are. Therefore, by selecting the object to be detected with a larger area, the result to be detected is more accurate.

In some embodiments, the target to be detected includes a human face, and the preset object includes a mask.

Therefore, by judging whether the face is wearing a mask and performing corresponding operations, for example, if the face does not wear a mask or the way of wearing a mask is inaccurate, a corresponding reminder can be issued so that the user can adjust in time; if the face wears a mask, face recognition, etc.

An embodiment of the present application provides an image detection device, comprising: an image acquisition module configured to acquire a first image containing a target to be detected; a target detection module configured to detect the first image to obtain a detection result of the first image , wherein the detection result includes whether the target to be detected in the first image is blocked by a preset object; the operation execution module is configured to execute a preset operation matching the detection result.

An embodiment of the present application provides an electronic device, including a memory and a processor, where the processor is configured to execute program instructions stored in the memory, so as to implement the above image detection method.

An embodiment of the present application provides a computer-readable storage medium, on which program instructions are stored, and the above-mentioned image detection method is implemented when the program instructions are executed by a processor.

An embodiment of the present application provides a computer program, including computer-readable code, when the computer-readable code is executed in an electronic device, the processor in the electronic device executes the image detection method described above.

The embodiments of the present application provide an image detection method and related devices, equipment, storage media, and computer programs. By detecting a first image containing a target to be detected, it is possible to obtain whether the target to be detected is blocked, and then execute an image detection method that matches the detection result. The preset operation can determine whether the object to be detected is occluded, so that a subsequent preset operation matching the detection result can be performed, and flexible processing can be performed based on the occlusion state of the object to be detected in the image.

It is to be understood that the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Description of drawings

The accompanying drawings, which are incorporated into and constitute a part of the specification, illustrate embodiments consistent with the present application, and together with the description, serve to explain the technical solutions of the present application.

FIG. 1 is a schematic flowchart of an embodiment of an image detection method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a first image in an embodiment of an image detection method according to an embodiment of the present application;

3 is a schematic diagram of a preprocessed first image in an embodiment of an image detection method according to an embodiment of the present application;

FIG. 4 is a schematic structural diagram of an embodiment of an image detection apparatus according to an embodiment of the present application;

FIG. 5 is a schematic structural diagram of an embodiment of an electronic device according to an embodiment of the present application;

FIG. 6 is a schematic structural diagram of an embodiment of a computer-readable storage medium according to an embodiment of the present application.

detailed description

The solutions of the embodiments of the present application will be described in detail below with reference to the accompanying drawings.

In the following description, for purposes of illustration and not limitation, specific details such as specific system structures, interfaces, techniques, etc. are set forth in order to provide a thorough understanding of the present application.

The term "and/or" in this article is only an association relationship to describe the associated objects, indicating that there can be three kinds of relationships, for example, A and/or B, it can mean that A exists alone, A and B exist at the same time, and A and B exist independently B these three cases. In addition, the character "/" in this document generally indicates that the related objects are an "or" relationship. Also, "multiple" herein means two or more than two. In addition, the term "at least one" herein refers to any combination of any one of the plurality or at least two of the plurality, for example, including at least one of A, B, and C, and may mean including from A, B, and C. Any one or more elements selected from the set of B and C.

The image detection method in the embodiment of the present application can be applied to a device with image processing capability. In addition, the device may have an image capture or video capture function, for example, the device may include components such as a camera for capturing images or videos. Or the device can obtain the required video stream or image from other devices by performing data transmission or data interaction with other devices, or access the required video streams from the storage resources of other devices or are images etc. Among them, other devices have image capture or video capture functions, and have a communication connection with the device, for example, the device can perform data transmission or data interaction with other devices through Bluetooth, wireless network, etc., this application The embodiment does not limit the communication mode between the two, which may include but not limited to the above-mentioned cases. In an implementation manner, the device may include a mobile phone, a tablet computer, an interactive screen, etc., which is not limited herein.

Please refer to FIG. 1 . FIG. 1 is a schematic flowchart of an embodiment of an image detection method according to an embodiment of the present application. Wherein, the image detection method is performed by an electronic device, and the method may include the following steps:

Step S11: Acquire a first image containing the target to be detected.

Among them, the first image containing the target to be detected may be the initial image containing the target object collected by directly calling the camera of the device, of course, it may be an image obtained from other devices, or it may be selected after frame selection, brightness adjustment, resolution, etc. The targets to be detected can also include human faces, faces or limbs of other animals, and so on. Therefore, the manner of acquiring the first image containing the target to be detected is not limited. Among them, other devices refer to devices that can only be operated by using different central processing units.

Step S12: Detect the first image to obtain a detection result of the first image, where the detection result includes whether the target to be detected in the first image is blocked by a preset object.

The preset object refers to any object that can block the target to be detected, such as masks, scarves, glasses, or visible objects such as arms and paper.

The first image is detected. During the detection process, it is necessary to detect whether the first image contains an image to be detected. If there is a target to be detected, it is determined whether the target to be detected is blocked by a preset object. The way of judging whether the target to be detected is occluded by a preset object may be to train an occlusion detection model before detecting the first image, and input the first image into the occlusion detection model (for example, it may be a neural network with occlusion detection function). network), it can be known whether the target to be detected in the first image is blocked by a preset object. Of course, in some embodiments, the way of judging whether the target to be detected is blocked by a preset object may also be judging whether the preset detection position in the target to be detected is blocked, and whether the object blocked by the preset detection position satisfies the preset object conditions of. The features of the occluded object and the preset occluded object can be extracted, and the similarity can be judged to obtain a detection result including whether the object to be detected is occluded.

Step S13: Execute a preset operation matching the detection result.

Wherein, the detection result may be that the target to be detected is occluded but not by the preset object, the target to be detected is occluded by the preset object but the occlusion method is not the preset method, the target to be detected is occluded by the preset object and the occlusion method is the same as the preset method. The setting method is the same or the target to be detected is not blocked at all. In the embodiment of the present application, it is considered whether the target to be detected in the first image is blocked by a preset object. Of course, in some embodiments, the preset object may be set to any object, that is, an occlusion detection model is used, and as long as it is determined that the object to be detected is occluded, a corresponding preset operation is performed. The preset operation can be any operation related to object detection, such as recognition, etc.

In the above solution, by detecting the first image containing the target to be detected to obtain whether the target to be detected is occluded, and then performing a preset operation matching the detection result, it is possible to determine whether the target to be detected is occluded, so as to carry out subsequent and The preset operation of detection result matching realizes flexible processing based on the occlusion state of the target to be detected in the image.

In some embodiments, the detection result is obtained by using a neural network to detect the first image. Before detecting the first image, a preset object occlusion model is trained first, so that the trained preset object occlusion model can detect whether the target to be detected in the first image is occluded by the preset object. The preset object may be one or more, such as two or three different objects. When there are multiple preset objects, it indicates that a preset object occlusion model can determine whether the target to be detected is preset When the object is occluded, it can also detect which preset object is occluded by the target to be detected. The target to be detected may be a face, and the preset object may be a mask. Correspondingly, the preset object occlusion model is a mask detection model. The mask detection model can detect whether the target to be detected wears a mask. Of course, in some embodiments, it can also simultaneously detect whether the target to be detected is wearing a mask in the correct way. The detection is performed by a neural network trained in advance, which makes the detection result more accurate and the detection speed faster.

In the case that the target to be detected is not blocked by a preset object, a first reminder is issued, wherein the first reminder is used to prompt the use of a preset object to block the target to be detected. Among them, the first reminder can have various reminder methods, including the method of frame selection by a face frame. If it is detected that the target to be detected is not blocked by a preset object, the face area will be framed in the form of a face frame. , the face frame at this time can have a warning color, such as red or yellow. The first reminder can also be a combination of the face frame and the prompt text. The prompt text, such as you are not wearing a mask, please wear a mask, of course , it can also be in the form of voice reminder, or in the form of flashing indicator lights. Of course, these forms can be used in combination or alone, which is not specified here. For example, when the mask detection model detects that the face does not wear a mask, a first reminder is issued to remind the face to wear a mask to cover the mouth and nose of the face. By issuing a first reminder when the target to be detected is not blocked, the situation that the target to be detected is not blocked by a preset object is timely reminded, and the person being reminded can also take corresponding measures in time.

In some embodiments, the detection result further includes whether the occlusion mode in which the target to be detected is occluded by the preset object is a preset occlusion mode. Among them, when training the neural network of the preset object occlusion model, the preset occlusion mode can be marked in the training sample, wherein the preset occlusion mode can be the correct occlusion mode, so that the preset object occlusion model is trained so that the trained pre- It is assumed that the object occlusion model can determine whether the occlusion mode of the preset object is the preset occlusion mode when it is detected that the target to be detected is occluded by the preset object. When the target to be detected is blocked by a preset object and the blocking mode does not belong to the preset blocking mode, a second reminder is issued. The second reminder is used to prompt to adjust the occlusion mode of the preset object. For example, when the target to be detected is a face, the preset object is a mask, and the preset occlusion method is the correct way of wearing a mask. When it is detected that the face is wearing a mask, it is determined whether the way of wearing the mask is the correct way of wearing the mask. If it is not correct, a second reminder is issued to prompt the face to adjust the blocking method of the preset object. Of course, in some embodiments, the preset occlusion mode may be various occlusion modes, such as the correct occlusion mode, the first wrong occlusion mode, the second wrong occlusion mode, etc. Assuming that the occlusion mode of object occlusion is the first wrong occlusion mode, a reminder corresponding to the first erroneous occlusion mode is issued, and when it is detected that the occlusion mode in which the target to be detected is occluded by the preset object is the second wrong occlusion mode, Then, a reminder corresponding to the second wrong occlusion mode is issued to prompt the target to be detected to adjust the occlusion mode to the correct occlusion mode. For example, also when the target to be detected is a face and the preset object is a mask, the correct occlusion method is that the mask covers the nose and mouth at the same time, and the first wrong occlusion method is that the mask covers the nose but does not cover the mouth. The reminder corresponding to the first wrong occlusion method is to remind the face to cover the mouth at the same time. The second wrong occlusion method is that the mask covers the mouth but not the nose. The reminder corresponding to the second wrong occlusion method is: Prompt the face to cover the nose at the same time. When the occlusion mode is incorrect, a second reminder is issued, so as to adjust the occlusion mode of the target to be detected in time. Wherein, the method of the second reminder is similar to that of the first reminder, and can also be in the form of a face frame and a text reminder, and a face frame and a voice reminder or a separate text reminder or a separate voice reminder or a warning light flashing, etc., Of course, if there are a variety of different preset blocking methods, then the text reminder or voice should be set accordingly. For example, if the preset blocking method is the first wrong blocking method, then the text reminder corresponds to the first wrong blocking method. .

In some embodiments, in some business scenarios, when the target to be detected is blocked by a preset object, the target to be detected is identified. In other business scenarios, if the target to be detected is not blocked by a preset object, the target to be detected will not be identified. If it is detected that the face does not wear a mask, the face recognition will not be performed, and the face without a mask will not be able to enter the station through face recognition. Of course, according to the needs of the business scenario, even if it is detected that the target to be detected is not blocked by a preset object, the target to be detected can still be identified. To identify the target to be detected, it is necessary to perform feature extraction on the target to be detected, and before the feature extraction is performed, the first image that meets the preset quality requirements can be determined from the multi-frame first images containing the target to be detected as the follow-up feature extraction. first image. The manner of determining the first image that meets the preset quality requirements as the first image for subsequent feature extraction may be based on the quality factor of each frame of the first image, correspondingly obtaining the quality score of each frame of the first image, wherein the first image The quality factor of the image includes at least one of the following: pose information of the target to be detected relative to the photographing device, parameter information used to reflect the size of the target to be detected in the first image, and brightness information of the first image. The pose information of the target to be detected relative to the photographing device may be angle information of the target to be detected relative to the photographing device. Wherein, the angle information of the target to be detected relative to the photographing device here may be the angle information of the target to be detected relative to the lens during shooting. For example, taking the lens as the origin, establish a three-dimensional coordinate system, in which the line connecting the lens and the center of the earth is the X axis, the line extending directly in front of the lens and perpendicular to the X axis is the Y axis, and the line perpendicular to the X axis and the Y axis is the Y axis. for the Z axis. The three-dimensional coordinate system is only used to represent the angle between the target to be detected and the photographing device. In some embodiments, the selection of the origin of the three-dimensional coordinate system or the selection of three directions may be different from the embodiments of the present application. The angle can be divided into angles relative to the XYZ direction of the lens. For example, if the target to be detected faces the lens, the angles along the XYZ direction are all 0° (degrees), and the front side of the target to be detected faces the first image acquisition component. Then the angle of the target to be detected relative to the first image acquisition component in the X direction is 90°, the angle along the Y direction is 0°, and the angle along the Z direction is also 0°. Here, because the target to be detected revolves around the X axis It is rotated by 90°, so the angle between the target to be detected and the X axis, that is, relative to the X direction, is 90°. Of course, the smaller the angle in all directions, the better. The parameter information used to reflect the size of the object to be detected in the first image includes the size of the area of the first image occupied by the object to be detected, where the area size can be represented by the size of the area of the first image occupied by the object to be detected. Of course, the premise is that the target to be detected is completely contained in the first image. If the first image only contains a part of the target to be detected, then the quality factor score of the size of the target to be detected in the first image of this frame is equal to relatively low. The brightness information of the first image is not as high as possible, but as the brightness of natural light at the current moment is better, and the score of this quality factor is relatively higher. Wherein, the weights occupied by the above three quality factors are set according to the influence degree relationship of the above three quality factors on the image quality. For example, the weight of the angle is set to 0.4, and the other two are set to 0.3 respectively. Of course, this is only an example. The weights between the various quality factors can be set according to the needs. In some embodiments, in addition to these three quality factors , and may also include factors such as the blur degree of the first image, as long as the factors that can affect the image quality can be used to calculate the quality score of the image. By selecting images whose quality scores meet the requirements for feature extraction, the extracted features can better represent the target to be detected. Of course, the setting of the weight can take into account the actual image detection accuracy requirements, the processing capability of the image detection device, and the resource occupancy. For example, in some embodiments, if the processing capability of the image detection device is high and the resource occupancy is small, multiple quality factors may be considered to calculate the quality score, and if the processing capability of the image detection device is too low, appropriate use of Several quality factors are used to calculate the quality score, for example, an appropriate quality factor is selected according to the time required to calculate each quality factor or the memory space occupied. Therefore, the choice of how many quality factors to use or which quality factors to use can be made flexibly. Of course, in some embodiments, a lower quality score threshold may also be determined. If the quality score of the first image is lower than the quality score threshold, it will be excluded, and the first image with the quality score greater than the quality score threshold will be retained. .

In some embodiments, before the feature extraction is performed on the first image for identification, the first image for subsequent feature extraction may also be preprocessed. Wherein, the preprocessing method may be: in the case that the first image includes multiple targets to be detected, determine the target area of the target to be detected in the first image that meets the preset extraction requirements, and remove the target area in the first image. part of the image outside the area. The target area here may be an area containing a target to be detected. That is to say, when the first image contains multiple targets to be detected, the identification is not performed on the complete first image, but only on the target area of the target to be detected that meets the preset extraction requirements. Therefore, , to a certain extent, the noise generated by other targets to be detected during the recognition process is reduced, thereby reducing the influence of targets to be detected that do not meet the requirements on the recognition results. The preset extraction requirement may be that the area of the area corresponding to the target to be detected is larger than the area of the area corresponding to other targets to be detected, wherein the other targets to be detected include targets other than the target to be detected. If there are multiple objects to be detected in the first image, the areas occupied by the multiple objects to be detected may be inconsistent, and the objects to be detected with larger areas have a relatively higher recognition rate during the recognition process. Detection target for identification. Wherein, if there are multiple objects to be detected with the same area, the object to be detected whose center is closer to the center of the first image may be identified, or in other embodiments, the corresponding objects of all objects to be detected may be obtained separately. Of course, all the objects to be detected in the latter refer to the objects to be detected whose areas are tied first or whose areas are larger than a preset area extraction threshold.

In some embodiments, before the feature extraction is performed on the first image for identification, the preprocessing of the first image for subsequent feature extraction may also be that it is detected that the inclination angle of the object to be detected in the first image is greater than a preset angle, and rotate the first image until the inclination angle of the object to be detected is less than or equal to the preset angle. In some embodiments, in addition to rotating the entire first image, the rotation method may only rotate the target to be detected, or the target area including the target to be detected. Therefore, the method of aligning the target to be detected is not limited here. . The preset angle may be within 0° to 180° clockwise or counterclockwise. In this embodiment of the present application, the preset angle is selected to be set to 0°. In some embodiments, the preset angle may also be 30°, 35°, and the like. Wherein, the way of judging whether the object to be detected is inclined by a preset angle may be to obtain the included angle between a line connecting a preset first key point and a preset second key point in the object to be detected and a vertical line, Whether the angle is greater than the preset angle, if it is greater, rotate the first image so that the included angle is less than or equal to the preset angle, and the preset first key point after rotation is located above the preset second key, which is relative to The bottom edge of the first image is determined. Of course, the inclination angle may also be the inclination angle of the object to be detected relative to a certain position of the first image, for example, the inclination angle of the object to be detected relative to the center of the first image. Of course, the preset angle here can be set according to the requirements of different scenarios, for example, it can be determined according to the area of the first image where the area where the target to be detected is located. For example, when the area of the area where the target to be detected is located is larger than the first area preset value, the preset angle can be set to be greater than 30°, and when the area of the area where the target to be detected is located is smaller than the second area preset value, the preset angle can be set. Set the angle to be less than 30°. Because the larger the area of the target to be detected is, the larger the area of the target to be detected is, that is, the less the target to be detected is affected by the angle, the more tolerant the inclination angle of the target to be detected is. The greater the influence of the tilt angle, the stricter the tilt angle of the object to be detected. Of course, this is just an example, and in some embodiments, other correspondences between areas and preset angles may also be set, which are not specified here.

For example, referring to FIGS. 2 and 3 , FIG. 2 is a schematic diagram of a first image in an embodiment of an image detection method according to an embodiment of the present application, and FIG. 3 is a schematic diagram of a preprocessed first image in an embodiment of the image detection method according to an embodiment of the present application. . As shown in FIG. 2 , the lower half of the object to be detected 21 in the first image 20 is blocked by a preset object 22 , and the object to be detected 21 is obviously inclined to the left, that is, the upper left corner of the object to be detected 21 (the first preset The angle between the connection line between the key point) and the lower left corner point (the second preset key point) and a vertical line is 30°, that is, the inclination angle of the target 21 to be detected is 30° greater than the preset angle of 0°, then the The first image 20 is rotated to the right, that is, rotated 30° clockwise. The rotated first image is shown in FIG. 3 . In FIG. 3 , the upper left corner point (the first preset key point) and the lower left corner point of the target 21 to be detected are shown in FIG. 3 . The included angle between the connection line (the second preset key point) and a vertical line is 0°, which is equal to the preset angle of 0°.

When the inclination angle of the target to be detected in the first image is greater than the preset angle, the target to be detected is straightened, so that the subsequent process of performing live detection or target recognition on the target to be detected is reduced because the target to be detected is inclined due to the inclination of the target. the impact caused.

In some embodiments, before the feature extraction is performed on the first image for identification, a liveness detection may also be performed on the subsequent feature extraction first image, and when the liveness detection result is that the target to be detected is a living body, it is determined to execute the At least the first feature of the unoccluded part of the object to be detected is extracted from the first image and the subsequent steps thereof. Wherein, if there are multiple objects to be detected in the first image, the object to be detected with the largest area is selected for living body detection. The in vivo detection can be performed by inputting the target area corresponding to the target to be detected into the in vivo detection model, wherein the in vivo detection model is obtained by training a number of images containing the target to be detected occluded by the preset object. By identifying the target to be detected only when the target to be detected is a living body, the security of the identification is enhanced, and prosthetic attacks can be prevented to a certain extent.

Wherein, in the process of identifying the object to be detected, at least the first feature of the unoccluded part of the object to be detected is first extracted from the first image as the feature to be identified of the object to be detected. Wherein, the first feature refers to the feature of the key point that is not occluded in the target to be detected. Wherein, the first feature of the unoccluded part of the object to be detected can be extracted from the first image, and the second feature of the occluded part of the object to be detected can be obtained, and the first feature and the second feature are used as the object to be detected. Identify features. The second feature here is the feature of the key points of the occluded part of the target to be detected. There are two ways to obtain the second feature of the occluded portion. One is to extract the feature of the occluded portion from the first image as the second feature. That is to say, although this part is occluded by the preset object, the second feature of the occluded part is extracted according to the method of extracting the first feature that is not occluded, that is, the same processing mechanism is adopted regardless of whether the target to be detected is occluded by the preset object. , that is, whether it is occluded by a preset object does not affect the feature extraction process. Of course, in some embodiments, if there is no preset object blocking the target to be detected, this method can still be used to extract key point features in the target to be detected. For example, when the target to be detected is a face and the preset object is a mask, the method of extracting the second feature is to treat the face as not being blocked by the mask, and extract the features of each key point on the face, that is to say, for the face wearing a mask The same processing mechanism is used as the face without a mask, that is, wearing a mask will not affect the process of feature extraction. Another way is to obtain a preset feature of the occluded part as the second feature, wherein the preset feature may be a feature obtained based on at least one reference feature, and each reference feature is the difference between the reference target without the occluded part and the reference target. The area corresponding to the occluded part is extracted. That is to say, before identifying the first image, the reference features of the key points of the occluded part are preset, that is, the features of the occluded part are complemented. For example, a number of detection results are pre-extracted for the features corresponding to preset parts in the target to be detected that do not have an occluded part, and then the average value of the extracted several features is supplemented as the reference feature of the part occluded by the object, wherein, It may be to pre-extract the features of the corresponding preset parts in the target to be detected whose detection results are not occluded by the preset object, and then fill in the average value of the extracted features as the reference feature of the part occluded by the preset object. . For example, pre-extract the features of the corresponding preset parts in several faces without masks, that is, the features of the mask-wearing parts, such as nose, mouth, etc., and make up the average value of the extracted features as the mask covered by the mask. part of the preset reference feature. Regarding the method of determining the features of the occluded parts, the features of the occluded parts can be directly extracted. Since the features of the occluded parts can be different to a certain extent with the different objects to be detected, this method can improve the accuracy of recognition; The preset feature can be obtained as the feature of the occluded part, and this method does not need to perform feature extraction on the occluded part, which can reduce the consumption of processing resources and improve the processing efficiency.

In some embodiments, after obtaining the to-be-identified feature of the to-be-detected object, the to-be-identified feature is used to identify the to-be-detected object. Among them, the recognized scenes can be divided into 1:1 scenes and 1:N scenes, where 1:1 refers to the comparison between two features, and 1:N refers to one feature and multiple features comparison between. In a 1:1 scenario, that is, when the preset target includes one, the first similarity between the feature to be identified and the pre-stored feature of the preset target is obtained, and when the first similarity satisfies the first prediction In the case of setting conditions, it is determined that the identification result includes that the target to be detected has passed the identity authentication. The first preset condition may be that the first similarity is greater than the first similarity threshold. Wherein, the first similarity threshold when the feature to be identified includes the second feature of the occluded portion of the target to be detected is smaller than the first similarity threshold when the feature to be identified does not include the second feature. If the feature to be identified contains a second feature, the second feature may be different from the real feature of the key points of the occluded part of the target to be detected. Therefore, in this case, appropriately reducing the similarity threshold can improve the accuracy of identification sex. Wherein, when the feature to be identified does not include the second feature, the selection of the first similarity threshold may be determined according to the ratio of the number of occluded key points to the total number of key points of the target to be detected. For example, if the key points of the occluded part are one-third of the total number of key points of the target to be detected, the first similarity threshold may be determined to be one-third of the similarity threshold of the unobstructed target to be detected. At this time, when the feature to be identified includes the second feature, the first similarity threshold can be set to be 0.1 smaller than the first similarity threshold in the case where the feature to be identified does not include the second feature, or a smaller value, which is not done here. Regulation. For example, the similarity threshold for identifying the unoccluded target to be detected may be between 0.6 and 1. Of course, this is only an example. In some embodiments, the first similarity threshold in the case where the feature to be identified includes the second feature of the occluded part of the target to be detected may also be equal to the value of the feature to be identified that does not include the second feature. The first similarity threshold value in this case, if the feature to be identified includes the second feature, can also be determined according to the above method and can be determined according to the actual situation.

In some embodiments, before acquiring the first similarity between the feature to be identified and the pre-stored feature of the preset target, an association between the user account and the pre-stored feature of the preset target is established first. The implementation is as follows: in response to an account registration request, register an account for the user. Among them, the account here can be some electronic payment accounts, as long as the application program that can perform target recognition can respond to the account registration request and register the account for the user. The user can register through the mobile phone number in the corresponding application. After the registration is successful, the user obtains the user name, password and other information. From at least one frame of the second image captured by the user, a second image that meets the preset quality requirements is determined, and the feature of the preset part of the user is extracted from the determined second image. Wherein, the preset part here is the same as the preset part of the target to be detected. The step of selecting the second image that meets the preset quality requirement is the same as the above-mentioned step of selecting the first image that meets the preset quality requirement. Finally, the feature of the preset part is associated with the account, and the feature of the preset part is saved as the pre-stored feature of the preset target. That is, the preset part of the user is the preset target.

Among them, in the 1:N scenario, that is, in the case where there are multiple preset targets, the second similarity between the features to be identified and the pre-stored features of each preset target is obtained respectively, and it is determined that the recognition results include The identity of the target to be detected is determined as the identity of the preset target corresponding to the second degree of similarity satisfying the second preset condition. The second preset condition may be that the second similarity is greater than the second similarity threshold. Wherein, in general, satisfying the second preset condition referred to here is not only greater than the second similarity threshold, but is often a parameter that takes the maximum value among all the second similarity. That is, the preset target identity corresponding to the largest second similarity is selected as the identity of the target to be detected. Wherein, the second similarity threshold when the feature to be identified includes the second feature is smaller than the second similarity threshold when the feature to be identified does not include the second feature. If the feature to be identified contains a second feature, the second feature may be different from the real feature of the key points of the occluded part of the target to be detected. Therefore, in this case, appropriately reducing the similarity threshold can improve the accuracy of identification sex. The method for determining the second similarity threshold is the same as the method for determining the first similarity threshold.

1:N can be in a scenario involving many faces. For example, an office building or a company has installed face recognition gates at the entrance and exit. When the registered person appears in front of the gate, the camera on the gate detects and captures the face, and compares the captured face with the face database. When the comparison is successful, open the gate, when an unregistered person appears at the gate, the comparison should be unsuccessful and the gate does not respond.

In the above solution, by detecting the first image containing the object to be detected to obtain whether the object to be detected is blocked, and then performing a preset operation matching the detection result, it can be determined whether the object to be detected is blocked and subsequent detection can be performed The preset operation of result matching realizes flexible processing based on the occlusion state of the object to be detected in the image.

Based on the foregoing embodiments, the embodiments of the present application further provide an image detection method, the method utilizes the model detection capability based on the deep learning algorithm, and starts from the face verification scene, and realizes the 1:1 and 1:N scenes At the same time, it provides a method for checking whether a mask is worn, as well as the realization method of face comparison and face retrieval in the process of wearing a mask.

(1) The face verification scenarios mainly include 1:1 and 1:N scenarios.

For example, 1:1 in the payment scenario refers to the 1:1 verification between the face photo captured in real time and the bottom library image bound by the member. If it is confirmed that they are the same user, the authentication is passed. For another example, the 1:N scenario is more related to face retrieval. For example, an office building or a company has installed face recognition gates at the entrance and exit. Users register and form a face database. When the user appears in front of the gate, the camera on the gate detects and captures the face, and compares the captured face image with the pictures in the face database. When the comparison is successful, the gate is opened, and the comparison When unsuccessful, the gate will not respond, that is, the gate will remain closed.

In some embodiments, whether in a 1:1 scenario or a 1:N scenario, protection against living bodies is required. For example, the attack of dummies such as photos (including photos obtained by taking pictures of people, electronically synthesized photos, etc.), masks, etc.

(2) The image detection method in the 1:1 scene can be implemented in the following ways:

First, the user has an account, such as an account for electronic payment. The user registers through the mobile phone number in the application system corresponding to the electronic payment account. After the registration is successful, the user obtains information such as user name and password, that is, an electronic payment account. Secondly, the application system guides the user to perform the operation of binding the face through some activities. When the user passes the living body authentication, the face recognition is performed, and when the quality of the face in the video meets the face collection requirements, the user's face is collected. image. Wherein, a frame with the highest quality in the shooting process can be selected, and the evaluation criteria of the quality include one or more of the dimensions such as the angle of the human face, the intensity of the illumination, and the size of the human face.

The collected face image of the user can be associated with the account, specifically the identification of the account, as a comparison picture of the face base library. When the user consumes online, the order is confirmed and the payment link is entered. At this time, if the user has bound the face image and chooses face payment, it will enter the face capture link. During face detection, check the wearing of masks (that is, face attribute detection). If you wear a mask, continue the subsequent process. If you do not wear a mask, you can remind the user to wear a mask by playing voice or displaying text.

When the user wears a mask, perform face detection. During the shooting process, select the frame with the highest quality, and align the selected frame with the face (for example, if the face exists If the tilt angle is a certain angle, the face is straightened), and the face feature value extraction is performed based on the aligned face image.

Among them, there are two ways to extract and compare the facial feature value. The first is to extract the facial feature value including the mask part. For example, the feature value of the face without a mask in the image of the face base library is A. In the same way, extract the feature of the face wearing a mask, extract the feature value A1, and then compare the feature vector of the feature value A1 with the mask and the feature value A without the mask. The second is to extract the facial feature values of the visible part above the mask. Assuming that all faces have 128 key points and 64 key points above the mask, then extract the eigenvalues from the 64 key points of the visible part above the mask, Compare with the eigenvalues extracted from the corresponding 64 key points in the picture of the face base library. Further, the generated feature value is compared with the feature value of the picture in the user's described face base library, and when the comparison threshold is exceeded (for example, the comparison threshold is set to 0.8, and the similarity exceeds 0.8, it is considered to be the same user), the comparison is passed, and the face verification process ends.

(3) The image detection method in the 1:N scene can be implemented in the following ways:

First, in the office building personnel management business system, enter the photo of each user who has permission to enter the building (forming a comparison picture of the face database under the traffic scene of the building). There is a one-to-one correspondence between pictures and building users. Secondly, the pictures in the face base library are bound and associated with the face recognition device of the gate, so that the face recognition equipment of the gate can read the pictures in the face base library. When a person needs to pass through the gate, the face recognition device on the gate detects the face information, and then enters the face capture state (at this time, the camera module on the face recognition device can be turned on all the time to perform face tracking, The face frame always moves with the movement of the face).

Then, check the wearing of the mask during face detection (that is, face attribute detection), and perform different operations according to different inspection results. For the specific operation content, please refer to the content description in the above 1:1 scenario.

Among them, there are also two ways to extract and retrieve the facial feature value in the 1:N scene. The first is to extract the facial feature value including the mask part. It is assumed that there are A and B in the comparison picture of the face base library. , C, D, and E of the five face eigenvalues without masks, the same way is used to extract the features of the faces wearing masks, and the five eigenvalues A1, B1, C1, D1 and E1 are extracted. When retrieving the eigenvalue A1 of wearing a mask, if the distance between the eigenvalue A1 and the eigenvector of the eigenvalue A is the shortest, the corresponding eigenvalue A is retrieved. The second is to extract the facial feature value of the visible part above the mask, in which the generation part of the feature value is the same as the method in the above 1:1 scenario, please refer to the above description. After the eigenvalues are generated, the generated eigenvalues are then used for facial feature retrieval (1:N search) in the pictures of the face base database of all users in the entire building. When there are eigenvalues exceeding the comparison threshold, The retrieval is considered successful. If the retrieval is successful, the face recognition device transmits an opening signal, the gate opens, and the face verification process ends.

In the embodiment of the present application, through the above method, before face recognition, it is possible to check whether the user wears a mask and remind them, and when the user wears a mask, perform accurate face recognition and complete the face verification process. The efficiency of passage is greatly improved, while the risk of removing masks is reduced.

The execution subject of the image detection method may be an image detection apparatus, for example, the image detection method may be executed by a terminal device or a server or other processing device, wherein the terminal device may be a user equipment (User Equipment, UE), a mobile device, a terminal , cellular phones, cordless phones, personal digital assistants (Personal Digital Assistant, PDA), handheld devices, computing devices, in-vehicle devices, wearable devices, etc. In some possible implementations, the image detection method may be implemented by the processor calling computer-readable instructions stored in the memory.

Please refer to FIG. 4 , which is a schematic structural diagram of an embodiment of an image detection apparatus according to an embodiment of the present application. The image detection device 40 includes an image acquisition module 41 , a target detection module 42 and an operation execution module 43 . The image acquisition module 41 is configured to acquire a first image containing the target to be detected; the target detection module 42 is configured to detect the first image to obtain a detection result of the first image, wherein the detection result includes the target to be detected in the first image Whether it is blocked by a preset object; the operation execution module 43 is configured to execute a preset operation matching the detection result.

In some embodiments, the detection result is obtained by detecting the first image by the target detection module 42 using a neural network.

In the above scheme, detection is performed by a neural network trained in advance, so that the detection result is more accurate and the detection speed is faster.

In some embodiments, the operation execution module 43 performs a preset operation matching the detection result, including: when the target to be detected is not blocked by a preset object, issuing a first reminder; wherein the first reminder is used to remind the user to use The preset object occludes the target to be detected.

In the above solution, by issuing a first reminder when the object to be detected is not blocked, the situation that the object to be detected is not blocked by a preset object is timely reminded, and the person being reminded can also take corresponding measures in time.

In some embodiments, the detection result further includes whether the occlusion mode in which the target to be detected is occluded by a preset object is a preset occlusion mode; the operation execution module 43 performs a preset operation matching the detection result, including: In the case where the object is blocked and the blocking method does not belong to the preset blocking method, a second reminder is issued; wherein, the second reminder is used to prompt to adjust the blocking method of the preset object.

In the above solution, when the occlusion mode is incorrect, a second reminder is issued, so as to adjust the occlusion mode of the target to be detected in time.

In some embodiments, the operation execution module 43 performs a preset operation matching the detection result, including: when the object to be detected is occluded by a preset object, extracting at least the unoccluded part of the object to be detected from the first image The first feature of the object is used as the feature to be identified of the target to be detected; the feature to be identified is used to identify the target to be detected, and the identification result is obtained.

In the above solution, when the target to be detected is occluded by a preset object, the features of the unoccluded part are extracted for identification, which realizes the recognition based on the local features of the target to be detected, and since the local features are not occluded, it can represent the target to be detected. Detect the target to ensure the accuracy of recognition to a certain extent.

In some embodiments, the operation execution module 43 extracts at least the first feature of the unoccluded part of the object to be detected from the first image as the feature to be identified of the object to be detected, including: extracting the object to be detected from the first image The first feature of the unoccluded part is obtained, and the second feature of the occluded part of the object to be detected is obtained; the first feature and the second feature are used as the feature to be identified of the object to be detected.

In the above solution, in addition to using the features of the unoccluded part of the target to be detected, the feature of the occluded part is also combined, thereby improving the feature richness of the object to be detected.

In some embodiments, the operation execution module 43 obtains the second feature of the occluded portion of the object to be detected, including: extracting the feature of the occluded portion from the first image as the second feature; or, obtaining a preset of the occluded portion The feature is used as the second feature, wherein the preset feature includes a feature obtained based on at least one reference feature, and each reference feature is obtained by extracting an area corresponding to the occluded part in the reference target without the occluded part.

In the above scheme, regarding the feature determination method of the occluded part, the features of the occluded part can be directly extracted. Since the features of the occluded part can be different to a certain extent with different targets to be detected, this method can improve the accuracy of recognition. It is also possible to obtain a preset feature as the feature of the occluded part. This method does not need to perform feature extraction on the occluded part, which can reduce the consumption of processing resources and improve the processing efficiency.

In some embodiments, the operation execution module 43 uses the feature to be identified to identify the target to be detected, and obtains the recognition result including at least one of the following: in the case that the preset target includes one, obtain the difference between the feature to be identified and the preset target. The first similarity between the pre-stored features, and in the case that the first similarity satisfies the first preset condition, it is determined that the identification result includes that the target to be detected has passed the identity authentication; in the case of multiple preset targets, respectively obtain The second similarity between the feature to be identified and the pre-stored feature of each preset target, and determining the recognition result includes determining the identity of the target to be detected as the preset target corresponding to the second similarity that satisfies the second preset condition. identity.

The above scheme, by calculating the first similarity with the pre-stored features of a specific preset target, or calculating the similarity with the pre-stored features of multiple preset targets, so that the target to be detected can be compared with the actual scene requirements. A specific preset target is compared or compared with a preset target in a database.

In some embodiments, the first preset condition includes that the first similarity is greater than a first similarity threshold; the second preset condition includes that the second similarity is greater than a second similarity threshold.

In the above solution, by setting the first similarity thresholds in different scenarios respectively, the recognition result is more accurate.

In some embodiments, the first similarity threshold when the feature to be identified includes the second feature of the occluded portion of the object to be detected is smaller than the first similarity threshold when the feature to be identified does not include the second feature Threshold; the second similarity threshold when the feature to be identified includes the second feature is smaller than the second similarity threshold when the feature to be identified does not include the second feature.

In the above solution, if the feature to be identified contains a second feature, the second feature may be different from the real features of the key points of the occluded part of the target to be detected. Therefore, in this case, appropriately reducing the similarity threshold can improve the recognition accuracy.

In some embodiments, the image detection apparatus 40 further includes a pre-stored module (not shown). Before the operation execution module 43 obtains the first similarity between the feature to be identified and the pre-stored feature of the preset target, the pre-store module is configured to: in response to the account registration request, register an account for the user; In the second image, a second image that meets the preset quality requirements is determined, and the features of the user's preset part are extracted from the determined second image; the features of the preset part are associated with the account, and the features of the preset part are Features saves pre-existing features that are preset targets.

In the above solution, the features of the preset part are extracted by first determining the second image that meets the quality requirements, so that the extracted features are more accurate.

In some embodiments, when the object to be detected is occluded by a preset object, before extracting at least the first feature of the unoccluded portion of the object to be detected from the first image, the operation execution module 43 is further configured to perform the following At least one step: from multiple frames of first images containing the target to be detected, determining a first image that meets preset quality requirements as the first image for subsequent feature extraction; preprocessing the first image for subsequent feature extraction; Perform in vivo detection on the first image for subsequent feature extraction, and when the result of the in vivo detection is that the target to be detected is a living body, determine to extract at least the first feature of the unoccluded part of the target to be detected from the first image and its next steps.

In the above scheme, preprocessing is performed before feature extraction, so that the extracted features are more accurate, and the target to be detected is identified only when the target to be detected is a living body, thereby enhancing the security of identification, and can be used in Prevent prosthetic attack to some extent.

In some embodiments, the operation execution module 43 determines, from the multiple frames of first images containing the target to be detected, the first image that meets the preset quality requirements as the first image for subsequent feature extraction, including: based on the first image of each frame The quality factor of the image corresponds to the quality score of the first image of each frame, wherein the quality factor of the first image includes at least one of the following: pose information of the target to be detected relative to the photographing device, used to reflect the information to be detected in the first image Detect the parameter information of the target size and the brightness information of the first image; based on the quality score, determine the first image that meets the preset quality requirements as the first image for subsequent feature extraction, wherein the quality score of the selected first image is higher than Quality scores for other first images.

In the above solution, feature extraction is performed by determining images whose quality scores meet the requirements, so that the extracted features can better represent the target to be detected.

In some embodiments, the operation execution module 43 preprocesses the first image for subsequent feature extraction, including: when the first image includes multiple objects to be detected, determining that the objects to be detected that meet the preset extraction requirements are in the first image. a target area in an image, and remove the image portion other than the target area in the first image; and/or, detect that the inclination angle of the object to be detected in the first image is greater than a preset angle, and rotate the first image to the target area to be detected The inclination angle of the detection target is smaller than the preset angle.

In the above solution, when there are multiple objects to be detected in the first image, only the objects to be detected that meet the preset extraction requirements are determined, and the objects to be detected that do not meet the requirements are discarded, thereby reducing the number of objects to be detected that do not meet the requirements. Influence of the recognition result; secondly, when the inclination angle of the object to be detected in the first image is corrected, the influence caused by the inclination of the object to be detected is reduced.

In the above scheme, because the larger the area of the target to be detected, the more accurate the extracted features will be. Therefore, by selecting the target to be detected with a larger area, the result to be detected is more accurate.

The above scheme, by judging whether the face wears a mask, and performing corresponding operations, for example, if the face does not wear a mask or the way of wearing a mask is inaccurate, a corresponding reminder can be issued so that the user can adjust in time; if the face wears a mask , face recognition, etc.

Please refer to FIG. 5 , which is a schematic structural diagram of an embodiment of an electronic device according to an embodiment of the present application. The electronic device 50 includes a memory 51 and a processor 52, and the processor 52 is configured to execute program instructions stored in the memory 51, so as to implement the steps in any of the above image detection method embodiments. In an implementation scenario, the electronic device 50 may include, but is not limited to, a microcomputer and a server. In addition, the electronic device 50 may also include mobile devices such as a notebook computer and a tablet computer, which are not limited herein.

The processor 52 is configured to control itself and the memory 51 to implement the steps in any of the image detection method embodiments described above. The processor 52 may also be referred to as a central processing unit (Central Processing Unit, CPU). The processor 52 may be an integrated circuit chip with signal processing capability. The processor 52 may also be a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), a Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. In addition, the processor 52 may be jointly implemented by an integrated circuit chip.

In the above solution, by detecting the first image containing the target to be detected to obtain whether the target to be detected is blocked, and then performing a preset operation matching the detection result, it can be judged whether the target to be detected is blocked so that subsequent and detection can be performed The preset operation of result matching realizes flexible processing based on the occlusion state of the object to be detected in the image.

Please refer to FIG. 6 , which is a schematic structural diagram of an embodiment of a computer-readable storage medium according to an embodiment of the present application. The computer-readable storage medium 60 stores program instructions 61 that can be executed by the processor, and the program instructions 61 are used to implement the steps in any of the above image detection method embodiments.

In some embodiments, the embodiments of the present application provide a computer program, including computer-readable codes, when the computer-readable codes are executed in an electronic device, a processor in the electronic device executes the above method.

In some embodiments, the functions or modules included in the apparatus provided in the embodiments of the present application may be used to execute the methods described in the above method embodiments, and for implementation, reference may be made to the above method embodiments for brevity.

The above description of various embodiments has tended to emphasize the differences between the various embodiments, the same or similarities may be referred to each other for the sake of brevity.

In the several embodiments provided in this application, it should be understood that the disclosed method and apparatus may be implemented in other manners. For example, the device implementations described above are only illustrative. For example, the division of modules or units is only a logical function division. In actual implementation, there may be other divisions. For example, units or components may be combined or integrated. to another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, which may be in electrical, mechanical or other forms.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units. The integrated unit, if implemented as a software functional unit and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solutions of the embodiments of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, and the computer software products are stored in a storage The medium includes several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (Processor) to execute all or part of the steps of the methods in the various implementation manners of the embodiments of this application. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes .

Industrial Applicability

Embodiments of the present application provide an image detection method and related devices, equipment, storage media, and computer programs. The method includes: acquiring a first image containing an object to be detected; detecting the first image to obtain the The detection result of the first image, wherein the detection result includes whether the target to be detected in the first image is blocked by a preset object; and a preset operation matching the detection result is performed. According to the image detection method provided by the embodiment of the present application, it is possible to determine whether the object to be detected is occluded, so as to perform a subsequent preset operation matching the detection result, and realize flexible processing based on the occlusion state of the object to be detected in the image.

Claims

An image detection method, wherein the method comprises:

obtaining a first image containing the target to be detected;

Detecting the first image to obtain a detection result of the first image, wherein the detection result includes whether the target to be detected in the first image is blocked by a preset object;

A preset operation matching the detection result is performed.
The method according to claim 1, wherein the detection result is obtained by using a neural network to detect the first image.
The method according to claim 1 or 2, wherein the performing a preset operation matching the detection result comprises:

When the target to be detected is not blocked by the preset object, a first reminder is issued; wherein, the first reminder is used to prompt to use the preset object to block the target to be detected.
The method according to any one of claims 1 to 3, wherein the detection result includes whether the occlusion mode in which the target to be detected is occluded by a preset object is a preset occlusion mode; the execution matches the detection result preset actions, including:

When the target to be detected is blocked by the preset object and the blocking mode does not belong to the preset blocking mode, a second reminder is issued; wherein, the second reminder is used to prompt adjustment of the preset The way the object is occluded.
The method according to any one of claims 1 to 4, wherein the performing a preset operation matching the detection result comprises:

In the case that the object to be detected is blocked by the preset object, extract at least the first feature of the unoccluded part of the object to be detected from the first image, as the object to be detected to be identified feature;

Using the feature to be identified, the target to be detected is identified, and an identification result is obtained.
The method according to claim 5, wherein extracting at least the first feature of the unoccluded part of the object to be detected from the first image as the feature to be recognized of the object to be detected comprises:

Extract the first feature of the unoccluded portion of the object to be detected from the first image, and obtain the second feature of the occluded portion of the object to be detected;

The first feature and the second feature are used as to-be-identified features of the to-be-detected target.
The method according to claim 6, wherein the acquiring the second feature of the occluded portion of the object to be detected comprises:

Extract the feature of the occluded portion from the first image as the second feature; or,

Acquiring a preset feature of the occluded portion as the second feature, wherein the preset feature includes a feature obtained based on at least one reference feature, and each of the reference features is for the absence of the occluded portion. It is obtained by extracting the area corresponding to the occluded part in the reference target.
The method according to any one of claims 5 to 7, wherein, identifying the target to be detected by using the feature to be identified, and obtaining a recognition result, comprising at least one of the following:

In the case that the preset target includes one, obtain the first similarity between the feature to be identified and the pre-stored feature of the preset target, and in the case that the first similarity satisfies a first preset condition , determine that the identification result includes that the target to be detected has passed identity authentication;

In the case where there are multiple preset targets, obtain the second similarity between the feature to be identified and the pre-stored feature of each of the preset targets, respectively, and determine that the identification result includes adding the feature to be identified. The identity of the detection target is determined as the identity of the preset target corresponding to the second degree of similarity satisfying the second preset condition.
The method of claim 8, wherein the method comprises at least one of the following:

The first preset condition includes that the first similarity is greater than a first similarity threshold;

The second preset condition includes that the second similarity is greater than a second similarity threshold.
The method of claim 9, wherein the method comprises at least one of the following:

The first similarity threshold in the case where the feature to be recognized includes the second feature of the occluded part of the object to be detected is smaller than that in the case where the feature to be recognized does not include the second feature the first similarity threshold;

The second similarity threshold when the feature to be identified includes the second feature is smaller than the second similarity threshold when the feature to be identified does not include the second feature.
The method according to any one of claims 8 to 10, wherein before the acquiring the first similarity between the to-be-identified feature and the pre-stored feature of the preset target, the method further comprises:

In response to an account registration request, register an account for the user;

From at least one frame of the second image captured by the user, determine the second image that meets the preset quality requirements, and extract the feature of the preset part of the user from the determined second image;

Associating the feature of the preset part with the account, and saving the feature of the preset part as a pre-stored feature of the preset target.
The method according to any one of claims 5 to 11, wherein in the case that the object to be detected is blocked by the preset object, at least the extraction of the object to be detected from the first image is performed. Before the first feature of the unoccluded portion of the target, the method further includes at least one of the following steps:

From the multiple frames of first images containing the target to be detected, determine the first image that meets the preset quality requirements as the first image for subsequent feature extraction;

Preprocessing the first image for subsequent feature extraction;

Perform in vivo detection on the first image for which subsequent feature extraction is performed, and when the result of the in vivo detection is that the target to be detected is a living body, determine to perform the extraction of at least the target to be detected from the first image The first feature of the unoccluded part of , and its subsequent steps.
The method according to claim 12, wherein the first image that meets a preset quality requirement is determined from the multiple frames of first images including the target to be detected as the first image for subsequent feature extraction images, including:

Based on the quality factor of the first image of each frame, the quality score of the first image of each frame is correspondingly obtained, wherein the quality factor of the first image includes at least one of the following: the object to be detected is relative to the photographing device pose information, parameter information used to reflect the size of the target to be detected in the first image, and brightness information of the first image;

Based on the quality score, the first image that meets the preset quality requirement is determined as the first image for subsequent feature extraction, wherein the quality score of the selected first image is higher than that of the other first images. The quality score of an image.
The method according to claim 12 or 13, wherein the preprocessing of the first image for subsequent feature extraction comprises:

In the case that the first image includes a plurality of the objects to be detected, determine the target area of the objects to be detected in the first image that meets the preset extraction requirements, and remove all the objects in the first image. parts of the image outside the stated target area; and/or,

It is detected that the inclination angle of the object to be detected in the first image is greater than a preset angle, and the first image is rotated so that the inclination angle of the object to be detected is smaller than or equal to the preset angle.
The method according to claim 14, wherein the preset extraction requirement includes that the area corresponding to the target to be detected is larger than the area of the corresponding area of other targets to be detected, and the other targets to be detected include areas other than the target to be detected. other goals.
The method according to claim 10, wherein the target to be detected comprises a human face, and the preset object comprises a mask.
An image detection device, comprising:

an image acquisition module, configured to acquire a first image containing the target to be detected;

a target detection module, configured to detect the first image to obtain a detection result of the first image, wherein the detection result includes whether the target to be detected in the first image is blocked by a preset object;

An operation execution module configured to execute a preset operation matching the detection result.
An electronic device, comprising a memory and a processor, wherein the processor is configured to execute program instructions stored in the memory, so as to implement the method of any one of claims 1 to 16.
A computer-readable storage medium having program instructions stored thereon, wherein the program instructions, when executed by a processor, implement the method of any one of claims 1 to 16.
A computer program, comprising computer-readable codes, when the computer-readable codes are executed in an electronic device, a processor in the electronic device executes the method for realizing any one of claims 1 to 16. method.