WO2022111044A1

WO2022111044A1 - Image processing method and apparatus, and terminal control method and apparatus

Info

Publication number: WO2022111044A1
Application number: PCT/CN2021/121457
Authority: WO
Inventors: 黄耿石; 滕家宁; 邵婧
Original assignee: 深圳市商汤科技有限公司
Priority date: 2020-11-30
Filing date: 2021-09-28
Publication date: 2022-06-02
Also published as: CN112560592A

Abstract

Provided are an image processing method and apparatus, and a terminal control method and apparatus. The image processing method comprises: acquiring two images to be detected which are obtained by means of cameras in a binocular camera photographing a target object; respectively performing target object detection on the two images to be detected, so as to obtain an object detection frame of the target object in each of the images to be detected; externally expanding each of the object detection frames and translating at least one externally expanded object detection frame so as to obtain translated object detection frames; and determining a recognition result for the target object at least on the basis of the translated object detection frames.

Description

Image processing method and device, terminal control method and device

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the priority of Chinese patent application No. 202011377063.0 filed on November 30, 2020, the entire contents of which are incorporated herein by reference.

technical field

The present disclosure relates to the technical field of image processing, and in particular, to an image processing method and device, and a terminal control method and device.

Background technique

With the continuous development of computer vision technology and the wide application of binocular cameras, the image processing technology based on binocular cameras is widely used in various fields such as living detection and intelligent transportation. Taking the application of living body detection as an example, live body detection can be performed based on a set of images collected by a binocular camera. For example, a set of images collected by each module (corresponding to a binocular camera) can be realized by using a live body detection model. Liveness detection.

SUMMARY OF THE INVENTION

The embodiments of the present disclosure provide at least an image processing method and apparatus, and a terminal control method and apparatus.

In a first aspect, an embodiment of the present disclosure provides an image processing method, including: acquiring two images to be detected obtained by shooting a target object by each camera in a binocular camera; and performing target object detection on the two images to be detected respectively , obtain the object detection frame of the target object in each of the to-be-detected images; perform the expansion processing on each of the object detection frames, and translate at least one object detection frame after the expansion processing, and obtain the translated The object detection frame; at least based on the translated object detection frame, determine the recognition result of the target object.

Using the above image processing method, when two images to be detected collected by the binocular camera are acquired, first, the target object detection can be performed on the images to be detected to obtain the object detection frame of the target object in each image to be detected. On the premise that the two object detection frames are expanded, one or two object detection frames in the expanded two object detection frames can be translated to determine the target object based on the processed object detection frames. recognition result.

The above image processing method can focus on the target object based on the target object detection, which can initially reduce the influence of the baseline, and consider that in the process of target object recognition for the two images to be detected collected by the binocular camera, it is necessary to refer to a module. The parallax formed by (corresponding to a binocular camera) determines the depth information about the target object. Therefore, the embodiment of the present disclosure further achieves the effect of simulating the parallax of the human eye through the cooperative processing operation of expanding and translating the object detection frame. , to obtain the processed object detection frame for identifying the target object. The embodiments of the present disclosure are highly versatile, and can achieve the purpose of different modules sharing a set of pseudo-baselines to simulate human eye parallax, thereby improving the generalization capability of the modules and reducing the time cost of subsequent applications such as target object recognition.

In a possible implementation manner, the binocular camera includes a first camera and a second camera; in the case of translating an object detection frame after the expansion processing, the at least one object detection frame after the expansion processing The frame is translated to obtain the translated object detection frame, which includes: expanding the object detection frame detected in the to-be-detected image collected by the first camera to obtain the to-be-translated detection frame; Translate away from the direction of the second camera to obtain the translated object detection frame.

Here, when the detection frame to be translated is selected, the translation direction of the detection frame to be translated can be determined based on the relative positional relationship between the two cameras included in the binocular camera. Here, the detection frame to be translated corresponds to the first camera. In the case of , the to-be-translated detection frame may be translated in a direction away from the second camera, and the to-be-translated detection frame moved based on this translation direction may meet the parallax requirement of the pseudo baseline.

In a possible implementation manner, the step of translating the detection frame to be translated in a direction away from the second camera to obtain the translated object detection frame includes: based on the size of the detection frame to be translated information to determine the translation distance; move the to-be-translated detection frame by the translation distance according to the direction away from the second camera to obtain the translated object detection frame.

In a possible implementation manner, the determining the translation distance based on the size information of the detection frame to be translated includes: determining the translation distance based on the width value in the size information of the detection frame to be translated and a preset translation coefficient. the translation distance.

Considering that the parallax determined by the binocular camera is related to the distance between the target object and the binocular camera, and considering that the imaging size is also related to the distance, therefore, in this embodiment of the present disclosure, the size information of the object detection frame after imaging can be used. To determine the translation distance for simulating parallax, and then based on the pseudo-baseline constructed by the translation distance to achieve the recognition effect of binocular parallax.

In a possible implementation manner, in the case of translating the two object detection frames after the outer expansion processing, the at least one object detection frame after the outer expansion processing is translated to obtain the translated object detection frame, including : respectively translate each object detection frame in the two object detection frames after the expansion processing in a direction away from the other object detection frame, so as to obtain the translated two object detection frames.

In a possible implementation manner, the performing expansion processing on each of the object detection frames includes: for each of the object detection frames, in the to-be-detected image corresponding to the object detection frame, determining the object The position coordinates of the corner points of the detection frame; based on the determined position coordinates of the corner points and the preset expansion ratio, the expansion processing is performed on the object detection frame, and the expanded object detection frame corresponding to the object detection frame is obtained.

Here, considering the influence of the target object selected by the object detection frame on the recognition result, other image areas (such as the background) other than the target object will also have a certain influence on the recognition result, especially for applications such as living body detection, The background information can determine the recognition result to a certain extent. Therefore, in the process of constructing the pseudo-baseline, the object detection frame can be expanded based on the corner position coordinates of the object detection frame and the preset expansion ratio. The obtained object detection frame can not only help construct a pseudo-baseline, but also improve the accuracy of subsequent result recognition.

In a possible implementation manner, the determining the recognition result of the target object based on at least the translated object detection frame includes: in the case of translating an object detection frame after expansion processing, based on A said translated object detection frame and an untranslated object detection frame after expansion processing, determine the recognition result of the target object; or perform the expansion processing on the two object detection frames In the case of translation, the recognition result of the target object is determined based on the two translated object detection frames.

In a possible implementation manner, the target object is a target face; and determining the recognition result of the target object based on at least the translated object detection frame includes: using a trained living body detection nerve The network at least performs target face recognition on the translated object detection frame, and determines whether the target face corresponding to the object detection frame is a real face.

In a second aspect, an embodiment of the present disclosure further provides a terminal control method, the terminal is provided with a binocular camera, the method includes: acquiring a set of face images shot by the binocular camera on a target face, the The group of face images includes a first face image captured by a first camera in the binocular camera, and a second face image captured by a second camera in the binocular camera; through the first aspect and its The image processing method according to any one of the various embodiments obtains a recognition result corresponding to the set of face images, where the recognition result includes whether the target face is a real face; in response to the recognition of the person The result includes that the target face is a real face, and the person corresponding to the target face has passed identity authentication, and the terminal is controlled to perform the specified operation.

In a third aspect, an embodiment of the present disclosure further provides an image processing apparatus, including: an acquisition module for acquiring two to-be-detected images obtained by shooting a target object by each camera in a binocular camera; a detection module for The two to-be-detected images are respectively subjected to target object detection to obtain an object detection frame of the target object in each of the to-be-detected images; an external expansion module is used to perform external expansion processing on each of the object detection frames, And at least one object detection frame after expansion processing is translated to obtain a translated object detection frame; a determining module is used to determine a recognition result of the target object based on at least the translated object detection frame.

In a fourth aspect, an embodiment of the present disclosure further provides a terminal control device, including: an acquisition module configured to acquire a group of face images captured by the binocular camera on a target face, the group of face images including A first face image captured by the first camera in the binocular camera, and a second face image captured by the second camera in the binocular camera; a determination module, used for the first aspect and its various The image processing method according to any one of the embodiments obtains a recognition result corresponding to the group of face images, and the recognition result includes whether the target face is a real face; a control module is configured to respond to the The identification result of the person includes that the target face is a real face, and the person corresponding to the target face has passed identity authentication, and the terminal is controlled to perform a specified operation.

In a fifth aspect, embodiments of the present disclosure further provide an electronic device, including: a processor, a memory, and a bus, where the memory stores machine-readable instructions executable by the processor, and when the electronic device runs, the A bus communicates between the processor and the memory, and the machine-readable instructions execute the steps or the second aspect of the image processing method according to any one of the first aspect and its various embodiments when the machine-readable instructions are executed by the processor The steps of the terminal control method.

In a sixth aspect, an embodiment of the present disclosure further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is run by an electronic device, the electronic device executes the first The steps of the image processing method described in any one of the aspect and its various embodiments or the steps of the terminal control method described in the second aspect.

In a seventh aspect, an embodiment of the present disclosure further provides a computer program, comprising computer-readable codes, when the codes are executed in an electronic device, causing a processor in the electronic device to execute the first aspect and the method when executed by a processor in the electronic device. The steps of the image processing method described in any of its various embodiments or the steps of the terminal control method described in the second aspect.

For the description of the effects of the foregoing apparatus, electronic device, and computer-readable storage medium, reference may be made to the description of the foregoing method, and details are not repeated here.

In order to make the above-mentioned objects, features and advantages of the present disclosure more obvious and easy to understand, the preferred embodiments are exemplified below, and are described in detail as follows in conjunction with the accompanying drawings.

Description of drawings

In order to explain the technical solutions of the embodiments of the present disclosure more clearly, the following briefly introduces the accompanying drawings required in the embodiments, which are incorporated into the specification and constitute a part of the specification. The drawings illustrate embodiments consistent with the present disclosure, and together with the description serve to explain the technical solutions of the present disclosure. It should be understood that the following drawings only show some embodiments of the present disclosure, and therefore should not be regarded as limiting the scope. Other related figures are obtained from these figures.

FIG. 1 shows a flowchart of an image processing method provided by Embodiment 1 of the present disclosure;

FIG. 2( a ) shows a schematic diagram of an application of an image processing method provided by Embodiment 1 of the present disclosure;

FIG. 2(b) shows a schematic diagram of the application of an image processing method provided by Embodiment 1 of the present disclosure;

Fig. 2(c) shows a schematic diagram of the application of an image processing method provided by Embodiment 1 of the present disclosure;

FIG. 2(d) shows an application schematic diagram of an image processing method provided by Embodiment 1 of the present disclosure;

FIG. 3 shows a flowchart of a terminal control method provided by Embodiment 1 of the present disclosure;

FIG. 4 shows a schematic diagram of an image processing apparatus provided by Embodiment 2 of the present disclosure;

FIG. 5 shows a schematic diagram of a terminal control apparatus provided by Embodiment 2 of the present disclosure;

FIG. 6 shows a schematic diagram of an electronic device according to Embodiment 3 of the present disclosure;

FIG. 7 shows a schematic diagram of another electronic device provided by Embodiment 3 of the present disclosure.

Detailed ways

In order to make the purposes, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only These are some, but not all, embodiments of the present disclosure. The components of the disclosed embodiments generally described and illustrated herein may be arranged and designed in a variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure is not intended to limit the scope of the disclosure as claimed, but is merely representative of selected embodiments of the disclosure. Based on the embodiments of the present disclosure, all other embodiments obtained by those skilled in the art without creative work fall within the protection scope of the present disclosure.

At present, a set of images collected by each module (corresponding to a binocular camera) can often be detected by using a living body detection model. For example, a picture obtained by the left eye camera of the binocular camera of the object and another picture obtained by the right eye camera of the binocular camera of the object are input into the living body detection model at the same time, so as to finally obtain whether the object is a living body. result.

Usually, different live detection models are trained for different modules. This is due to the different baselines of different modules, i.e., the relative distances of the two cameras in different binocular cameras are different, which will result in a model that performs well on one module, and a model that performs well on another module with different baselines. Accuracy performance is poor. That is, the same living detection model has a weak adaptability to different modules, so it is often necessary to train different living detection models for different modules, and the time cost of model training is huge.

Based on this, the present disclosure provides an image processing method and device, and a terminal control method and device, so as to improve the generalization capability of the module and reduce the time cost of subsequent applications such as target object recognition.

In order to facilitate the understanding of this embodiment, an image processing method disclosed in the embodiment of the present disclosure is first introduced in detail. The execution body of the image processing method provided by the embodiment of the present disclosure is generally an electronic device with The devices include, for example, terminal devices or servers or other processing devices, and the terminal devices may be user equipment (User Equipment, UE), mobile devices, user terminals, cellular phones, cordless phones, personal digital assistants (Personal Digital Assistant, PDA), handheld devices, computing devices, in-vehicle devices, wearable devices, etc. In some possible implementations, the image processing method may be implemented by the processor calling computer-readable instructions stored in the memory.

The image processing method provided by the embodiments of the present disclosure will be described below.

Referring to FIG. 1, which is a flowchart of an image processing method provided by an embodiment of the present disclosure, the method includes steps S101-S104, wherein:

S101. Acquire two images to be detected obtained by shooting the target object by each camera in the binocular camera;

S102, performing target object detection on the two images to be detected, respectively, to obtain an object detection frame of the target object in each image to be detected;

S103, performing external expansion processing on each of the object detection frames, and translating at least one object detection frame after the external expansion processing to obtain a translated object detection frame;

S104. Determine the recognition result of the target object based on at least the translated object detection frame.

In some examples, the binocular camera may include an RGB (Red Green Blue) camera and a near-infrared camera, that is, the two images to be detected may be an RGB image and an infrared image, respectively. In other examples, two infrared cameras may also be included, that is, the two images to be detected are both infrared images. In still other examples, two RGB cameras may also be included, that is, the two images to be detected are both RGB images. This application does not limit the specific structure of the binocular camera.

Here, in order to facilitate understanding of the image processing method provided by the embodiments of the present disclosure, an application scenario of the image processing method may be briefly described first. The above image processing method can be mainly applied to related applications of target recognition based on binocular cameras. For example, it can be used to perform liveness detection on faces captured by binocular cameras, and it can also be used to perform license plate recognition on vehicles captured by binocular cameras. It may also be other related applications, which are not specifically limited here.

In the embodiment of the present disclosure, two images may be acquired based on the binocular camera shooting the target object in the same scene. The disparity map is obtained by using the stereo matching algorithm, and then the depth map is obtained to realize target recognition. Considering that the relative distance between the two cameras included in different modules (each module corresponds to a binocular camera) is different, even if the same recognition method is used for target recognition for the same target, it may be due to the baseline. (The relative distance between the two cameras) leads to different recognition results, especially in the process of using the target detection model for target recognition, since the training model itself needs to input a large number of image samples, if the image samples used are The corresponding baseline is different from the baseline of the module corresponding to the target detection model, which will greatly reduce the accuracy of the model. Although different object detection models can be trained for different modules to ensure the accuracy of the model, this method will lead to a large increase in the training cost.

Just to solve this problem, the embodiments of the present disclosure provide an image processing method that can provide a general pseudo-baseline for different modules, and then can perform target recognition based on images collected by different modules. Target object detection can reduce the influence of the baseline existing in the current module, and then in order to facilitate subsequent target recognition, a pseudo-baseline can be constructed based on the cooperation of external expansion and translation, so as to ensure that the influence of the original baseline is eliminated. Pseudo-baseline to achieve target recognition.

Wherein, the two to-be-detected images collected by the binocular camera in the embodiment of the present disclosure may be determined based on the application scenario where the binocular camera is located. For example, in a face recognition application, the two to-be-detected images collected here may include An image of a human face; for another example, in an intelligent transportation application, the two images to be detected collected here may be images containing vehicles.

Considering that the relative distance between the two cameras included in the binocular camera (corresponding to the baseline) causes the two cameras to capture the same target object, there is a directional difference (that is, parallax). The object detection frame obtained by the target object detection on the to-be-detected images obtained by the two cameras respectively eliminates the influence of the original baseline caused by the parallax.

In the embodiment of the present disclosure, the object detection frame where the target object is located can be detected from the image to be detected based on the traditional target object detection method. The target object detection method here may be a frame difference method, a background subtraction method, an optical flow method, and the like.

In the embodiment of the present disclosure, in addition to using the above-mentioned traditional method to implement object detection, object detection can also be performed based on a trained detection model. The detection model here can be obtained by training the image samples marked with the object detection frame, and the training can be the correspondence between the input image sample and the output object detection frame. In the case of the detection model of , the object detection frame in the image to be detected can be determined. The detection model may be a separate neural network, or may be included in the above target detection model.

Under the influence of eliminating the original baseline, considering the critical role of the parallax generated by the baseline for target recognition, the embodiments of the present disclosure can create a common pseudo-baseline based on the translation operation to eliminate the parallax.

Considering that before the translation of the object detection frame, the target object selected in the object detection frame is a complete object, such as a target face containing human facial features, hair, and neck. If the translation of the object detection frame is performed directly, It may lead to the incompleteness of the target face, which is not conducive to subsequent object recognition. Based on this, before the translation operation is performed in the embodiment of the present disclosure, the expansion operation may be performed first.

Wherein, the above expansion operation can expand the image area framed by the object detection frame to a certain extent. Considering that the larger the image area, the more information content the area contains to a certain extent. This can improve the accuracy of subsequent target recognition, and on the other hand, can provide translation basis for subsequent translation operations.

It should be noted that the translation operation in this embodiment of the present disclosure may be a translation operation performed on one of the two object detection frames after the expansion processing. The camera is used as a reference for translation, and the translation operation may also be performed for both object detection frames. Here, the center position of the two cameras may be used as a reference for translation.

After the image processing method provided by the embodiments of the present disclosure is subjected to translation processing, the recognition result of the target object may be determined based on a translation-processed object detection frame and an object detection frame that has not been translated, or the recognition result of the target object may be determined based on two translation-processed object detection frames. After the object detection frame, the recognition result of the target object is determined.

In a specific application, the trained target recognition model can be used to perform target recognition on the object detection frame after translation processing, so as to determine the recognition result of the target object.

The target recognition model in the embodiment of the present disclosure may be a living body detection model related to face recognition, and the above-mentioned one translated object detection frame and one untranslated object detection frame are input into the trained living body detection model, and it can be determined that Determine whether the target face corresponding to the object detection frame is a real face, or input the above two translated object detection frames into the trained living detection model to determine whether the target face corresponding to the object detection frame is real. human face.

In addition, the above-mentioned target recognition model can also be a vehicle detection model related to vehicle recognition. The above-mentioned one translated object detection frame and one untranslated object detection frame are input into the trained vehicle detection model, and the object detection model can be determined. The type information of the target vehicle corresponding to the frame, or inputting the above two translated object detection frames into the trained vehicle detection model, can determine the type information of the target vehicle corresponding to the object detection frame.

In order to further understand the above process of target recognition, the above process can be described in detail with reference to Figures 2(a) to 2(d) by taking face recognition as an example.

As shown in Figure 2 (a), two images to be detected are collected by the binocular camera set for the detection of human face living body. , the right image is the human body image 2 collected by the right eye camera (such as a near-infrared camera) included in the binocular camera, after the target face detection is performed for the above-mentioned two face images, the target face can be generated in the two human body images. The object detection frame in , as shown in Figure 2(b).

For the two object detection frames shown in FIG. 2( b ), the embodiment of the present disclosure may perform outreach processing, as shown in FIG. 2( c ). Here, the object detection frame in the right image of Fig. 2(c) can be subjected to translation processing to obtain the object detection frame after translation processing, as shown in the right image of Fig. 2(d), the left image of Fig. 2(d) It is the same as the image on the left of Figure 2(c), including the object detection frame after expansion processing without translation processing.

In the embodiment of the present disclosure, the two images shown in FIG. 2(d) can be cut based on the object detection frame to obtain two corresponding face images. In the case of inputting the two face images into the living body detection model Then, it can be determined whether the target face is a real face.

Here, considering that the image on the right in Figure 2(d) is an image collected by a near-infrared camera, in the case where it is determined that there are non-living elements such as paper and screen in the object detection frame, it can be directly determined that the above-mentioned target face is not a real person face, the corresponding live detection score is 0 points.

It should be noted that the embodiments of the present disclosure may perform translation processing on the images collected by the right-eye camera in the above-mentioned exemplary manner, and may also perform translation processing on the images collected by the left-eye camera, and the embodiments of the present disclosure may be based on different application scenarios. to choose, no specific restrictions are made here.

Considering the key influence of the expansion processing on target recognition, the relevant content of the expansion processing of the detection frame can be described in detail next. In the embodiment of the present disclosure, the expansion processing may be performed on the object detection frame of each to-be-detected image by the following operations:

In the image to be detected, the position coordinates of the corner points of the object detection frame are determined; the object detection frame is expanded based on the determined position coordinates of the corner points and the preset expansion ratio to obtain the expanded object detection frame.

Here, first, the position coordinates of the corner points of the object detection frame in the image to be detected can be determined, and the position coordinates of the corner points can correspond to the image coordinates of the four corners of the object detection frame. In the case of determining the position coordinates of these corner points, The object detection frame may be expanded based on the position coordinates of the corner points and the preset expansion ratio to obtain the processed object detection frame.

Wherein, the above-mentioned preset outward expansion ratio may be determined based on the actual size of the target object. This is mainly considering that for different target objects, their actual sizes are different, and the corresponding imaging sizes are also different. In the same scene, using the binocular camera to capture the face and vehicle at the same position, the vehicle detection frame corresponding to the vehicle is much larger than the face detection frame corresponding to the face. Here, a larger scale expansion coefficient can be set for the target object with a larger size, so as to cover the surrounding area information of the target object with a larger imaging size, and a smaller scale expansion coefficient can be set for the target object with a smaller size. coefficient to be sufficient to cover the surrounding area information of the target object with smaller imaging size.

Taking the expansion of the object detection frame by 0.5 times as an example, the expansion processing can also be realized by dragging the four corners of the object detection frame; it can also be determined based on the position coordinates of the four corners. The length corresponding to the frame (corresponding to the upper frame, the lower frame, the left frame and the right frame), and then through the preset outward expansion ratio set for each frame to realize the outward expansion processing of the frame, in the specific application, it can be through the frame to achieve by dragging outwards.

It should be noted that, for the four frames formed by the four corners of the object detection frame, different scale coefficients can be set to meet the expansion requirements of the key parts included in the target object. For the face object frame, the above-mentioned four frames can be expanded in their outward expansion directions according to a certain outward expansion ratio. For example, the four frames of the left frame, the right frame, the upper frame and the lower frame can be respectively 0.4 times, 0.4 times, 0.8 times, 0.4 times, the dimensions of the four borders of the expanded face object frame have all changed. The reason why the part of the eye corresponding to the upper border is expanded by a larger multiple is mainly considering that the eye is the upper part of the facial features detected by the face detection frame. For living elements such as the forehead, in order to improve the accuracy of subsequent living detection, the upper border can be appropriately expanded by a larger multiple.

It should be noted that, in the embodiment of the present disclosure, the expansion processing may be performed on one frame of the four frames alone, or the expansion processing may be performed on a pair of frames (eg, the left frame and the right frame) among the four frames at the same time. , and may also be other external expansion processing methods, which will not be repeated here.

Considering the key role of translation processing in the construction of pseudo-baselines, the related content of detection frame translation processing can be described in detail next.

In the embodiment of the present disclosure, the binocular camera includes a first camera and a second camera. In the case of performing translation processing on an object detection frame after expansion processing, the translation processing may be performed by the following operations:

Performing expansion processing on the object detection frame detected in the to-be-detected image collected by the first camera to obtain the to-be-translated detection frame; translate the to-be-translated detection frame in a direction away from the second camera to obtain the translated object detection frame frame.

In the embodiment of the present disclosure, for the object detection frame after expansion processing, a detection frame to be translated can be selected from it. Here, if the object detection frame corresponding to the right-eye camera (corresponding to the first camera) included in the binocular camera is selected (ie Right detection frame) as the detection frame to be translated, the translation direction of the right detection frame can be determined based on the relative positional relationship between the right-eye camera and the left-eye camera, and the right-eye camera is relatively right, and the translation direction of the right detection frame can be determined to be to the right ( That is, the direction away from the left-eye camera), if the object detection frame corresponding to the left-eye camera (ie, the left detection frame) is selected as the detection frame to be translated, the translation direction of the left detection frame is to the left.

In the embodiment of the present disclosure, in addition to the translation according to the above-mentioned translation direction, the translation may also be performed in combination with the translation distance. In the embodiment of the present disclosure, the translation distance may be determined based on the size information of the detection frame to be translated.

Considering that the target objects with different distances from the camera have certain differences in their imaging sizes, when the size information of the detection frame to be translated is large, it means that the target object is closer to the camera, and the closer target object is. The existing parallax is large, so the required translation distance is large. When the size information of the to-be-translated detection frame is small, it means that the target object is far away from the camera to a certain extent. small, so the required translation distance is small.

In order to construct a pseudo-baseline suitable for both large-sized target objects and small-sized target objects, the translation distance can be determined based on a preset translation coefficient and the width value in the size information of the detection frame. For example, the translation distance is preset The product of the translation factor and the width value. The preset translation coefficient is related to the distance between the two cameras in the binocular camera. Wherein, the width value of the detection frame corresponds to the size of the horizontal side of the detection frame.

Among them, the translation distance is proportional to the width, that is, the larger the translation distance determined by the detection frame with the larger width is, in order to balance the larger parallax of the large-sized target object, and the smaller the width of the detection frame is, the smaller the translation distance determined by the detection frame is. , in order to balance the small parallax of small-sized target objects, so that a common pseudo-baseline suitable for various target objects can be constructed.

In the case where the translation processing is performed on both the two object detection frames after the expansion processing, in this embodiment of the present disclosure, each object detection frame in the two object detection frames after the expansion processing may be moved away from the other The direction of the object detection frame is translated to obtain the processed object detection frame.

Here, the translation directions of the left detection frame and the right detection frame can be determined respectively based on the relative positional relationship between the right-eye camera and the left-eye camera included in the binocular cameras. Right (that is, the direction away from the left detection frame), and the left-eye camera is relatively left, and it can be determined that the translation direction of the left detection frame is to the left (that is, the direction away from the right detection frame).

Here, the translation processing may also be implemented in combination with the translation distance. For details, reference may be made to the above description, which will not be repeated here.

The image processing method provided by the embodiment of the present disclosure can overcome the problem of inaccurate recognition results caused by different baselines of different binocular cameras, and has strong robustness, so it can be widely used in various technical fields.

First, the above-mentioned image processing method can be applied to terminal control applications, as shown in FIG. 3 , and can be implemented according to the following steps:

S301. Acquire a set of face images captured by a binocular camera on a target face, where the set of face images includes a first face image captured by a first camera in the binocular camera, and a set of face images captured by a second camera in the binocular camera The second face image of;

S302, through the above-mentioned image processing method, obtain a recognition result corresponding to the group of face images, and the recognition result includes whether the target face is a real face;

S303 , in response to the character identification result including that the target face is a real face, and the character corresponding to the target face has passed identity authentication, control the terminal to perform a specified operation.

Here, in the case that the target face corresponding to a group of face images is determined to be a real face based on the above image processing method, the terminal can be controlled to perform the specified operation in combination with the successful result of the person identity authentication corresponding to the target face, otherwise, The terminal can refuse to perform the specified operation and issue an alarm prompt.

In this embodiment of the present disclosure, when the terminal control method is applied to different terminals, the corresponding specified operations may also be different. The terminal here may be a user terminal, a gate device terminal, a payment terminal, or the like.

For example, when applied to the unlocking scene, it is determined that the target face corresponding to a group of face images captured by the binocular camera of the user terminal is a real face, and the identity of the person corresponding to the target face is legal. In the next step, it is determined that the unlocking can be successful. If a non-real face or an illegal identity is identified, it is determined that the unlocking fails. At this time, the unlocking failure reminder message can be returned to the user terminal to remind the user.

For another example, in the case of applying to the gate verification pass scene, it can be determined as a real face based on the recognition result, and the identity is legal, control the access switch connected to the gate equipment terminal to open, thereby realizing the gate's Automatic passage, if a non-real face is identified or the identity is illegal, it will not be able to pass.

It should be noted that the image processing method provided by the implementation of the present disclosure can not only be applied to the above unlocking applications and gate verification applications, but also can be applied to other scenarios, for example, pedestrian detection in a video surveillance scenario, or It is embedded in financial equipment to perform liveness detection of financial services, which will not be repeated here.

Those skilled in the art can understand that in the above method of the specific implementation, the writing order of each step does not mean a strict execution order but constitutes any limitation on the implementation process, and the specific execution order of each step should be based on its function and possible Internal logic is determined.

Based on the same inventive concept, the embodiments of the present disclosure also provide an image processing apparatus corresponding to the image processing method and a terminal control apparatus corresponding to the terminal control method. The above-mentioned methods are similar, so the implementation of the apparatus may refer to the implementation of the method, and repeated descriptions will not be repeated.

Referring to FIG. 4 , which is a schematic diagram of an image processing apparatus provided by an embodiment of the present disclosure, the apparatus includes: an acquisition module 401, a detection module 402, an external expansion module 403, and a determination module 404; wherein,

an acquisition module 401, configured to acquire two images to be detected obtained by shooting the target object by each camera in the binocular camera;

A detection module 402, configured to perform target object detection on the two to-be-detected images respectively, to obtain an object detection frame of the target object in each of the to-be-detected images;

The expansion module 403 is configured to perform expansion processing on each of the object detection frames, and translate at least one object detection frame after the expansion processing to obtain the translated object detection frame;

A determination module 404, configured to determine a recognition result of the target object based on at least the translated object detection frame.

The embodiments of the present disclosure can focus on the target object based on the target object detection, and can initially reduce the influence of the baseline, and consider that in the process of performing target object recognition on the two images to be detected collected by the binocular camera, it is necessary to refer to a module The parallax formed by (corresponding to a binocular camera) determines the depth information about the target object. Therefore, the embodiment of the present disclosure further achieves the effect of simulating the parallax of the human eye by performing the cooperative processing operation of expanding and translating the object detection frame. , to obtain the processed object detection frame for identifying the target object. The embodiments of the present disclosure are highly versatile, and can achieve the purpose of sharing a set of pseudo-baselines for different modules to simulate human eye parallax, thereby improving the generalization ability of the modules and reducing the time cost of subsequent applications such as target object recognition.

In a possible implementation, the binocular camera includes a first camera and a second camera; in the case of performing translation processing on an object detection frame after the external expansion processing, the external expansion module 403 is used for external expansion according to the following steps. The processed at least one object detection frame is subjected to translation processing: the object detection frame detected in the to-be-detected image collected by the first camera is subjected to expansion processing to obtain the to-be-translated detection frame; The direction of the second camera is translated to obtain the translated object detection frame.

In a possible implementation, the external expansion module 403 is configured to translate the detection frame to be translated in a direction away from the second camera according to the following steps to obtain the translated object detection frame: based on the size of the detection frame to be translated information to determine the translation distance; move the to-be-translated detection frame by the translation distance according to the direction away from the second camera to obtain the translated object detection frame.

In a possible implementation manner, the expansion module 403 is configured to determine the translation distance based on the size information of the detection frame of the object to be translated according to the following steps: based on the width value in the size information of the detection frame to be translated and A translation coefficient is preset to determine the translation distance.

In a possible implementation manner, in the case of translating the two object detection frames after the outer expansion processing, the outer expansion module 403 is configured to translate the at least one object detection frame after the outer expansion processing according to the following steps to obtain The translated object detection frame: each object detection frame in the two object detection frames after the expansion processing is respectively translated in a direction away from the other object detection frame to obtain the translated two object detection frames .

In a possible implementation manner, the expansion module 403 is configured to perform expansion processing on each of the object detection frames according to the following steps: for each of the object detection frames, in the to-be-detected corresponding to the object detection frame In the image, the position coordinates of the corners of the object detection frame are determined; based on the determined position coordinates of the corners and the preset expansion ratio, the expansion processing is performed on the object detection frame, and the expansion processing corresponding to the object detection frame is obtained. After the object detection box.

In a possible implementation manner, the determining module 404 is configured to determine the recognition result of the target object based on at least the translated object detection frame according to the following steps: an object detection frame after external expansion processing In the case of translation, the recognition result of the target object is determined based on an object detection frame after the translation and an object detection frame after the expansion process that has not been translated; or after the expansion process When the two object detection frames are translated, the recognition result of the target object is determined based on the two translated object detection frames.

In a possible implementation manner, the target object is a target face; the determining module 404 is configured to determine the recognition result of the target object based on at least the translated object detection frame according to the following steps: using the trained The living body detection neural network at least performs target face recognition on the translated object detection frame, and determines whether the target face corresponding to the object detection frame is a real face.

Referring to FIG. 5 , which is a schematic diagram of a terminal control apparatus provided by an embodiment of the present disclosure, the apparatus includes: an acquisition module 501 , a determination module 502 and a control module 503 ; wherein,

The acquisition module 501 is configured to acquire a group of face images captured by the binocular camera on the target face, where the group of face images includes a first face image captured by a first camera in the binocular camera, and a second face image captured by the second camera in the binocular camera;

A determination module 502, configured to obtain a recognition result corresponding to the group of face images through the above-mentioned image processing method, where the recognition result includes whether the target face is a real face;

The control module 503 is configured to control the terminal to perform a specified operation in response to the identification result of the person including that the target face is a real face and the person corresponding to the target face has passed identity authentication.

For the description of the processing flow of each module in the apparatus and the interaction flow between the modules, reference may be made to the relevant descriptions in the foregoing method embodiments, which will not be described in detail here.

An embodiment of the present disclosure further provides an electronic device. As shown in FIG. 6 , a schematic structural diagram of the electronic device provided by an embodiment of the present disclosure includes: a processor 601 , a memory 602 , and a bus 603 . The memory 602 stores machine-readable instructions executable by the processor 601 (for example, the acquisition module 401, the detection module 402, the external expansion module 403, the execution instructions corresponding to the determination module 404 in the image processing apparatus in FIG. 4, etc.), when the electronic When the device is running, the processor 601 and the memory 602 communicate through the bus 603, and the machine-readable instructions are executed by the processor 601 and perform the following processing: obtain two images to be detected obtained by shooting the target object by each camera in the binocular camera; Perform target object detection on the two to-be-detected images respectively, to obtain an object detection frame of the target object in each of the to-be-detected images; perform external expansion processing on each of the object detection frames, and perform external expansion processing The at least one object detection frame after the translation is performed to obtain a translated object detection frame; the recognition result of the target object is determined based on at least the translated object detection frame.

In a possible implementation, the binocular camera includes a first camera and a second camera; in the case of performing translation processing on an object detection frame after the external expansion processing, in the instructions executed by the above-mentioned processor 601, the external expansion processing is performed. In the case where one processed object detection frame is translated, the at least one object detection frame after the expansion processing is translated to obtain the translated object detection frame, which includes: performing the translation in the to-be-detected image collected by the first camera. The detected object detection frame is expanded to obtain a to-be-translated detection frame; the to-be-translated detection frame is translated in a direction away from the second camera to obtain the translated object detection frame.

In a possible implementation manner, in the instructions executed by the processor 601, the to-be-translated detection frame is translated in a direction away from the second camera to obtain the translated object detection frame, including: based on the The size information of the to-be-translated detection frame is used to determine the translation distance; the to-be-translated detection frame is moved by the translation distance according to the direction away from the second camera to obtain the translated object detection frame.

In a possible implementation manner, in the instructions executed by the processor 601, determining the translation distance based on the size information of the to-be-translated detection frame includes: based on the width value in the size information of the to-be-translated detection frame and A translation coefficient is preset to determine the translation distance.

In a possible implementation manner, in the case of translating the two object detection frames after the expansion processing, in the instructions executed by the processor 601, at least one object detection frame after the expansion processing is translated to obtain the translation The obtained object detection frame includes: respectively translating each object detection frame in the two object detection frames after the expansion processing in a direction away from the other object detection frame, so as to obtain the translated two object detection frames. frame.

In a possible implementation manner, in the instructions executed by the above-mentioned processor 601, performing an expansion process on each of the object detection frames includes: for each of the object detection frames, in the to-be-to-be-detected frame corresponding to the object detection frame In the detection image, determine the position coordinates of the corner points of the object detection frame; based on the determined position coordinates of the corner points and the preset expansion ratio, perform an expansion process on the object detection frame to obtain the expansion corresponding to the object detection frame. The processed object detection box.

In a possible implementation manner, in the instructions executed by the processor 601, at least based on the translated object detection frame, determining the recognition result of the target object includes: an object detection frame after external expansion processing In the case of translation, the recognition result of the target object is determined based on an object detection frame after the translation and an object detection frame after the expansion process that has not been translated; or after the expansion process When the two object detection frames are translated, the recognition result of the target object is determined based on the two translated object detection frames.

In a possible implementation manner, the target object is a target face; and determining the recognition result of the target object based on at least the translated object detection frame includes: using a trained neural network for living body detection at least The target face is identified on the translated object detection frame, and it is determined whether the target face corresponding to the object detection frame is a real face.

An embodiment of the present disclosure further provides an electronic device. As shown in FIG. 7 , a schematic structural diagram of the electronic device provided by an embodiment of the present disclosure includes: a processor 701 , a memory 702 , and a bus 703 . The memory 702 stores machine-readable instructions executable by the processor 701 (for example, the execution instructions corresponding to the acquisition module 501, the determination module 502, and the control module 503 in the terminal control apparatus in FIG. 5, etc.), and when the electronic device is running, processing The communication between the processor 701 and the memory 702 is through the bus 703. When the machine-readable instructions are executed by the processor 701, the following processing is performed: acquiring a group of face images captured by the binocular camera on the target face, the group of face images The image includes a first face image captured by the first camera in the binocular camera, and a second face image captured by the second camera in the binocular camera; through the image processing method described in the above embodiment, Obtain a recognition result corresponding to the group of face images, where the recognition result includes whether the target face is a real face; the recognition result in response to the character includes that the target face is a real face, and all The person corresponding to the target face is authenticated, and the terminal is controlled to perform a specified operation.

For the specific execution process of the above instruction, reference may be made to the steps of the methods described in the embodiments of the present disclosure, and details are not repeated here.

Embodiments of the present disclosure further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is run by a processor, the image processing method and the terminal control method described in the foregoing method embodiments are executed A step of. Wherein, the storage medium may be a volatile or non-volatile computer-readable storage medium.

The computer program product of the image processing method and the terminal control method provided by the embodiments of the present disclosure includes a computer-readable storage medium storing program codes, and the instructions included in the program codes can be used to execute the images described in the above method embodiments. For the steps of the processing method and the terminal control method, reference may be made to the foregoing method embodiments, which will not be repeated here.

Embodiments of the present disclosure also provide a computer program, which implements any one of the methods in the foregoing embodiments when the computer program is executed by a processor. The computer program product can be specifically implemented by hardware, software or a combination thereof. In an optional embodiment, the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), etc. Wait.

Those skilled in the art can clearly understand that, for the convenience and brevity of description, for the specific working process of the system and device described above, reference may be made to the corresponding process in the foregoing method embodiments, which will not be repeated here. In the several embodiments provided by the present disclosure, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. The apparatus embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some communication interfaces, indirect coupling or communication connection of devices or units, which may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

In addition, each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a processor-executable non-volatile computer-readable storage medium. Based on such understanding, the technical solutions of the present disclosure can be embodied in the form of software products in essence, or the parts that contribute to the prior art or the parts of the technical solutions. The computer software products are stored in a storage medium, including Several instructions are used to cause an electronic device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of the present disclosure. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes .

Finally, it should be noted that the above-mentioned embodiments are only specific implementations of the present disclosure, and are used to illustrate the technical solutions of the present disclosure, but not to limit them. The protection scope of the present disclosure is not limited to this, although the aforementioned The embodiments describe the present disclosure in detail, and those skilled in the art should understand that: any person skilled in the art can still modify the technical solutions described in the foregoing embodiments within the technical scope disclosed by the present disclosure. Or can easily think of changes, or equivalently replace some of the technical features; and these modifications, changes or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present disclosure, and should be covered in the present disclosure. within the scope of protection. Therefore, the protection scope of the present disclosure should be based on the protection scope of the claims.

Claims

An image processing method, comprising:

Acquiring two images to be detected obtained by shooting the target object with each camera in the binocular camera;

Performing target object detection on the two images to be detected, respectively, to obtain an object detection frame of the target object in each of the images to be detected;

Performing expansion processing on each of the object detection frames, and translating at least one object detection frame after the expansion processing to obtain a translated object detection frame;

A recognition result of the target object is determined based at least on the translated object detection frame.
The image processing method according to claim 1, wherein the binocular camera comprises a first camera and a second camera;

In the case of translating an object detection frame after the outer expansion processing, the at least one object detection frame after the outer expansion processing is translated to obtain the translated object detection frame, including:

Performing expansion processing on the object detection frame detected in the to-be-detected image collected by the first camera to obtain the to-be-translated detection frame;

Translate the to-be-translated detection frame in a direction away from the second camera to obtain the translated object detection frame.
The image processing method according to claim 2, wherein the translation of the to-be-translated detection frame in a direction away from the second camera to obtain the translated object detection frame comprises:

Determine the translation distance based on the size information of the to-be-translated detection frame;

Move the to-be-translated detection frame by the translation distance in a direction away from the second camera to obtain the translated object detection frame.
The image processing method according to claim 3, wherein the determining the translation distance based on the size information of the to-be-translated detection frame comprises:

The translation distance is determined based on the width value in the size information of the to-be-translated detection frame and a preset translation coefficient.
The image processing method according to claim 1, characterized in that, in the case of translating the two object detection frames after the outer expansion processing, the at least one object detection frame after the outer expansion processing is translated to obtain the translated object detection frame, including:

Each object detection frame in the two object detection frames after the expansion processing is respectively translated in a direction away from the other object detection frame, so as to obtain the translated two object detection frames.
The image processing method according to any one of claims 1-5, characterized in that, the performing expansion processing on each of the object detection frames comprises:

For each of the object detection boxes,

In the to-be-detected image corresponding to the object detection frame, determine the corner position coordinates of the object detection frame;

Based on the determined position coordinates of the corner points and the preset expansion ratio, an expansion process is performed on the object detection frame to obtain an expanded object detection frame corresponding to the object detection frame.
The image processing method according to any one of claims 1-6, wherein the determining, at least based on the translated object detection frame, the recognition result of the target object comprises:

In the case of translating an object detection frame after the expansion processing, based on the translated object detection frame and an object detection frame after the expansion processing that has not been translated, determine all the object detection frames of the target object. the identification result; or

In the case of translating the two object detection frames after the expansion processing, the recognition result of the target object is determined based on the two translated object detection frames.
The image processing method according to any one of claims 1-7, wherein the target object is a target face; and the recognition result of the target object is determined at least based on the translated object detection frame ,include:

The trained living body detection neural network is used to identify the target face at least on the translated object detection frame, and determine whether the target face corresponding to the object detection frame is a real face.
A terminal control method, characterized in that the terminal is provided with a binocular camera, the method comprising:

Acquiring a group of face images captured by the binocular camera on the target face, the group of face images including a first face image captured by a first camera in the binocular camera, and a set of face images captured by the binocular camera the second face image captured by the second camera in the camera;

Through the image processing method according to any one of claims 1-8, a recognition result corresponding to the group of face images is obtained, and the recognition result includes whether the target face is a real face;

In response to the recognition result including that the target face is a real face, and the person corresponding to the target face has passed identity authentication, the terminal is controlled to perform a specified operation.
An image processing device, comprising:

an acquisition module, used for acquiring two images to be detected obtained by shooting the target object by each camera in the binocular camera;

a detection module, configured to perform target object detection on the two to-be-detected images respectively, to obtain an object detection frame of the target object in each of the to-be-detected images;

an external expansion module, configured to perform external expansion processing on each of the object detection frames, and translate at least one object detection frame after the external expansion processing to obtain the translated object detection frame;

A determination module, configured to determine a recognition result of the target object based on at least the translated object detection frame.
A terminal control device, comprising:

an acquisition module, configured to acquire a group of face images captured by the binocular camera on the target face, where the group of face images includes a first face image captured by a first camera in the binocular camera, and A second face image captured by the second camera in the binocular camera;

A determination module, used for obtaining the recognition result corresponding to the group of face images by the image processing method according to any one of claims 1-8, and the recognition result includes whether the target face is a real face;

A control module, configured to control the terminal to perform a specified operation in response to the recognition result including that the target face is a real face and the person corresponding to the target face has passed identity authentication.
An electronic device, characterized in that it includes: a processor, a memory, and a bus, the memory stores machine-readable instructions executable by the processor, and when the electronic device runs, the processor and the memory are connected The machine-readable instructions perform the steps of the image processing method according to any one of claims 1 to 8 or the steps of the terminal control method according to claim 9 when the machine-readable instructions are executed by the processor.
A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, and when the computer program is run by an electronic device, the electronic device executes any one of claims 1 to 8. The steps of the image processing method described in claim 9 or the steps of the terminal control method described in claim 9 .
A computer program, comprising computer readable codes, when the codes are executed in an electronic device, the processor in the electronic device is urged to execute the image processing method according to any one of claims 1 to 8 or the steps of the terminal control method according to claim 9.