CN109447049B

CN109447049B - Light source quantitative design method and stereoscopic vision system

Info

Publication number: CN109447049B
Application number: CN201811629354.7A
Authority: CN
Inventors: 彭莎; 刘关松
Original assignee: Haowei Technology Wuhan Co ltd
Current assignee: Haowei Technology Wuhan Co ltd
Priority date: 2018-12-28
Filing date: 2018-12-28
Publication date: 2020-07-31
Anticipated expiration: 2038-12-28
Also published as: CN109447049A

Abstract

The invention provides a quantitative design method of a light source and a stereoscopic vision system, which comprises the steps of firstly respectively calculating N depth images corresponding to N groups of face images according to N groups of face images, then aligning the faces of the N face images to obtain a normalization parameter, obtaining an average depth image of the faces of the N depth images by using the normalization parameter, then obtaining a depth gradient image according to the average depth image, and then constraining the distribution density of dot matrix light spots emitted by the light source according to the depth gradient image, so that the distribution density of the dot matrix light spots can be more suitable for the faces, thereby improving the measurement precision of the face depth of the stereoscopic vision system, and further constraining the illumination range of the dot matrix light spots by the size of the average depth image, so that the distribution density of the dot matrix light spots in the area outside the faces can be reduced, and the energy consumption is reduced.

Description

Light source quantitative design method and stereoscopic vision system

Technical Field

The invention relates to the technical field of visual image processing, in particular to a quantitative design method of a light source and a stereoscopic vision system.

Background

Binocular Stereo Vision (Binocular Stereo Vision) is an important form of machine Vision, and is a method for acquiring three-dimensional geometric information of an object by acquiring two images of a visual scene from different positions by using imaging equipment based on a parallax principle and calculating position deviation between corresponding points of the images. The binocular stereo vision integrates the original images obtained by two eyes and observes the difference between the original images, so that obvious depth sense can be obtained, the corresponding relation between the characteristics is established, the mapping points of the same space physical point in different images are corresponded, and the parallax (Disparity) image can be obtained by calculating the difference.

The binocular stereo vision can be divided into active stereo vision (actively providing a light source) and passive stereo vision (with the help of ambient light) according to the difference of the light sources, wherein the active stereo vision textures a scene by adding a light source (emitting a dot matrix light spot) in a pseudo-random mode, and compared with the passive stereo vision, the binocular stereo vision can make up the defects of repeated texture or weak texture scenes, the light source in the pseudo-random mode enhances the texture of the scene, reduces the complexity of a stereo matching algorithm, and improves the precision of a depth image.

The existing light source in the pseudo-random mode is a dot matrix laser with uniform point intensity and uniform point density, and is suitable for being applied to an uncertain scene to provide texture information for active stereo matching distance measurement.

Disclosure of Invention

The invention aims to provide a light source quantitative design method and a stereoscopic vision system, which can reduce the energy consumption of the stereoscopic vision system and improve the accuracy of face recognition.

In order to achieve the above object, the present invention provides a method for designing a light source of a stereoscopic vision system, wherein the light source is configured to emit dot-matrix light spots to illuminate a face, and a plurality of camera modules are used to capture the face actively illuminated by the light source to obtain a set of face images corresponding to different capturing angles of the face, the method comprising:

providing N groups of face images and respectively calculating N depth images corresponding to the N groups of face images, wherein N is more than or equal to 1, and each group of face images comprises at least two face images corresponding to the same face;

aligning the human faces in the N depth images to obtain an average depth image;

obtaining a depth gradient image according to the average depth image;

and quantitatively designing the distribution density of the lattice light spots emitted by the light source according to the depth gradient image.

Optionally, the distribution density of the lattice light spots is positively correlated with the gradient value in the depth gradient map.

Optionally, the step of aligning the faces of the N depth images includes:

recording one face image with the same shooting angle in each group of face images as an image to be transformed, and marking k key pixels of each image to be transformed, wherein k is more than or equal to 4;

obtaining a normalization parameter corresponding to each image to be transformed according to k key pixels of each image to be transformed;

and normalizing the N depth images by using the normalization parameters so as to align the face in each depth image.

Optionally, the k key pixels are pixels where contours of two eyes, a nose, and a mouth of the human face are located, respectively.

Optionally, the normalization parameter includes one or more of a scaling coefficient, a translation coefficient, and a rotation coefficient.

Optionally, the depth value of each pixel on the average depth image is an average depth value of the N normalized depth images on the corresponding pixel.

Optionally, the average depth image is convolved by a gradient operator to obtain the depth gradient image.

Optionally, after the distribution density of the dot matrix light spots emitted by the light source is quantitatively designed, the illumination range of the dot matrix light spots emitted by the light source is also constrained according to the size of the average depth image.

The invention also provides a stereoscopic vision system which comprises a light source and a plurality of camera modules, wherein the light source is used for emitting dot matrix light spots to illuminate a face, the camera modules are used for shooting the face actively illuminated by the light source to obtain a group of face images corresponding to different shooting angles of the face, and the distribution density of the dot matrix light spots is quantitatively designed by adopting a quantitative design method of the light source.

Optionally, the distribution density of the lattice light spots is positively correlated with a gradient value in a depth gradient map.

In the quantitative design method of the light source and the stereoscopic vision system provided by the invention, N depth images corresponding to N groups of face images are respectively calculated according to N groups of face images, then the faces of the N groups of face images are aligned to obtain a normalization parameter, the normalization parameter is used for obtaining an average depth image of the faces of the N groups of face images, then a depth gradient image is obtained according to the average depth image, and the distribution density of dot matrix light spots emitted by the light source is constrained according to the depth gradient image, so that the distribution density of the dot matrix light spots can be more suitable for the faces, the measurement precision of the face depth of the stereoscopic vision system is improved, the illumination range of the dot matrix light spots is constrained according to the size of the average depth image, the distribution density of the dot matrix light spots in the area outside the faces can be reduced, and the energy consumption is reduced.

Drawings

Fig. 1 is a flowchart of a quantitative design method of a light source according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a depth gradient image according to an embodiment of the present invention.

Detailed Description

The following describes in more detail embodiments of the present invention with reference to the schematic drawings. Advantages and features of the present invention will become apparent from the following description and claims. It is to be noted that the drawings are in a very simplified form and are not to precise scale, which is merely for the purpose of facilitating and distinctly claiming the embodiments of the present invention.

As shown in fig. 1, the present invention provides a method for quantitatively designing a light source of a stereoscopic vision system, wherein the light source is configured to emit dot-matrix light spots to illuminate a face, and to capture the face actively illuminated by the light source through a plurality of camera modules to obtain a set of face images corresponding to different capturing angles of the face, the method comprising:

s1: providing N groups of face images and respectively calculating N depth images corresponding to the N groups of face images, wherein N is more than or equal to 1, and each group of face images comprises at least two face images corresponding to the same face;

s2: aligning the human faces of the N depth images to obtain an average face, and acquiring an average depth image of the average face;

s3: obtaining a depth gradient image according to the average depth image;

s4: and quantitatively designing the distribution density of the lattice light spots emitted by the light source according to the depth gradient image.

In this embodiment, the stereoscopic vision system is a binocular active stereoscopic vision system, and is applied to a face recognition function of a mobile device (such as a mobile phone or a tablet computer), the stereoscopic vision system has a plurality of camera modules to respectively shoot a same face to obtain a set of face images corresponding to different shooting angles of the face, and the face is actively illuminated by a light source of the stereoscopic vision system during shooting. Optionally, the light source of the stereoscopic vision system may be a dot matrix laser for forming dot matrix light spots, the dot matrix light spots may be infrared dot matrix light spots, and the camera modules may be infrared camera modules.

Based on the binocular active stereo vision system, step S1 is executed first, and N (N is greater than or equal to 1) sets of face images are collected in a face sample library (shot by the stereo vision system), in this embodiment, each set of face images includes two face images shot from different viewing angles and corresponding to the same face, that is, each set of face images includes a pair of face images corresponding to the same face, and then depth images corresponding to each set of face images are calculated respectively, so that N depth images are obtained in total.

Further, because the sizes of the face images of different groups in the face sample library may be different, the sizes of the faces are different, and the distribution positions of the five sense organs of the faces are also different, in order to eliminate the differences of the positions, sizes and directions of the face regions in the face images of different groups, the face images are normalized through coordinate transformation, so that all the face images in each group are normalized to a uniform standard, and the uniform standard is an average face.

Specifically, step S2 is executed to align the human faces of the N depth images to obtain an average face, i.e., an average depth image. Specifically, one of the face images with the same shooting angle in each group of face images is selected and recorded as an image to be transformed, for example, the face image shot at the left view angle in each group of face images is selected or the face image shot at the right view angle in each group of face images is selected and recorded as an image to be transformed, and at this time, there are N images to be transformed. And then marking k key pixels of each image to be transformed, wherein k is greater than or equal to 4, and the k key pixels are respectively the pixels where the outlines of the two eyes, the nose and the mouth of the human face in the image to be transformed are located, namely the outline of each organ except the ears in the five sense organs of the human face has at least one key pixel, and of course, the number of the key pixels distributed on the outline of each organ can be multiple so as to improve the alignment precision.

Optionally, in this embodiment, the N images to be transformed are aligned by the following method: firstly, acquiring the maximum value and the minimum value of an abscissa and the maximum value and the minimum value of an ordinate in k key pixels of each image to be transformed to obtain a scaling coefficient along the direction of the abscissa and a scaling coefficient along the direction of the ordinate; then obtaining the mean value of the abscissa and the mean value of the ordinate in the k key pixels of each image to be transformed so as to obtain a translation coefficient along the direction of the abscissa and a translation coefficient along the direction of the ordinate; the scaling coefficient and the translation coefficient along the abscissa direction and the scaling coefficient and the translation coefficient along the ordinate direction form a normalization parameter; and finally, normalizing the N depth images by using the normalization parameters so as to align the face in each depth image, wherein after the alignment step, the size of each depth image is the same, and the positions of k key pixels on each depth image are the same as the positions of k key pixels on an average face. Of course, the normalization parameters may also include rotation coefficients, and the normalization method may be rigid transformation, affine transformation, etc., and the present invention is not limited thereto.

Further, averaging corresponding pixels on the N depth images to obtain an average face. In this embodiment, the depth values of corresponding pixels on the N images to be transformed are averaged to obtain an average depth image of the average face, that is, the depth value of each pixel on the average depth image is an average depth value of the N depth images on the corresponding pixels, and the calculation of the average depth image may be represented by the following formula:

wherein the content of the first and second substances,

for the depth distribution of the mean depth image, I_s(p) is eachDepth distribution of the depth image.

After obtaining the average depth image, step S3 is executed, and a gradient operator may be used to perform convolution to obtain a depth gradient image of the average face, as shown in fig. 2, then, step S4 is executed to design the distribution density of the lattice spots emitted by the light source according to the depth gradient image, in this embodiment, the distribution density of the lattice spots is positively correlated to the gradient value in the depth gradient map, i.e. the higher the gradient value in the depth gradient image, the greater the density of the lattice spots is set, and the lower the gradient value in the depth gradient image, the smaller the distribution density of the lattice light spots is set, by reducing the distribution density of the lattice light spots in the region with the lower gradient value, therefore, the energy consumption is reduced, and the distribution density of the lattice light spots is increased in the area with higher gradient value, so that the identification precision is improved.

Optionally, in this embodiment, an effective identification area may be determined by using the average depth image obtained through calculation, and after the distribution density of the dot matrix light spots emitted by the light source of the stereoscopic vision system is designed in a quantized manner, the illumination range of the dot matrix light spots emitted by the light source may be designed in a quantized manner according to the effective identification area (the area occupied by the human face), so that the dot matrix light spots emitted by the light source are prevented from becoming invalid light spots.

Based on this, the embodiment further provides a stereoscopic vision system, which includes a light source and a plurality of camera modules, where the light source is configured to emit dot matrix light spots to illuminate a face, and the plurality of camera modules are configured to shoot the face actively illuminated by the light source to obtain a set of face images corresponding to different shooting angles of the face, and the distribution density of the dot matrix light spots is quantitatively designed by using a quantitative design method of the light source. In this embodiment, the distribution density of the dot matrix light spots is positively correlated with the gradient value in the depth gradient map, and the illumination range of the light source can also be determined by the average depth image, that is, the area irradiated by the dot matrix light spots emitted by the light source can be within the range of the average face as much as possible (the dot matrix light spots are irradiated within the effective identification area), so as to avoid that the dot matrix light spots emitted by the light source are irradiated to the area other than the face to become invalid light spots when the stereoscopic vision system is in use, which wastes energy and cannot improve the identification accuracy.

In summary, in the quantitative design method for light source and the stereoscopic vision system provided by the embodiments of the present invention, N depth images corresponding to N sets of face images are respectively calculated according to N sets of face images, then aligning the human faces of the N human face images to obtain a normalization parameter, acquiring an average depth image of the human faces of the N depth images by using the normalization parameter, then obtaining a depth gradient image according to the average depth image, and constraining the distribution density of the dot matrix light spots emitted by the light source according to the depth gradient image so that the distribution density of the dot matrix light spots can be more suitable for human faces, therefore, the measurement precision of the face depth of the stereoscopic vision system is improved, the illumination range of the dot matrix light spots is restrained by the size of the average depth image, the distribution density of the dot matrix light spots in the area outside the face can be reduced, and the energy consumption is reduced.

The above description is only a preferred embodiment of the present invention, and does not limit the present invention in any way. It will be understood by those skilled in the art that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A method for quantitatively designing a light source of a stereoscopic vision system, the light source being used for emitting dot matrix spots to illuminate a face and shooting the face actively illuminated by the light source through a plurality of camera modules to obtain a set of face images corresponding to different shooting angles of the face, the method comprising:

obtaining a depth gradient image according to the average depth image;

quantitatively designing the distribution density of the dot matrix light spots emitted by the light source according to the depth gradient image;

wherein the distribution density of the lattice light spots is positively correlated with the gradient values in the depth gradient map.

2. The method for quantitatively designing a light source according to claim 1, wherein the step of aligning the faces of the N depth images comprises:

3. The method according to claim 2, wherein the k key pixels are pixels where the outlines of the two eyes, nose and mouth of the human face are located.

4. The method of claim 2, wherein the normalization parameters comprise one or more of scaling coefficients, translation coefficients, and rotation coefficients.

5. The method of claim 2, wherein the depth value of each pixel in the average depth image is an average of the depths of the N normalized depth images in the corresponding pixels.

6. The method of claim 1, wherein the average depth image is convolved with a gradient operator to obtain the depth gradient image.

7. The method for quantitatively designing a light source according to any one of claims 1 to 6, wherein after quantitatively designing the distribution density of the lattice spots emitted by the light source, the illumination range of the lattice spots emitted by the light source is further constrained according to the size of the average depth image.

8. A stereoscopic vision system, comprising a light source and a plurality of camera modules, wherein the light source is used for emitting dot matrix light spots to illuminate a face, the plurality of camera modules are used for shooting the face actively illuminated by the light source to obtain a set of face images corresponding to different shooting angles of the face, and the distribution density of the dot matrix light spots is quantitatively designed by using the quantitative design method of the light source according to any one of claims 1 to 7.

9. The stereo vision system of claim 8, wherein the distribution density of the lattice of spots is positively correlated to the gradient values in a depth gradient map.