WO2023103145A1

WO2023103145A1 - Head pose truth value acquisition method, apparatus and device, and storage medium

Info

Publication number: WO2023103145A1
Application number: PCT/CN2022/071709
Authority: WO
Inventors: 周伟杰; 刘威; 袁淮; 吕晋; 周婷; 武红娇; 董德威; 李萌; 曹斌
Original assignee: 东软睿驰汽车技术（沈阳）有限公司
Priority date: 2021-12-09
Filing date: 2022-01-13
Publication date: 2023-06-15
Also published as: CN114220149A

Abstract

The present application discloses a head pose truth value acquisition method, apparatus and device, and a storage medium. A plurality of image acquisition devices collect images of a target object at the same moment, two-dimensional face key points annotated in the plurality of collected images are used as the data bases for three-dimensional reconstruction of the face key points of the target object, and a head pose truth value of the target object corresponding to an image is acquired. According to the present solution, the head pose truth value can be acquired without using a wearable device; therefore, the head pose truth value is not affected by a wearing angle. By means of the use of the plurality of image acquisition devices at the same moment and the three-dimensional reconstruction of the face key points, the accuracy of the acquired head pose truth value is ensured. Furthermore, when the head pose truth value is applied to the fields such as driver fatigue level analysis, virtual reality motion sensing games, commodity purchase intention analysis, and face verification, the data analysis result can be more accurate.

Description

A method, device, equipment and storage medium for obtaining the true value of head posture

This application requires submission of a Chinese patent application to the State Intellectual Property Office of the People's Republic of China on December 09, 2021, with the application number 202111501742.9 and the application name "A Method, Device, Equipment, and Storage Medium for Acquiring the True Value of Head Posture" priority, the entire contents of which are incorporated in this application by reference.

technical field

The present application relates to the technical field of image processing, and in particular to a method, device, equipment and storage medium for acquiring the true value of head posture.

Background technique

The true value of the head pose includes the yaw angle Yaw, the pitch angle Pitch and the roll angle Roll. Among them, the yaw angle, pitch angle and roll angle are the angles of rotation relative to the y-axis, x-axis and z-axis in the Euler angle vector coordinate system, respectively. The true value of head posture has applications in many fields, such as driver fatigue analysis, virtual reality somatosensory games, product purchase desire analysis, face verification and other fields.

At present, the true value of the head pose can be obtained through the sensors in the wearable device. However, the wearing angle of the wearable device easily affects the accuracy of the acquired true value of the head pose. Combined with the many application fields mentioned above, if the accuracy of the true value of the head pose is not good, it is easy to affect the accuracy of the results of the data analysis. For example, due to the poor accuracy of the acquired true value of the driver's head posture, the driver with a high degree of fatigue was mistakenly judged as not fatigued, and the voice reminder was not timely and accurate. It can be seen that improving the accuracy of obtaining the true value of the head pose is an urgent technical problem to be solved.

Contents of the invention

Based on the above problems, the present application provides a method, device, device and storage medium for acquiring the true value of head posture.

The embodiment of the application discloses the following technical solutions:

In the first aspect, the present application provides a method for obtaining the true value of the head pose, including:

Obtaining images of the target object collected by multiple image acquisition devices at the same time;

Marking the face key points in the images collected by each of the image capture devices among the plurality of image capture devices, and obtaining the face key point information corresponding to each of the image capture devices;

Based on the key point information of the face corresponding to each of the image acquisition devices and the parameters of each of the image acquisition devices, reconstruct the three-dimensional key point information of the face corresponding to each of the image acquisition devices;

Establishing a face coordinate system based on the three-dimensional key point information of the face corresponding to the target image acquisition device, where the target image acquisition device is one of the plurality of image acquisition devices;

According to the coordinate system of the target image acquisition device and the face coordinate system, obtain the true value of the head pose of the target object corresponding to the target image, the target image is the target image acquisition device at the time The image collected by the target object.

In an optional implementation manner, the obtaining the true value of the head pose of the target object corresponding to the target image according to the coordinate system of the target image acquisition device and the face coordinate system includes:

Obtaining a rotation matrix of the face coordinate system relative to the coordinate system of the target image capture device according to the face coordinate system and the target image capture device coordinate system;

A true value of the head pose of the target object corresponding to the target image is obtained according to the rotation matrix.

In an optional implementation manner, the establishment of the face coordinate system based on the 3D key point information of the face corresponding to the target image acquisition device includes:

Determining the face plane based on the three-dimensional key point information of the face corresponding to the target image acquisition device;

determining the normal vector of the face plane according to the face plane;

A human face coordinate system of the target object is established based on the human face plane and the normal vector of the human face plane.

In an optional implementation manner, the three-dimensional face key points corresponding to each of the image acquisition devices are reconstructed based on the key point information of the face corresponding to each of the image acquisition devices and the parameters of each of the image acquisition devices. point information, including:

According to the face key point information corresponding to the target image capture device, the face key point information corresponding to the reference image capture device, the internal parameters of the target image capture device, the internal parameters of the reference image capture device, and the target The external parameters between the image acquisition device and the reference image acquisition device are used to reconstruct the three-dimensional key point information of the face corresponding to the target image acquisition device through the triangulation reconstruction method; the reference image acquisition device is the plurality of images Any image acquisition device other than the target image acquisition device among the acquisition devices;

According to the 3D key point information of the face corresponding to the target image capture device and the external parameters, the 3D key point information of the face corresponding to the reference image capture device is obtained.

In an optional implementation, the method for obtaining the true value of the head pose further includes:

The image and head pose ground truths corresponding to each other are stored as image ground truth pairs.

In an optional implementation manner, the acquisition of images collected by multiple image acquisition devices on the target object at the same time includes:

providing a plurality of different lighting conditions to the space in which the target object is located;

acquiring images of the target object collected by the plurality of image acquisition devices at the same moment under the same illumination condition.

In an optional implementation manner, when the multiple image acquisition devices acquire images of the target object, the target object is sitting on a seat, and the seat is used to simulate a seat in a real vehicle environment; The setting parameters of the image acquisition device relative to the seat are determined according to the setting parameters of the simulation object in the real vehicle environment relative to the seat;

The simulated objects include at least one of the following:

A-pillar, B-pillar, instrument panel, front windshield or left side glass.

In an optional implementation manner, the multiple image acquisition devices include: an infrared camera and an RGB camera.

In an optional implementation manner, the internal parameters of the same type of the multiple image acquisition devices have different value ranges.

In the second aspect, the present application provides a device for obtaining the true value of the head posture, including:

An image acquisition module, configured to acquire images collected by multiple image acquisition devices on the target object at the same time;

A key point labeling module, configured to mark the key points of the face in the image captured by each of the image capture devices among the plurality of image capture devices, and obtain the key point information of the face corresponding to each of the image capture devices;

The three-dimensional key point reconstruction module is used to reconstruct the three-dimensional key point information of the face corresponding to each of the image acquisition devices based on the key point information of the face corresponding to each of the image acquisition devices and the parameters of each of the image acquisition devices;

A coordinate system establishment module, configured to establish a face coordinate system based on the three-dimensional key point information of the face corresponding to the target image acquisition device, where the target image acquisition device is one of the plurality of image acquisition devices;

A true value acquisition module, configured to obtain the true value of the head posture of the target object corresponding to the target image according to the coordinate system of the target image acquisition device and the face coordinate system, the target image being the target image The image collected by the collection device on the target object at the time.

In a third aspect, the present application provides a device for obtaining the true value of a head posture, including a processor and a memory; the memory is used to store a computer program; the processor is used to execute the computer program as provided in the first aspect according to the computer program. The method for obtaining the true value of the head pose.

In a fourth aspect, the present application provides a computer-readable storage medium for storing a computer program, and when the computer program is run by a processor, the method for obtaining the true value of the head posture as provided in the first aspect is executed.

Compared with the prior art, the present application has the following beneficial effects:

The present application provides a method for obtaining the true value of head posture. Use multiple image acquisition devices to collect images of the target object at the same time, use the key points of the two-dimensional face marked in the multiple captured images as the data basis for the three-dimensional reconstruction of the key points of the face of the target object, and then obtain the corresponding key points of the image. The ground truth of the head pose of the target object. Since this solution does not require the use of wearable devices to obtain the true value of the head pose, it is not affected by the wearing angle. Through the simultaneous use of multiple image acquisition devices and the 3D reconstruction of key points of the face, the accuracy of the acquired true value of the head pose is guaranteed. Furthermore, when the true value of the head posture is applied to areas such as driver fatigue analysis, virtual reality somatosensory games, product purchase desire analysis, face verification, etc., the data analysis results can be made more accurate.

Description of drawings

In order to more clearly illustrate the specific embodiments of the present application or the technical solutions in the prior art, the following will briefly introduce the accompanying drawings that need to be used in the description of the specific embodiments or prior art. Obviously, the accompanying drawings in the following description The figures show some implementations of the present application, and those skilled in the art can obtain other figures based on these figures without any creative effort.

FIG. 1 is a flow chart of a method for obtaining a true value of a head posture provided by an embodiment of the present application;

FIG. 2 is a schematic diagram of a scene in which images of a target object are collected by multiple image collection devices provided in an embodiment of the present application;

Fig. 3 is a schematic structural diagram of a device for obtaining a true value of a head pose provided by an embodiment of the present application.

Detailed ways

As described above, obtaining the true value of the head posture currently usually relies on the sensors of the wearable device. However, the angle at which a person wears the device may affect the accuracy of the obtained true head pose, which may affect the accuracy of the analysis results when the true head pose is used for data analysis. After research, the inventor proposes a technical solution to simultaneously collect images of people through multiple image acquisition devices, and use these images to obtain the true value of the head posture. This solution does not require people to wear wearable devices, and can obtain high-precision true head poses only by relying on images. Furthermore, the accuracy of the data analysis result obtained when the true value of the head posture is used for data analysis is ensured.

In order to enable those skilled in the art to better understand the solution of the application, the technical solution in the embodiment of the application will be clearly and completely described below in conjunction with the drawings in the embodiment of the application. Obviously, the described embodiment is only It is a part of the embodiments of this application, not all of them. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.

Referring to FIG. 1 , the figure is a flow chart of a method for obtaining a true value of a head pose provided by an embodiment of the present application. The method shown in Figure 1 includes the following steps:

S101: Acquire images of a target object collected by multiple image collection devices at the same time.

In the embodiment of the present application, it is proposed that multiple image acquisition devices are used to acquire images of the target object at the same time. The target object refers to the subject whose true value of the head pose needs to be obtained. For example, when it is necessary to obtain the true value of Mr. A's head posture, Mr. A is used as the target object, and multiple image acquisition devices are used to acquire images of Mr. A at the same time.

The image acquisition device may be any device capable of capturing and forming images, such as an RGB camera. The type and model of the image acquisition device are not limited here. The multiple image acquisition devices are specifically two or more image acquisition devices. That is, when it is required to acquire images of the target object at the same time, at least two image acquisition devices are used. The setting parameters of multiple image acquisition devices are different or not completely the same. Setting parameters include position, height and angle, etc.

S102: Mark the key points of the face in the images captured by each of the multiple image capture devices, and obtain the key point information of the face corresponding to each of the image capture devices.

In this step, the face key points are marked on the images acquired by each image acquisition device. The marked face key points may include but not limited to: eyebrow head, eyebrow peak, eyebrow tail, inner eye corner, outer eye corner, pupil center, nose wing, nostril, mouth corner, etc. As an example, 70 keypoints of faces in an image are labeled. There is no limit to the number of labels for key points here.

Recognizing and marking key points of human faces in images containing human faces is a relatively mature technology in this field, so the specific implementation of this step is not limited here. The face key point information may include: the pixel coordinates of the key point in the image. Since the image is a two-dimensional image, the key point information of the human face obtained by marking also refers to the key point information of the human face in the two-dimensional image.

S103: Based on the key point information of the face corresponding to each image acquisition device and the parameters of each image acquisition device, reconstruct the three-dimensional key point information of the face corresponding to each image acquisition device.

The execution purpose of this step is to construct the information of the key points of the three-dimensional face by marking the key points of the two-dimensional face in multiple images collected by different image acquisition devices at the same time. In specific implementation, for each image, a set of corresponding three-dimensional key point information of the human face must be obtained. Since the image corresponds to the image acquisition device, it can be understood that the image acquisition device corresponds to the obtained three-dimensional key point information of the human face. Taking a certain acquisition moment as an example, in this step, it is necessary to reconstruct the 3D key point information of the face corresponding to each image acquisition device. For example, there are 14 image acquisition devices in total, and for the same acquisition time, it is necessary to obtain the 3D key point information of the face corresponding to each of the 14 image acquisition devices.

Among the 14 image acquisition devices, different image acquisition devices have unique labels, for example, device No. 1, device No. 2, ... device No. 14. The order of the labels is not limited here. For example, labeling may be performed in a manner of increasing ordinal numbers along one direction.

An exemplary implementation of S103 is introduced below:

According to the face key point information corresponding to the target image capture device, the face key point information corresponding to the reference image capture device, the internal parameters of the target image capture device, the internal parameters of the reference image capture device, and the target image capture device and the reference image capture device Between the external parameters, the three-dimensional key point information of the face corresponding to the target image acquisition device is reconstructed by the triangulation reconstruction method. The reference image acquisition device is any image acquisition device other than the target image acquisition device among the plurality of image acquisition devices. According to the 3D key point information of the face corresponding to the target image capture device and the external parameters, the 3D key point information of the face corresponding to the reference image capture device is obtained. The 3D key point information of the face may include the 3D coordinates of the key points of the face in the coordinate system of the image acquisition device.

It is assumed that No. 4 device is the target image acquisition device, and No. 5 device is the reference image acquisition device. When performing this step, the face key point information corresponding to No. 4 device, the face key point information corresponding to No. 5 device, the internal parameters of No. 4 device, the internal parameters of No. 5 device, and Between the external parameters, the information of the 3D key points of the face corresponding to the No. 4 device is reconstructed through the triangulation reconstruction method. When triangulating and reconstructing the 3D key point information of the face, it can be realized by the triangulation reconstruction function. The triangulation reconstruction is a relatively mature technology in this field, so the specific implementation process will not be described here. After obtaining the 3D key point information of the face corresponding to No. 4 device, since the external parameters between different image acquisition devices have been calibrated in advance, it is possible to use the external parameters between other image acquisition devices and No. 4 device. The 3D key point information of the face corresponding to the No. 4 device is converted into the coordinate system of other image acquisition devices, and the 3D key point information of the face corresponding to other image acquisition devices is obtained. In this way, the three-dimensional key point information of the human face corresponding to each of the 14 image acquisition devices can be obtained.

In the following, only the 3D key point information of the face corresponding to one target image acquisition device is taken as an example to introduce the method of obtaining the true value of the head pose. The 3D face key point information corresponding to other image acquisition devices can also perform corresponding operations in the same implementation manner. For ease of description, in the embodiments of the present application, the image of the target object captured by the target image capture device at the above-mentioned moment is referred to as the target image.

S104: Establish a face coordinate system based on the three-dimensional key point information of the face corresponding to the target image acquisition device, where the target image acquisition device is one of multiple image acquisition devices.

When this step is actually implemented, the plane of the face can be determined based on the 3D key point information of the face corresponding to the target image acquisition device. As an example, a face plane can be constructed based on the coordinates of 3 to 4 key points of the face in the coordinate system of the target image acquisition device. The key points of the selected face are not limited.

Next, determine the normal vector of the face plane according to the face plane. As mentioned earlier, the coordinates of 3 to 4 key points of the face are used when determining the plane of the face. Of course, two non-parallel space vectors can also be obtained through the coordinates of the 3 to 4 key points of the face. In specific implementation, the normal vector of the face plane can be obtained by cross-producting these two vectors. In addition, the unit vector of the normal vector can also be determined through the cross function of NumPy (full name: Numerical Python, which is an open source numerical calculation extension of Python).

Combined with the above, the face plane and the normal vector of the face plane have been obtained. Furthermore, the face coordinate system of the target object can be established based on the plane and the normal vector of the plane. It can be understood that since the three-dimensional key point information of the face is obtained in the coordinate system of the target image acquisition device, the face coordinate system established based on the three-dimensional key point information of the face is also based on the coordinate system of the target image acquisition device. Base.

S105: According to the coordinate system of the target image acquisition device and the face coordinate system, obtain the true value of the head pose of the target object corresponding to the target image, where the target image is an image collected by the target image acquisition device of the target object at any time.

On the basis of knowing the coordinate system of the target image acquisition device and the face coordinate system, the rotation matrix of the face coordinate system relative to the coordinate system of the target image acquisition device can be obtained according to the face coordinate system and the coordinate system of the target image acquisition device . Since there is an association between this rotation matrix and the three angles (yaw angle Yaw, pitch angle Pitch, and roll angle Roll) in the true value of the head pose, the true head pose of the target object can be obtained based on the rotation matrix. value. Since the true value of the head pose is obtained on the basis of the coordinate system of the target image acquisition device and the face coordinate system, and the face coordinate system is also based on the coordinate system of the target image acquisition device, the target image acquisition device and the The target images taken at all times have correspondence, so the true value of the head pose obtained in this step can be associated with the target image.

For example, it is also possible to store the image and the computed head pose ground truth as an image ground truth pair. In this way, a one-to-one correspondence between images and truth values is established. Storing image ground truth pairs facilitates the use of head pose ground truth in subsequent applications, such as training models. As an example, the model may be a model for determining the true value of the head posture through images, or further, the model may be a model for determining the driving safety factor through images or a model for analyzing driving fatigue, etc. The specific functions of the trained model are not limited here.

The above is the method for obtaining the true value of the head pose provided by the embodiment of the present application. In this method, multiple image acquisition devices are used to collect images of the target object at the same time, and the two-dimensional face key points marked in the multiple collected images are used as the data basis for three-dimensional reconstruction of the target object face key points, and then obtained The true value of the head pose of the target object corresponding to the image. Since this solution does not require the use of wearable devices to obtain the true value of the head pose, it is not affected by the wearing angle. Through the simultaneous use of multiple image acquisition devices and the 3D reconstruction of key points of the face, the accuracy of the acquired true value of the head pose is guaranteed. Furthermore, when the true value of the head posture is applied to areas such as driver fatigue analysis, virtual reality somatosensory games, product purchase desire analysis, face verification, etc., the data analysis results can be made more accurate.

In a specific application, in order to provide more diverse and rich data for the subsequent usage scenarios of the true value of the head pose, it is proposed in the embodiment of the present application that a variety of different data can be provided to the space where the target object is located during the implementation of S101. lighting conditions. Acquire images of the target object collected by multiple image acquisition devices at the same time under the same lighting conditions. As an example, the images collected by all image acquisition devices at the same time are obtained under the first-level lighting condition, the images collected by all the image acquisition devices at the same time are obtained under the second-level lighting condition, and the images collected by all the image acquisition devices are obtained at the same time under the third-level lighting condition. The captured image. In practical applications, even if the image acquisition device is the same and the target object maintains the head posture, the true value of the head posture obtained by executing this method under different lighting conditions may be different. By collecting images under different illuminations and obtaining the true value of the head pose separately, it can help to improve the accuracy of data analysis. For example, when it is necessary to perform data analysis under the first-level lighting conditions, because the previous stage collects images under the first-level lighting conditions and obtains the true value of the head pose, not only collects images under other lighting conditions and obtains the true value of the head pose. value, thus making the data analysis under this condition more accurate in the data analysis results.

The first-level, second-level, and third-level lighting conditions described above are examples of different lighting conditions, and the levels of different lighting conditions are not limited here. For example, it could be to include 4 levels of lighting conditions. The higher the level, the weaker the light intensity; the lower the level, the higher the light intensity. In another example, the lighting conditions are divided into sunlight lighting conditions, infrared lighting conditions, ultraviolet lighting conditions, and the like. The above lighting conditions can be realized by the natural environment, and can also be realized by lighting devices. For example, adjust or control the selection of light source type, light on and off, intensity, and irradiation angle in the lighting device to achieve different lighting conditions.

When the true value of the head posture needs to be applied to the data analysis scene in the driving field, the embodiment of the present application can take adaptive measures in the image acquisition stage. For example, when multiple image acquisition devices acquire images of a target object, the target object sits on a seat, and the seat is used to simulate a seat in a real vehicle environment. For example, the target object represents the driver, and the seat it is on represents the driver's seat. The setting parameters of the image acquisition device relative to the seat are determined according to the setting parameters of the simulated object relative to the seat in the real vehicle environment. The setting parameters include: position, height, angle, etc. The simulated object includes at least one of the following: A-pillar, B-pillar, instrument panel, front windshield or left vehicle glass. As an example, No. 1 device is set to simulate the A-pillar, No. 2 device is set to simulate the B-pillar, No. 3 device is set to simulate the instrument panel, No. 4 device is set to simulate the front windshield, and No. 5 device is set to simulate the left car glass set up.

FIG. 2 is a schematic diagram of a scene in which images of a target object are collected by multiple image collection devices according to an embodiment of the present application. Fig. 2 shows that 14 cameras collect images of the target object on the seat, and the lighting device is surrounded above the target object in the scene for changing the lighting conditions.

In addition to the advantages mentioned above, there are many advantages in the technical solution of the embodiment of the present application:

In the embodiment of the present application, multiple image acquisition devices can simultaneously record the video of the same person during acquisition, and calculate the continuous true value of the head posture according to the three-dimensional key point information of the face of the video frame number. Therefore, the technical solution of the present application has continuity in obtaining the true value of the head posture.

In this method, there is an image acquisition device that captures images of people's side faces. Therefore, this solution has the ability to acquire the true value of head poses at large angles.

This method adopts a non-contact head posture acquisition method, which is relatively simple and convenient to implement.

In this embodiment of the present application, types of multiple image acquisition devices used to acquire images may include infrared cameras, RGB cameras, and the like. In addition, the value ranges of the same type of internal parameters of multiple image acquisition devices may be different. These internal parameters with different value ranges may be focal length, optical center value, and the like. As an example, among multiple image acquisition devices, the focal length range of device A is f1-f2, and the focal length range of device B is f3-f4.

In practical applications, various types of images, such as infrared images and RGB images, can be formed by configuring multiple different types of image acquisition devices. In this way, a variety of images with different imaging effects can be obtained through a small number of acquisitions, which can meet the acquisition requirements of different research and development projects for images with different imaging effects in practical applications, and save acquisition time and acquisition costs.

In practical applications, by configuring a variety of image acquisition devices with internal references in different value ranges, images with different imaging effects can be obtained through fewer acquisitions. In this way, it can meet the acquisition requirements of different research and development projects for various images with different imaging effects in practical applications, saving acquisition time and acquisition cost.

In the embodiment of the present application, various types of devices and devices with various internal parameter value ranges are set to make the acquired image data more diverse and meet actual use requirements.

Based on the method for obtaining the true value of the head posture provided in the foregoing embodiments, correspondingly, the present application also provides a device for obtaining the true value of the head posture. The specific implementation of the device will be described below in conjunction with the embodiments. Fig. 3 is a schematic structural diagram of a device for obtaining a true value of a head pose. The device 300 shown in Figure 3 includes:

An image acquisition module 301, configured to acquire images collected by multiple image acquisition devices on the target object at the same time;

The key point labeling module 302 is configured to label the key points of the face in the image captured by each of the image capture devices among the plurality of image capture devices, and obtain the key point information of the face corresponding to each of the image capture devices ;

The three-dimensional key point reconstruction module 303 is used to reconstruct the three-dimensional key point information of the face corresponding to each of the image acquisition devices based on the key point information of the face corresponding to each of the image acquisition devices and the parameters of each of the image acquisition devices;

A coordinate system establishment module 304, configured to establish a face coordinate system based on the three-dimensional key point information of the face corresponding to the target image acquisition device, where the target image acquisition device is one of the plurality of image acquisition devices;

The true value acquisition module 305 is configured to obtain the true value of the head posture of the target object corresponding to the target image according to the coordinate system of the target image acquisition device and the human face coordinate system, and the target image is the target object The image collected by the image collection device on the target object at the time.

Use multiple image acquisition devices to collect images of the target object at the same time, use the key points of the two-dimensional face marked in the multiple captured images as the data basis for the three-dimensional reconstruction of the key points of the face of the target object, and then obtain the corresponding key points of the image. The ground truth of the head pose of the target object. Since this solution does not require the use of wearable devices to obtain the true value of the head pose, it is not affected by the wearing angle. Through the simultaneous use of multiple image acquisition devices and the 3D reconstruction of key points of the face, the accuracy of the acquired true value of the head pose is guaranteed. Furthermore, when the true value of the head posture is applied to areas such as driver fatigue analysis, virtual reality somatosensory games, product purchase desire analysis, face verification, etc., the data analysis results can be made more accurate.

Optionally, the truth acquisition module 305 includes:

a rotation matrix acquiring unit, configured to obtain a rotation matrix of the face coordinate system relative to the coordinate system of the target image capture device according to the face coordinate system and the coordinate system of the target image capture device;

A true value obtaining unit, configured to obtain the true value of the head pose of the target object corresponding to the target image according to the rotation matrix.

Optionally, the coordinate system establishment module 304 includes:

A face plane determining unit, configured to determine a face plane based on the three-dimensional key point information of the face corresponding to the target image acquisition device;

a normal vector determination unit, configured to determine the normal vector of the face plane according to the face plane;

A coordinate system establishing unit, configured to establish the face coordinate system of the target object based on the face plane and the normal vector of the face plane.

Optionally, the 3D key point reconstruction module 303 includes:

Optionally, the acquisition device 300 of the true value of the head posture also includes:

The storage module is used for storing the images corresponding to each other and the true value of the head posture as image true value pairs.

Optionally, the image acquisition module 301 includes:

a lighting unit, configured to provide a variety of different lighting conditions to the space where the target object is located;

An image acquisition unit, configured to acquire images of the target object acquired by the plurality of image acquisition devices at the same time under the same lighting condition.

Based on the method and device for obtaining the true value of the head posture provided in the foregoing embodiments, correspondingly, the present application also provides a device for obtaining the true value of the head posture, including a processor and a memory; the memory is used to store computer programs; The processor is configured to execute, according to the computer program, the method for obtaining the true value of the head posture as provided in the foregoing method embodiments. In addition, the processor in the device can also be used to control the lighting device to provide variable lighting conditions.

Based on the method, device and device for obtaining the true value of the head posture provided in the foregoing embodiments, correspondingly, the present application also provides a computer-readable storage medium for storing a computer program, and the computer program is executed when the processor runs The method for obtaining the true value of the head pose as provided in the foregoing method embodiments.

The above is only a specific embodiment of the present application, but the protection scope of the present application is not limited thereto. Any person familiar with the technical field can easily think of changes or Replacement should be covered within the protection scope of this application. Therefore, the protection scope of the present application should be based on the protection scope of the claims.

Claims

A method for obtaining a true value of a head pose, comprising:

Obtaining images of the target object collected by multiple image acquisition devices at the same time;

Marking the face key points in the images collected by each of the image capture devices among the plurality of image capture devices, and obtaining the face key point information corresponding to each of the image capture devices;

Based on the key point information of the face corresponding to each of the image acquisition devices and the parameters of each of the image acquisition devices, reconstruct the three-dimensional key point information of the face corresponding to each of the image acquisition devices;

Establishing a face coordinate system based on the three-dimensional key point information of the face corresponding to the target image acquisition device, where the target image acquisition device is one of the plurality of image acquisition devices;

According to the coordinate system of the target image acquisition device and the face coordinate system, obtain the true value of the head pose of the target object corresponding to the target image, the target image is the target image acquisition device at the time The image collected by the target object.
The method according to claim 1, wherein the obtaining the true value of the head posture of the target object corresponding to the target image according to the coordinate system of the target image acquisition device and the human face coordinate system comprises :

Obtaining a rotation matrix of the face coordinate system relative to the coordinate system of the target image capture device according to the face coordinate system and the target image capture device coordinate system;

A true value of the head pose of the target object corresponding to the target image is obtained according to the rotation matrix.
The method according to claim 1, wherein the establishment of a human face coordinate system based on the three-dimensional key point information of the human face corresponding to the target image acquisition device includes:

Determining the face plane based on the three-dimensional key point information of the face corresponding to the target image acquisition device;

determining the normal vector of the face plane according to the face plane;

A human face coordinate system of the target object is established based on the human face plane and the normal vector of the human face plane.
The method according to claim 1, wherein, based on the face key point information corresponding to each of the image acquisition devices and the parameters of each of the image acquisition devices, the person corresponding to each of the image acquisition devices is reconstructed. Face 3D key point information, including:

According to the face key point information corresponding to the target image capture device, the face key point information corresponding to the reference image capture device, the internal parameters of the target image capture device, the internal parameters of the reference image capture device, and the target The external parameters between the image acquisition device and the reference image acquisition device are used to reconstruct the three-dimensional key point information of the face corresponding to the target image acquisition device through the triangulation reconstruction method; the reference image acquisition device is the plurality of images Any image acquisition device other than the target image acquisition device among the acquisition devices;

According to the 3D key point information of the face corresponding to the target image capture device and the external parameters, the 3D key point information of the face corresponding to the reference image capture device is obtained.
The method according to any one of claims 1-4, further comprising:

The image and head pose ground truths corresponding to each other are stored as image ground truth pairs.
The method according to any one of claims 1-4, wherein the acquiring images of the target object acquired by multiple image acquisition devices at the same time comprises:

providing a plurality of different lighting conditions to the space in which the target object is located;

acquiring images of the target object collected by the plurality of image acquisition devices at the same moment under the same illumination condition.
The method according to any one of claims 1-4, wherein when the multiple image acquisition devices acquire the images of the target object, the target object sits on a seat, and the seat is used for simulating A seat in the vehicle environment; the setting parameters of the image acquisition device relative to the seat are determined according to the setting parameters of the simulated object in the actual vehicle environment relative to the seat;

The simulated objects include at least one of the following:

A-pillar, B-pillar, instrument panel, front windshield or left side glass.
The method according to any one of claims 1-4, wherein the multiple image acquisition devices include: an infrared camera and an RGB camera.
The method according to any one of claims 1-4, characterized in that the internal references of the same type of the multiple image acquisition devices have different value ranges.
A device for obtaining the true value of a head posture, characterized in that it includes:

An image acquisition module, configured to acquire images collected by multiple image acquisition devices on the target object at the same time;

A key point labeling module, configured to mark the key points of the face in the image captured by each of the image capture devices among the plurality of image capture devices, and obtain the key point information of the face corresponding to each of the image capture devices;

The three-dimensional key point reconstruction module is used to reconstruct the three-dimensional key point information of the face corresponding to each of the image acquisition devices based on the key point information of the face corresponding to each of the image acquisition devices and the parameters of each of the image acquisition devices;

A coordinate system establishment module, configured to establish a face coordinate system based on the three-dimensional key point information of the face corresponding to the target image acquisition device, where the target image acquisition device is one of the plurality of image acquisition devices;

A true value acquisition module, configured to obtain the true value of the head posture of the target object corresponding to the target image according to the coordinate system of the target image acquisition device and the face coordinate system, the target image being the target image The image collected by the collection device on the target object at the time.
A device for obtaining the true value of a head posture, characterized in that it includes a processor and a memory; the memory is used to store a computer program; and the processor is used to execute any one of claims 1-9 according to the computer program. The method for obtaining the true value of the head pose.
A computer-readable storage medium, characterized in that it is used to store a computer program, and when the computer program is executed by a processor, the method for obtaining the true value of the head posture according to any one of claims 1-9 is executed.