CN115190288B

CN115190288B - Method and equipment for synchronously acquiring images by multiple cameras

Info

Publication number: CN115190288B
Application number: CN202210718630.7A
Authority: CN
Inventors: 史灿灿
Original assignee: Hisense Electronic Technology Shenzhen Co ltd
Current assignee: Hisense Electronic Technology Shenzhen Co ltd
Priority date: 2022-06-23
Filing date: 2022-06-23
Publication date: 2023-04-25
Anticipated expiration: 2042-06-23
Also published as: CN115190288A

Abstract

The application relates to the technical field of computer vision and provides a method and equipment for synchronously acquiring images of multiple cameras, which are used for realizing synchronous exposure of the depth cameras and multiple industrial cameras by considering that the depth cameras have no hardware synchronous interface and cannot directly receive hardware synchronous signals synchronously exposed by the multiple industrial cameras, so that synchronous exposure of the depth cameras and the multiple industrial cameras is realized by means of LED lamps, the target parameter set of exposure information of the depth cameras is acquired by the images of the LED lamps acquired by the depth cameras, and the industrial cameras are provided with the hardware synchronous interface, so that the target parameter set of the depth cameras can be transmitted to the multiple industrial cameras by a control signal controller, and synchronous exposure of the multiple industrial cameras and the depth cameras is realized.

Description

Method and equipment for synchronously acquiring images by multiple cameras

Technical Field

The application relates to the technical field of computer vision, and provides a method and equipment for synchronously acquiring images by multiple cameras.

Background

With the development of Virtual Reality (VR) technology, application scenes thereof are becoming wider and wider. In a virtual scene, the 3D gesture of the hand is accurately restored through a gesture technology to accurately express the meaning expressed by the hand of the virtual character or accurately make corresponding hand actions, the delay and confusion of the gesture technology depend on the performance of a neural network model, and a large amount of high-quality gesture labeling data are important supports for improving the performance of the hand detection model and a gesture joint point estimation model.

Because 3D gesture data is complex, manual labeling costs are relatively high, and labeling quality is unsatisfactory. In order to improve the labeling efficiency of 3D gesture data, an automatic labeling method that a depth camera generates a gesture 3D model and projects the gesture 3D model to an industrial camera to generate labeling data is mostly adopted at present, and synchronous exposure of the depth camera and the industrial camera is needed. However, the depth camera does not have a hardware synchronization interface, so that the synchronous exposure of the depth camera and the industrial camera is realized only by adopting a soft synchronization method at present, wherein soft synchronization means that a group of images with close time stamps are collected as a synchronization result, the synchronization precision is low, the acquisition requirement of 3D gesture data of synchronous exposure cannot be met, and further, the real and natural 3D hand gestures cannot be restored, so that the immersive experience is affected.

Accordingly, it is desirable to provide a method for improving the labeling quality of 3D gesture data by enabling simultaneous exposure of a depth camera and an industrial camera.

Disclosure of Invention

The embodiment of the application provides a method and equipment for synchronously acquiring images by multiple cameras, which are used for improving the labeling quality of 3D gesture data.

In one aspect, an embodiment of the present application provides a method for synchronously acquiring images by multiple cameras, including:

the method comprises the steps that LED lamps in a visual field of a depth camera are controlled to continuously flash through a signal controller, and multi-frame RGB images of the LED lamps collected by the depth camera are obtained;

according to multi-frame RGB images acquired by the depth camera, determining a target parameter set of the LED lamp synchronous with the depth camera, and emptying the RGB images;

the signal controller is controlled to send the target parameter set to a plurality of industrial cameras, and the LED lamps are set to flash for a single time;

and determining the time relation of synchronous exposure between the depth camera and each industrial camera according to the time stamp of the RGB image which is acquired by the depth camera and is lighted by the LED lamp once and the time stamp of the RGB image which is acquired by each industrial camera so as to acquire the image.

On the other hand, the embodiment of the application provides 3D image acquisition equipment, which comprises a processor, a memory, a USB virtual serial port and a data interface; the data interface and the USB virtual serial port are connected with the memory through a bus;

The memory includes a data storage unit and a program storage unit, the program storage unit stores a computer program, and the processor performs the following operations according to the computer program:

a first control instruction is sent to the signal controller through the USB virtual serial port, so that the signal controller controls the LED lamps in the visual field of the depth camera to continuously flash;

acquiring multi-frame RGB images of the LED lamp in a view field through the data interface, and storing the multi-frame RGB images in the data storage unit;

according to multi-frame RGB images acquired by the depth camera, determining a target parameter set of the LED lamp synchronous with the depth camera, and clearing the RGB images stored in the data storage unit;

the target parameter set is sent to the signal controller through the USB virtual serial port, so that the signal controller sends the target parameter set to a plurality of industrial cameras, and a second control instruction is sent to the signal controller through the USB virtual serial port, so that the signal controller controls the LED lamp to flash for a single time;

respectively acquiring multi-frame RGB images collected by the depth camera and subjected to single flickering by the LED lamp and RGB images collected by each industrial camera through the data interface;

In another aspect, embodiments of the present application provide a computer-readable storage medium storing computer-executable instructions for causing a computer device to perform the method for synchronously capturing images with multiple cameras provided by embodiments of the present application.

According to the method and the device for synchronously acquiring the images of the multiple cameras, due to the fact that the depth camera does not have a hardware synchronous interface and cannot directly receive the hardware synchronous signals synchronously exposing with the multiple industrial cameras, synchronous exposing of the depth camera and the multiple industrial cameras is achieved through the LED lamps, the target parameter set of the LED lamps reflecting exposing information of the depth camera is acquired through the images of the LED lamps acquired by the depth camera, and due to the fact that the industrial cameras are provided with the hardware synchronous interface, the target parameter set of the LED lamps can be sent to the multiple industrial cameras through the control signal controller, synchronous exposing of the multiple industrial cameras and the depth camera is achieved, and therefore based on the depth camera and the multiple industrial cameras after synchronous exposing, the 3D gesture data set for training the hand key point estimation model is acquired, and therefore marking quality of the 3D gesture data is improved, reality of a hand model is improved, and immersive experience of a user is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort to a person skilled in the art.

FIG. 1A is an effect diagram of restoring a 3D gesture from an image according to an embodiment of the present application;

FIG. 1B is a diagram illustrating another effect of recovering a 3D gesture from an image according to an embodiment of the present disclosure;

fig. 2 is a schematic diagram of a hand key point provided in an embodiment of the present application;

fig. 3 is a schematic diagram of an image capturing system according to an embodiment of the present application;

fig. 4 is a position relationship diagram of a depth camera and an LED lamp provided in an embodiment of the present application;

fig. 5 is a signal waveform diagram of a single chip microcomputer, an LED lamp and an industrial camera according to an embodiment of the present application;

FIG. 6 is a graph of parameters of an LED lamp versus dark fringes in an RGB image according to an embodiment of the present application;

FIG. 7 is a flowchart of a method for synchronously exposing a depth camera and a plurality of industrial cameras according to an embodiment of the present application;

FIG. 8 is a flowchart of a method for determining a target parameter set of an LED lamp according to an embodiment of the present disclosure;

FIG. 9 is a flowchart of a method for determining a target frame rate of an LED lamp according to an embodiment of the present disclosure;

fig. 10 is a flowchart of a method for determining a target phase of an LED lamp according to an embodiment of the present disclosure;

fig. 11 is two adjacent frame images collected by a depth camera according to an embodiment of the present application, where the LED lamp flashes at a time;

FIG. 12 is a flowchart of a method for capturing images using a synchronously exposed depth camera and multiple industrial cameras according to an embodiment of the present application;

FIG. 13 is a flowchart of a complete method for synchronously capturing images by multiple cameras according to an embodiment of the present application;

fig. 14 is a block diagram of a host provided in an embodiment of the present application.

Detailed Description

In order to clearly describe the embodiments of the present application, an explanation is given below for terms of the present application.

Soft synchronization: refers to gathering a set of images with close time stamps as a result of the synchronization.

Hard synchronization: the method is that a plurality of cameras are exposed at the same time (namely, image acquisition is carried out) through an external trigger signal, and all cameras are required to be provided with a hardware synchronous interface and a synchronous signal generating device.

Depth camera: comprising an RGB sensor and a depth sensor, RGB images and depth images can be acquired simultaneously. A hardware synchronization interface is typically not provided.

Industrial camera: comprises RGB sensor with hardware synchronous interface.

3D gesture estimation: it means that three-dimensional coordinates representing hand key point positions are estimated by using RGB images or depth images or a combination of the RGB images and the depth images.

The following outlines the design ideas of the embodiments of the present application.

At present, the virtualization market is increasingly expected to be rich in application scenes, the most widely applied virtual scenes such as virtual conferences and virtual friends making are in current market demands, wherein gesture technology determines immersive experience of the virtual scenes, because under the condition of different exposure of multiple cameras, the positions of the whole hands of a user and the positions of all finger joints in the real scenes can still be detected in real time, and accurate and real hand positions and hand gestures of the user can be synchronously restored at any view angle, so that 3D gestures of the hands of the user can be accurately restored through gesture technology in the virtual scenes, and the meaning expressed by the hands of the user can be better expressed, or corresponding hand actions can be accurately made, such as: play slides, grasp items, etc. Once the virtual scene leaves the multi-view and synchronous gesture technology, gestures of a user can become uncontrollable, and gesture images or finger joints frequently appear abnormal jumping in the virtual scene, so that the user cannot express willingness through hands and even irreversible and destructive product experience is brought, and the immersive experience of the virtual scene or the meaning of cross-space convergence is lost.

For example, in the virtual conference interaction process, the multiple cameras can incorrectly recognize the hands or not synchronously recognize the hands, which can cause that the VR device can accurately restore the 3D gesture to the hand image acquired at one view angle, as shown in (a) in fig. 1A, but cannot accurately restore the 3D gesture to the hand image acquired at the other view angle, as shown in (b) in fig. 1A, it can be clearly seen from fig. 1A that the hand image of the user can accurately perform gesture labeling at the view angle of (a), and when the gesture labeling deviates from the hand position of the user at the view angle of (b), the situation seriously damages the VR experience of the user due to the problem caused by that the multiple cameras fail to synchronously expose, even causes that the virtual article cannot be triggered or the actual action of the hand of the user cannot be restored in the virtual conference scene.

For another example, when the hand image collected under multiple angles is gesture-marked, the hand gesture can accurately reflect the real hand position, as shown in (c) in fig. 1B, and when the hand image collected under multiple angles is not gesture-marked, a certain deviation between the hand gesture and the real hand position is caused, resulting in poor experience effect, as shown in (d) in fig. 1B.

As shown in fig. 1A (a) and fig. 1B (c), the user is given a true, natural immersive experience when the restored hand gesture fits exactly the actual hand position.

Currently, the mainstream technology of 3D gesture key point estimation is an estimation method based on deep learning, and the root cause of abnormal phenomena such as delay and confusion generated by the gesture technology is the performance of a deep neural network model. The deep learning method is data hungry and thirsty, a large amount of high-quality gesture labeling data are important supports for improving the performance of the hand detection model and the gesture key point estimation model, and the accuracy and generalization of the gesture key point estimation are directly affected by the quantity and the quality of the data, so that the performance of one AI product depends on the quality of the data acquired in an application scene.

When estimating hand keypoints based on deep learning, most of 21 keypoints are extracted, and the positions of the keypoints are shown in fig. 2. Because 3D gesture data are difficult to acquire and complex to process, at present, large noise exists in the 3D gesture data set disclosed in China and internationally, which is unfavorable for training of a hand key point estimation model and affects the application of the hand key point estimation model on AR/VR. To ensure accuracy and real-time of the hand keypoint estimation model in a virtual scene, it is required that its learned parameters be sufficiently accurate, which requires that the data set provided to model learning be accurate and of high quality. Thus, in training a hand keypoint model, 3D gesture data in some application scenarios may need to be acquired and annotated.

Considering that the manual labeling cost is relatively high and the labeling quality is not satisfactory, at present, an automatic labeling method that a gesture 3D model is generated by a depth camera and then projected to an industrial camera to generate labeling data is mostly adopted. Since the hand can move at a high speed, when the exposure of the depth camera and the plurality of industrial cameras is not synchronous, the moving moment of each camera for shooting the hand image is different, and the actual position of the hand is also different. For example, the hand movement speed is 1m/s, the camera exposure times differ by 15ms, the corresponding hand distance is 15mm, which is already comparable to the width of one finger. Therefore, errors generated when a plurality of hand images at different positions are used as the same position for gesture labeling can affect the accuracy and generalization of the gesture key point estimation model, so that the depth camera and the industrial camera are highly synchronous in exposure, thereby increasing labeling precision and reducing data set noise. However, high quality depth cameras do not have a hardware synchronization interface and if only soft synchronization is used, the quality of the annotated data will be degraded.

In view of this, the embodiment of the application provides a method and device for synchronously acquiring images by using a multi-camera, considering that a depth camera has no hardware synchronous interface and cannot directly receive hardware synchronous signals sent by a signal controller, therefore, synchronous exposure of the depth camera and a plurality of cameras is realized by means of an LED lamp, the acquired frame rate of the depth camera is determined and set to the LED lamp by analyzing RGB images of the LED lamp in a field of view acquired by the depth camera, and the target phase of the LED lamp is determined, further, a set of target parameters for synchronous exposure can be sent to a plurality of industrial cameras by the signal controller by means of the hardware synchronous signals, so that the frequency, the width and the phase of the industrial cameras and the LED lamp are identical, and the hardware synchronous signals of the plurality of industrial cameras and the control signals of the LED lamp can be independently turned on and off. After a plurality of industrial cameras and depth cameras can be synchronously exposed, hand images for training a hand key point estimation model in a synchronous exposure time period can be acquired, a gesture 3D model generated according to the hand images acquired by the depth cameras is projected to the industrial cameras, a high-quality 3D gesture data set with labels is obtained, and further the trained hand key point estimation model can accurately restore 3D gestures in a virtual scene, and immersive experience of a user is improved.

Referring to fig. 3, an image capturing system according to an embodiment of the present application includes a host, a signal controller, an LED lamp, a depth camera, and a plurality of industrial cameras. When the multi-camera synchronous exposure is set, the LED lamp moves into the visual field of the depth camera, when the multi-camera is used for data acquisition, the LED lamp moves out of the visual field of the depth camera, the LED lamp and the signal controller form an exposure synchronization device, and as shown in fig. 4, the defect that the depth camera does not have a hardware synchronous interface can be overcome through the exposure synchronization device, so that the effect of hard synchronization of the depth camera and a plurality of industrial cameras is realized.

In the image acquisition system shown in fig. 3, a host sends a control command to a signal controller through a USB virtual serial port, and the signal controller sends a control signal to an LED lamp through a signal line according to the control command, so as to control parameters such as frequency (period), phase, pulse width, switching state, flicker mode (including continuous flicker and single flicker) of the LED lamp. The LED lamp emits white light when being lighted and is positioned in the visual field of the depth camera, so that the depth camera can acquire RGB images of the LED lamp and transmit the RGB images to the host through the data interface, and the host analyzes the received RGB images to determine the final target parameter set of the LED lamp. Since the target parameter set is determined based on the RGB image of the LED lamp acquired by the depth camera, the parameters of the LED lamp may reflect the exposure information of the depth camera.

In the image acquisition system shown in fig. 3, since each industrial camera has a hardware synchronization interface, the signal controller can directly send a hardware synchronization signal to each industrial camera according to the control of the host, and the hardware synchronization signals of all industrial cameras are connected in parallel. After each industrial camera receives the hardware synchronizing signal, the industrial camera has the same frequency, phase, pulse width and other target parameter sets as the LED lamp, so that each industrial camera is controlled by the hardware synchronizing signal, and RGB images are acquired and transmitted to the host through the data interface. The host computer analyzes the RGB image collected by the depth camera and the RGB image collected by each industrial camera, determines the time relation of synchronous exposure of the depth camera and a plurality of cameras, and moves the LED lamp out of the view of the depth camera.

In the image acquisition system shown in fig. 3, after determining the time relationship of synchronous exposure of the depth camera and the plurality of cameras, multiple hand image pairs of synchronous exposure can be acquired by using the time relationship, a hand three-dimensional model is reconstructed by using hand images acquired by the depth camera, and after projecting the hand three-dimensional model to hand images acquired by the plurality of industrial cameras, a 3D gesture of the hand images acquired by each industrial camera is obtained, so that a 3D gesture data set for training a hand key point estimation model is generated.

The signal controller in fig. 3 may be a single-chip microcomputer. As shown in fig. 5, in a signal waveform diagram between a singlechip, an LED lamp and an industrial camera provided in the embodiment of the present application, as shown in fig. 5, after the singlechip is powered on, a square pulse signal composed of high and low levels is generated, a period from a rising edge of a current high level to a rising edge of a next high level is one period, and the period of the pulse signal is determined by the frequency of a crystal oscillator inside the singlechip. The on-off of the LED lamp is controlled by a control signal sent by the singlechip, the waveform of the singlechip pulse signal changes in a periodic form, and the angle from the rising edge of the high level of the singlechip to the rising edge of the high level of the LED lamp is used as the phase of the LED lamp. The frequency, the phase and the pulse width of the signals of the industrial camera and the LED lamp are the same, and the single chip microcomputer can send signals for synchronous exposure to the LED lamp and the industrial camera at the same time, and can also send signals independently, such as a virtual coil output part shown in fig. 5, when the control signal of the LED lamp is at a low level and the hardware synchronizing signal of the industrial camera is at a high level, the single chip microcomputer only sends the hardware synchronizing signal to the industrial camera, when the control signal of the LED lamp is at a high level and the hardware synchronizing signal of the industrial camera is at a low level, the single chip microcomputer only sends the control signal to the LED lamp, and when the control signal of the LED lamp and the hardware synchronizing signal of the industrial camera are at a high level, the single chip microcomputer simultaneously sends signals to the LED lamp and the industrial camera.

In the embodiment of the present application, the period of the pulse signal is in minimum step units of 1 nanosecond, and the phase and pulse width are in minimum step units of 1 microseconds.

The depth camera comprises an RGB sensor for capturing RGB images and a depth sensor for capturing depth images, the RGB sensor and the depth sensor being synchronously exposed. In general, the RGB sensor is a roller shutter camera that exposes all pixels of each line of the RGB image in the same period, and the exposure periods of pixels of different lines are different, and the exposure lines are sequentially exposed from top to bottom, as shown in fig. 6.

As shown in fig. 6, different frequencies, widths, and phases of the LED lamps correspond to different RGB image contents, which are embodied as follows: the RGB image has a horizontal dark stripe. When the dark stripe passes through the position of the LED lamp in the RGB image, the lamp beads except the dark stripe are bright; when the dark fringes do not pass through the position of the LED lamp in the RGB image, the lamp beads are extinguished, and meanwhile, the dark fringes are not obvious. When the frame rate of the RGB sensor is different from that of the LED lamp, the dark stripes are vertically scrolled, and the larger the frame rate difference is, the larger the dark stripe scrolling speed is; when the frame rate of the RGB sensor is the same as that of the LED lamp, the phase change of the LED lamp and the vertical position change of the dark stripes are in a linear proportional relation; the pulse width of the LED lamp is in linear inverse relation with the width of the dark stripes, and the wider the pulse width of the LED lamp is, the narrower the width of the dark stripes is. The falling edge of the high level of the control signal of the LED lamp is basically the same as the initial exposure time of the RGB sensor, and the actual experiment shows that the falling edge of the high level of the control signal of the LED lamp is only inferior to 0.1 millisecond.

Based on the image acquisition system shown in fig. 3, the embodiment of the application provides a method for improving the labeling quality of 3D gesture data by implementing synchronous exposure of a depth camera and an industrial camera, and the method is executed by a host in fig. 3, and mainly comprises the following steps as shown in fig. 7:

s701: and controlling the LED lamps in the visual field of the depth camera to continuously flash through the signal controller, and acquiring multi-frame RGB images of the LED lamps acquired by the depth camera.

When executing S701, the LED lamp is moved into the field of view of the depth camera, the multi-camera synchronous exposure program is started, the signal controller is powered on, the host computer sends a first control instruction to the signal controller through the USB virtual serial port, and the signal controller sends a control signal to the LED lamp through the signal line according to the first control instruction to control the LED lamp to continuously flash with initial parameters (including initial frequency, initial phase, initial pulse width, etc.). In the continuous flickering process of the LED lamp, the depth camera collects multi-frame (1000 frames for example) RGB images of the LED lamp in the visual field, and the multi-frame RGB images are sent to the host through the data interface and stored locally by the host.

S702: and determining a target parameter set of the LED lamp synchronous with the depth camera according to the multi-frame RGB image acquired by the depth camera, and emptying the multi-frame RGB image acquired by the depth camera.

The target parameter set at least comprises a target frame rate and a target phase of the LED lamp. In S702, the acquisition frame rate of the depth camera is determined through the multi-frame RGB images acquired by the depth camera, the determined acquisition frame rate is used as the target frequency of the LED lamp, and the initial phase of the LED lamp is adjusted according to the multi-frame RGB images to obtain the target phase, so as to obtain the target parameter set for synchronizing the depth camera and the LED lamp, where the target synchronization parameter can reflect the exposure information of the depth camera. After the target parameter set is determined, in order to prevent the multi-frame RGB image from affecting the subsequent program, the multi-frame RGB image is emptied from the local memory of the host.

The process of determining the target parameter set, see fig. 8, mainly comprises the following steps:

s7021: and carrying out dark stripe detection on the multi-frame RGB image, determining the acquisition frame rate of the depth camera, and setting the acquisition frame rate as a target frame rate to the LED lamp, so that the LED lamp continuously flashes according to the set target frame rate.

As can be seen from fig. 6, the different frequencies, widths and phases of the LED lamps correspond to different RGB image contents, and the visual appearance is as follows: the RGB image has a horizontal dark stripe. The acquisition frame rate of the depth camera can be determined from the dark fringes detected in the RGB image, see fig. 9 for a specific process:

S7021_1: and determining the sampling period of the depth camera according to the time stamp of the multi-frame RGB image, and taking the sampling period as the flickering period of the LED lamp.

When the S7021_1 is executed, the time difference between the first frame RGB image and the last frame RGB image is determined according to the time stamp of the first frame RGB image and the last frame RGB image in the multi-frame RGB images continuously flashing by the LED lamp collected by the depth camera at the initial collection frame rate, the collection period of the depth camera is determined by combining the total frame number of the multi-frame RGB images, specifically, the collection period is the quotient of the time difference and the total frame number, and the collection period is set as the flashing period of the LED lamp by the signal controller.

S7021_2: and detecting dark stripes of the LED lamps on the multi-frame RGB image, and determining the rolling direction of the dark stripes according to the line numbers of the dark stripes in the RGB image.

When executing S7021_2, sequentially reading each frame of RGB image in multi-frame RGB image, detecting whether the local area of the LED lamp in each frame of RGB image appears dark stripes for the first time, if not, continuing to read the next frame of RGB image, if so, recording the line number of the dark stripes in the RGB image and the frame number of the RGB image, continuing to read the next frame of RGB image, detecting whether the dark stripes appear in the next frame of RGB image again, if so, recording the line numbers of the dark stripes in the next frame of RGB image and the frame numbers of the next frame of RGB image, continuing to read the next frame of RGB image until no dark stripes appear in the next frame of RGB image, and at the moment, determining the rolling direction of the dark stripes according to the change of the line numbers of the dark stripes in the RGB image.

S7021_3: and determining the acquisition frame number of the depth camera according to the image frame number of the dark stripe appearing at the top of the image for the first time and the image frame number of the dark stripe returning to the first position.

As known from the process of S7021_2, when the host detects the dark fringes of each frame of RGB image read in sequence, the host records the line numbers of the dark fringes in the RGB image, and also records the image frame numbers of the dark fringes in the RGB image, so that in SS7021_3, from when the dark fringes are detected for the first time at the top of the RGB image, each time the next frame of RGB image is read, the line number of the dark fringes in the next frame of RGB image is obtained, and whether the line number is the same as the line number of the dark fringes in the RGB image for the first time (i.e., whether the dark fringes return to the first position is determined), if the line number is the same, the reading is stopped, and at this time, the acquisition frame number of the depth camera is determined according to the recorded image frame number of the dark fringes in the top of the image and the image frame number when the dark fringes return to the first position.

S7021_4: and determining the acquisition frame rate of the depth camera according to the flicker period, the acquisition frame number and the rolling direction.

In the embodiment of the application, when dark stripes roll a circle and return to the original first position, the quantity of RGB images collected by the RGB sensor of the depth camera is different from the flashing frequency of the LED lamp by 1 frame, and in turn, the collection frame rate of the depth camera can be calculated, and the calculation formula is as follows:

Wherein f _rgb Representing the acquisition frame rate, T, of a depth camera _rgb Representing the acquisition period of the depth camera, T _led And (3) indicating the flicker period of the LED lamp, wherein n indicates the acquisition frame number of the depth camera, the minus sign is taken when the rolling direction of the dark stripes is downward, and the plus sign is taken when the rolling direction of the dark stripes is upward.

After determining the acquisition frame rate of the depth camera in S7021_4, the host sends a control instruction to the signal controller through the USB virtual serial port, and the signal controller sets the acquisition frame rate as a target frame rate to the LED lamp according to the received control instruction, so that the LED lamp continuously flashes according to the target frame rate.

S7022: and acquiring RGB images continuously flashing by the LED lamp acquired by the depth camera according to the set target frame rate, and determining the phase corresponding to the RGB image with dark stripes at the top of the image as the target phase of the LED lamp.

In S7022, after the host sets the target frame rate at which the LED lamp continuously blinks, the phase of the LED lamp is controlled by the signal controller, and the target phase of the LED lamp is determined according to the RGB images corresponding to the different phases, and the specific process is shown in fig. 10, and mainly includes the following steps:

s7022_1: and reading the current RGB image corresponding to the initial phase of the LED acquired by the depth camera.

When the S7022_1 is executed, the host sets an initial phase for the LED lamp through the signal controller, the LED lamp flashes at a target frame rate and the initial phase, and in the flashing process, the depth camera collects the current RGB image of the LED lamp in the field of view and sends the collected current RGB image to the host.

S7022_2: detecting whether dark stripes appear at the top of the current RGB image, if not, executing S7022_3, and if so, executing S7022_4.

When the S7022_2 is executed, after receiving a current RGB image corresponding to an initial phase, the host detects whether dark stripes appear at the top of the current RGB image, if not, the initial phase of the LED lamp is not matched with exposure information of the depth camera, the initial phase of the LED lamp needs to be adjusted, and S7022_3 is executed; if so, the initial phase of the LED lamp is matched with the exposure information of the depth camera, and the initial phase of the LED lamp is not required to be adjusted, so that S7022_4 is executed.

Optionally, the embodiment of the present application sets the initial phase of the LED lamp to 0. It should be noted that, in the embodiment of the present application, the initial phase of the LED lamp is not limited, and may be set according to actual requirements.

S7022_3: and increasing the initial phase of the LED lamp once according to the set phase step, and reading the RGB image of the next frame after the initial phase is increased again, and returning to S7022_2.

When the step S7022_3 is executed, when dark stripes are not detected at the top of the current RGB image, the initial phase of the LED lamp is increased by a set phase step, so that the LED lamp flashes at a target frame rate and with the increased phase, in the flashing process, the depth camera acquires the next frame of RGB image of the LED lamp in the field again, and sends the next frame of RGB image to the host, and the host returns to S7022_2 to detect whether dark stripes appear at the top of the next frame of RGB image.

S7022_4: and stopping adjusting the phase of the LED lamp, and determining the phase corresponding to the RGB image with dark stripes at the top of the image as the target phase of the LED lamp.

When the execution of the S7022_4 is performed, when the occurrence of the dark stripe at the top of the RGB image is detected, the flicker condition of the LED lamp is indicated to be matched with the exposure information of the depth camera, the adjustment of the phase of the LED lamp should be stopped, and the phase corresponding to the RGB image with the dark stripe at the top of the image is determined as the target phase of the LED lamp.

S7023: the target frame rate and the target phase are combined into a target parameter set that synchronizes the LED lamp with the depth camera.

In the embodiment of the application, after the target frame rate and the target phase of the LED lamp are determined, the target frame rate and the target phase of the LED lamp are used as target parameter sets for synchronizing the LED lamp and the depth camera, the control signals are stopped to be sent to the LED lamp, RGB images collected by the depth camera in the local memory are emptied, and the interference of the RGB images continuously flashing by the LED lamp on subsequent steps is prevented.

S703: the control signal controller sends the target parameter set to a plurality of industrial cameras and sets the LED lights to flash a single time.

Because the industrial cameras have the hardware synchronization interfaces, in S703, the host sends the determined target parameter set to the signal controller through the USB virtual serial port, and the signal controller receives the target parameter set, carries the target parameter set in the hardware synchronization signal, and sends the target parameter set to each industrial camera through the hardware synchronization interface. Meanwhile, the host computer sends a second control instruction to the signal controller through the USB virtual serial port, and the signal controller sends a control signal to the LED lamp according to the second control instruction, so that the flickering mode of the LED lamp is single flickering.

In some embodiments, the set of target parameters further includes a pulse width of the LED lamp.

S704: and determining the time relation of synchronous exposure between the depth camera and each industrial camera according to the time stamp of the RGB image of the LED lamp which is acquired by the depth camera and the time stamp of the RGB image acquired by each industrial camera for image acquisition.

In S704, the depth camera collects multiple frames of RGB images of the single flickering of the LED lamp, as shown in fig. 11, which is two adjacent frames of RGB images of the single flickering of the LED lamp collected by the depth camera. After the image is acquired, the depth camera sends multi-frame RGB images which flash at a single time by the LEDs to the host for storage, and the host takes one frame of RGB image which is lighted by the LEDs in the stored multi-frame RGB images as a first initial frame image which is synchronously exposed, and acquires a time stamp of the first initial frame image. Meanwhile, after each industrial camera receives the hardware synchronous signal, each industrial camera respectively collects a frame of RGB image according to the target parameter set, the respectively collected RGB image is used as a second initial frame image synchronously exposed with the depth camera, the time stamp of the second initial frame image is respectively obtained, and the time relation of synchronous exposure between the depth camera and the corresponding industrial camera is determined by comparing the time stamp of the first initial frame image and the time stamp of the second initial frame image.

In the process of synchronously exposing the depth camera and the plurality of industrial cameras, the embodiment of the application considers that the depth camera has no hardware synchronous interface and cannot directly receive the hardware synchronous signal synchronously exposing the plurality of industrial cameras, so the synchronous exposure of the depth camera and the plurality of cameras is realized by means of the LED lamp. Specifically, the acquisition frame rate of the depth camera is determined by analyzing multi-frame RGB images continuously flashing the LED lamp at an initial frame rate and is set to the LED lamp, so that the LED lamp continuously flashes according to the target frame rate, in the flashing process, the phase of the LED lamp is adjusted through corresponding RGB images when the LED lamp collected by the depth camera flashes at different phases, the adjusted target phase and target frame rate are used as target parameter sets, so that the frequency, the width and the phase of the industrial camera and the LED lamp are identical, and the target parameter sets can reflect the exposure information of the depth camera due to the fact that the target phase and the target frame rate are determined according to the RGB images of the LED lamp. Further, by analyzing the RGB image collected by the depth camera and the RGB image collected by each industrial camera after receiving the hardware synchronizing signal, the time relation of synchronous exposure between the depth camera and the corresponding industrial camera is determined. According to the embodiment of the application, under the condition that the depth camera is not required to be modified, the effect equivalent to hard synchronization is achieved.

In the embodiment of the application, after the depth camera and the plurality of industrial cameras synchronously expose and determine the time relation of synchronous exposure, when 3D gesture data are synchronously acquired by utilizing the depth camera and the plurality of industrial cameras, the labeling quality of the acquired data can be improved, so that the real and natural 3D hand gestures are restored, and the immersive experience of a user is improved.

Referring to fig. 12, a flowchart of a method for synchronously capturing images by a depth camera and a plurality of industrial cameras based on synchronous exposure according to an embodiment of the present application is implemented by a host, and mainly includes the following steps:

s1201: the control signal controller stops sending control signals to the LED lamps and continuously sends hardware synchronization signals to the plurality of industrial cameras, so that the corresponding industrial cameras and the depth cameras are synchronously exposed according to the target parameter set.

In the embodiment of the application, when image acquisition is performed, since the depth camera and the plurality of industrial cameras can synchronously expose without flashing the LED lamp, in S1201, the host sends a third control instruction to the signal controller through the UEB virtual serial port, the signal controller stops sending the control signal to the LED lamp according to the third control instruction, and simultaneously, the signal controller continuously sends a hardware synchronization signal to the plurality of industrial cameras according to the third control instruction, so that the corresponding industrial camera synchronously exposes with the depth camera according to the target parameter set.

S1202: and acquiring a hand image pair of the synchronous exposure of the depth camera and each industrial camera according to the time relation of the synchronous exposure of the depth camera and each industrial camera.

When the images are collected, the depth camera and the industrial cameras are respectively located at different visual angles of a human hand, so that hand images at different visual angles can be synchronously collected, and the hand images collected respectively are sent to the host. In S1202, after receiving the hand images sent by the depth camera and the plurality of industrial cameras, the host computer composes a set of hand image pairs from the hand images collected by the depth camera and the hand images collected by each industrial camera according to the time relationship of synchronous exposure between the depth camera and each industrial camera.

The hand image collected by the depth camera comprises a hand RGB image and a hand depth image, and the hand image collected by the industrial camera is the hand RGB image.

S1203: determining whether the time limit of synchronous exposure of the depth camera and each industrial camera exceeds a set time limit threshold, if not, executing S1204, and if so, executing S1205.

In the embodiment of the application, since the RGB images of the LED lamps collected during the synchronous exposure of the depth camera and the multiple industrial cameras are discrete, the positions (line numbers) of the dark fringes detected by the multiple frames of RGB images may be different, so the measurement error exists in the collection frame number n in the formula 1, and the calculated collection frame rate (i.e., the frame rate of the RGB images) of the depth camera and the actual frame rate have a small difference, so that the accumulated error after a period of time may cause the loss of the synchronous exposure effect of the depth camera and the multiple industrial cameras, and therefore, it is required to determine whether the time limit of the synchronous exposure of the depth camera and each industrial camera exceeds the set time limit threshold during image collection.

In S1203, the synchronous exposure time period of the depth camera and each industrial camera is calculated in units of hours, so that the image capturing efficiency is not affected.

S1204: and continuously acquiring and storing the hand image pairs.

In S1204, when it is determined that the time limit for the synchronous exposure of the depth camera and each industrial camera does not exceed the set time limit threshold, it is indicated that the hand image collected by the depth camera in the hand image pair is almost synchronously exposed with the hand image collected by each industrial camera, and therefore, it is determined that the obtained hand image pair is valid, storage can be performed, and the hand image contrast within the synchronous exposure time limit is continued to be obtained.

S1205: the time relationship of the simultaneous exposure between the depth camera and each of the industrial cameras is redetermined to reduce the cumulative time error of the simultaneous exposure of the depth camera and each of the industrial cameras.

In S1205, when it is determined that the time limit of the synchronous exposure of the depth camera and each industrial camera exceeds the set time limit threshold, it indicates that the error of the synchronous exposure of the hand image collected by the depth camera and the hand image collected by each industrial camera in the hand image pair is large, which affects the labeling quality of the 3D gesture, so that the process shown in fig. 7 needs to be re-executed to enable the synchronous exposure of the depth camera and the plurality of industrial cameras, and the time relation of the synchronous exposure between the depth camera and each industrial camera needs to be re-determined, thereby reducing the error of the synchronous exposure of the depth camera and each industrial camera caused by time accumulation, and further improving the labeling quality of the 3D gesture.

S1206: and continuing to acquire and store the hand image pairs synchronously exposed by the depth camera and each industrial camera according to the redetermined time relation of synchronous exposure.

In S1206, after re-synchronizing the exposure of the depth camera with each industrial camera, hand images are acquired from different perspectives and sent to the host. After receiving hand images sent by the depth camera and the plurality of industrial cameras, the host continues to acquire hand image pairs synchronously exposed by the depth camera and each industrial camera according to the redetermined time relation of synchronous exposure.

S1207: and aiming at each group of hand image pairs, generating a hand three-dimensional model according to hand images acquired by a depth camera in the hand image pairs, and projecting the hand three-dimensional model into hand images synchronously exposed by corresponding industrial cameras in the hand image pairs to obtain corresponding 3D gestures.

And stopping acquiring the hand image pairs after the number of the acquired hand image pairs reaches the training requirement. And aiming at each stored hand image pair, adopting a three-dimensional reconstruction algorithm, generating a hand three-dimensional model according to the hand depth image acquired by the hand image pair depth camera, projecting the hand three-dimensional model into the hand RGB image synchronously exposed by the corresponding industrial camera in the hand image pair, and obtaining a corresponding 3D gesture, thereby completing the automatic labeling of the 3D gesture.

S1208: and generating a 3D gesture data set for training a hand key point estimation model according to the hand images acquired by each industrial camera in the plurality of groups of hand images and the corresponding 3D gestures.

In the process of acquiring images by using the synchronously exposed depth camera and the plurality of industrial cameras, hand images acquired in the synchronous exposure time limit are relatively centered, the hand images of the depth camera and the plurality of industrial cameras are almost synchronously exposed, and errors for marking 3D gestures are very small and can be ignored, so that when the 3D gestures are marked for the hand images acquired by using the synchronously exposed hand images generated by using the depth camera, the marked 3D gestures can accurately reflect the actual positions of the hands, and when the hand key point estimation model is trained by using the high-quality 3D gesture data set generated by using the hand images and the marked 3D gestures, the trained hand key point estimation model can accurately restore the 3D gestures in a virtual scene, and the immersive experience of a user is improved.

The following describes a flowchart of a method for synchronously acquiring 3D images by using multiple cameras according to the embodiment of the present application from the perspective of device interaction, referring to fig. 13, the flowchart mainly includes the following steps:

S1301: the host computer sends a first control instruction to the singlechip to control the LED lamp in the visual field of the depth camera to flash.

S1302: the singlechip controls the LED lamp to continuously flash at an initial frame rate according to the first control instruction.

S1303: the depth camera collects multi-frame RGB images of the LED lamps flashing at an initial frame rate and sends the multi-frame RGB images to the host.

S1304: and the host determines the acquisition frame rate of the depth camera according to the received multi-frame RGB image.

S1305: the host computer sends a second control instruction to the singlechip, wherein the second control instruction carries the acquisition frame rate of the depth camera.

S1306: and the singlechip sets the acquisition frame rate as a target frame rate to the LED lamp according to the second control instruction, so that the LED lamp continuously flashes at the target frame rate and different phases.

S1307: the depth camera collects multi-frame RGB images of the LED lamp flashing at different phases with a target frame rate, and sends the multi-frame RGB images to the host.

S1308: and the host takes the phase corresponding to the RGB image with dark stripes at the top in the multi-frame RGB image as the target phase of the LED lamp.

S1309: the host computer sends a third control instruction to the singlechip, wherein the third control instruction carries a target phase.

S1310: the singlechip sets up the target phase to the LED lamp.

S1311: and the host computer sends a fourth control instruction to the singlechip and clears the multi-frame RGB image.

S1312: and stopping sending a control signal to the LED lamp by the singlechip according to the fourth control instruction.

S1313: the host computer sends a fifth control instruction to the singlechip.

S1314-1315: and the singlechip starts to send a control signal carrying the target frame rate and the target phase to the LED lamp according to the fifth control instruction, sets the LED lamp to flash for a single time, and simultaneously sends hardware synchronous signals carrying the target frame rate and the target phase to a plurality of industrial cameras.

S1316: the depth camera collects multi-frame RGB images of the LED lamp which flash at a target frame rate and a target phase once, and sends the multi-frame RGB images to the host.

S1317: and the plurality of industrial cameras acquire RGB images according to the hardware synchronous signals and send the RGB images to the host.

S1318: the host selects a frame of RGB image with the LED lamp turned on from multiple frames of RGB images sent by the depth camera.

S1319: the host determines a time relationship of synchronous exposure of the depth camera and the plurality of industrial cameras according to the selected RGB image and the time stamp of the RGB image transmitted by each industrial camera.

S1320: the host computer sends a sixth control instruction to the singlechip.

S1321: and stopping sending control signals to the LED lamps by the singlechip according to the sixth control instruction, and continuously sending hardware synchronization signals to the plurality of industrial cameras.

S1322: the depth camera collects multi-frame hand images and sends the multi-frame hand images to the host.

S1323: and the industrial cameras continuously acquire multi-frame hand images according to the received hardware synchronous signals and send the multi-frame hand images to the host.

S1324: and the host acquires and saves the hand image pairs synchronously exposed by the depth camera and the industrial cameras according to the time relation of synchronous exposure of the depth camera and the industrial cameras.

S1325: the host determines whether the time limit of the simultaneous exposure of the depth camera and the plurality of industrial cameras exceeds a set time limit threshold, returns to 1301 if yes, and executes S1326 if no.

S1326: for each stored group of hand image pairs, the host generates a hand three-dimensional model according to hand images acquired by the depth camera in the hand image pairs, and projects the hand three-dimensional model into the hand images synchronously exposed by the corresponding industrial cameras in the hand image pairs to obtain corresponding 3D gestures.

S1327: and the host generates a 3D gesture data set for training a hand key point estimation model according to the hand images acquired by each industrial camera in the plurality of groups of hand images and the corresponding 3D gestures, and completes image acquisition.

Based on the same technical conception, the embodiment of the application provides 3D image acquisition equipment, and the 3D image acquisition equipment can be a host computer such as a notebook computer, a desktop computer, a micro server and the like, and the 3D image acquisition equipment can realize the method steps of synchronously acquiring images by multiple cameras in the embodiment and can achieve the same technical effect.

Referring to fig. 14, the 3D image capturing apparatus includes a processor 1401, a memory 1402, a USB virtual serial port 1403, and a data interface 1404; the data interface 1404, the USB virtual serial port 1403, the memory 1402 and the processor 1401 are connected by a bus 1405;

the memory 1402 includes a data storage unit and a program storage unit, the program storage unit storing a computer program, and the processor 1401 performs the following operations according to the computer program:

sending a first control instruction to a signal controller through the USB virtual serial port 1403 so that the signal controller controls LED lamps in the visual field of the depth camera to continuously flash;

acquiring multi-frame RGB images of the LED lamp in a view field by the depth camera through the data interface 1404, and storing the multi-frame RGB images in the data storage unit;

the target parameter set is sent to the signal controller through the USB virtual serial port 1403, so that the signal controller sends the target parameter set to a plurality of industrial cameras and sends a second control instruction to the signal controller through the USB virtual serial port 1403, and the signal controller controls the LED lamp to flash for a single time;

Through the data interface 1404, multiple frames of RGB images collected by the depth camera and collected by each industrial camera, which are subjected to single flicker by the LED lamp, are respectively obtained;

Optionally, after determining the time relationship of the simultaneous exposure between the depth camera and each industrial camera, the processor 1401 further performs the following operations:

transmitting a third control instruction to the signal controller through the USB virtual serial port 1403, so that the signal controller stops transmitting control signals to the LED lamps, and simultaneously continuously transmitting hardware synchronization signals to the plurality of industrial cameras;

receiving hand images acquired by the depth camera and each industrial camera through the data interface 1404, and obtaining a hand image pair synchronously exposed by the depth camera and each industrial camera according to the time relation of synchronous exposure between the depth camera and each industrial camera, and storing the hand image pair in the data storage unit;

Determining whether a time limit for simultaneous exposure of the depth camera with each industrial camera exceeds a set time limit threshold;

if the time relation of synchronous exposure between the depth camera and each industrial camera is exceeded, the time relation of synchronous exposure between the depth camera and each industrial camera is redetermined, so that accumulated time errors of synchronous exposure between the depth camera and each industrial camera are reduced, and hand image pairs of synchronous exposure between the depth camera and each industrial camera are continuously acquired through the data interface 1404 and stored in the data storage unit according to the redetermined time relation of synchronous exposure;

for each frame of hand image pair, generating a hand three-dimensional model according to the hand images acquired by the depth camera in the hand image pair, and projecting the hand three-dimensional model into the hand images synchronously exposed by the corresponding industrial camera in the hand image pair to obtain corresponding 3D gestures;

and generating a 3D gesture data set for training a hand key point estimation model according to the hand image acquired by each industrial camera in each group of hand images and the corresponding 3D gesture.

Optionally, the target parameter set includes at least a target frame rate and a target phase, and the processor 1401 determines, according to the multi-frame RGB image acquired by the depth camera, the target parameter set that the LED lamp and the depth camera are synchronous, which specifically includes:

Dark stripe detection is carried out on multi-frame RGB images, the acquisition frame rate of the depth camera is determined, and the acquisition frame rate is used as a target frame rate to be set to the LED lamp, so that the LED lamp continuously flashes at the target frame rate;

acquiring RGB images which are acquired by the depth camera and continuously flash at different phases at the target frame rate, and determining the phase corresponding to the RGB image with the dark stripes at the top of the image as the target phase of the LED lamp;

and taking the target frame rate and the target phase as a target parameter set for synchronizing the LED lamp and the depth camera.

Optionally, the processor 1401 performs dark stripe detection on multiple frames of RGB images, and determines an acquisition frame rate of the depth camera, which specifically includes:

determining a sampling period of the depth camera according to the time stamp of the multi-frame RGB image, and taking the sampling period as a flickering period of the LED lamp;

detecting dark stripes of the LED lamps on a multi-frame RGB image, and determining the rolling direction of the dark stripes according to the line numbers of the dark stripes in the RGB image;

determining the acquisition frame number of the depth camera according to the image frame number of the dark stripe appearing at the top of the image for the first time and the image frame number of the dark stripe returning to the first position;

And determining the acquisition frame rate of the depth camera according to the flicker period, the acquisition frame number and the rolling direction.

Optionally, a calculation formula of the acquisition frame rate of the depth camera is:

wherein f _rgb Representing the acquisition frame rate, T, of the depth camera _led And (3) indicating the flicker period of the LED lamp, wherein n indicates the acquisition frame number of the depth camera, the minus sign is taken when the rolling direction of the dark stripes is downward, and the plus sign is taken when the rolling direction of the dark stripes is upward.

Optionally, the processor 1401 acquires RGB images of the LED lamp collected by the depth camera, which continuously flash at the target frame rate and different phases, and specifically includes:

reading a current RGB image corresponding to the initial phase of the LED acquired by the depth camera, and detecting whether the dark stripes appear at the top of the current RGB image;

if not, increasing the initial phase of the LED lamp once according to the set phase step length, and reading the next frame of RGB image after the initial phase is increased again until the dark stripes appear at the top of the RGB image, and stopping adjusting the phase of the LED lamp.

Optionally, the target parameter set further comprises a pulse width.

It should be noted that fig. 14 is only an example, and provides hardware necessary for the 3D image capturing device to execute the steps of the method for synchronously capturing images with multiple cameras provided in the embodiments of the present application, which is not shown, and the 3D image capturing device further includes a display screen, a power supply, and other common devices of an electronic device having a data processing function.

The processor referred to in fig. 14 of the present embodiment may be a central processing unit (Central Processing Unit, CPU), a general purpose processor, a graphics processor (Graphics Processing Unit, GPU) digital signal processor (Digital Signal Processor, DSP), an Application-specific integrated circuit (Application-specific Integrated Circuit, ASIC), a field programmable gate array (Field Programmable Gate Array, FPGA) or other programmable logic device, transistor logic device, hardware components, or any combination thereof.

Embodiments of the present application also provide a computer readable storage medium storing instructions that, when executed, perform the method of the foregoing embodiments.

The present application also provides a computer program product for storing a computer program for performing the method of the foregoing embodiments.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various modifications and variations can be made in the present application without departing from the spirit or scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims and the equivalents thereof, the present application is intended to cover such modifications and variations.

Claims

1. A method for synchronously acquiring 3D images with multiple cameras, comprising:

2. The method of claim 1, wherein after determining the time relationship of simultaneous exposure between the depth camera and each industrial camera, the method further comprises:

controlling the signal controller to stop sending control signals to the LED lamps and continuously sending hardware synchronous signals to the plurality of industrial cameras so that the corresponding industrial cameras and the depth cameras are synchronously exposed according to the target parameter set;

According to the time relation of synchronous exposure between the depth camera and each industrial camera, acquiring and storing a hand image pair of synchronous exposure of the depth camera and each industrial camera;

if the time relation of the synchronous exposure between the depth camera and each industrial camera is exceeded, the time relation of the synchronous exposure between the depth camera and each industrial camera is redetermined, so that the accumulated time error of the synchronous exposure between the depth camera and each industrial camera is reduced, and the hand image pair of the synchronous exposure between the depth camera and each industrial camera is continuously acquired and stored according to the redetermined time relation of the synchronous exposure;

for each stored group of hand image pairs, generating a hand three-dimensional model according to hand images acquired by the depth camera in the hand image pairs, and projecting the hand three-dimensional model into hand images synchronously exposed by corresponding industrial cameras in the hand image pairs to obtain corresponding 3D gestures;

and generating a 3D gesture data set for training a hand key point estimation model according to the hand images acquired by each industrial camera in the plurality of groups of hand images and the corresponding 3D gestures.

3. The method of claim 1, wherein the set of target parameters includes at least a target frame rate and a target phase, and determining the set of target parameters for synchronizing the LED lamp with the depth camera from a plurality of frames of RGB images acquired by the depth camera comprises:

4. The method of claim 3, wherein the dark stripe detection of the multi-frame RGB image to determine the acquisition frame rate of the depth camera comprises:

5. The method of claim 4, wherein the calculation formula of the acquisition frame rate of the depth camera is:

6. The method of claim 3, wherein the acquiring RGB images of the LED lamp captured by the depth camera that flash continuously at the target frame rate, different phases, comprises:

7. The method of any of claims 1-6, wherein the set of target parameters further comprises a pulse width.

8. The 3D image acquisition equipment is characterized by comprising a processor, a memory, a USB virtual serial port and a data interface; the data interface and the USB virtual serial port are connected with the memory through a bus;

a first control instruction is sent to a signal controller through the USB virtual serial port, so that the signal controller controls LED lamps in the visual field of the depth camera to continuously flash;

9. The 3D image acquisition device of claim 8, wherein after the processor determines the temporal relationship of simultaneous exposure between the depth camera and each industrial camera, further performs:

Transmitting a third control instruction to the signal controller through the USB virtual serial port so that the signal controller stops transmitting control signals to the LED lamps and simultaneously continuously transmits hardware synchronization signals to the plurality of industrial cameras;

receiving hand images acquired by the depth camera and each industrial camera through the data interface, and acquiring a hand image pair synchronously exposed by the depth camera and each industrial camera according to the time relation of synchronous exposure between the depth camera and each industrial camera, and storing the hand image pair in the data storage unit;

if the time relation of synchronous exposure between the depth camera and each industrial camera is exceeded, the time relation of synchronous exposure between the depth camera and each industrial camera is redetermined, so that accumulated time errors of synchronous exposure between the depth camera and each industrial camera are reduced, and hand image pairs of synchronous exposure between the depth camera and each industrial camera are continuously acquired through the data interface according to the redetermined time relation of synchronous exposure and are stored in the data storage unit;

10. The 3D image capturing device according to claim 8, wherein the target parameter set includes at least a target frame rate and a target phase, and the processor determines the target parameter set for synchronizing the LED lamp with the depth camera according to the multi-frame RGB image captured by the depth camera, specifically comprising: