WO2018107679A1

WO2018107679A1 - Method and device for acquiring dynamic three-dimensional image

Info

Publication number: WO2018107679A1
Application number: PCT/CN2017/088162
Authority: WO
Inventors: 邵明明; 钟小飞; 王金波; 王林
Original assignee: 华为技术有限公司
Priority date: 2016-12-12
Filing date: 2017-06-13
Publication date: 2018-06-21
Also published as: CN110169056A; CN110169056B; CN112132881A

Abstract

Provided is a method for acquiring a dynamic three-dimensional image. The method for acquiring a dynamic three-dimensional image comprises: acquiring a movement posture of a terminal device; respectively collecting a depth image and a colour image by means of a depth camera and a colour camera; according to the movement posture of the terminal device and the depth image, performing quick segmentation and matching; according to the colour image, performing exact matching on a result of the quick segmentation and matching; and if there is an overlap between the acquired current image and a photographed image, performing fusion on the overlapping area through a fusion algorithm so as to generate a dynamic three-dimensional image. The present application can realise, with respect to the currently occurring defects existing in panoramic photography and surround photography, by means of adding a depth camera to a device and combining data of a mobile phone posture sensor and a colour image sensor, a method capable of recording a scenery appearance from various directions at once so as to acquire a dynamic three-dimensional image and support storage display.

Description

Method and device for acquiring dynamic three-dimensional image

The present application claims priority to Chinese Patent Application No. 201611142062.1, entitled "A Method and Apparatus for Dynamic Three-Dimensional Image Acquisition", filed on Dec. 12, 2016, the entire contents of In this application.

Technical field

The present application relates to the field of image recognition, and more particularly to a method and apparatus for dynamic three-dimensional image acquisition.

Background technique

Typically, images are captured by input devices such as cameras to describe the real world. With the development of technology, camera devices have been able to provide increasingly finer image quality and larger and larger image resolution. On this basis, a large number of image algorithms are generated to assist the camera to produce more diverse images, such as panoramic photos, panoramic selfies, skin photos, audio photos, face recognition and smile recognition. It makes the photo more interesting and enriches the form of the real world.

The existing two-dimensional camera device can acquire a scene inside and outside a fixed-size area at a certain moment, and generate a two-dimensional, static description of the real world, and the acquired data is a two-dimensional matrix in units of pixels, which is compressed. After the algorithm is processed and saved, the compressed image is extracted and decompressed and pushed to the display device cache. Since the real world is three-dimensional and dynamic, designing a camera system capable of acquiring dynamic three-dimensional images and a series of storage display methods will open a new camera revolution.

Current methods for three-dimensional, dynamic image acquisition include: panoramic image acquisition and surround image acquisition.

The panoramic image acquisition mode is horizontally moved or horizontally rotated by the user's handheld terminal device, and the newly acquired image is stitched into the existing image by the internal stitching algorithm in real time, and can be viewed by sliding, zooming, etc. after completion. The method is simple in operation, and can obtain a static image in a wider horizontal direction, which broadens the imaging range of the conventional two-dimensional image.

The surrounding image acquisition mode is moved or rotated by the user handheld terminal device in one of four directions of up, down, left, and right, and the internal terminal device records the current terminal device posture and the acquired scene image and performs inter-frame feature region matching. With compression, you can view it by sliding or rotating the device. The method is simple in operation, and can obtain a dynamic image of a wider area in a certain direction, and realizes acquisition of a local dynamic three-dimensional image in one direction.

The panoramic image acquisition method limits the user to move or rotate a fixed distance in one direction. The shooting process is prone to jitter and affect the shooting effect. The final stitched image is curved and deformed, and it is difficult to restore the real scene.

The surround image acquisition method can only shoot around in a single direction after shooting starts. When shooting close-up shots, it is not possible to handle changes in the distance between the device and the subject. When a captured image is displayed, it cannot be freely enlarged and reduced. Storage and display do not form an industry standard, and the acquired images can only be viewed within the shooting software.

Today, as interactive terminal devices become more and more intelligent, people are increasingly demanding the fun, accuracy, and rapidity of the operation of the terminal devices themselves. Therefore, in view of the defects of the current terminal device panoramic imaging or surround imaging, a method for acquiring dynamic and three-dimensional images from various orientations of the scene and supporting storage and display is needed. Enrich the representation of the picture, changing the public's perception of the image and the experience of shooting.

Summary of the invention

The present application provides a method and a terminal device for dynamic three-dimensional image acquisition, which can improve user experience.

In a first aspect, a method for dynamic three-dimensional image acquisition is provided, which can be applied to a terminal device, and the method includes:

Obtaining a motion posture of the terminal device;

Collecting depth images and color images by a depth camera and a color camera, respectively;

Performing fast segmentation matching with the depth image according to the motion posture of the terminal device;

Performing exact matching on the result of the fast segmentation matching according to the color image;

If the acquired current image overlaps with the captured image, the overlapping region is fused by the fusion algorithm to generate a dynamic three-dimensional image.

According to the three-dimensional gesture unlocking method of the embodiment of the present application, the three-dimensional gesture image presented by the user in the three-dimensional space in front of the camera is acquired in real time, the gesture of the user in the gesture image is extracted, and the unlocking gesture is matched with the previously set unlocking gesture of the user to achieve the unlocking. The purpose of the terminal device. Thus, the user is provided with a new, interesting, accurate and fast unlocking method.

In a possible implementation, the motion posture of the terminal device is acquired by the accelerometer, the gyroscope and the electronic compass of the terminal device.

In a possible implementation manner, performing fast segmentation matching with the depth image according to a motion posture of the terminal device, including:

When determining that the terminal device moves smoothly according to the motion posture of the terminal device, acquiring a depth map corresponding to the start time point and the end time point in the first time period;

The feature region of the depth map of the end time point is calculated based on the completed segmented feature region of the depth map of the start time point and the pose change of the terminal device.

In a possible implementation, the result of the fast segmentation matching is accurately matched according to the color image, including:

Performing compensation optimization on the feature region range according to the color image to obtain a fine-grained picture feature description.

In a possible implementation, the overlapping region is fused by the fusion algorithm to generate a dynamic three-dimensional image, including:

Calculating the current posture of the terminal device in real time, and extracting comparable iconic data from the historical feature matrix for matching;

If the matching result indicates that the current image frame overlaps with the historical image frame, the fusion processing is performed to update the overlapping region.

In a second aspect, a terminal device is provided for performing the method of the first aspect or any possible implementation of the first aspect. In particular, the terminal device may comprise means for performing the method of the first aspect or any of the possible implementations of the first aspect.

A third aspect provides a terminal device including a memory, a processor, and a display, the memory being used to store a computer program, the processor is configured to call and run a computer program from the memory, and when the program is executed, the processor executes the above The method of any of the first aspect or any of the possible implementations of the first aspect.

In a fourth aspect, a computer readable medium is provided for storing a computer program comprising instructions for performing the method of the first aspect or any of the possible implementations of the first aspect.

DRAWINGS

1 is a schematic diagram of a minimum hardware system for implementing a terminal device of an embodiment of the present application.

FIG. 2 is a schematic flowchart of a dynamic three-dimensional image acquisition method according to an embodiment of the present application.

3 is a block diagram showing the design of motion pose recognition and trajectory acquisition in accordance with one embodiment of the present application.

4 is a schematic diagram of data fusion of a gyroscope and an accelerometer according to an embodiment of the present application.

FIG. 5 is a schematic diagram of a method for data fusion of a gyroscope and an electronic compass according to an embodiment of the present application.

FIG. 6 is a schematic flowchart of performing fast segmentation matching according to a motion posture and a depth image of a terminal device according to an embodiment of the present application.

FIG. 7 is an accurate match of a result of fast segmentation matching according to the color image according to an embodiment of the present application.

Schematic flow chart.

FIG. 8 is a schematic flowchart of a method for image overlap region fusion according to an embodiment of the present application.

FIG. 9 is a schematic flowchart of a user capturing a dynamic three-dimensional image according to an embodiment of the present application.

FIG. 10 is a schematic flowchart of a user viewing a dynamic three-dimensional image according to an embodiment of the present application.

FIG. 11 is a schematic block diagram of an example of a terminal device according to an embodiment of the present application.

detailed description

Embodiments of the present application will be described below with reference to the accompanying drawings.

The terminal device in this embodiment may be an access terminal, a user equipment (UE), a subscriber unit, a subscriber station, a mobile station, a mobile station, a remote station, a remote terminal, a mobile device, a user terminal, and a wireless communication device. , user agent or user device. The terminal device may be a cellular phone, a cordless phone, a session initiation protocol (SIP) phone, a wireless local loop (WLL) station, a personal digital assistant (PDA), and a wireless communication function. Handheld device, computing device or other processing device connected to a wireless modem, in-vehicle device, wearable device, and the like.

1 is a schematic diagram of a minimum hardware system 100 of a terminal device implementing the three-dimensional gesture unlocking method of the present application. The system 100 shown in FIG. 1 includes a light source transmitter 110, a depth camera 120, a spectrum analysis module 130, a color camera 140, a processor 150, a display unit 160, a nonvolatile memory 170, a memory 180, and a sensing unit 190.

The color camera 140, the light source emitter 110 and the depth camera 120 constitute a spectral input module, and the spectral analysis module 130 constitutes an image generation module. The light source emitter 110, color camera 140, and depth camera 120 can be mounted side by side over the device (eg, directly above the device). The light source emitter 110 can be an infrared emitter, the depth camera 120 can be an infrared camera, and the spectrum analysis module 130 can be an infrared spectrum analysis module. In this case, the light source emitter 110 cooperates with the depth camera 120 to project the scene through the infrared light encoded image. The light source emitter 110 outputs a common laser light source, which is filtered by a frosted glass and an infrared filter to form near-infrared light. Wherein, the light source emitter 110 can continuously output infrared light having a wavelength of 840 nanometers (nm).

The depth camera 120 is a Complementary Metal Oxide Semiconductor (CMOS) image sensor for receiving an excitation light source reflected from the outside, such as infrared light, and digitally encoding the excitation light source to form a digital image for transmission to the spectrum analysis module. 130. The spectral analysis module 130 analyzes the speckles, calculates the distance between the corresponding pixel points of the image and the depth camera 120, and forms a depth data matrix for the driver to read.

The sensing unit 190 is connected to the processor 150, detects location information of the terminal device or a change in the surrounding environment, and transmits the sensed information to the processor 150. Specifically, the sensing unit 190 includes at least one of: a gyro sensor for detecting rotation, rotational movement, angular displacement, tilt, or any other non-linear motion, for A triaxial acceleration sensor that senses acceleration in one or more directions, an electronic compass that senses the earth's magnetic field to determine the north-south direction. The sensing unit 190 operates under the control of the processor 150.

The terminal device may receive motion sensor data generated by a motion sensor (eg, a gyro sensor or an acceleration sensor) in the sensing unit 190, and process the generated motion sensor data using the motion sensing application. For example, a processor running a motion sensing application can analyze motion sensor data to identify specific types of motion events.

Display unit 160 is configured to display graphics, images or data to a user. The display unit 160 is configured to provide various screens associated with the operation of the terminal device. The display unit 160 provides a home screen, a message composing screen, a phone screen, a game screen, a music playing screen, and a video playing screen. The display unit 160 can be implemented using a flat display panel such as a liquid crystal display (LCD), an organic light emitting diode (OLED), and an active matrix OLED (AMOLED).

In the case where the display unit 160 is implemented in the form of a touch screen, the display unit 160 can operate as an input device. In the case where the display unit 160 is implemented in the form of a touch screen, the display unit 160 includes a touch panel for detecting a touch gesture. The touch panel is configured to convert a pressure applied to a specific position of the display unit 160 or a capacitance change at a specific area of the display unit 160 into an electric input signal. The touch panel can be implemented in one of add-on or on-cell (or in-cell).

The touch panel can be implemented in one of the following panels: a resistive touch panel, a capacitive touch panel, an electromagnetic induction touch panel, and a pressure touch panel. The touch panel is configured to detect the pressure of the touch as well as the location and area being touched. If a touch gesture is made on the touch panel, a corresponding input signal is generated to the processor 150. The processor 150 then checks the user's touch input information to perform the corresponding function.

The processor 150 can be responsible for executing various software programs (e.g., applications and operating systems) to provide computing and processing operations for the terminal devices. The non-volatile memory 170 is used to store program files, system files, and data. Memory 180 is used for system and program running caches.

Hereinafter, a method of dynamic three-dimensional image acquisition of a terminal device according to an embodiment of the present application will be described in detail.

2 is a schematic flow chart of a method for dynamic three-dimensional image acquisition according to an embodiment of the present application. The method shown in FIG. 2 can be performed by the terminal device shown in FIG. 1.

S210. Acquire a motion posture of the terminal device.

The following describes the acquisition method of the motion posture of the device.

A "pose" or "motion pose" as referred to herein is a set of motions of a device, which may be a set of motions included, such as a swing or a circular motion, or may be a simple movement of the device, eg, the device is specific The tilt of the axis or angle.

Figure 3 shows a block diagram of a design for motion pose recognition and acquisition of trajectories. The sampling unit 310 can receive motion data from the gyroscope, the accelerometer, and the electronic compass and sample. The attitude solving unit 320 reads the data of the gyroscope, the accelerometer and the electronic compass, calculates the triaxial angular velocity of the device, calculates the angular increment matrix, solves the attitude differential equation, and finally updates the attitude quaternion. The data fusion unit 330 filters the noise in the correlation output based on the Kalman filter algorithm and finally outputs the device pose and trajectory.

Since gyroscopes and accelerometers are affected by external factors (such as friction, unstable torque, etc.), it is necessary to calibrate the gyroscope and accelerometer sensor data before the first power-on to eliminate static. error.

The error model used in the gyroscope or accelerometer error calibration process can be expressed by equation (1).

Where [D _x D _y D _z ] ^T is the true value of the physical quantity measured by the gyroscope or accelerometer, [M _x M _y M _z ] ^T is the actual measured value of the gyroscope or accelerometer, [B _x B _y B _z ] ^T is the sensor bias. For the gyroscope in a stationary state, D _x , D _y , and D _z are all 0, and for the accelerometers D _x and D _y in the horizontal stationary state, both are 0, and D _z is a gravitational acceleration value.

When using the electronic compass for geomagnetic measurement, there are many factors that cause the error of the electronic compass direction, such as environmental magnetic interference factors, such as current, ferrous materials, permanent magnets, etc., so that the magnetic sensor measurement value deviates from the geomagnetic true value, thus making the direction calculation Navigation deviation occurs at the same time; at the same time, the compass tilt angle, which depends on the geographical position and orientation, leads to a large directional error. Therefore, error calibration of magnetic sensor measurement data is an indispensable important link.

For the electronic compass, the focus is on eliminating the error in the XY plane. When there is no error, its measured value appears as a circle on the XY plane. When the unit circle is transformed by the two ratios a and b on the X and Y axes, after the rotation of the θ angle and the translation of (x ₀ , y ₀ ), the elliptic equation shown by the formula (2) is formed:

Where x ₁ and y ₁ are the outputs of the calibrated electronic compass, and x and y are the outputs when the electronic compass is deviated. This application can obtain x ₀ , y ₀ , θ, a, b by least square fitting. Eliminate errors.

The calculation of the parameters x ₀ , y ₀ , θ, a, b in the equation (2) is as shown in equation (3):

The formula for calculating the relevant parameters in equation (3) is as follows:

U = α ² + β ² γ ² , V = 2 (β ² γ ^{2 -} α ² γ), W = α ² γ ² + β ² ,

After the error is calibrated, the quaternion is used to describe the attitude of the device. First, the gyro data is read, the three-axis angular velocity of the device is calculated, the angular increment matrix is calculated, the attitude differential equation is solved, and the attitude quaternion is finally updated.

The rotation quaternion from the inertial coordinate system a to the device coordinate system b is:

Where θ is the angle of rotation and μ ^R is the representation of the axis of rotation in the inertial coordinate system. From equation (4):

Through

Convert to gyroscope measurable

Can get quaternion:

The Pika method can be used to solve the quaternion differential equation. The process is to first calculate the corresponding quaternion Q(t) when the carrier moves, and then according to the quaternion and the attitude matrix.

Correspondence relationship, respectively determine the attitude matrix and attitude angles Δθ _x , Δθ _y , Δθ _z are the angular changes of the gyroscope x, y, and z axes at the sampling interval of [t _k , t _k+1 ], Δθ ² = Δθ _x ² + Δθ _y ² + Δθ _z ² , the order approximation algorithm for the quaternion is:

Using the Kalman filter algorithm, the rollover angle calculated by the accelerometer and the rollover angular velocity of the gyroscope test, the pitch angle data calculated by the accelerometer and the gyroscope test pitch rate data are respectively filtered, and the accelerometer and the gyroscope can be made. The data compensates each other, reduces the measurement noise, and the pitch angle and roll angle test values are more accurate, which makes the magnetic sensor tilt angle compensation effect better, can perform static calibration, and can also perform dynamic calibration.

The noise variance matrix of these two sensors is set as a variable, and the external disturbance is monitored in real time, and the noise variance matrix of the accelerometer and the electronic compass is dynamically changed, and then the gain in the Kalman filter is corrected.

After obtaining the a priori quaternion in the attitude solving step, the values of the accelerometer and the electronic compass are read to obtain the observation, and the a priori quaternion is used as the initial value of the state quantity, and the formula of the Kalman filter is brought. Get the final pose quaternion. In the present application, the gyroscope is integrated with the accelerometer, and the pitch angle θ and the roll angle γ are estimated, and the gyroscope is integrated with the electronic compass to estimate the heading angle.

The data fusion process of gyroscope and accelerometer is shown in Figure 4. The data fusion process of gyroscope and electronic compass is shown in Figure 5.

The data after the fusion of the attitude solution and the data is represented by a quaternion, which can be converted into a direction cosine matrix by the formula (9):

Converted to Euler angles by equations (10) and (11):

S220: Collect a depth image and a color image respectively by using a depth camera and a color camera.

The depth image is also referred to as a range image, and refers to an image from a image collector (for example, the depth camera 120 in the present application) to a point (depth) of each point in the scene as a pixel value. It directly reflects the geometry of the visible surface of the scene.

For example, when the method is performed by the terminal device shown in FIG. 1 , the depth camera 120 digitally encodes the excitation light source by receiving an excitation light source reflected by the outside, such as infrared light, to form a digital image and transmit it to the spectrum analysis module 130. . The spectral analysis module 130 analyzes the speckles and calculates a distance z between the corresponding pixel point (x, y) in the current image and the depth camera 120, so that the current depth image can be acquired.

S230: Perform fast segmentation matching with the depth image according to the motion posture of the terminal device.

6 shows a fast matching method of a central region three-dimensional object in which a motion figure and a depth image of a device are fused, the method tracking a device state change in real time, and extracting a start and end time point corresponding to each fixed time period when the device smoothly moves. Depth map frame. Based on the start time point, the depth map has completed the segmentation of the feature region and the device pose change in the close-up mode, and the approximate range of each feature region of the depth map at the end of the time is inferred, thereby performing fast segmentation matching.

Since the value of each pixel of the depth image is the linear distance between the object and the camera, there is a similarity between the distance from the same object to the camera. Therefore, the coarse segmentation based on the depth map adopts the region growing method. However, since the depth image has noise and is easy to lose information, the depth image is first filtered to achieve image smoothing and loss depth filling. The specific implementation manner is as follows:

The image is filtered using a bilateral filter defined as:

Where I is the original image, I' is the filtered image, Ω is the neighborhood of (x, y), and w(i, j) is the weight of the filter at the corresponding coordinates.

Pixels with similar depths in the image are combined to form a similar feature region. The specific implementation is:

1) Select the starting pixel point;

2) comparing the starting pixel point with the surrounding pixel point according to the similarity criterion;

3) If the two satisfy the condition of similarity, the pixel is merged into the starting pixel to form a new starting point region;

4) When the surrounding pixel points do not satisfy the similarity condition, the growth in the direction is stopped.

Among them, the selection of the starting point is very important for the efficiency of depth image segmentation. If properly selected, the segmentation can be speeded up. This application is based on the relative posture and trajectory of the device to roughly estimate the position of the feature region in the depth map at the end time point to accelerate. . Since the distance of the captured object from the camera is usually close, the minimum value region in the depth map is selected, and a multi-fork tree of the image minimum value region is established to realize the selection of the starting point. Let the depth image D, the pixel depth set Λ={d ₁ , d ₂ ,..., d _N }, sort the pixels according to the depth value from small to large, and find all the minimum values in the image from the arrival. region:

The similarity criterion is used to distinguish the object from the background. Select the average depth and difference mean of the pixel points in the previous comparison, and select the average depth and difference mean of the pixel points. The difference between the mean value of the pixel depth and the difference between the two is It is determined to be the same area within 5%.

The process of inferring the location of the feature area of the depth map using the device attitude information at the end of time is as follows:

1) recording the device attitude information of the starting point of the time, and determining the coordinates in the coordinate system of the device according to the coordinates of the pixel of the feature area in the image and the depth value of the image;

2) Obtaining the change of the attitude of the device at the end of the time compared to the starting point, and converting the change of the posture to the original coordinate system of the device;

3) Convert the pixel coordinates of the starting point feature area to the coordinate system of the time end point device itself.

S240. Perform accurate matching on the fast segmentation matching result according to the color image.

After the depth image is effectively filtered, not only can the noise be smoothed, but also the pixels with missing depth can be filled, but the accuracy is not high. Therefore, the feature region matching based on the depth image cannot effectively match the actual object feature region. Since the color image segmentation method can effectively extract the boundary information, the present application accurately matches the result of the fast matching for the close-up mode and the image segmentation method of the color image, and directly draws the frame for the device to smoothly move or rotate for the perspective mode, and color The images are accurately matched, and the precise matching is mainly optimized at the edge of the feature region obtained by the fast matching. The matching process is shown in FIG. 7 .

Firstly, the depth image and the color image are separately filtered, and then the feature regions are quickly matched according to the posture information and the depth information, and a series of feature regions are obtained, and the representative pixel points of each feature region are provided to the color image for segmentation. The image segmentation adopts the watershed algorithm, and after filtering, the grayscale image after the coloration is generated, and the water injection operation is directly performed according to the provided characteristic pixel points, and finally the boundary of each feature region is obtained. Based on the segmentation of the boundary region of the color image, the boundary points of the feature regions obtained by the fast matching are compared with each other. If there is no deviation, the matching results are both normal. If there is a deviation, and the neighborhood depth data is missing or the drop is not clear, the color image segmentation result is the final result. If there is a deviation, and the neighborhood depth data is perfect and the drop is not clear, the depth image segmentation result is the final result. If there is a deviation, and the neighborhood depth data is perfect and the difference is clear, the color image segmentation result is the final result.

Accurate matching results can provide feedback for device attitude and trajectory information, making gesture recognition more accurate. The process of obtaining attitude information based on the exact matching result is as follows:

1) recording the coordinates of the device at the current time and the coordinates of the representative pixel points of the current feature regions in the depth image, recording the depth values of the pixel points of the feature region, and determining the coordinates thereof in the coordinate system of the device itself;

2) Obtain the change of the device posture of the current time compared with the start time of the time, and convert the posture change to the original coordinate system of the device;

3) Convert the current point feature area pixel point coordinate to the coordinate system of the time end point device itself. If there is a deviation, correct the current time device posture information and re-convert until the two have no deviation.

S250: If the acquired current image overlaps with the captured image, the overlapping region is merged by a fusion algorithm.

The present application allows the device to acquire an image in all directions, and the overlapping regions need to be merged when the acquired image overlaps with the captured image. The overlap region fusion is based on the historical feature matrix and the current device pose information. The present application associates each feature matrix with the device pose, so that the previous device pose information of the historical feature matrix can be obtained. The specific integration process is shown in Figure 8.

In order to be able to determine that there is overlap between the current image and the historical image, the posture information of the device during the motion shooting will be continuously recorded and saved, and the attitude information may be extracted at regular intervals as the iconic data for comparison with the device posture in the future. Since different device poses may also have overlapping regions, the method records the field of view that can be captured by each gesture experienced by the device, and stores the feature region and posture information in combination. The current relative pose and trajectory are calculated in real time during the movement of the device, and the comparable iconic data is extracted from the historical feature matrix for matching. If the matching result indicates that the current image frame overlaps with the historical image frame, the fusion processing is performed. , updated to the overlap area, while recording the current device pose.

In order to achieve accurate acquisition of close-range 3D images and optimization of power consumption under long-range conditions, the association of far and near-field mode images is realized. The method provided by the present application can dynamically identify whether the current scene is a distant view or a close-up view. The determining method is to scan the depth image matrix, calculate the number of pixel points whose depth value is less than the threshold value, and determine the foreground when the number is less than the threshold value. When in Vision mode, the depth camera is automatically turned off and periodically activated to detect if it is in close-up mode, which reduces power consumption.

In the close-up mode, the method tracks the change of the depth state of the feature area. When the distance changes, the distance is stored in the feature matrix, so that the display can recognize whether there is a close to the moving action when shooting, thereby prompting the user to Zoom out and zoom in.

In the remote mode, the method does not activate the depth camera, so there is no need to perform fast matching of the central region of the depth map, and only the color image and the attitude information are accurately matched to obtain the feature matrix.

The following describes the user shooting process. In order to ensure the shooting effect, after the user presses the button to start shooting, the method prompts the user to move or rotate the mobile phone in any direction to shoot the target object. When the shooting is started, the simultaneous starting attitude sensor, depth camera and color camera work are triggered. The device attitude and trajectory identification module reads the data of the gyroscope, accelerometer and electronic compass in real time, performs attitude calculation on it, and then fuses the multi-sensor data to obtain the posture and trajectory of the device. The depth camera collects the depth data in real time, and after filtering, recognizes the near and far modes. If it is a close-up mode, the depth image is roughly segmented, and the device region and the trajectory data are quickly matched to the central region to speed up the matching. The color camera collects the color data in real time, and after filtering, performs the water injection operation according to the feature area provided by the fast matching result, and finally obtains the boundary of the scene, compares and determines the boundary of the feature area obtained by the fast matching, and adjusts the feature area. The boundary eventually produces a subtle matching feature area.

In order to solve the problem that the image captured during the shooting coincides with the already photographed area, it can be recognized and considered to be the same area at the time of display, thereby realizing the cycle view, and the method is based on the device posture and the feature area in the device. The position in the coordinate system is calculated, and the coincident area is fused and updated, and the posture of the device taken twice is recorded, so that it can be cyclically viewed when supplied to the display. After the coincidence region is merged, the final feature region set of the image is generated.

When the user releases the button, the user is expected to stop shooting. At this time, the sensor and the camera are stopped, and the last frame of image processing is completed, then the intermediate buffer is cleared and the resources are released, and finally the set of the obtained feature regions is written to the nonvolatile. The memory finishes shooting. The process of shooting images by the user is shown in Figure 9:

When the user clicks on the picture to view, the posture sensor is activated to acquire the device posture, and the image feature region set is read to obtain the final image frame. The method supports the user to rotate the mobile phone to view the picture, and the gesture of the mobile phone is corresponding to the posture of the mobile phone when the picture is taken. The device attitude and trajectory identification module reads the data of the gyroscope, accelerometer and electronic compass in real time, performs attitude calculation on it, and then fuses the multi-sensor data to obtain the posture of the device. After reading the image feature region set, the image frames are synthesized according to the coordinates in the image frame in which each feature region is located, and finally the image frames are buffered one by one to be read. After the current pose of the device is generated, it needs to correspond to the initial device pose of the captured photo. After that, the change of the gesture of the mobile phone will trigger the display of the corresponding state picture frame. The change of the device pose will trigger the selection of the image frame of the corresponding pose and submit the display. After the image is implemented, it is judged whether the current frame is scalable. If possible, the screen prompts that the current zoom is possible, and then the posture sensor data is acquired to start a new cycle. If you can't zoom, you also get the attitude sensor data to start a new loop.

When the user clicks back, the user is considered to no longer view the picture. At this time, the sensor and the camera are stopped, the intermediate cache is cleared, and resources are released. The process of viewing images by the user is shown in FIG.

(1) The application is equipped with a gyroscope, an accelerometer and an electronic compass sensor on the terminal device for providing device attitude information; an infrared transmitter and an infrared camera for providing depth image data; and a color camera for use In order to provide color image data, the combination of the three provides raw data support for the acquisition of dynamic three-dimensional images.

(2) The method for gesture recognition of a three-dimensional space terminal device according to the present application can obtain an initial posture of a device by sampling, posture settlement, and data fusion of three attitude sensors. The attitude generation algorithm is compensated according to the change of the depth image and the color image, and the closed-loop tracking of the attitude detection is completed.

(3) The fast matching method of the three-dimensional object in the central region of the device posture and depth image fusion in the present application can provide a strategy of speeding up matching in the form of frame drawing when the device posture changes at a constant speed, and realize the same three-dimensional object between multiple frames of images. Quick match.

(4) The fine matching method based on the color image and the fast matching result of the same three-dimensional object between the multi-frame images according to the present application can compensate and optimize the fast matching result according to the data information of the corresponding position of the color image for each feature region. , get the most detailed image feature description.

(5) The present application can realize 360-degree omnidirectional imaging, and supports shooting of an already photographed object, and can dynamically recognize the overlapped area that has been photographed according to the current posture information of the device and the historical feature matrix information. Data fusion is performed on the overlapping regions, and the overlapping data information is added, so that the display can be smoothly switched according to the posture.

(6) This application can dynamically recognize the distant and near mode of the subject, and achieve a full range of camera for the near view. Scenery, the integration of the panorama. The Vision automatically turns off the depth camera to reduce power consumption.

FIG. 11 is another schematic block diagram of a terminal device according to an embodiment of the present application. The terminal device 1100 shown in FIG. 11 includes: a radio frequency (RF) circuit 1110, a memory 1120, other input devices 1130, a display screen 1140, a sensor 1150, an audio circuit 1160, an I/O subsystem 1170, and a processor 1180. And power supply 1190 and other components. It will be understood by those skilled in the art that the terminal device structure shown in FIG. 11 does not constitute a limitation of the terminal device, and may include more or less components than those illustrated, or combine some components or split some components. , or different parts layout. Those skilled in the art will appreciate that the display screen 1140 belongs to a User Interface (UI), and the terminal device 1100 may include a user interface that is smaller than illustrated or less.

The specific components of the terminal device 1100 are specifically described below with reference to FIG. 11:

The RF circuit 1110 can be used for receiving and transmitting signals during and after receiving or transmitting information, in particular, after receiving the downlink information of the base station, and processing it to the processor 1180; in addition, transmitting the designed uplink data to the base station. Generally, RF circuits include, but are not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a Low Noise Amplifier (LNA), a duplexer, and the like. In addition, RF circuitry 1110 can also communicate with the network and other devices via wireless communication. The wireless communication may use any communication standard or protocol, including but not limited to Global System of Mobile communication (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (Code). Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), E-mail, Short Messaging Service (SMS), etc.

The memory 1120 can be used to store software programs and modules, and the processor 1180 executes various functional applications and data processing of the terminal device 1100 by running software programs and modules stored in the memory 1120. The memory 1120 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application required for at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may be stored according to Data (such as audio data, phone book, etc.) created by the use of the terminal device 1100. Moreover, memory 1120 can include high speed random access memory, and can also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

Other input devices 1130 can be used to receive input numeric or character information, as well as to generate key signal inputs related to user settings and function control of terminal device 1100. Specifically, other input devices 1130 may include, but are not limited to, physical keyboards, function keys (such as volume control buttons, switch buttons, etc.), trackballs, mice, joysticks, and light mice (the light mouse is not sensitive to display visual output). One or more of a surface, or an extension of a touch sensitive surface formed by a touch screen. Other input devices 1130 are coupled to other input device controllers 1171 of I/O subsystem 1170 for signal interaction with processor 1180 under the control of other device input controllers 1171.

The display 1140 can be used to display information entered by the user or information provided to the user as well as various menus of the terminal device 1100, and can also accept user input. The specific display screen 1140 can include a display panel 1141 and a touch panel 1142. The display panel 1141 can be configured by using a liquid crystal display (LCD), an organic light-emitting diode (OLED), or the like. The touch panel 1142, also referred to as a touch screen, a touch sensitive screen, etc., can collect contact or non-contact operations on or near the user (eg, the user uses any suitable object or accessory such as a finger, a stylus, etc. on the touch panel 1142. Or the operation in the vicinity of the touch panel 1142 may also include a somatosensory operation; the operation includes a single-point control operation, a multi-point control operation, and the like, and the corresponding connection device is driven according to a preset program. Optionally, the touch panel 1142 may include two parts: a touch detection device and a touch controller. Wherein, the touch detection device detects a user's touch orientation and posture, and Detecting a signal brought by the touch operation, transmitting the signal to the touch controller; the touch controller receives the touch information from the touch detection device, and converts it into information that the processor can process, and sends the information to the processor 1180, and can receive the processing The command sent by the device 1180 is executed. In addition, the touch panel 1142 can be implemented by using various types such as resistive, capacitive, infrared, and surface acoustic waves, and the touch panel 1142 can be implemented by any technology developed in the future. Further, the touch panel 1142 can cover the display panel 1141, and the user can display the content according to the display panel 1141 (the display content includes, but is not limited to, a soft keyboard, a virtual mouse, a virtual button, an icon, etc.) on the display panel 1141. Operation is performed on or near the covered touch panel 1142. After detecting the operation on or near the touch panel 1142, the touch panel 1142 transmits to the processor 1180 through the I/O subsystem 1170 to determine user input, and then the processor 1180 is based on the user. The input provides a corresponding visual output on display panel 1141 via I/O subsystem 1170. Although in FIG. 11 , the touch panel 1142 and the display panel 1141 are two independent components to implement the input and input functions of the terminal device 1100 , in some embodiments, the touch panel 1142 and the display panel 1141 may be The input and output functions of the terminal device 1100 are implemented integrated.

The terminal device 1100 may also include at least one type of sensor 1150, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust the brightness of the display panel 1141 according to the brightness of the ambient light, and the proximity sensor may close the display panel 1141 when the terminal device 1100 moves to the ear. And / or backlight. As a kind of motion sensor, the accelerometer sensor can detect the magnitude of acceleration in all directions (usually three axes). When it is stationary, it can detect the magnitude and direction of gravity. It can be used to identify the gesture of the mobile phone (such as horizontal and vertical screen switching, related Game, magnetometer attitude calibration), vibration recognition related functions (such as pedometer, tapping), etc.; as for the terminal device 1100 can also be configured with gyroscopes, barometers, hygrometers, thermometers, infrared sensors and other sensors, here No longer.

An audio circuit 1160, a speaker 1161, and a microphone 1162 can provide an audio interface between the user and the terminal device 1100. The audio circuit 1160 can transmit the converted audio data to the speaker 1161, and convert it into a sound signal output by the speaker 1161; on the other hand, the microphone 1162 converts the collected sound signal into a signal, which is received by the audio circuit 1160. The audio data is converted, and the audio data is output to the RF circuit 1110 for transmission to, for example, another mobile phone, or the audio data is output to the memory 1120 for further processing.

The I/O subsystem 1170 is used to control external devices for input and output, and may include other device input controllers 1171, sensor controllers 1172, and display controllers 1173. Optionally, one or more other input control device controllers 1171 receive signals from other input devices 1130 and/or send signals to other input devices 1130, and other input devices 1130 may include physical buttons (press buttons, rocker buttons, etc.) , dial, slide switch, joystick, click wheel, light mouse (light mouse is a touch-sensitive surface that does not display visual output, or an extension of a touch-sensitive surface formed by a touch screen). It is worth noting that other input control device controllers 1171 can be connected to any one or more of the above devices. Display controller 1173 in I/O subsystem 1170 receives signals from display 1140 and/or transmits signals to display 1140. After the display 1140 detects the user input, the display controller 1173 converts the detected user input into an interaction with the user interface object displayed on the display 1140, ie, implements human-computer interaction. Sensor controller 1172 can receive signals from one or more sensors 1150 and/or send signals to one or more sensors 1150.

The processor 1180 is a control center of the terminal device 1100 that connects various portions of the entire terminal device using various interfaces and lines, by running or executing software programs and/or modules stored in the memory 1120, and recalling stored in the memory 1120. The data performs various functions and processing data of the terminal device 1100, thereby performing overall monitoring of the terminal device. Optionally, the processor 1180 can include one or more processing units; optionally, the processor 1180 The application processor and the modem processor can be integrated, wherein the application processor mainly processes an operating system, a user interface, an application, etc., and the modem processor mainly processes wireless communication. It will be appreciated that the above described modem processor may also not be integrated into the processor 1180.

The processor 1180 is configured to: acquire a motion posture of the terminal device; separately acquire a depth image and a color image by using the depth camera and the color camera; perform fast segmentation matching with the depth image according to the motion posture of the terminal device; The result of the fast segmentation matching is accurately matched; if the acquired current image overlaps with the captured image, the overlapping region is fused by the fusion algorithm to generate a dynamic three-dimensional image. .

The terminal device 1100 further includes a power source 1190 (such as a battery) for supplying power to the various components. Optionally, the power source can be logically connected to the processor 1180 through the power management system to manage functions such as charging, discharging, and power consumption through the power management system. .

Although not shown, the terminal device 1100 may further include a camera (a depth camera and a color camera), a Bluetooth module, and the like, and details are not described herein again.

It should be understood that the terminal device 1100 may correspond to a terminal device in a dynamic three-dimensional image acquisition method according to an embodiment of the present application, and the terminal device 1100 may include a physical unit for performing a method performed by a terminal device or an electronic device in the above method. . In addition, the physical units in the terminal device 1100 and the other operations and/or functions described above are respectively used for the corresponding processes of the foregoing methods, and are not described herein for brevity.

It should also be understood that the terminal device 1100 can include a physical unit in a method for performing the above-described dynamic three-dimensional image acquisition. In addition, the physical units in the terminal device 1100 and the other operations and/or functions described above are respectively used for the corresponding processes of the foregoing methods, and are not described herein for brevity.

It should also be understood that the processor in the embodiment of the present application may be an integrated circuit chip with signal processing capability. In the implementation process, each step of the foregoing method embodiment may be completed by an integrated logic circuit of hardware in a processor or an instruction in a form of software. The processor may be a central processing unit (CPU), the processor may be another general-purpose processor, a digital signal processor (DSP), or an application specific integrated circuit (ASIC). ), Field Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. The methods, steps, and logical block diagrams disclosed in the embodiments of the present application can be implemented or executed. The general purpose processor may be a microprocessor or the processor or any conventional processor or the like. The steps of the method disclosed in the embodiments of the present application may be directly implemented by the hardware decoding processor, or may be performed by a combination of hardware and software in the decoding processor. The software can be located in a random storage medium, such as a flash memory, a read only memory, a programmable read only memory or an electrically erasable programmable memory, a register, and the like. The storage medium is located in the memory, and the processor reads the information in the memory and combines the hardware to complete the steps of the above method.

It should also be understood that the memory in the embodiments of the present application may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory. The non-volatile memory may be a read-only memory (ROM), a programmable read only memory (PROM), an erasable programmable read only memory (Erasable PROM, EPROM), or an electric Erase programmable read only memory (EEPROM) or flash memory. The volatile memory can be a Random Access Memory (RAM) that acts as an external cache. By way of example and not limitation, many forms of RAM are available, such as static random access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (Synchronous DRAM). SDRAM), Double Data Rate Synchronous Dynamic Random Access Memory (DDR), Enhanced Synchronous Dynamic Random Access Memory (ESDRAM), Synchronous Connection Dynamic Random Access Memory (SDRAM) and Direct Memory Bus Random Access Memory (DR RAM). It should be noted that the memories of the systems and methods described herein are intended to comprise, without being limited to, these and any other suitable types of memory.

It should also be understood that the bus system may include a power bus, a control bus, a status signal bus, and the like in addition to the data bus. However, for the sake of clarity, the various buses are labeled as bus systems in the figure.

It should also be understood that in the embodiment of the present application, "B corresponding to A" means that B is associated with A, and B can be determined according to A. However, it should also be understood that determining B from A does not mean that B is only determined based on A, and that B can also be determined based on A and/or other information. It should be understood that the term "and/or" herein is merely an association relationship describing an associated object, indicating that there may be three relationships, for example, A and/or B, which may indicate that A exists separately while 10 is stored in A. And B, there are three cases of B alone. In addition, the character "/" in this article generally indicates that the contextual object is an "or" relationship.

In the implementation process, each step of the above method may be completed by an integrated logic circuit of hardware in a processor or an instruction in a form of software. The steps of the method for transmitting an uplink signal disclosed in the embodiments of the present application may be directly implemented as hardware processor execution completion, or performed by a combination of hardware and software in a processor. The software can be located in a random storage medium, such as a flash memory, a read only memory, a programmable read only memory or an electrically erasable programmable memory, a register, and the like. The storage medium is located in the memory, and the processor reads the information in the memory and combines the hardware to complete the steps of the above method. To avoid repetition, it will not be described in detail here.

The embodiment of the present application further provides a computer readable storage medium storing one or more programs, the one or more programs including instructions, when the portable electronic device is included in a plurality of applications When executed, the portable electronic device can be caused to perform the method of the embodiment shown in Figures 2 and/or 3.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the various examples described in connection with the embodiments disclosed herein can be implemented in electronic hardware or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the solution. A person skilled in the art can use different methods to implement the described functions for each particular application, but such implementation should not be considered to be beyond the scope of the embodiments of the present application.

A person skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the system, the device and the unit described above can refer to the corresponding process in the foregoing method embodiment, and details are not described herein again.

In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, each functional unit in each embodiment of the embodiments of the present application may be integrated into one processing unit, It may be that each unit physically exists alone, or two or more units may be integrated into one unit.

The functions may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a standalone product. Based on such understanding, the technical solution of the embodiments of the present application, or the part contributing to the prior art or the part of the technical solution, may be embodied in the form of a software product stored in a storage medium. The instructions include a plurality of instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the various embodiments of the embodiments of the present application. The foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program codes. .

The foregoing is only a specific embodiment of the embodiments of the present application, but the scope of protection of the embodiments of the present application is not limited thereto, and any person skilled in the art can easily adopt the technical scope disclosed in the embodiments of the present application. All changes or substitutions are contemplated to be within the scope of the embodiments of the present application. Therefore, the scope of protection of the embodiments of the present application is subject to the scope of protection of the claims.

Claims

A method for acquiring a dynamic three-dimensional image, wherein the method is applied to a terminal device, and the method includes:

Obtaining a motion posture of the terminal device;

Collecting depth images and color images by a depth camera and a color camera, respectively;

Performing fast segmentation matching with the depth image according to the motion posture of the terminal device;

Performing exact matching on the result of the fast segmentation matching according to the color image;

If the acquired current image overlaps with the captured image, the overlapping region is fused by the fusion algorithm to generate a dynamic three-dimensional image.
The method of claim 1, wherein the obtaining the motion gesture of the terminal device comprises:

The motion posture of the terminal device is obtained by the accelerometer, the gyroscope and the electronic compass of the terminal device.
The method according to claim 1 or 2, wherein the fast segmentation matching with the depth image according to the motion posture of the terminal device comprises:

When determining that the terminal device moves smoothly according to the motion posture of the terminal device, acquiring a depth map corresponding to the start time point and the end time point in the first time period;

The feature region of the depth map of the end time point is calculated based on the completed segmented feature region of the depth map of the start time point and the pose change of the terminal device.
The method according to claim 3, wherein the accurately matching the results of the fast segmentation matching according to the color image comprises:

Performing compensation optimization on the feature region range according to the color image to obtain a fine-grained picture feature description.
The method according to any one of claims 1 to 4, wherein the merging the overlapping regions by the fusion algorithm to generate a dynamic three-dimensional image comprises:

Calculating the current posture of the terminal device in real time, and extracting comparable iconic data from the historical feature matrix for matching;

If the matching result indicates that the current image frame overlaps with the historical image frame, the fusion processing is performed to update the overlapping region.
A terminal device, comprising:

An acquiring unit, configured to acquire a motion posture of the terminal device;

a collecting unit, configured to separately collect a depth image and a color image by using a depth camera and a color camera;

a processing unit, configured to perform fast segmentation matching with the depth image according to the motion posture of the terminal device; and accurately match the result of the fast segmentation matching according to the color image; if the acquired current image overlaps with the captured image, The overlapping regions are fused by a fusion algorithm to generate a dynamic three-dimensional image.
The terminal device according to claim 6, wherein the obtaining unit is specifically configured to:

The motion posture of the terminal device is obtained by the accelerometer, the gyroscope and the electronic compass of the terminal device.
The terminal device according to claim 6 or 7, wherein the processing unit is specifically configured to:

When determining that the terminal device moves smoothly according to the motion posture of the terminal device, acquiring a depth map corresponding to the start time point and the end time point in the first time period;

The feature region of the depth map of the end time point is calculated based on the completed segmented feature region of the depth map of the start time point and the pose change of the terminal device.
The terminal device according to claim 8, wherein the processing unit is specifically configured to:

Performing compensation optimization on the feature region range according to the color image to obtain a fine-grained picture feature description.
The terminal device according to any one of claims 6 to 9, wherein the processing unit is specifically configured to:

Calculating the current posture of the terminal device in real time, and extracting comparable iconic data from the historical feature matrix for matching;

If the matching result indicates that the current image frame overlaps with the historical image frame, the fusion processing is performed to update the overlapping region.
A terminal device, comprising: a memory, a processor, and a display;

Memory for storing programs;

The processor is configured to execute the program stored by the memory, and the processor is configured to perform the method of any one of claims 1-5 when the program is executed.