CN107610127B

CN107610127B - Image processing method, image processing apparatus, electronic apparatus, and computer-readable storage medium

Info

Publication number: CN107610127B
Application number: CN201710811959.7A
Authority: CN
Inventors: 张学勇
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2017-09-11
Filing date: 2017-09-11
Publication date: 2020-04-03
Anticipated expiration: 2037-09-11
Also published as: CN107610127A

Abstract

The invention discloses an image processing method, an image processing device and an electronic device. The method comprises the following steps: acquiring a depth image of a current user, and acquiring a three-dimensional background image of a scene where the current user is located; performing edge extraction on the three-dimensional background image to acquire depth data of edge pixels of a target object in the three-dimensional background image; judging whether the current user collides with a target object in the scene or not according to the depth image of the current user and the depth data of the edge pixels of the target object; if so, the audio for the sound that should be emitted when the target object is collided is inserted. According to the embodiment of the invention, the person region and the target object region extracted through the depth image are more accurate, and particularly, the edge pixels of the person region and the edge pixels of the target object region can be accurately calibrated. In addition, when the person collides with the target object, image processing is further performed to simulate the sound to be emitted when the target object is collided, so that the user experience is improved.

Description

Image processing method, image processing apparatus, electronic apparatus, and computer-readable storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to an image processing method and apparatus, an electronic apparatus, and a computer-readable storage medium.

Background

The existing technology for reminding collision between a person and an object in a background generally uses feature points to extract a person outline and an object outline, and judges whether a current user collides with the object in the background according to the person outline and the object outline. However, the accuracy of the person contour and the object contour extracted by using the feature points is not high, and particularly, the boundaries of the person and the object cannot be accurately calibrated, which affects the effect of judging whether the person and the object collide with each other. In addition, in the prior art, when the collision between a person and an object is determined, the user can be notified only by a simple prompt, and the collision event is not further processed.

Disclosure of Invention

The object of the present invention is to solve at least to some extent one of the above mentioned technical problems.

To this end, a first object of the present invention is to propose an image processing method. According to the method, when the person collides with the target object, image processing is further performed to simulate the sound which should be emitted when the target object is collided, so that the user experience is improved.

A second object of the present invention is to provide an image processing apparatus.

A third objective of the present invention is to provide an electronic device.

A fourth object of the invention is to propose a computer-readable storage medium.

In order to achieve the above object, an embodiment of the present invention provides an image processing method, including: acquiring a depth image of a current user, and acquiring a three-dimensional background image of a scene where the current user is located; performing edge extraction on the three-dimensional background image to acquire depth data of edge pixels of a target object in the three-dimensional background image; judging whether the current user collides with a target object in the scene or not according to the depth image of the current user and the depth data of the edge pixels of the target object; if so, inserting audio for the sound that the target object should make when collided.

In order to achieve the above object, an image processing apparatus according to a second aspect of the present invention includes: the depth image acquisition component is used for acquiring a depth image of a current user and acquiring a three-dimensional background image of a scene where the current user is located; a processor to: performing edge extraction on the three-dimensional background image to acquire depth data of edge pixels of a target object in the three-dimensional background image; judging whether the current user collides with a target object in the scene or not according to the depth image of the current user and the depth data of the edge pixels of the target object; if so, inserting audio for the sound that the target object should make when collided.

In order to achieve the above object, an electronic device according to a third aspect of the present invention includes: one or more processors; a memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the programs comprising instructions for performing the image processing method as described in the embodiments of the first aspect of the present invention.

To achieve the above object, a computer-readable storage medium according to a fourth embodiment of the present invention includes a computer program for use with an electronic device capable of capturing images, where the computer program is executable by a processor to perform the image processing method according to the first embodiment of the present invention.

The image processing method, the image processing device, the electronic device and the computer readable storage medium of the embodiment of the invention acquire the depth data of the edge pixel of the person from the depth image of the current user, acquire the depth data of the edge pixel of the object from the three-dimensional background image of the scene where the current user is located, judge whether the person collides with the object in the three-dimensional background according to the depth data of the edge pixel of the person and the edge pixel of the object, and insert the audio of the sound which is to be emitted when the object collides, namely, simulate the sound which is to be emitted when the object collides. Because the acquisition of the depth image is not easily influenced by factors such as illumination, color distribution in a scene and the like, the character region and the target object region extracted through the depth image are more accurate, and particularly, edge pixels of the character region and edge pixels of the target object region can be accurately calibrated. Further, the effect of judging whether the current user collides with the target object in the virtual scene is better based on the more accurate depth data of the figure edge pixels and the target object edge pixels; in addition, when the person collides with the target object, the image processing is further performed to simulate the sound to be emitted when the target object is collided, and the user experience is greatly improved.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flow chart of an image processing method according to some embodiments of the invention;

FIG. 2 is a block schematic diagram of an image processing apparatus according to some embodiments of the invention;

FIG. 3 is a schematic structural diagram of an electronic device according to some embodiments of the invention;

FIG. 4 is a flow chart of an image processing method of some embodiments of the present invention;

FIG. 5 is a flow chart of an image processing method of some embodiments of the present invention;

6(a) -6 (e) are schematic views of a scene for structured light measurement according to one embodiment of the present invention;

FIGS. 7(a) and 7(b) are schematic diagrams of a scene for structured light measurements according to one embodiment of the present invention;

FIG. 8 is a flow chart of an image processing method of some embodiments of the present invention;

FIG. 9 is a block diagram of an electronic device according to some embodiments of the invention;

FIG. 10 is a block diagram of an electronic device according to some embodiments of the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

It will be understood that, as used herein, the terms "first," "second," and the like may be used herein to describe various elements, but these elements are not limited by these terms. These terms are only used to distinguish one element from another. For example, a first client may be referred to as a second client, and similarly, a second client may be referred to as a first client, without departing from the scope of the present invention. Both the first client and the second client are clients, but they are not the same client.

An image processing method, an apparatus, an electronic apparatus, and a computer-readable storage medium according to embodiments of the present invention are described below with reference to the accompanying drawings.

Referring to fig. 1 to 2, an image processing method according to an embodiment of the invention can be applied to the electronic device 1000 according to an embodiment of the invention. The image processing method may include:

s110, obtaining a depth image of the current user, and obtaining a three-dimensional background image of a scene where the current user is located.

And S120, performing edge extraction on the three-dimensional background image to acquire depth data of edge pixels of the target object in the three-dimensional background image.

As an example, the three-dimensional background image may be edge extracted by a Canny operator. The core of the algorithm for edge extraction by the Canny operator mainly comprises the following steps: firstly, a 2D Gaussian filtering template is used for carrying out convolution on a three-dimensional background image so as to eliminate noise; then, obtaining the gradient value of the gray scale of each pixel by using a differential operator, calculating the gradient direction of the gray scale of each pixel according to the gradient value, and finding out adjacent pixels of the corresponding pixels along the gradient direction through the gradient direction; then, each pixel is traversed, and if the gray value of a certain pixel is not the maximum compared with the gray values of two adjacent pixels in front and back in the gradient direction, the pixel is not considered as the edge point. Therefore, the pixel points at the edge position in the three-dimensional background image can be determined, so that the edge pixels of the target object after edge extraction are obtained, and the depth data of the edge pixels of the target object in the three-dimensional background image can be obtained.

S130, judging whether the current user collides with the target object in the scene according to the depth image of the current user and the depth data of the edge pixels of the target object.

If so, the audio corresponding to the sound to be generated when the target object is collided is inserted in S140.

Referring to fig. 3, an image processing method according to an embodiment of the invention can be implemented by the image processing apparatus 100 according to an embodiment of the invention. The image processing apparatus 100 according to the embodiment of the present invention can be used in the electronic apparatus 1000 according to the embodiment of the present invention. The image processing apparatus 100 may include a depth image acquisition assembly 10 and a processor 20. The step S110 may be implemented by the depth image capturing assembly 10, and the steps S120 to S140 may be implemented by the processor 20.

That is to say, the depth image collecting assembly 10 may be configured to obtain a depth image of a current user, and obtain a three-dimensional background image of a scene where the current user is located; the processor 20 may be configured to perform edge extraction on the three-dimensional background image to obtain depth data of edge pixels of a target object in the three-dimensional background image, determine whether the current user collides with the target object in the scene according to the depth image of the current user and the depth data of the edge pixels of the target object, and if so, insert an audio corresponding to a sound that should be emitted when the target object is collided.

For example, assuming that the target object is a water cup, the depth image capturing component 10 may obtain a depth image of the current user and obtain a three-dimensional background image of a scene where the current user is located. The processor 20 performs edge extraction on the three-dimensional background image to obtain depth data of edge pixels of a water cup in the three-dimensional background image, and judges whether the current user collides with the water cup in the scene according to the depth image of the current user and the depth data of the edge pixels of the water cup, if so, the current user turns over the water cup in the virtual scene, and at the moment, audio aiming at sound which is to be emitted when the water cup is collided can be inserted, if the water cup is collided and falls on the ground, for example, when the water cup is collided by the user and falls on the ground, broken sound and the like of the water cup falling on the ground can be emitted, and if the water cup has water, sound of water sprinkled on the ground can be emitted.

The three-dimensional background image can be understood as a depth image of the scene, the three-dimensional background image representation comprises depth information of each person or object in the background of the current user, and the depth image representation of the current user comprises depth information of the person in the background of the current user. The scene range of the three-dimensional background image is consistent with the scene range of the depth image of the current user, and each pixel in the three-dimensional background image can find the depth information corresponding to the pixel in the depth image.

It should be further noted that the scene where the current user is located may be a virtual scene, such as a virtual scene provided by the electronic device, or may be an entity scene, that is, a real scene where the current user is located.

The image processing apparatus according to the embodiment of the present invention can be applied to the electronic apparatus 1000 according to the embodiment of the present invention. That is, the electronic device 1000 according to the embodiment of the present invention may include the image processing device 100 according to the embodiment of the present invention.

In some embodiments, the electronic device 1000 may have a photographing function, and the photographing function is photographing of a depth image using the structured light principle. For example, the electronic device 1000 may be a smart phone, a platform computer, a smart helmet, smart glasses, etc.; and may be a VR (Virtual Reality technology) device, an AR (augmented Reality technology) device, or the like.

Because the acquisition of the depth image is not easily influenced by factors such as illumination, color distribution in a scene and the like, and the depth information contained in the depth image has higher accuracy, the figure edge extracted through the depth image and the object edge extracted through the three-dimensional background image are more accurate, and particularly, figure edge pixels and object edge pixels can be accurately calibrated. Further, the effect of judging whether the current user collides with the object in the background is better based on the more accurate depth data of the person edge pixels and the object edge pixels.

As an example, referring to fig. 4, in some embodiments, the step of acquiring the depth image of the current user in step S110 may include:

s1101, projecting structured light to a current user;

s1102, shooting a structured light image modulated by a current user;

and S1103, demodulating phase information corresponding to each pixel of the structured light image to obtain a depth image.

Referring again to FIG. 3, in some embodiments, the depth image acquisition assembly 10 may include a structured light projector 11 and a structured light camera 12. Step S1101 may be implemented by the structured light projector 11, and steps S1102 and S1103 may be implemented by the structured light camera 12.

That is, the structured light projector 11 may be used to project structured light to a current user; the structured light camera 12 may be configured to capture a structured light image modulated by the current user, and demodulate phase information corresponding to each pixel of the structured light image to obtain the depth image.

For example, the structured light projector 11 may project a pattern of structured light onto the face and the body of the current user, and then form a structured light image modulated by the current user on the face and the body of the current user. The structured light camera 12 captures a modulated structured light image, and demodulates the structured light image to obtain a depth image of the current user. The pattern of the structured light may be laser stripes, gray codes, sinusoidal stripes, non-uniform speckles, etc.

Referring to fig. 5, in some embodiments, the step S1103 of demodulating phase information corresponding to each pixel of the structured-light image to obtain the depth image may include:

s11031, demodulating phase information corresponding to each pixel in the structured light image;

s11032, converting the phase information into depth information;

and S11033, generating the depth image of the current user according to the depth information.

Referring to fig. 2, in some embodiments, step S11031, step S11032 and step S11033 can be implemented by the structured light camera 12.

That is, the structured light camera 12 may be further configured to demodulate phase information corresponding to each pixel in the structured light image, convert the phase information into depth information, and generate a depth image according to the depth information.

For example, the phase information of the modulated structured light is changed compared with the unmodulated structured light, and the structured light represented in the structured light image is the distorted structured light, wherein the changed phase information can represent the depth information of the object. Therefore, the structured light camera 12 first demodulates phase information corresponding to each pixel in the structured light image, and then calculates depth information according to the phase information, thereby obtaining a final depth image.

In order to make the process of acquiring depth images of the face and body of the current user according to the structure more obvious to those skilled in the art, a widely-applied raster projection technique (fringe projection technique) is taken as an example to illustrate the specific principle. The grating projection technology belongs to the field of surface structured light in a broad sense.

As shown in fig. 6(a), when the surface structured light is used for projection, firstly, a sinusoidal stripe is generated by computer programming, and the sinusoidal stripe is projected to a measured object through the structured light projector 11, then the bending degree of the stripe modulated by an object is photographed by using the structured light camera 12, and then the bending stripe is demodulated to obtain a phase, and then the phase is converted into depth information to obtain a depth image. To avoid the problem of error or error coupling, the depth image capturing assembly 10 needs to be calibrated before using the structured light to capture the depth information, and the calibration includes calibration of geometric parameters (e.g., relative position parameters between the structured light camera 12 and the structured light projector 11, etc.), calibration of internal parameters of the structured light camera 12 and internal parameters of the structured light projector 11, and so on.

Specifically, in a first step, the computer is programmed to generate sinusoidal stripes. Since the phase is acquired by using the distorted stripe, for example, the phase is acquired by using a four-step phase shifting method, four phase differences are generated here

Then the structured light projector 11 projects the four stripes onto the object to be measured (mask as shown in fig. 6 (a)) in a time-sharing manner, and the structured light camera 12 acquires the image on the left side of fig. 6(b) and reads the stripes on the reference surface as shown on the right side of fig. 6 (b).

And secondly, phase recovery is carried out. The structured light camera 12 calculates the modulated phase according to the four acquired modulated fringe patterns (i.e., structured light images), and the phase pattern obtained at this time is a truncated phase pattern. Since the result of the four-step phase-shifting algorithm is calculated by the arctan function, the phase after the light modulation of the structure is limited to between-pi, i.e. it starts again each time the modulated phase exceeds-pi, pi. The resulting phase principal value is shown in fig. 6 (c).

In the phase recovery process, the jump-canceling process is required, that is, the truncated phase is recovered to the continuous phase. As shown in fig. 6(d), the modulated continuous phase diagram is on the left and the reference continuous phase diagram is on the right.

And thirdly, subtracting the modulated continuous phase from the reference continuous phase to obtain a phase difference (namely phase information), wherein the phase difference represents the depth information of the measured object relative to the reference surface, and substituting the phase difference into a phase and depth conversion formula (wherein parameters related in the formula are calibrated), so that the three-dimensional model of the object to be measured shown in fig. 6(e) can be obtained.

It should be understood that, in practical applications, the structured light used in the embodiments of the present invention may be any pattern other than the grating, according to different application scenarios.

As a possible implementation mode, the invention can also use speckle structure light to collect the depth information of the current user.

Specifically, the method for acquiring depth information by using speckle structure light is to use a substantially flat diffraction element, wherein the diffraction element is provided with a relief diffraction structure with a specific phase distribution, and the cross section of the diffraction element is provided with a step relief structure with two or more concave-convex parts. The thickness of the substrate in the diffraction element is approximately 1 micron, the height of each step is not uniform, and the height can be in the range of 0.7-0.9 micron. The structure shown in fig. 7(a) is a partial diffraction structure of the collimating beam splitting element of the present embodiment. Fig. 7(b) is a cross-sectional side view taken along section a-a, with the abscissa and ordinate both in units of microns. Speckle patterns generated by speckle structured light are highly random and can shift pattern with distance. Therefore, before obtaining depth information using speckle structured light, firstly, a speckle pattern in a space needs to be calibrated, for example, a reference plane is taken every 1 cm within a range of 0-4 m from the structured light camera 12, 400 speckle images are saved after calibration is completed, and the smaller the calibrated interval is, the higher the accuracy of the obtained depth information is. Then, the structured light projector 11 projects the speckle structured light onto a measured object (such as a current user), and the height difference of the surface of the measured object changes the speckle pattern of the speckle structured light projected onto the measured object. After the structured light camera 12 shoots speckle patterns (i.e., structured light images) projected onto a measured object, the speckle patterns and 400 speckle images stored after previous calibration are subjected to cross-correlation operation one by one, and then 400 correlation images are obtained. The position of the measured object in the space can display a peak value on the correlation image, and the peak values are superposed together and subjected to interpolation operation to obtain the depth information of the measured object.

Since the common diffraction element diffracts the light beam to obtain a plurality of diffracted lights, the difference of the light intensity of each diffracted light beam is large, and the risk of injury to human eyes is also large. Even if the diffracted light is diffracted twice, the uniformity of the obtained light beam is low. Therefore, the effect of projecting the object to be measured by using the light beam diffracted by the ordinary diffraction element is poor. In this embodiment, the collimating beam splitting element is adopted, and the collimating beam splitting element not only has the function of collimating the non-collimated light beam, but also has the function of splitting light, that is, the non-collimated light reflected by the reflector exits a plurality of collimated light beams at different angles after passing through the collimating beam splitting element, the cross-sectional areas of the emitted collimated light beams are approximately equal, the energy fluxes are approximately equal, and further, the effect of projecting by using the scattered light diffracted by the light beams is better. Meanwhile, the laser emergent light is dispersed to each beam of light, the risk of damaging human eyes is further reduced, and compared with other uniformly-arranged structured light, the speckle structured light has the advantage that the electric quantity consumed by the speckle structured light is lower when the same collecting effect is achieved.

It should also be noted that the above implementation manner for obtaining the depth image of the current user is also applicable to obtaining the three-dimensional background image of the scene, and the description of the obtaining manner of the three-dimensional background image may refer to the description of the obtaining manner of the depth image of the current user, and is not described herein again.

Referring to fig. 8, in some embodiments, the step S130 of determining whether the current user collides with the target object in the scene according to the depth image of the current user and the depth data of the edge pixel of the target object may include:

and S1301, performing person edge extraction on the depth image of the current user to determine pixels corresponding to the edge positions of persons in the depth image of the current user.

S1302, acquiring depth data of the person edge pixels in the depth image of the current user.

S1303, when the fact that the depth data of the person edge pixels are the same as that of the target object edge pixels and the pixels are adjacent is detected, it is judged that the current user collides with the target object in the scene.

It should be noted that, in the embodiment of the present invention, when it is detected that the depth data of the person edge pixel is different from the depth data of the target object edge pixel, and/or the person edge pixel is not adjacent to the target object edge pixel, it may be determined that the current user has not collided with the target object in the scene.

Referring back to fig. 2, in some embodiments, step S1301, step S1302 and step S1303 may be implemented by the processor 20.

That is to say, the processor 20 may be further configured to perform person edge extraction on the depth image of the current user to determine pixels corresponding to edge positions of persons in the depth image of the current user, acquire depth data of the person edge pixels in the depth image of the current user, and determine whether the current user collides with a target object in the scene according to the depth data of the person edge pixels and the target object edge pixels, where it may be determined that the current user does not collide with the target object in the scene when it is detected that the depth data of the person edge pixels is different from the depth data of the target object edge pixels and/or the person edge pixels are not adjacent to the target object edge pixels; when it is detected that the depth data of the person edge pixels and the depth data of the target object edge pixels are the same and the pixels are adjacent, it can be determined that the current user collides with the target object in the scene.

As an example, the processor 20 may perform edge extraction on the depth image of the current user through a Canny operator. The core of the algorithm for edge extraction by the Canny operator mainly comprises the following steps: firstly, a 2D Gaussian filtering template is used for convolving a depth image of a current user to eliminate noise; then, obtaining the gradient value of the gray scale of each pixel by using a differential operator, calculating the gradient direction of the gray scale of each pixel according to the gradient value, and finding out adjacent pixels of the corresponding pixels along the gradient direction through the gradient direction; then, each pixel is traversed, and if the gray value of a certain pixel is not the maximum compared with the gray values of two adjacent pixels in front and back in the gradient direction, the pixel is not considered as the edge point. Therefore, pixel points at the edge position in the depth image of the current user can be determined, so that edge pixels of the person after edge extraction are obtained, and further, the depth data of the edge pixels of the person in the depth image can be obtained.

When obtaining the depth data of the person edge pixels in the depth image of the current user, the processor 20 may determine whether the current user collides with a target object in the scene according to the depth data of the person edge pixels and the target object edge pixels. For example, when the current user moves such that the person edge pixels are the same as the depth data of the object edge pixels in the scene and the pixels are adjacent, it may be determined that the current user has collided with the object in the scene.

The processor 20 may generate and provide a reminder for the collision to the current user while inserting audio for the sound that should be made when the target object is collided. As one example, the reminder information may be provided to the current user by one or more of: the mode of voice broadcasting, the mode of text display, the mode of vibration reminding, the mode of changing the color of the background edge in the display device and the like.

That is to say, when generating the reminding information for collision, the processor 20 may provide the reminding information to the current user by one or more of the following ways to remind the current user that "you collide with the target object at present and please get away from the obstacle", for example, may prompt the user that the current collision with the target object occurs by shaking a mobile phone, or prompt the user that the current collision with the target object occurs by playing a voice, or display the text in the display device by displaying the text to prompt the user that the current collision with the target object occurs, or prompt the user that the current collision with the target object occurs by changing the color of the background edge in the display device.

Referring to fig. 3 and fig. 9, an electronic device 1000 is further provided according to an embodiment of the invention. The electronic device 1000 may include the image processing device 100. The image processing apparatus 100 may be implemented using hardware and/or software. The image processing apparatus 100 may include a depth image acquisition assembly 10 and a processor 20.

Specifically, the depth image capturing assembly 10 may include a structured light projector 11 and a structured light camera 12, and the depth image capturing assembly 10 may be configured to capture depth information of a current user to obtain a depth image of the current user, and capture depth information of a scene in which the current user is located to obtain a three-dimensional background image. For example, taking the example of the depth image capture assembly 10 capturing depth information of a current user to obtain a depth image of the current user, the structured light projector 11 may be used to project structured light to the current user, wherein the structured light pattern may be a laser stripe, a gray code, a sinusoidal stripe, or a randomly arranged speckle pattern, etc. The structured light camera 12 includes an image sensor 121 and a lens 122, and the number of the lens 122 may be one or more. Image sensor 121 may be used to capture a structured light image projected onto a current user by structured light projector 11. The structured light image may be sent by the depth image capturing component 10 to the processor 20 for demodulation, phase recovery, phase information calculation, and the like to obtain the depth information of the current user. It can be understood that, for the implementation of the depth information of the scene, reference may be made to the above implementation of the depth information of the current user, and details are not described here.

In some embodiments, the image processing apparatus 100 may include an imaging device 110, and the imaging device 110 may include a depth image capturing assembly 10 and a visible light camera 111. The visible light camera 111 can be used to capture color information of the object to be photographed to obtain a color image. The functions of the visible light camera 111 and the structured light camera 12 can be realized by one camera, that is, the imaging apparatus 10 includes only one camera that can capture not only color images but also structured light images and one structured light projector 11.

Besides adopting the structured light to obtain the depth image, the depth image of the current user and the three-dimensional background image of the scene can be obtained by a binocular vision method, a time of Flight (TOF) based depth image obtaining method and the like.

The processor 20 further performs edge extraction on the three-dimensional background image to obtain depth data of edge pixels of a target object in the three-dimensional background image, performs character edge extraction on the depth image of the current user to determine pixels corresponding to the edge position of a character in the depth image of the current user, obtains depth data of the character edge pixels in the depth image of the current user, determines that the current user collides with the target object in the scene when it is detected that the depth data of the character edge pixels is the same as the depth data of the edge pixels of the target object and the pixels are adjacent to each other, generates reminding information for the collision, and provides the reminding information for the current user.

Further, the image processing apparatus 100 includes an image memory 30. The image memory 30 may be embedded in the electronic device 1000, or may be a memory independent from the electronic device 1000, and may include a Direct Memory Access (DMA) feature. The raw image data collected by the visible light camera 111 or the structured light image-related data collected by the depth image collecting assembly 10 can be transmitted to the image memory 30 for storage or buffering. Processor 20 may read the structured light image-related data from image memory 30 for processing to obtain the depth image of the current user and the three-dimensional background image of the scene. In addition, the depth image of the current user and the three-dimensional background image of the scene may also be stored in the image memory 30 for the processor 20 to invoke at any time for processing, for example, the processor 20 invokes the depth image of the current user for person edge extraction and invokes the three-dimensional background image for edge extraction of a target object in the scene. The obtained person edge pixels and the depth data of the person edge pixels may be stored in the image memory 30, and the obtained target object edge pixels in the three-dimensional background image and the depth data of the target object edge pixels may be stored in the image memory 30.

The image processing apparatus 100 may also include a display 50. The display 50 may retrieve the reminder for the collision directly from the processor 20. The display 50 displays the reminder to remind the user of a collision with a target object of the scene, please move away from the obstacle. The image processing apparatus 100 may further include an encoder/decoder 60, and the encoder/decoder 60 may encode and decode image data of the depth image of the current user, the three-dimensional background image of the scene, and the like, and the encoded image data may be saved in the image memory 30 and may be decompressed by the decoder for display before the image is displayed on the display 50. The encoder/decoder 60 may be implemented by a Central Processing Unit (CPU), a GPU, or a coprocessor. In other words, the encoder/decoder 60 may be any one or more of a Central Processing Unit (CPU), a GPU, and a coprocessor.

The image processing apparatus 100 further comprises a control logic 40. When imaging device 10 is imaging, processor 20 may perform an analysis based on data acquired by the imaging device to determine image statistics for one or more control parameters (e.g., exposure time, etc.) of imaging device 10. Processor 20 sends the image statistics to control logic 40 and control logic 40 controls imaging device 10 to determine the control parameters for imaging. Control logic 40 may include a processor and/or microcontroller that executes one or more routines (e.g., firmware). One or more routines may determine control parameters of imaging device 10 based on the received image statistics.

Referring to fig. 10, an electronic device 1000 according to an embodiment of the invention may include one or more processors 200, a memory 300, and one or more programs 310. Where one or more programs 310 are stored in memory 300 and configured to be executed by one or more processors 200. The program 310 includes instructions for performing the image processing method of any of the above embodiments.

For example, program 310 may include instructions for performing an image processing method as described in the following steps:

s110', obtaining a depth image of a current user, and obtaining a three-dimensional background image of a scene where the current user is located;

s120', performing edge extraction on the three-dimensional background image to acquire depth data of edge pixels of a target object in the three-dimensional background image;

s130', judging whether the current user collides with a target object in the scene according to the depth image of the current user and the depth data of the edge pixels of the target object;

s140', if yes, inserting an audio corresponding to a sound to be emitted when the target object is collided.

For another example, the program 310 further includes instructions for performing an image processing method as described in the following steps:

s11031', demodulating phase information corresponding to each pixel in the structured light image;

s11032', converting the phase information into depth information;

s11033', a depth image of the current user is generated according to the depth information.

As another example, program 310 also includes instructions for performing the image processing method described in the following steps:

s1301', carrying out person edge extraction on the depth image of the current user to determine pixels corresponding to the edge positions of persons in the depth image of the current user;

s1302', obtaining depth data of the figure edge pixels in the depth image of the current user;

and S1303', when the fact that the depth data of the edge pixels of the person is the same as that of the edge pixels of the target object and the pixels are adjacent is detected, it is judged that the current user collides with the target object in the scene.

The computer-readable storage medium of an embodiment of the present invention includes a computer program for use in conjunction with the image-enabled electronic device 1000. The computer program may be executed by the processor 200 to perform the image processing method of any of the above embodiments.

For example, the computer program may be executed by the processor 200 to perform the image processing method described in the following steps:

As another example, the computer program may be executable by the processor 200 to perform the image processing method described in the following steps:

s11032', converting the phase information into depth information;

As another example, the computer program may be executable by the processor 200 to perform an image processing method as described in the following steps:

In summary, the image processing method, the image processing apparatus, the electronic apparatus, and the computer-readable storage medium according to the embodiments of the present invention obtain depth data of edge pixels of a person from a depth image of a current user, obtain depth data of edge pixels of an object of a target object from a three-dimensional background image of a scene where the current user is located, and determine whether the person collides with the target object in the three-dimensional background according to the depth data of the edge pixels of the person and the edge pixels of the target object, and if so, insert an audio of a sound that should be emitted when the target object collides, that is, simulate a sound that should be emitted when the target object collides. Because the acquisition of the depth image is not easily influenced by factors such as illumination, color distribution in a scene and the like, the character region and the target object region extracted through the depth image are more accurate, and particularly, edge pixels of the character region and edge pixels of the target object region can be accurately calibrated. Further, the effect of judging whether the current user collides with the target object in the virtual scene is better based on the more accurate depth data of the figure edge pixels and the target object edge pixels; in addition, when the person collides with the target object, the image processing is further performed to simulate the sound to be emitted when the target object is collided, and the user experience is greatly improved.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. An image processing method, characterized by comprising the steps of:

acquiring a depth image of a current user, and acquiring a three-dimensional background image of a scene where the current user is located;

performing edge extraction on the three-dimensional background image to acquire depth data of edge pixels of a target object in the three-dimensional background image;

judging whether the current user collides with a target object in the scene or not according to the depth image of the current user and the depth data of the edge pixels of the target object;

if yes, inserting audio aiming at the sound which is to be emitted when the target object is collided;

wherein, the obtaining the depth image of the current user includes:

the method comprises the steps of projecting speckle structure light to a current user through a structured light projector, enabling the speckle pattern of the speckle structure light projected to the current user to change due to the height difference of the surface of the current user, shooting the speckle pattern projected to the current user through a structured light camera, then carrying out correlation operation on the speckle pattern and a plurality of speckle images which are calibrated and stored in advance one by one to obtain a plurality of correlation degree images, displaying a peak value on the correlation degree image at the position of the current user in a space, superposing the peak values, and carrying out interpolation operation to obtain a depth image of the current user.

2. The image processing method of claim 1, wherein the obtaining the depth image of the current user comprises:

projecting structured light towards the current user;

shooting a structured light image modulated by the current user;

and demodulating phase information corresponding to each pixel of the structured light image to obtain the depth image.

3. The image processing method of claim 2, wherein the demodulating phase information corresponding to each pixel of the structured-light image to obtain the depth image comprises:

demodulating phase information corresponding to each pixel in the structured light image;

converting the phase information into depth information;

and generating the depth image according to the depth information.

4. The image processing method of claim 1, wherein the determining whether the current user collides with a target object in the scene according to the depth image of the current user and the depth data of the edge pixel of the target object comprises:

performing person edge extraction on the depth image of the current user to determine pixels corresponding to the edge positions of persons in the depth image of the current user;

acquiring depth data of figure edge pixels in the depth image of the current user;

and when the fact that the depth data of the figure edge pixels are the same as the depth data of the target object edge pixels and the pixels are adjacent is detected, it is determined that the current user collides with the target object in the scene.

5. The image processing method according to any one of claims 1 to 4, wherein while the inserting of the audio for the sound that should be emitted when the target object is collided, the method further comprises:

and generating reminding information aiming at the collision and providing the reminding information to the current user.

6. The image processing method of claim 5, wherein the reminder information is provided to the current user by one or more of:

the mode of voice broadcasting, the mode of text display, the mode of vibration reminding and the mode of changing the color of the background edge in the display device.

7. An image processing apparatus characterized by comprising:

the depth image acquisition component is used for acquiring a depth image of a current user and acquiring a three-dimensional background image of a scene where the current user is located;

a processor to:

wherein, the obtaining the depth image of the current user includes:

8. The image processing apparatus of claim 7, wherein the depth image acquisition assembly comprises a structured light projector and a structured light camera, the structured light projector for projecting structured light to the current user;

the structured light camera is configured to:

shooting a structured light image modulated by the current user;

9. The image processing apparatus of claim 8, wherein the structured light camera is specifically configured to:

converting the phase information into depth information;

and generating the depth image according to the depth information.

10. The image processing apparatus of claim 7, wherein the processor is specifically configured to:

11. The image processing apparatus of any of claims 7 to 10, wherein the processor is further configured to:

and generating reminding information for collision while inserting audio for the sound which should be emitted when the target object is collided, and providing the reminding information for the current user.

12. The image processing apparatus of claim 11, wherein the processor provides the reminder information to the current user by one or more of:

13. An electronic device, comprising:

one or more processors;

a memory; and

one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the programs comprising instructions for performing the image processing method of any of claims 1 to 6.

14. A computer-readable storage medium, comprising a computer program for use in conjunction with an electronic device capable of image capture, the computer program being executable by a processor to perform the image processing method of any of claims 1 to 6.