CN108921129B

CN108921129B - Image processing method, system, medium, and electronic device

Info

Publication number: CN108921129B
Application number: CN201810809073.3A
Authority: CN
Inventors: 周志敏; 丛林; 赵辰; 李晓燕
Original assignee: Hangzhou Yixian Advanced Technology Co ltd
Current assignee: Hangzhou Yixian Advanced Technology Co., Ltd.
Priority date: 2018-07-20
Filing date: 2018-07-20
Publication date: 2021-05-14
Anticipated expiration: 2038-07-20
Also published as: CN108921129A

Abstract

The embodiment of the invention provides an image processing method which comprises the steps of determining a foreground region from an obtained depth image, determining a first region where a palm is located in a hand image based on the depth of the hand image under the condition that the foreground region contains the hand image, determining N minimum distance values from N foreground points in the first region to a background region in the depth image respectively, taking a foreground point corresponding to the maximum value of the N minimum distance values as a palm center position, wherein N is a positive integer and is not less than 2, and obtaining the hand region image based on the palm center position for judging whether a click operation occurs or not. The method solves the technical problems that the hand position detection is unstable and the accuracy is not accurate enough in the prior art, can accurately intercept the hand area image for projection control, and improves the accuracy of identification. Embodiments of the invention also provide an image processing system, a computer readable storage medium and an electronic device.

Description

Image processing method, system, medium, and electronic device

Technical Field

Embodiments of the present invention relate to the field of electronic technologies, and in particular, to an image processing method, system, medium, and electronic device.

Background

This section is intended to provide a background or context to the embodiments of the invention that are recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.

Projection interaction is a new man-machine interaction mode, namely, a projector projects an interaction interface onto an arbitrary plane, then a camera is used for detecting the behavior of a human body, and a corresponding response is made through the projector.

Generally, a camera for detecting human body behaviors is a color camera, and is easily affected by illumination and shadows, so that hand position detection is very unstable, and an image projected by a projector affects detection of a hand region, so that misjudgment is easily caused, and accuracy is not accurate enough.

At present, some solutions using a combination of a depth camera and a color camera have appeared, for example, the projection interaction is implemented by means of existing products, but these existing products can only recognize a few simple and fixed gestures, and are not suitable for being directly used in a customized projection interaction system.

Disclosure of Invention

Therefore, an improved image processing method is highly needed, which can solve the technical problems of unstable hand position detection and inaccurate precision in the prior art, and can accurately capture the hand region image for projection control, thereby improving the accuracy of identification.

In this context, embodiments of the present invention are intended to provide an image processing method, medium, system, and electronic device.

In a first aspect of an embodiment of the present invention, an image processing method is provided, where the method includes determining a foreground region from an acquired depth image, determining a first region where a palm of the hand image is located based on a depth of the hand image when the foreground region includes the hand image, determining N minimum distance values from N foreground points in the first region to a background region in the depth image, respectively, and taking a foreground point corresponding to a maximum value of the N minimum distance values as a palm center position, where N is a positive integer and N is greater than or equal to 2, and acquiring a hand region image based on the palm center position to determine whether a click operation occurs.

In an embodiment of the present invention, the determining, based on the depth of the hand graphic, the first region where the palm is located in the hand graphic when the foreground region includes the hand graphic includes obtaining the depth of the hand graphic, obtaining standard palm size information at the depth, and determining, based on the standard palm size information at the depth, the first region where the palm is located in the hand graphic when the foreground region includes the hand graphic.

In another embodiment of the invention, the standard palm dimension at depth information comprises a standard palm radius at the depth, the determining a first region where a palm is located in the hand graph based on the standard palm size information at the depth comprises determining direction information of the foreground region in the depth image, determining a second region containing the foreground region from the depth image, traversing the second region to determine a first reference point based on the direction information, wherein the first reference point satisfies that there are consecutive foreground points of a predetermined length in a first direction and a second direction perpendicular to the first direction, respectively, from the first reference point, and determining a first area where the palm is located in the hand graph in the second area based on the standard palm radius under the depth and the first reference point position.

In another embodiment of the present invention, the determining the direction information of the foreground region in the depth image includes determining an arm region based on a portion where the foreground region intersects with an edge of the depth image, and determining the direction information of the foreground region in the depth image according to a direction of the arm region.

In another embodiment of the present invention, after determining an arm region based on a portion where the foreground region intersects with an edge of the depth image, the method further includes obtaining width and height information of the arm region, and the determining a second region including the foreground region from the depth image includes determining the second region from the depth image according to the direction, width and height information of the arm region.

In another embodiment of the invention, said predetermined length is equal to or less than a standard palm radius at said depth.

In another embodiment of the invention, the method further comprises acquiring hand gestures through a deep neural network based on the hand region image, and determining click freedom degrees of one or more fingers according to the hand gestures, wherein the click freedom degrees represent the probability of click operations of the fingers.

In another embodiment of the present invention, the method further includes determining a position of a fingertip of the finger in the depth image in a case where a first probability of a click operation of the finger determined based on the click freedom is between a first preset value and a second preset value, determining a difference value between a depth of the fingertip image of the finger and a depth of a background area covered by the fingertip, and determining that a click operation occurs in a case where the difference value is smaller than a third preset value.

In another embodiment of the present invention, the method further includes performing gesture recognition on the hand gesture after a difference between the depth of the palm center position and the depth of the background image is greater than a fourth preset value for a preset time.

In a second aspect of the embodiments of the present invention, an image processing system is provided, which includes a foreground region determining module, a first region determining module, a palm position determining module, and a hand region image acquiring module. And the foreground region determining module is used for determining a foreground region from the acquired depth image. And the first area determining module is used for determining a first area where a palm in the hand graph is located based on the depth of the hand graph under the condition that the foreground area contains the hand graph. And the palm center position determining module is used for determining N minimum distance values from the N foreground points in the first area to the background area in the depth image respectively, and taking the foreground point corresponding to the maximum value in the N minimum distance values as the palm center position, wherein N is a positive integer and is more than or equal to 2. And the hand area image acquisition module is used for acquiring a hand area image based on the palm center position and judging whether click operation occurs.

In one embodiment of the present invention, the first region determination module includes a depth obtaining sub-module, a size information obtaining sub-module, and a first region determination sub-module. And the depth obtaining submodule is used for obtaining the depth of the hand graph under the condition that the foreground area contains the hand graph. And the size information obtaining submodule is used for obtaining the standard palm size information under the depth. And the first area determining submodule is used for determining a first area where the palm in the hand graph is located based on the standard palm size information under the depth.

In another embodiment of the present invention, the standard palm size information at the depth includes a standard palm radius at the depth, and the first region determination submodule includes a direction information determination unit, a second region determination unit, a first traversal unit, and a first region determination unit. A direction information determining unit, configured to determine direction information of the foreground region in the depth image. A second region determining unit configured to determine a second region including the foreground region from the depth image. And the first traversal unit is used for traversing the second area to determine a first reference point based on the direction information, wherein the first reference point meets the condition that starting from the first reference point, there are continuous foreground points with preset lengths in a first direction and a second direction perpendicular to the first direction. And the first area determining unit is used for determining a first area where the palm in the hand graph is located in the second area based on the standard palm radius under the depth and the first reference point position.

In another embodiment of the present invention, the direction information determination unit includes an arm area determination subunit and a direction information determination subunit. And the arm area determining subunit is used for determining an arm area based on the part of the foreground area, which is intersected with the edge of the depth image. And the direction information determining subunit is used for determining the direction information of the foreground area in the depth image according to the direction of the arm area.

In another embodiment of the present invention, the first region determining sub-module further includes an arm size determining unit, configured to obtain width and height information of the arm region, and the second region determining unit is configured to determine the second region from the depth image according to the direction, width and height information of the arm region.

In another embodiment of the present invention, the system further comprises a hand gesture acquisition module and a degree of freedom determination module. And the hand gesture acquisition module is used for acquiring a hand gesture through a deep neural network based on the hand region image. And the freedom degree determining module is used for determining the click freedom degree of one or more fingers according to the hand gesture, and the click freedom degree represents the probability of the finger for click operation.

In another embodiment of the present invention, the system further comprises a fingertip position determination module, a depth difference determination module, and a click operation determination module. And the fingertip position determining module is used for determining the position of the fingertip of the finger in the depth image under the condition that the first probability of the click operation of the finger determined based on the click freedom degree is between a first preset value and a second preset value. And the depth difference determining module is used for determining the difference between the depth of the fingertip image of the finger and the depth of the background area covered by the fingertip. And the click operation determining module is used for determining that click operation occurs under the condition that the difference value is smaller than a third preset value.

In another embodiment of the present invention, the system further includes a hand gesture recognition module, configured to perform gesture recognition on the hand gesture after a difference between the depth of the palm center position and the depth of the background image is greater than a fourth preset value and lasts for a preset time.

In a third aspect of embodiments of the present invention, there is provided a computer readable storage medium having stored thereon executable instructions which, when executed by a processing unit, cause the processing unit to perform a method according to any one of the above.

In a fourth aspect of embodiments of the present invention, there is provided an electronic device comprising a processing unit and a storage unit having stored thereon executable instructions that, when executed by the processing unit, cause the processing unit to perform any of the methods described above.

The method, the system, the medium and the electronic equipment can solve the technical problems that the hand position detection is unstable and the accuracy is not accurate enough in the prior art, can accurately intercept the hand area image for projection control, and improve the accuracy of identification.

Drawings

The above and other objects, features and advantages of exemplary embodiments of the present invention will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the invention are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:

fig. 1 schematically shows an application scenario of an image processing method according to an exemplary embodiment of the present invention;

FIG. 2 schematically shows a flow chart of an image processing method according to an exemplary embodiment of the present invention;

FIG. 3A schematically illustrates a flow chart for determining a first region of the hand graphic where a palm is located based on a depth of the hand graphic, according to an exemplary embodiment of the present invention;

FIG. 3B schematically shows a schematic view of a depth image according to an exemplary embodiment of the present invention;

FIG. 3C schematically illustrates a diagram of standard palm size information at different depths according to an exemplary embodiment of the invention;

FIG. 4 schematically illustrates a flow chart for determining a first region of the hand graphic where the palm is located based on standard palm dimension information at the depth, according to an exemplary embodiment of the present invention;

FIG. 5A schematically shows a flow chart for determining directional information of the foreground region in the depth image according to an exemplary embodiment of the present invention;

fig. 5B schematically shows a flow chart for determining directional information of the foreground region in the depth image and for determining a second region containing the foreground region from the depth image according to another exemplary embodiment of the present invention;

FIG. 6 schematically shows a flow chart of an image processing method according to another exemplary embodiment of the present invention;

FIG. 7 schematically shows a block diagram of an image processing system according to an exemplary embodiment of the present invention;

FIG. 8 schematically illustrates a block diagram of a first region determination module according to an exemplary embodiment of the present invention;

FIG. 9 schematically illustrates a block diagram of a first region determination submodule, according to an exemplary embodiment of the present invention;

fig. 10 schematically shows a block diagram of a direction information determining unit according to an exemplary embodiment of the present invention;

FIG. 11 schematically shows a block diagram of an image processing system according to another exemplary embodiment of the present invention;

FIG. 12 schematically illustrates a computer-readable storage medium suitable for implementing an image processing method and system according to exemplary embodiments of the invention; and

fig. 13 schematically illustrates an electronic device diagram suitable for implementing an image processing method and system according to an exemplary embodiment of the present invention.

In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.

Detailed Description

The principles and spirit of the present invention will be described with reference to a number of exemplary embodiments. It is understood that these embodiments are given solely for the purpose of enabling those skilled in the art to better understand and to practice the invention, and are not intended to limit the scope of the invention in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

As will be appreciated by one skilled in the art, embodiments of the present invention may be embodied as a system, apparatus, device, method, or computer application. Accordingly, the present disclosure may be embodied in the form of: entirely hardware, entirely software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.

According to the embodiment of the invention, an image processing method, an image processing system and an electronic device are provided.

In this document, it is to be understood that any number of elements in the figures are provided by way of illustration and not limitation, and any nomenclature is used for differentiation only and not in any limiting sense.

The principles and spirit of the present invention are explained in detail below with reference to several representative embodiments of the invention.

Summary of The Invention

The inventor finds that the camera for detecting human body behaviors is a color camera, is easily influenced by illumination and shadow, makes hand position detection unstable, and images shot by a projector influence detection of hand regions, easily causes misjudgment and is not accurate enough. Some schemes that combine depth cameras and color cameras can only recognize a few simple and fixed gestures, and are not suitable for direct use in customized projection interactive systems. The method provided by the embodiment of the invention can accurately identify the region where the palm is located based on the depth of the hand image, and can improve the accuracy of capturing the hand region image and the accuracy of identification control operation by determining the palm center position in the region.

Having described the general principles of the invention, various non-limiting embodiments of the invention are described in detail below.

Application scene overview

Referring first to fig. 1, fig. 1 schematically illustrates an application scenario of an image processing method according to an exemplary embodiment of the present invention. It should be noted that fig. 1 is only an example of an application scenario in which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, but does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.

In the scenario shown in fig. 1, the electronic device 110 may for example comprise a projection device for projecting an image onto the plane 120, and the user uses the hand 130 to perform an action on the projection area for controlling the projection. The electronic device 110 may further comprise an image acquisition arrangement comprising a depth image acquisition arrangement, which may be used for acquiring an image comprising depth information according to an exemplary embodiment of the present invention. The electronic device 110 controls the projection content by recognizing the movement of the hand 130 in the area, and may be used to recognize whether or not a click operation is performed on the plane 120 by the hand 130, for example, and when a click operation is performed, the projection content is controlled according to the position where the click operation is performed.

The inventor finds that the unstable palm center position causes large regression error and even regression error of the hand posture based on deep learning. The current academic solution is to use another network to return to the palm position, which directly leads to inefficiency.

Exemplary method

An image processing method according to an exemplary embodiment of the present invention is described below with reference to fig. 2 in conjunction with the application scenario of fig. 1. It should be noted that the above application scenarios are merely illustrated for the convenience of understanding the spirit and principles of the present invention, and the embodiments of the present invention are not limited in this respect. Rather, embodiments of the present invention may be applied to any scenario where applicable.

Fig. 2 schematically shows a flowchart of an image processing method according to an exemplary embodiment of the present invention.

Before implementing the method of the exemplary embodiment of the present invention, the projection device and the depth image capturing device may be first calibrated, including the internal reference and distortion of the depth image capturing device, the internal reference and distortion of the projection device, and the external reference of the depth camera to the projector, etc.

In addition, the ambient depth may be continuously captured and averaged to be the average background depth. To remove noise, the depth data may be filtered using 3 × 3 checks. Due to the self-shielding of the object, a hole (namely, the position of the depth value which cannot be measured is 0) can appear in the depth image, and the problem of partial black holes can be effectively solved by utilizing a field filling algorithm.

As shown in fig. 2, the image processing method of the exemplary embodiment of the present invention includes steps S210 to S240.

In step S210, a foreground region is determined from the acquired depth image. According to an exemplary embodiment of the invention, an electronic device for performing the method may comprise a depth image acquisition means, for example a depth camera, for acquiring a depth image. According to an exemplary embodiment of the present invention, an electronic device for executing the method may determine a foreground region and a background region in an acquired image including depth information according to the depth information. For example, in the scene illustrated in fig. 1, the foreground region is a region of the hand and a part of the arm of the user, and the background region is a region of the plane 120 that is not covered by the hand and the part of the arm of the user. According to an exemplary embodiment of the present invention, in the case that the average background depth is obtained, the foreground region and the background region may also be determined based on the average background depth.

In step S220, when the foreground region includes a hand graphic, a first region where the palm of the hand graphic is located is determined based on the depth of the hand graphic.

Operation S220 of the exemplary embodiment of the present invention is described below with reference to fig. 3A to 3C and fig. 4.

Fig. 3A schematically illustrates a flow chart for determining a first region of the hand graphic where the palm is located based on the depth of the hand graphic according to an exemplary embodiment of the present invention. Fig. 3B schematically shows a schematic view of a depth image 300 according to an exemplary embodiment of the present invention.

As shown in fig. 3A, the method includes steps S310 to S330.

In step S310, if the foreground region 310 includes the hand graphics 311, the depth of the hand graphics 311 is obtained. According to the exemplary embodiment of the present invention, in the case where it is determined that the foreground region 310 includes the hand graphic 311 through hand recognition, the depth of the hand is obtained from the depth image 300.

In step S320, standard palm size information at the depth is obtained. According to an exemplary embodiment of the present invention, it is possible to obtain real size information of a standard palm of an average adult, for example, a diameter d or a radius r of the palm, and then convert the real size information to pixel size information in an image at the depth as standard palm size information at the depth. Or root of Chinese characterAnd directly acquiring the pixel size information of the standard palm at the depth according to the existing data. Referring to FIG. 3C, FIG. 3C schematically illustrates different standard palm dimension information, such as palm radius r, at different depths according to an exemplary embodiment of the invention₁、r₂、r₃、r₄Wherein the deeper the depth, the smaller the palm radius.

In step S330, a first area 330 where the palm in the hand graph is located is determined based on the standard palm size information at the depth. The method quotes the standard palm size information under the depth, so that the area where the palm is located can be accurately identified, and meanwhile, the possibility of identifying other objects as the palm can be reduced, for example, an arm cannot be identified as a part of the palm.

Fig. 4 schematically shows a flowchart for determining a first region in the hand graph where the palm is located based on the standard palm dimension information at the depth according to an exemplary embodiment of the present invention.

As shown in fig. 4, the method includes steps S410 to S440.

In step S410, the direction information of the foreground region 310 in the depth image 300 is determined. Step S410 is described below with reference to fig. 5A.

Fig. 5A schematically shows a flowchart for determining directional information of the foreground region in the depth image according to an exemplary embodiment of the present invention.

As shown in fig. 5A, the method includes steps S510 and S520.

In step S510, an arm region is determined based on a portion where the foreground region intersects with the edge of the depth image.

In step S520, the direction information of the foreground region in the depth image is determined according to the direction of the arm region.

For example, in the scene illustrated in fig. 3B, the depth image 300 may be a rectangle, the arm of the user may extend from one of four sides of the rectangle into the area of the projected image, the hand and the arm of the user intersect at least one side of the depth image 300, and the direction information may be direction information determined according to the side where the intersection point is located.

Reference is made back to fig. 4. In step S420, a second region 320 containing the foreground region 310 is cut out from the depth image 300. For example, the second region 320 may be truncated according to the depth information, and in the case where the foreground region 310 contains the hand graphics 311, the second region 320 may be a minimum bounding rectangular region containing the foreground region 310. Since the hand is connected with the arm, the second area usually also contains the area of the arm, and the foreground point with the largest background point distance is directly found on the second area and is usually not the palm center point.

Fig. 5B schematically shows a flow chart for determining directional information of the foreground region in the depth image and for determining a second region containing the foreground region from the depth image according to another exemplary embodiment of the present invention.

As shown in fig. 5B, the method includes steps S510, S530, and S540.

In step S530, the width and height information of the arm region is obtained.

In step S540, the second region is determined from the depth image according to the direction, width and height information of the arm region.

According to an exemplary embodiment of the present invention, one or more suspected arm regions may be determined based on a portion of each of the foreground regions that intersects with the edge of the depth image. Based on the depth of the foreground region and the width and height of the suspected arm regions, a portion of non-arm regions may be excluded, and an arm region may be determined from the plurality of suspected arm regions. And under the condition that the foreground area contains a hand graph and an arm area, determining a first area where a palm is located in the hand graph based on the depth of the hand graph.

According to the method, the arms connected with the hands are judged, so that the images of the non-hands are further eliminated, the possibility of error identification is reduced, and the calculation efficiency is improved.

Reference is made back to fig. 4. In step S430, based on the direction information, the second area 320 is traversed to determine a first reference point B, where the first reference point B satisfies that there are consecutive foreground points of a predetermined length in a first direction and a second direction perpendicular to the first direction, respectively, from the first reference point.

According to an exemplary embodiment of the invention, the predetermined length is equal to or less than a standard palm radius at the depth.

According to an exemplary embodiment of the invention, after traversing the second area 320 to determine the first reference point B based on the direction information, the method further comprises, in case the first reference point is not found or the first reference point and the second area satisfy a predetermined condition, traversing the second area 320 to determine a second reference point, wherein the second reference point satisfies that there is a further predetermined length of consecutive foreground points in the first direction and a second direction perpendicular to the first direction, respectively, starting from the second reference point, the further predetermined length being smaller than the previous predetermined length. In some cases, the actual palm in the image may not satisfy consecutive foreground points having a standard palm radius length in a first direction and a second direction perpendicular to the first direction, respectively, due to palm tilt, etc., and the standard may be lowered to set another predetermined length to a length less than the standard palm radius, such as 2/3 of the palm radius.

In step S440, a first region 330 where the palm is located in the hand graph is determined in the second region 320 based on the standard palm radius at the depth and the position of the first reference point B. Since the first reference point B is a point on the hand graph, the first area 330 where the palm is located can be determined more accurately by combining the standard palm radius or diameter information. In this case, the first region 330 usually does not include a large amount of arm region.

For example, when the arm extends from the lower part of the second area 320, the arm can be moved from the pixel point A (a) at the upper left corner of the second area 320_x，a_y) Starts traversing until the first time point B is found (B)_x，b_y) Starting from point B, r foreground points with standard palm radius are arranged in the x direction and r foreground points are arranged in the y direction, whereinThe x direction is not limited to the horizontal direction, and the y direction is perpendicular to the x direction. At this time, the first region may be determined as (a)_x，a_y，b_x-a_x+2r，b_y-a_y+2r), wherein (a)_x，a_y) Indicating the starting position of the rectangular region, b_x-a_x+2r denotes the width of the rectangular area, b_y-a_y+2r denotes the height of the rectangular area.

According to the exemplary embodiment of the invention, in the process of traversing pixel points, for any pixel point, the x direction and the y direction are adjusted at the same time, and whether a group of x direction and y direction exist is determined, so that the requirement that continuous foreground points with preset lengths exist in the x direction and continuous foreground points with preset lengths also exist in the y direction is met. If yes, determining the pixel point as a first reference point, determining the x direction and the y direction at the same time, and ending the traversal process.

Reference is made back to fig. 2. In step S230, N minimum distance values from the N foreground points in the first region 330 to the background region in the depth image are determined, and a foreground point corresponding to a maximum value of the N minimum distance values is used as a palm center position, where N is a positive integer and N is greater than or equal to 2. According to the exemplary embodiment of the present invention, after the first region is accurately captured, the foreground point with the largest distance to the background point can be used as the palm position.

In step S240, a hand region image is acquired based on the palm position for determining whether a click operation occurs. After the palm center position is determined, a hand region image is acquired based on the palm center position, and the hand region image may be the same as or different from the image of the first region 330, and in general, the hand region image and the first region are not the same. The method can accurately identify the region where the palm is located based on the depth of the hand image, and can improve the accuracy of capturing the hand region image and the accuracy of identification control operation by determining the palm center position in the region.

Fig. 6 schematically shows a flowchart of an image processing method according to another exemplary embodiment of the present invention.

As shown in fig. 6, the method may further include step S610 and step S620 on the basis of the foregoing embodiment.

In step S610, a hand pose is acquired by a deep neural network based on the hand region image.

In step S620, a click freedom of one or more fingers is determined according to the hand gesture, and the click freedom represents a probability of a click operation of the fingers.

In the manufacturing process of the data set, three-dimensional coordinate information (x, y, z) of 21 nodes (four nodes for each finger and a palm node) of the hand is marked; for each fingertip node, information T of whether or not the fingertip is in contact with the desktop is attached, where T ═ 1 indicates that the fingertip is in contact with the desktop, and T ═ 0 indicates that the fingertip is not in contact with the desktop.

In the design process of the neural network, the degree of freedom of the output number of the last full connection layer is also increased by 5. In practical application, the network can not only recover the three-dimensional position of each node, but also output probability information of whether 5 finger nodes contact the desktop.

According to the method, on the basis of recognizing the hand gesture based on the neural network, the special output freedom degree of the clicking operation of each finger is increased in the neural network, and the accuracy of judging the clicking operation can be improved.

As shown in FIG. 6, the method may further include steps S630-650.

In step S630, in a case where a first probability that a click operation occurs for the finger determined based on the click degree of freedom is between a first preset value and a second preset value, a position of a fingertip of the finger in the depth image is determined. For example, the first preset value is 0.4, the second preset value is 0.6, and when the first probability of the finger to perform the click operation, which is determined based on the output degree of freedom of the finger click, is less than 0.4 or greater than 0.6, it may be directly determined whether the finger of the user performs the click operation. And when the position of the fingertip is between 0.4 and 0.6, the position of the fingertip in the depth image is acquired. According to an exemplary embodiment of the present invention, the position may also be obtained from a neural network, for example, the position information of the node of the fingertip is determined from the output of 21 nodes.

In step S640, the difference between the depth of the fingertip image of the finger and the depth of the background region covered by the fingertip is determined.

In step S650, in the case that the difference is smaller than the third preset value, it is determined that the click operation occurs.

The method further confirms whether the clicking operation occurs or not by combining the depth information of the background area covered by the finger tip in the depth image.

According to an exemplary embodiment of the present invention, the method further includes performing gesture recognition on the hand gesture after a difference between the depth of the palm center position and the depth of the background image is greater than a fourth preset value for a preset time.

According to the height P of the palm_H(i.e., the difference between the depth of the palm center position and the depth of the background image) and a threshold height T_H(fourth preset value), controlling whether the system is in an air gesture recognition stage or a desktop click recognition state at present. When the palm is still at a certain position greater than T_HAfter a preset time of height, for example 5 seconds, the recognition of the gesture in the air starts, including some static gestures and recognition of a left swing and a right swing.

The embodiments disclosed by the invention can be combined at will or simply transformed to obtain the required processing strategy so as to realize better technical effect.

Exemplary devices

Having described the method of an exemplary embodiment of the present invention, an image processing system of an exemplary embodiment of the present invention is next described with reference to fig. 7.

Fig. 7 schematically shows a block diagram of an image processing system 700 according to an exemplary embodiment of the present invention.

As shown in fig. 7, the image processing system 700 includes a foreground region determining module 710, a first region determining module 720, a palm position determining module 730, and a hand region image acquiring module 740.

The foreground region determining module 710, for example, performs operation S210 described above with reference to fig. 2, for determining a foreground region from the acquired depth image.

The first region determining module 720, for example, performs the operation S220 described above with reference to fig. 2, to determine, based on the depth of the hand graphic, a first region where the palm is located in the hand graphic in the case that the foreground region includes the hand graphic.

The palm position determining module 730, for example, executes the operation S230 described above with reference to fig. 2, and is configured to determine N minimum distance values from N foreground points in the first area to a background area in the depth image, and take a foreground point corresponding to a maximum value of the N minimum distance values as a palm position, where N is a positive integer and N ≧ 2.

The hand region image obtaining module 740, for example, performs the operation S240 described above with reference to fig. 2, and is configured to obtain a hand region image based on the palm position, and is configured to determine whether a click operation occurs.

Fig. 8 schematically illustrates a block diagram of the first region determining module 720 according to an exemplary embodiment of the present invention.

As shown in fig. 8, the first region determining module 720 includes a depth obtaining sub-module 810, a size information obtaining sub-module 820, and a first region determining sub-module 830.

The depth obtaining sub-module 810, for example, performs the operation S310 described above with reference to fig. 3A, for obtaining the depth of the hand graphics in the case that the foreground region contains the hand graphics.

The size information obtaining sub-module 820, for example, performs the operation S320 described above with reference to fig. 3A, for obtaining the depth of the hand figure in the case that the foreground region contains the hand figure.

The first region determining sub-module 830, for example, performs the operation S330 described above with reference to fig. 3A, to determine a first region where the palm is located in the hand graph based on the standard palm size information at the depth.

Fig. 9 schematically illustrates a block diagram of the first region determination submodule 830 according to an exemplary embodiment of the present invention.

As shown in fig. 9, the first region determining sub-module 830 includes a direction information determining unit 910, a second region determining unit 920, a first traversal unit 930, and a first region determining unit 940.

The direction information determining unit 910, for example, performs operation S410 described above with reference to fig. 4, for determining direction information of the foreground region in the depth image.

The second region determining unit 920, for example, performs operation S420 described above with reference to fig. 4, for determining a second region including the foreground region from the depth image.

The first traversal unit 930 performs, for example, the operation S430 described above with reference to fig. 4, for traversing the second area to determine a first reference point based on the direction information, where the first reference point satisfies, starting from the first reference point, consecutive foreground points having a predetermined length in a first direction and a second direction perpendicular to the first direction, respectively.

The first region determining unit 940, for example, performs the operation S440 described above with reference to fig. 4, to determine the first region where the palm is located in the hand graph in the second region based on the standard palm radius at the depth and the first reference point position.

Fig. 10 schematically shows a block diagram of the direction information determining unit 910 according to an exemplary embodiment of the present invention.

As shown in fig. 10, the direction information determination unit 910 includes an arm area determination subunit 1010 and a direction information determination subunit 1020.

The arm area determining subunit 1010, for example, performs operation S510 described above with reference to fig. 5A, for determining an arm area based on a portion where the foreground area intersects with the edge of the depth image.

The direction information determining subunit 1020, for example, performs operation S520 described above with reference to fig. 5A, to determine the direction information of the foreground region in the depth image according to the direction of the arm region.

According to an exemplary embodiment of the present invention, the first region determining sub-module further includes an arm size determining unit for obtaining width and height information of the arm region; the second area determining unit is used for determining the second area from the depth image according to the direction, the width and the height information of the arm area.

Fig. 11 schematically shows a block diagram of an image processing system 1100 according to another exemplary embodiment of the present invention.

As shown in fig. 11, the image processing system 1100 may further include a hand gesture obtaining module 1110 and a degree of freedom determining module 1120 based on the foregoing embodiments.

The hand gesture obtaining module 1110, for example, performs operation S610 described above with reference to fig. 6, for obtaining a hand gesture through the deep neural network based on the hand region image.

The degree-of-freedom determination module 1120, for example, performs operation S620 described above with reference to fig. 6, and is configured to determine, according to the hand gesture, a degree of freedom of click of one or more fingers, where the degree of freedom of click represents a probability of a click operation of the finger.

As shown in fig. 11, the image processing system 1100 may further include a fingertip position determination module 1130, a depth difference determination module 1140, and a click operation determination module 1150.

The fingertip position determination module 1130, for example, performs operation S630 described above with reference to fig. 6, and is configured to determine the position of the fingertip of the finger in the depth image if the first probability of the finger being clicked, which is determined based on the click freedom, is between a first preset value and a second preset value.

The depth difference determining module 1140, for example, performs the operation S640 described above with reference to fig. 6, to determine the difference between the depth of the fingertip image of the finger and the depth of the background area covered by the fingertip.

The click operation determining module 1150, for example, performs the operation S650 described above with reference to fig. 6, for determining that a click operation occurs if the difference is smaller than a third preset value.

According to an exemplary embodiment of the present invention, the system further includes a hand gesture recognition module for performing gesture recognition on the hand gesture after a difference between the depth of the palm center position and the depth of the background image is greater than a fourth preset value for a preset time.

Any number of modules, sub-modules, units, sub-units, or at least part of the functionality of any number thereof according to embodiments of the present disclosure may be implemented in one module. Any one or more of the modules, sub-modules, units, and sub-units according to the embodiments of the present disclosure may be implemented by being split into a plurality of modules. Any one or more of the modules, sub-modules, units, sub-units according to embodiments of the present disclosure may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in any other reasonable manner of hardware or firmware by integrating or packaging a circuit, or in any one of or a suitable combination of software, hardware, and firmware implementations. Alternatively, one or more of the modules, sub-modules, units, sub-units according to embodiments of the disclosure may be at least partially implemented as a computer program module, which when executed may perform the corresponding functions.

For example, any of the foreground region determining module 710, the first region determining module 720, the palm position determining module 730, the hand region image acquiring module 740, the depth obtaining sub-module 810, the size information obtaining sub-module 820, the first region determining sub-module 830, the direction information determining unit 910, the second region determining unit 920, the first traversal unit 930, the first region determining unit 940, the arm region determining sub-unit 1010, the direction information determining sub-unit 1020, the arm size determining unit, the hand posture acquiring module 1110, the degree of freedom determining module 1120, the fingertip position determining module 1130, the depth difference determining module 1140, the click operation determining module 1150, and the hand posture identifying module may be implemented in one module, or any one of them may be split into a plurality of modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of the other modules and implemented in one module. According to an embodiment of the present disclosure, at least one of the foreground region determining module 710, the first region determining module 720, the palm position determining module 730, the hand region image acquiring module 740, the depth obtaining sub-module 810, the size information obtaining sub-module 820, the first region determining sub-module 830, the direction information determining unit 910, the second region determining unit 920, the first traversal unit 930, the first region determining unit 940, the arm region determining sub-unit 1010, the direction information determining sub-unit 1020, the arm size determining unit, the hand gesture acquiring module 1110, the degree of freedom determining module 1120, the fingertip position determining module 1130, the depth difference determining module 1140, the click operation determining module 1150, and the hand gesture recognizing module may be at least partially implemented as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a palm position determining module 730, a hand region image acquiring module 740, a depth obtaining sub-module 810, A system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or any other reasonable manner of integrating or packaging a circuit, etc., or in any one of or a suitable combination of software, hardware, and firmware. Alternatively, at least one of the foreground region determining module 710, the first region determining module 720, the palm position determining module 730, the hand region image acquiring module 740, the depth acquiring sub-module 810, the size information acquiring sub-module 820, the first region determining sub-module 830, the direction information determining unit 910, the second region determining unit 920, the first traversal unit 930, the first region determining unit 940, the arm region determining sub-unit 1010, the direction information determining sub-unit 1020, the arm size determining unit, the hand gesture acquiring module 1110, the degree of freedom determining module 1120, the fingertip position determining module 1130, the depth difference determining module 1140, the click operation determining module 1150, and the hand gesture recognizing module may be at least partially implemented as a computer program module that, when executed, may perform a corresponding function.

Exemplary Medium

An exemplary embodiment of the present invention provides a computer-readable storage medium storing computer-executable instructions that, when executed by a processing unit, are configured to implement an image processing method according to any one of the above-described method embodiments.

In some possible embodiments, aspects of the present invention may also be implemented in the form of a program product including program code for causing an electronic device to perform the steps of the image processing method according to various exemplary embodiments of the present invention described in the above section "exemplary method" of this specification when the program product is run on the electronic device, for example, the electronic device may perform step S210 as shown in fig. 2: determining a foreground region from the acquired depth image; step S220: under the condition that the foreground area contains a hand graph, determining a first area where a palm in the hand graph is located based on the depth of the hand graph; step S230: determining N minimum distance values from N foreground points in the first area to a background area in the depth image respectively, and taking a foreground point corresponding to the maximum value in the N minimum distance values as a palm center position, wherein N is a positive integer and is more than or equal to 2; step S240: and acquiring a hand area image based on the palm center position for judging whether click operation occurs.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

As shown in fig. 12, a program product 1200 for implementing an image processing method and system according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on an electronic device. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the consumer electronic device, partly on a remote electronic device, or entirely on the remote electronic device or server. In the case of remote electronic devices, the remote electronic devices may be connected to the consumer electronic devices through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to external electronic devices (e.g., through the internet using an internet service provider).

Exemplary electronic device

Having described the method, medium, and apparatus of exemplary embodiments of the present invention, an electronic device of exemplary embodiments of the present invention is described next with reference to fig. 13.

Those skilled in the art will appreciate that aspects of the present invention may be implemented as a system, method, or application. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects may be referred to herein generally as a "circuit," module, "" system "or" unit.

In some possible embodiments, an electronic device according to the invention may comprise at least one processing unit, and at least one memory unit. Wherein the storage unit stores program code which, when executed by the processing unit, causes the processing unit to perform the steps in the image processing method according to various exemplary embodiments of the present invention described in the above-mentioned "exemplary methods" section of this specification. For example, the processing unit may perform step S210 as shown in fig. 2: determining a foreground region from the acquired depth image; step S220: under the condition that the foreground area contains a hand graph, determining a first area where a palm in the hand graph is located based on the depth of the hand graph; step S230: determining N minimum distance values from N foreground points in the first area to a background area in the depth image respectively, and taking a foreground point corresponding to the maximum value in the N minimum distance values as a palm center position, wherein N is a positive integer and is more than or equal to 2; step S240: and acquiring a hand area image based on the palm center position for judging whether click operation occurs.

An electronic apparatus according to this embodiment of the present invention is described below with reference to fig. 13. The electronic device 1300 shown in fig. 13 is only an example and should not bring any limitations to the function and scope of use of the embodiments of the present invention.

As shown in fig. 13, the electronic device 1300 is represented in the form of a general electronic device. The components of the electronic device 1300 may include, but are not limited to: the at least one processing unit 1310, the at least one memory unit 1320, and the bus 1330 connecting the various system components including the memory unit 1320 and the processing unit 1310.

Bus 1330 may include a data bus, an address bus, and a control bus.

The memory unit 1320 may include volatile memory, such as Random Access Memory (RAM)1321 and/or cache memory 1322, and may further include Read Only Memory (ROM) 1323.

Storage 1320 may also include a program/utility 1325 having a set (at least one) of program modules 1324, such program modules 1324 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

The electronic device 1300 may also communicate with one or more external devices 1340 (e.g., keyboard, pointing device, bluetooth device, etc.), which may be through an input/output (I/O) interface 1350. Also, the electronic device 1300 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) through the network adapter 1360. As shown, the network adapter 1360 communicates with other modules of the electronic device 1300 via the bus 1330. It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with the electronic device 1300, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

It should be noted that although in the above detailed description several units/modules of the image processing system are mentioned, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the units/modules described above may be embodied in one unit/module according to embodiments of the invention. Conversely, the features and functions of one unit/module described above may be further divided into embodiments by a plurality of units/modules.

Moreover, while the operations of the method of the invention are depicted in the drawings in a particular order, this does not require or imply that the operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.

While the spirit and principles of the invention have been described with reference to several particular embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, nor is the division of aspects, which is for convenience only as the features in such aspects may not be combined to benefit. The invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims

1. An image processing method comprising:

determining a foreground region from the acquired depth image;

under the condition that the foreground region contains a hand graph, obtaining standard palm size information under the depth based on the depth of the hand graph, and determining a first region where a palm in the hand graph is located based on the standard palm size information under the depth;

determining N minimum distance values from N foreground points in the first area to a background area in the depth image respectively, and taking a foreground point corresponding to the maximum value of the N minimum distance values as a palm center position, wherein N is a positive integer and is more than or equal to 2;

acquiring a hand area image based on the palm center position, and judging whether a click operation occurs;

wherein determining a first region where the palm is located in the hand graph based on the standard palm dimension information at the depth comprises:

determining direction information of the foreground region in the depth image;

determining a second region from the depth image that includes the foreground region;

traversing the second area to determine a first reference point based on the direction information, wherein the first reference point satisfies that, starting from the first reference point, there are continuous foreground points of a predetermined length in a first direction and a second direction perpendicular to the first direction, respectively; and

and determining a first area where the palm is located in the hand graph in the second area based on the standard palm radius under the depth and the first reference point position.

2. The method of claim 1, wherein, in the case that the foreground region contains a hand graphic, further comprising, prior to the obtaining standard palm size information at a depth based on the depth of the hand graphic: and obtaining the depth of the hand graph.

3. The method of claim 1, wherein the determining directional information of the foreground region in the depth image comprises:

determining an arm area based on a part of the foreground area intersected with the depth image edge;

and determining the direction information of the foreground region in the depth image according to the direction of the arm region.

4. The method of claim 3, wherein:

after determining an arm region based on a portion of the foreground region intersecting the depth image edge, the method further includes obtaining width and height information of the arm region;

the determining the second region including the foreground region from the depth image comprises determining the second region from the depth image according to the direction, width and height information of the arm region.

5. The method of claim 1, wherein the predetermined length is equal to or less than a standard palm radius at the depth.

6. The method of claim 1, further comprising:

acquiring a hand gesture through a depth neural network based on the hand region image;

and determining the click freedom degree of one or more fingers according to the hand gesture, wherein the click freedom degree represents the probability of the finger for click operation.

7. The method of claim 6, further comprising:

determining the position of the fingertip of the finger in the depth image under the condition that the first probability of the finger to generate the clicking operation, which is determined based on the clicking freedom degree, is between a first preset value and a second preset value;

determining the difference value of the depth of the fingertip image of the finger and the depth of a background area covered by the fingertip; and

and under the condition that the difference value is smaller than a third preset value, determining that the clicking operation occurs.

8. The method of claim 6, further comprising:

and after the difference between the depth of the palm center position and the depth of the background image is greater than a fourth preset value and lasts for a preset time, performing gesture recognition on the hand gesture.

9. An image processing system comprising:

the foreground region determining module is used for determining a foreground region from the acquired depth image;

the first area determining module is used for obtaining standard palm size information under the depth based on the depth of the hand graph under the condition that the foreground area contains the hand graph, and determining a first area where a palm in the hand graph is located based on the standard palm size information under the depth;

the palm center position determining module is used for determining N minimum distance values from N foreground points in the first area to a background area in the depth image respectively, and taking a foreground point corresponding to the maximum value in the N minimum distance values as a palm center position, wherein N is a positive integer and is more than or equal to 2;

the hand area image acquisition module is used for acquiring a hand area image based on the palm position and judging whether click operation occurs or not;

wherein the first region determining module comprises: a direction information determining unit, configured to determine direction information of the foreground region in the depth image; a second region determining unit configured to determine a second region including the foreground region from the depth image; a first traversal unit, configured to traverse the second area to determine a first reference point based on the direction information, where the first reference point satisfies that, starting from the first reference point, there are consecutive foreground points of a predetermined length in a first direction and a second direction perpendicular to the first direction, respectively; and the first area determining unit is used for determining a first area where the palm in the hand graph is located in the second area based on the standard palm radius under the depth and the first reference point position.

10. The system of claim 9, wherein the first region determination module comprises:

a depth obtaining sub-module, configured to, in a case that the foreground region includes a hand graphic, obtain a depth of the hand graphic before obtaining standard palm size information at the depth based on the depth of the hand graphic.

11. The system of claim 9, wherein the direction information determining unit comprises:

an arm region determining subunit, configured to determine an arm region based on a portion where the foreground region intersects with the depth image edge;

and the direction information determining subunit is used for determining the direction information of the foreground area in the depth image according to the direction of the arm area.

12. The system of claim 11, wherein:

the first area determining submodule further comprises an arm size determining unit used for obtaining width and height information of the arm area;

the second region determining unit is used for determining the second region from the depth image according to the direction, width and height information of the arm region.

13. The system of claim 9, wherein the predetermined length is equal to or less than a standard palm radius at the depth.

14. The system of claim 9, further comprising:

the hand gesture acquisition module is used for acquiring hand gestures through a deep neural network based on the hand region image;

and the freedom degree determining module is used for determining the click freedom degree of one or more fingers according to the hand gesture, and the click freedom degree represents the probability of the finger for click operation.

15. The system of claim 14, further comprising:

a fingertip position determining module, configured to determine a position of a fingertip of the finger in the depth image if a first probability of a click operation of the finger, which is determined based on the click freedom, is between a first preset value and a second preset value;

the depth difference value determining module is used for determining the difference value between the depth of the fingertip image of the finger and the depth of the background area covered by the fingertip; and

and the click operation determining module is used for determining that click operation occurs under the condition that the difference value is smaller than a third preset value.

16. The system of claim 14, further comprising:

and the hand gesture recognition module is used for performing gesture recognition on the hand gesture after the difference between the depth of the palm center position and the depth of the background image is greater than a fourth preset value and lasts for a preset time.

17. A computer-readable storage medium having stored thereon executable instructions that, when executed by a processing unit, cause the processing unit to perform the method of any one of claims 1-8.

18. An electronic device, comprising:

a processing unit; and

a storage unit having stored thereon executable instructions that, when executed by the processing unit, cause the processing unit to perform the method of any one of claims 1-8.