CN109753859B

CN109753859B - Device and method for detecting human body component in image and image processing system

Info

Publication number: CN109753859B
Application number: CN201711089515.3A
Authority: CN
Inventors: 赵东悦; 黄耀海; 陈存建
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2017-11-08
Filing date: 2017-11-08
Publication date: 2023-10-24
Anticipated expiration: 2037-11-08
Also published as: CN109753859A

Abstract

The invention discloses a device and a method for detecting human body parts in an image and an image processing system. The device comprises: a unit configured to acquire an image including a human body; a unit configured to detect an object from a region including the human body; a unit configured to determine a relative relationship between a detected object and a human body part to be detected of the human body; and a unit configured to detect the human body part based on the determined relative relationship. According to the invention, the accuracy of human body part detection is improved.

Description

Device and method for detecting human body component in image and image processing system

Technical Field

The present invention relates to image processing, and more particularly, to an apparatus and method for detecting a human body part in an image, for example, and an image processing system.

Background

In video/image analysis and recognition applications, automatic and accurate detection of human body parts (e.g., facial parts, body parts) is a critical task. For example, detected facial components (e.g., facial feature points) are commonly used in facial recognition applications, such as facial verification (face identification), facial expression recognition, facial attribute recognition, and the like. Detected body parts (e.g., joints of an arm) are commonly used in person identification applications such as person authentication (human re-identification), person action identification, person attribute identification, person image retrieval, and the like.

In recent years, regression methods have made great progress in human component detection, such as the method disclosed in "Supervised Descent Method And its Applications to Face Alignment (x.Xiong And F.De la Torre.CVPR, 2013)". These regression methods are mainlyComprising the following steps: acquiring an initial shape of a human body part to be detected in an image; the initial shape of the human body part is then gradually updated according to a multi-stage regression process so that the shape of the finally detected human body part can be as close as possible to the actual shape of the human body part. Wherein, in any one-stage regression process (e.g., the t-th stage), the corresponding shape of the human body part (i.e., S ^t ) Is by using shape increments (i.e., deltas ^t ) Updating the data from the previous stage regression process (i.e., t ^t-1 Stage) determines the corresponding shape of the human body part (i.e., S ^t-1 ) To be determined, i.e. S ^t ＝S ^t-1 +ΔS ^t . Wherein DeltaS ^t Is based on a pre-generated regression model corresponding to the t-th stage and from S ^t-1 The extracted features are estimated.

It follows that in each stage of the regression process, the accuracy of the corresponding shape increment is largely dependent on the accuracy of the extracted features. However, in some cases, particularly where the human body wears/holds an accessory object, it is difficult to obtain accurate features to estimate accurate shape increments for the multiple stage regression process. For example, as shown in fig. 1A, in the case where the arm holds a bag, the accuracy of the features extracted around the arm will be affected by the bag. Thereby, the accuracy of the corresponding shape increment will be affected, resulting in an affected accuracy of the finally detected shape of the arm. For example, as shown in fig. 1B, in the case of a face wearing glasses, the accuracy of the features extracted around the eyes will be affected by the glasses, resulting in eye feature points (i.e., the shape of the eyes) sinking into the locally optimal positions of the glasses. Thereby, the accuracy of the finally detected eye feature points will also be affected.

Disclosure of Invention

Accordingly, in view of the foregoing background, the present invention aims to solve at least one of the above problems.

According to one aspect of the present invention, there is provided an apparatus for detecting a human body part in an image, the apparatus comprising: an acquisition unit configured to acquire an image including a human body; an object detection unit configured to detect an object from a region including the human body; a relative relationship determination unit configured to determine a relative relationship between a detected object and a human body part to be detected of the human body; and a human body part detection unit configured to detect the human body part based on the determined relative relationship.

Wherein the detected object is an object worn or held on the human body.

By using the invention, the accuracy of human body part detection is improved.

Other characteristic features and advantages of the invention will be apparent from the following description with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description, serve to explain the principles of the invention.

Fig. 1A to 1B schematically illustrate an exemplary human body wearing/holding an accessory object.

Fig. 2A to 2B schematically illustrate an exemplary relative relationship between an accessory object and a human body part according to the present invention.

Fig. 3 is a block diagram schematically illustrating a hardware configuration in which techniques according to embodiments of the present invention may be implemented.

Fig. 4 is a block diagram illustrating a configuration of an apparatus for human body part detection according to a first embodiment of the present invention.

Fig. 5 schematically shows a flow chart of human body part detection according to a first embodiment of the invention.

Fig. 6A to 6B schematically show an exemplary process of step S530 shown in fig. 5 according to the present invention.

Fig. 7 schematically shows a flowchart of step S540 as shown in fig. 5 according to the first embodiment of the present invention.

Fig. 8 schematically illustrates an exemplary process of step S5411 shown in fig. 7 according to the present invention.

Fig. 9 schematically shows another flowchart of step S540 as shown in fig. 5 according to the first embodiment of the present invention.

Fig. 10 schematically shows a flowchart of step S5422 shown in fig. 9 according to the first embodiment of the present invention.

Fig. 11 schematically shows an exemplary process of step S54221 shown in fig. 10 according to the present invention.

Fig. 12 schematically shows another flowchart of step S540 as shown in fig. 5 according to the first embodiment of the present invention.

Fig. 13 schematically shows a further flowchart of step S540 as shown in fig. 5 according to the first embodiment of the invention.

Fig. 14 schematically illustrates an exemplary process of the flowchart shown in fig. 13 in accordance with the present invention.

Fig. 15 is a block diagram illustrating a configuration of an apparatus for human body part detection according to a second embodiment of the present invention.

Fig. 16 schematically shows a flow chart of human body part detection according to a second embodiment of the invention.

Fig. 17 illustrates an arrangement of an exemplary analyzer according to the present invention.

Fig. 18 illustrates an arrangement of an exemplary image processing system according to the present invention.

Detailed Description

It should be noted that the following description is merely illustrative and exemplary in nature and is in no way intended to limit the invention, its application, or uses. The relative arrangement of the components and steps, numerical expressions and numerical values set forth in the examples do not limit the scope of the present invention unless it is specifically stated otherwise. In addition, techniques, methods, and apparatus known to those of skill in the art may not be discussed in detail, but are intended to be part of this specification where appropriate.

Note that like reference numerals and letters refer to like items in the drawings, and thus once an item is defined in one drawing, it is not necessary to discuss it in the following drawings.

Considering that in practical situations, people often wear or hold auxiliary objects such as glasses, bags, bats, wheelchairs, etc., on the one hand, the inventors found that since these auxiliary objects are often designed in a general structure/shape (e.g., linear, rectangular, or circular), the auxiliary objects can be detected from the image/video more easily by using a general object detection method (e.g., a glasses detection method, a bag detection method, etc.) than by detecting human body parts. The human body parts are, for example, face, eyes, nose, mouth, arms, legs, etc.

On the other hand, the inventors found that in the case where a person wears or holds an accessory object, the relative relationship between the accessory object and the relevant human body part is always limited to several types, respectively, due to the photographing angle of a specific electronic device (e.g., digital camera, video camera, web camera) and/or due to the wearing/holding manner of the accessory object. Among them, in the present invention, the relative relationship between an accessory object and an associated human body part can be regarded as "posture of the human body part with respect to the object". For example, in the case where the accessory object is glasses, the relevant human body part is an eye. As shown in fig. 2A, due to the different shooting angles of the cameras (as shown at 210), the pose of the corresponding eyes relative to the glasses (i.e., the relative relationship between the glasses and eyes) includes the eyes above the glasses (as shown at 220), the eyes within the glasses (as shown at 230), and the eyes on the rims of the glasses (as shown at 240). For example, in the case where the accessory object is a bag, the relevant human body part is an arm. As shown in fig. 2B, the manner of carrying the bag includes, for example, a short single shoulder strap (as shown at 250), a bias strap (as shown at 260), an elbow strap (as shown at 270), a long single shoulder strap (as shown at 280), and a hand strap (as shown at 290). Further, when the actual manner of holding the bag is determined, the posture of the corresponding arm with respect to the bag (that is, the relative relationship between the bag and the arm) may be determined accordingly.

As described above, it can be seen that the relative relationship between the appendages and the body member includes at least one of:

1) Relative position (R) _p ) Indicating the position of the body part relative to the accessory object.

2) Opposite region (R) _r ) Representing the area of the body part relative to the accessory object. Wherein in the present invention, the opposing region may be expressed as a human body part with respect to an accessory objectProbability distribution of the region that can be located.

3) Relative shape (R) _s ) Representing the shape of the body part relative to the accessory object.

According to the two aspects described above (i.e., the pose of the accessory object and the human body part with respect to the object can be more easily detected is always limited to several types), the inventors believe that in the case where an accessory object near the human body part of the human body can be detected, the corresponding relative relationship between the accessory object and the human body part can also be determined based on the detected accessory object. Therefore, during the detection process of the human body part, the corresponding relative relationship between the accessory object and the human body part can be at least used for adjusting the extracted features for the corresponding detection process, so as to improve the accuracy of the features. For example, in the case where the corresponding detection processing is performed by the regression method as described above, since more accurate features are obtained, the accuracy of the corresponding shape increment for the human body part can be improved, and the accuracy of the finally detected shape of the human body part can be improved.

That is, the relative relationship between the accessory object and the human body part can be used to guide the detection process of the human body part to improve the accuracy of the human body part detection. Furthermore, in order to obtain the pose of such a human body part with respect to an object from an image/video, in the present invention, a relative relationship model will first be pre-generated. The pre-generated relative relationship model is generated by using a rule-based estimation method or a machine learning method based on the relative relationship between the object (i.e., the accessory object) marked in the sample image and the human body part. For the rule-based estimation method, the rule for generating the relative relation model can be set according to the actual application scene. Wherein the machine learning method for generating the relative relationship model comprises at least one of: regression methods (e.g., explicit shape regression methods (Explicit Shape Regression, ESR), or supervised descent model methods (Supervised Descent Model, SDM)), classification methods (e.g., support vector machine methods (Support Vector Machine, SVM), naive Bayes methods (Naive Bayes)), or convolutional neural network methods.

Exemplary embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

(hardware configuration)

A hardware configuration that can implement the techniques described hereinafter will be first described with reference to fig. 3.

The hardware configuration 300 includes, for example, a Central Processing Unit (CPU) 310, random Access Memory (RAM) 320, read Only Memory (ROM) 330, a hard disk 340, an input device 350, an output device 360, a network interface 370, and a system bus 380. Furthermore, in one implementation, the hardware configuration 300 may be implemented by a processor, such as a tablet, notebook, desktop, or other suitable electronic device. In another implementation, the hardware configuration 300 may be implemented by a monitor or analyzer, such as a digital camera, video camera, webcam, or other suitable electronic device. Where the hardware configuration 300 is implemented by a monitor/analyzer, the hardware configuration 300 also includes, for example, an optical system 390.

In one implementation, a human body part according to the present invention detects a module or component configured by hardware or firmware and used as hardware configuration 300. For example, apparatus 400, which will be described in detail below with reference to fig. 4, and apparatus 1500, which will be described in detail below with reference to fig. 15, are used as modules or components of hardware configuration 300. In another implementation, human body part detection according to the present invention is configured by software stored in ROM 330 or hard disk 340 and executed by CPU 310. For example, a process 500 described in detail below with reference to fig. 5 and a process 1600 described in detail below with reference to fig. 16 are used as programs stored in the ROM 330 or the hard disk 340.

CPU 310 is any suitable programmable control device (such as a processor) and may perform various functions to be described below by executing various application programs stored in ROM 330 or hard disk 340 (such as memory). The RAM 320 is used to temporarily store programs or data loaded from the ROM 330 or the hard disk 340, and is also used as a space in which the CPU 310 performs various processes (such as implementing techniques that will be described in detail below with reference to fig. 5 to 14 and 16) and other available functions. Hard disk 340 stores a variety of information such as an Operating System (OS), various applications, control programs, images/videos, processing results for each image in the video, predefined data (e.g., average shape of human body parts), and/or pre-generated models (e.g., pre-generated relative relationship models, pre-generated regression models).

In one implementation, input device 350 is used to allow a user to interact with hardware configuration 300. In one example, a user may input image/video/data through input device 350. In another example, a user may trigger a corresponding process of the present invention through input device 350. In addition, the input device 350 may take a variety of forms, such as a button, a keyboard, or a touch screen. In another implementation, the input device 350 is used to receive images/video output from a specialized electronic device such as a digital camera, video camera, and/or webcam. In addition, where the hardware configuration 300 is implemented by a monitor/analyzer, the optical system 390 in the hardware configuration 300 will directly capture images/video of the monitored location.

In one implementation, the output device 360 is used to display the processing results (such as the shape of the detected human body part) to the user. Also, the output device 360 may take various forms such as a Cathode Ray Tube (CRT) or a liquid crystal display. In another implementation, the output device 360 is used to output the processing results to subsequent processing such as face recognition processing (e.g., demographic (counting), crowd analysis (profile analyzing), face verification, etc.).

The network interface 370 provides an interface for connecting the hardware configuration 300 to a network. For example, the hardware configuration 300 may be in data communication via the network interface 370 with other electronic devices connected via a network. Alternatively, a wireless interface may be provided for the hardware configuration 300 for wireless data communication. The system bus 380 may provide a data transmission path for transmitting data between the CPU 310, the RAM 320, the ROM 330, the hard disk 340, the input device 350, the output device 360, the network interface 370, and the like. Although referred to as a bus, system bus 380 is not limited to any particular data transfer technique.

The above-described hardware configuration 300 is merely illustrative and is in no way intended to limit the invention, its applications or uses. Also, only one hardware configuration is shown in fig. 3 for simplicity. However, a plurality of hardware configurations may be used as needed.

(human body part detection)

Next, human body part detection according to the present invention will be described with reference to fig. 4 to 17.

Fig. 4 is a block diagram illustrating a configuration of an apparatus 400 according to a first embodiment of the present invention. Wherein some or all of the modules shown in fig. 4 may be implemented by dedicated hardware. In this first embodiment, human body part detection will be performed on one image. As shown in fig. 4, the apparatus 400 includes an acquisition unit 410, an object detection unit 420, a relative relationship determination unit 430, and a human body part detection unit 440.

In addition, the storage device 450 shown in fig. 4 stores a pre-generated relative relationship model to be used by the relative relationship determination unit 430, as well as a pre-generated position (i.e., an average shape of a pre-generated human body part) and a pre-generated regression model of human body part feature points that may be used by the human body part detection unit 440. Alternatively, the pre-generated relative relationship model, the average shape of the pre-generated human body part, and the pre-generated regression model may be stored in different storage devices. In one implementation, storage device 450 is ROM 330 or hard disk 340 shown in FIG. 3. In another implementation, the storage device 450 is a server or an external storage device connected to the apparatus 400 via a network (not shown).

First, in one implementation, for example, where the hardware configuration 300 shown in fig. 3 is implemented by a processor, the input device 350 receives images including a human body output from a dedicated electronic device (e.g., a camera) or input by a user. Input device 350 then transmits the received image containing the person's body to apparatus 400 via system bus 380. In another implementation, for example, where hardware configuration 300 is implemented by a monitor/analyzer, apparatus 400 directly receives an image captured by optical system 390 containing a human body.

Then, as shown in fig. 4, the acquisition unit 410 acquires the received image containing the human body.

The object detection unit 420 detects an object (i.e., an accessory object) from an area including a human body. In the present invention, the object detection unit 420 detects an object by detecting a feature point of the object or by detecting a region of the object. In addition, the detected object is an object near a human body part to be detected of a human body.

Then, in the case where an object is detected from the region containing the human body, this means that the human body wears or holds the object, the relative relationship determination unit 430 determines the relative relationship between the detected object and the human body part to be detected of the human body. And, the human body part detection unit 440 detects human body parts based on the determined relative relationship. In the present invention, the human body parts detected by the human body part detection unit 440 are composed of feature points. Wherein the object detected by the object detection unit 420 is related to the human body part detected by the human body part detection unit 440.

Alternatively, in the case where the object detection unit 420 fails to detect any object from the region containing the human body, which means that the human body part does not wear or hold any object, the human body part may be directly detected by using the regression method.

Finally, after the human body part detection unit 440 detects the human body part, the human body part detection unit 440 transmits the detected human body part to the output device 360 shown in fig. 3 via the system bus 380 for displaying the detected human body part to a user or for outputting the detected human body part to a subsequent process such as a face recognition process (e.g., a person count, a person group analysis, a face verification, etc.).

The flowchart 500 shown in fig. 5 is a corresponding process of the apparatus 400 shown in fig. 4.

As shown in fig. 5, in the acquisition step S510, the acquisition unit 410 acquires the received image containing the human body.

In the object detection step S520, the object detection unit 420 detects an object (i.e., an accessory object) from an area including a human body. In one implementation, the object detection unit 420 detects an object using a pre-generated object detection model. Wherein the pre-generated object detection model is generated by using a regression method or a general detection method (e.g., a template matching method, a deformable member model method, a cascade classifier method) based on the object marked in the sample image. In this implementation, the pre-generated object detection model is also stored in the storage device 450, for example. More specifically, the object detection unit 420 detects an object in the following manner: first, a likely region of an object is determined based on a region including a human body and characteristics of the object (e.g., structure/shape of the object); the pre-generated object detection model is then retrieved from the storage device 450 and the object is detected from the determined possible areas using the pre-generated object detection model.

Then, in the case where an object is detected from the region containing the human body, the relative relationship determination unit 430 determines a relative relationship between the detected object and a human body part to be detected of the human body in the relative relationship determination step S530. More specifically, the relative relationship determination unit 430 determines the relative relationship between the detected object and the human body part in the following manner: first, a pre-generated relative relationship model is obtained from the storage device 450; next, obtaining features from the detected object and human body; then, based on the pre-generated relative relationship model and the obtained features, a relative relationship between the detected object and the human body part is determined.

In one implementation, the features obtained are features extracted from the human body itself. In the present invention, the relative relationship determination unit 430 determines a pose (post) feature (e.g., a three-dimensional (3D) pose feature) of a human body based on the detected position of an object in a region including the human body, and determines a corresponding relative relationship using the 3D pose feature of the human body. However, it is obviously not necessarily limited thereto. As shown in fig. 6A, in the case where the object is glasses and the human body part to be detected is eyes, the 3D pose of the head will be first determined. The detected glasses (shown at 611) and the determined 3D pose of the head (shown at 612) are then input into a pre-generated relative relationship model (shown at 613). The relative relationship between the glasses and the eyes can then be obtained directly (as shown at 614).

In another implementation, the obtained features are features extracted from the detected object itself. In the present invention, the relative relation determining unit 430 extracts apparent features (e.g., scale-invariant feature transform (Scale Invariant Feature Transform, SIFT) features, speeded-Up Robust Features, SURF) around the detected object, and determines a corresponding relative relation using the apparent features of the detected object. However, it is obviously not necessarily limited thereto. For example, in the case where the object is a pair of glasses and the human body part to be detected is an eye, the apparent features of the pair of glasses will be extracted first. Then, by inputting the detected apparent features of the glasses and the extracted glasses into a pre-generated relative relationship model, the relative relationship between the glasses and the eyes can also be directly obtained.

In another implementation, the obtained feature is a relative feature (e.g., relative shape, relative position) between the detected object and the human body. In the present invention, the relative relation determining unit 430 determines the relative characteristics between the detected object and the human body based on the position and/or shape of the detected object and the position and/or shape of the human body. However, it is obviously not necessarily limited thereto. As shown in fig. 6B, in the case where the object is a bat and the human body part to be detected is an arm, the relative characteristics between the bat and the human body will be determined first. Wherein the determined relative characteristics are, for example, the shape of the club relative to the human body (as shown at 621) and the position of the club relative to the human body (as shown at "d1" and "d 2"), wherein "d1" and "d2" represent the relative distance between the club and the human body, respectively. The detected bats and detected relative features are then input into a pre-generated relative relationship model (as shown at 622). Wherein in this example the pre-generated relative relationship model is generated, for example, using a Convolutional Neural Network (CNN) method. The relative relationship between the club and arm may then be obtained directly (as indicated at 623).

In addition, the relative relationship determination unit 430 may determine the relative relationship between the detected object and the human body part using the above-described features (i.e., the features extracted from the human body itself, the features extracted from the detected object itself, and the relative features between the detected object and the human body) at the same time.

Returning to fig. 5, in a human body part detection step S540, the human body part detection unit 440 detects a human body part based on the determined relative relationship. In one implementation, the human body part detection unit 440 detects human body parts by using a regression method based on the determined relative relationship with reference to fig. 7 to 12.

In this implementation, the average shape and regression model of the human body part will first be pre-generated. The average shape of the pre-generated human component (i.e., the locations of the pre-generated human component feature points) and the pre-generated regression model will be stored in the storage device 450. In one example, the average shape of the human body part and the regression model are generated using a regression method based on the labeled human body part in the sample image. Alternatively, as described above, there is some relative relationship between the object and the human body component in the case where the human body wears or holds the object (i.e., the accessory object). Thus, in another example, to improve accuracy of human body part detection, first, sample images for generating a regression model and an average shape of human body parts are grouped based on a relative relationship between a labeled object in the sample image and the human body parts. Then, for each group of sample images, based on the human body parts marked in the sample images in the group, an average shape of the corresponding human body parts and a corresponding regression model are generated by using a regression method. That is, for each type of relative relationship between the object and the human body part, such as the above-mentioned relative position (R _p ) The above opposite region (R _r ) The relative shape (R) _s ) The average shape of the corresponding human body part and the corresponding regression model will be generated.

Hereinafter, a detailed regression process of detecting human body parts based on the determined relative relationship will be described with reference to fig. 7 to 12.

In one implementation, in order to be able to perform the regression process more easily and more quickly, a flowchart of the human body part detection step S540 is shown in fig. 7. In this implementation, the determined relative relationship between the detected object and the human component is used to determine the initial position of the human component feature point.

After determining the relative relationship between the detected object and the human body part in step S530 shown in fig. 5, as shown in fig. 7, first, in step S5411, the human body part detection unit 440 acquires the pre-generated position of the human body part feature point from the storage device 450. Then, the human body part detection unit 440 determines an initial position of the human body part feature point based on the determined relative relationship and the pre-generated position of the human body part feature point. In addition, the initial position of the determined human body part feature point will be regarded as the current position (i.e., the first position) of the human body part feature point.

In the present invention, the human body part detection unit 440 determines the initial position of the human body part feature point by adjusting the pre-generated position of the human body part feature point based on the determined relative relationship. Taking the detected object as glasses and the human body part as eyes as an example, as shown in fig. 8, initial positions of eye feature points are determined as follows:

first, based on the determined relative relationship (as shown at 810), the direction of the eye (as shown at 820) and the center of the eye (as shown at 830) are determined.

Then, based on the determined direction and center, the pre-generated position of the eye feature point is moved (as shown at 840), and the final position after the movement (as shown at 850) is regarded as the initial position of the eye feature point. In one example, the pre-generated positions of the ocular feature points are moved directly to the corresponding positions of the ocular feature points in the determined relative relationship. In another example, the pre-generated position of the ocular feature point is increased by a movement offset along the determined direction of the eye. Wherein the movement offset is determined based on a determined center of the eye and a determined center based on a pre-generated position of the eye feature point. In yet another example, for each feature point in the average shape of the pre-generated eye, a movement offset will be added to the pre-generated position of that feature point along the determined direction of the eye. Wherein the movement offset is an average or weighted value determined based on the pre-generated position of the feature point and the position of the feature point in the determined relative relationship.

Returning to fig. 7, for the current stage of the regression process, i.e., for the T-th stage (e.g., 1 st stage) of the regression process, where T is a natural number and 1+.t+.t, where T represents the total number of stages of the regression process, the human component detection unit 440 extracts features around the current position of the human component feature point in step S5412. For example, features are extracted from regions each containing a feature point of a human body part having a corresponding current position. Wherein each region is for example centered on a corresponding feature point. The extracted features are, for example, SIFT features, SURF features, etc.

In step S5413, first, the human body part detection unit 440 acquires the t-th pre-generated regression model (for example, the 1 st pre-generated regression model) from the storage 450. Then, the human body part detection unit 440 determines a position increment for the current position of the human body part feature point based on the extracted feature and the t-th pre-generated regression model. For example, the corresponding position increment is determined by mapping the extracted feature to the t-th pre-generated regression model.

In step S5414, the human body part detection unit 440 adds the determined position increment to the current position of the human body part feature point to update the corresponding position. In addition, the position of the updated human body part feature point will be regarded as the current position of the human body part feature point.

Then, in step S5415, the human body part detection unit 440 determines whether T is greater than T. In the case where it is determined that T is greater than T, this means that all stages of the regression process are performed, and the human body part detection unit 440 determines the human body part feature point having the final updated position as the detected human body part. Otherwise, in step S5416, the human body part detection unit 440 sets t=t+1, and repeatedly performs the corresponding operations from step S5412 to step S5416.

In addition, as described above, for each type of relative relationship between an object and a human body part, the average shape of the corresponding human body part and the corresponding regression model may be pre-generated. Thus, as an alternative solution, in step S5411 and step S5413, the human component detection unit 440 may acquire an average shape of the pre-generated human component and a pre-generated regression model corresponding to the relative relationship between the detected object and the human component determined in step S530 shown in fig. 5.

In another implementation, another flowchart of the human body part detection step S540 is shown in fig. 9 in order to make the regression process more robust. In this implementation, the determined relative relationship between the detected object and the human body component is used to determine a feature that is used to determine the position delta of the corresponding regression process.

Comparing fig. 9 with fig. 7, the main difference is that steps S5421 to S5422 shown in fig. 9 are different from steps S5411 to S5412 shown in fig. 7.

As shown in fig. 9, in step S5421, first, the human body part detection unit 440 acquires the pre-generated position of the human body part feature point from the storage device 450. Then, the human body part detection unit 440 determines the initial positions of the human body part feature points based on the pre-generated positions of the human body part feature points. More specifically, the pre-generated positions of the human body part feature points will be determined as the corresponding initial positions. In addition, the initial position of the determined human body part feature point will be regarded as the current position (i.e., the first position) of the human body part feature point. In addition, as an alternative solution, in step S5421, the human body part detection unit 440 may also acquire an average shape of a pre-generated human body part corresponding to the relative relationship between the detected object and the human body part determined in step S530 shown in fig. 5.

Then, for the current level of the regression process, i.e., for the t-th level (e.g., first level) of the regression process, the human component detection unit 440 refers to fig. 10 to determine features around the current position of the human component feature point based on the determined relative relationship in step S5422.

As shown in fig. 10, in step S54221, the human component detection unit 440 determines a position probability weight map for the human component feature points in the following manner. Wherein, for each feature point of the human body part, the position probability weight graph for the feature point is the probability distribution of the current position of the feature point in the region containing the human body.

Taking the eye example shown in fig. 11, first, a first center of the eye (shown as 1121, 1122) is determined based on the determined relative relationship (shown as 1110), and a second center of the eye (shown as 1141, 1142) is determined based on the current position of the eye feature point (shown as 1131, 1132).

Next, a direction vector is determined from the determined second center to the determined first center (as shown in 1151, 1152).

Then, for each feature point of the eye (as shown in 1161), a position probability weight map for the feature point (as shown in 1162) is determined by using, for example, a gaussian smoothing function (Gaussian Smoothing Function) based on the current position of the feature point and a direction vector corresponding to the feature point (as shown in 1151). Wherein, for example, in the case where the feature point is located on the left eye, the direction vector corresponding to the feature point is the direction vector determined on the left eye.

Returning to fig. 10, after determining the position probability weight map for each feature point of the human body part, in step S54222, the human body part detection unit 440 determines features around the current position of the human body part feature point in the following manner:

first, in the same manner as described in step S5412 shown in fig. 7, features around the current position of the human body part feature point are extracted.

Then, for each feature point of the human body part, the corresponding feature is weighted by multiplying the feature extracted around the current position of the feature point by the position probability weight map of the feature point.

Returning to fig. 9, in step S5413, the human body part detection unit 440 determines a corresponding position increment based on the weighted features in the same manner as described in fig. 7.

In addition, since the operations of steps S5413 to S5416 shown in fig. 9 are the same as the corresponding operations of steps S5413 to S5416 shown in fig. 7, detailed description will not be repeated here.

In yet another implementation, to make the overall regression process faster, a further flowchart of the human component detection step S540 is shown in FIG. 12. In this implementation, the determined relative relationship between the detected object and the human body component is used to adjust the determined positional increment for the corresponding regression process.

Comparing fig. 12 with fig. 7, steps S5412 to S5413 and S5415 to S5416 shown in fig. 12 are the same as steps S5412 to S5413 and S5415 to S5416 shown in fig. 7, so a detailed description of these steps will not be repeated here. Comparing fig. 12 with fig. 9, step S5431 shown in fig. 12 is the same as step S5421 shown in fig. 9, so a detailed description of this step will not be repeated here. Accordingly, steps S5432 to S5433 shown in fig. 12 will be described in detail below.

As shown in fig. 12, for the current stage of the regression process, i.e., for the t-th stage (e.g., first stage) of the regression process, after determining the position increment for the current position of the human body part feature point in step S5413, the human body part detection unit 440 adjusts the determined position increment based on the determined relative relationship in step S5432. In the present invention, the human body part detecting unit 440 adjusts the determined position increment by weighting the determined position increment based on the determined relative relationship and the current position of the human body part feature point. Taking the detected object as glasses and the human body part as eyes as an example, the determined position increment is adjusted in the following manner:

Still taking the eye illustrated in fig. 11 as an example, first, a first center of the eye (as illustrated by 1121, 1122) is determined based on the determined relative relationship (as illustrated by 1110), and a second center of the eye (as illustrated by 1141, 1142) is determined based on the current position of the eye feature point (as illustrated by 1131, 1132).

Then, for each feature point of the eye, a weighting vector for the feature point is determined based on the direction vector corresponding to the feature point and the current position of the feature point. Wherein, for example, in the case where the feature point is located on the left eye, the direction vector corresponding to the feature point is the direction vector determined on the left eye.

Then, for each feature point of the eye, the corresponding position increment is weighted by multiplying the position increment of the current position of the feature point by the weighting vector for the feature point.

Returning to fig. 12, in step S5433, the human body part detection unit 440 updates the current position of the human body part feature point based on the adjusted position increment in the same manner as described in step S5414 in fig. 7.

In addition, as shown in fig. 7 to 12, the relative relationship between the detected object and the human body part is used to determine the initial position of the more accurate feature point, the more accurate feature, or the more accurate position increment, respectively. However, it is obviously not necessarily limited thereto. In the present invention, the relative relationship between the detected object and the human body part may be used to determine any two or all of the initial position, feature, and position increment simultaneously.

Returning to fig. 5, as indicated above, in one implementation, the human body parts may be detected using a regression method based on the determined relative relationship. Considering that sometimes a human body wears or holds an object (i.e., an accessory object) in a special action (style)/manner, such as an arm carrying bag (as shown in fig. 1A), holding a bat (e.g., playing baseball/golf), sitting on a wheelchair, or the like, in order to make the regression process for this case more efficient, the human body part detection unit 440 optionally detects human body parts with reference to fig. 13 based on the determined relative relationship and the connection points determined based on the determined relative relationship. The connecting points are human body part characteristic points for directly connecting the human body part with the detected object.

After determining the relative relationship between the detected object and the human body part in step S530 shown in fig. 5, as shown in fig. 13, first, in step S5441, the human body part detection unit 440 determines a connection point based on the determined relative relationship between the detected object and the human body part. Taking the detected object as a bat and the body part as an arm as an example, as shown in fig. 14, 1410 represents a determined relative relationship, and 1420 represents a determined connection point. Then, the human body part detection unit 440 regards the determined connection points as the human body part feature points currently detected. As shown in fig. 14, the connection point 1420 will be considered as the currently detected arm feature point.

In step S5442, the human body part detection unit 440 determines a feasible region (feasibly regions) based on the determined relative relationship and the currently detected human body part feature points. Wherein the feasible region is a region from which feature points of the human body part in the vicinity of the currently detected feature point can be detected. As shown in fig. 14, 1431 and 1432 represent the determined feasible regions.

In step S5443, the human body part detection unit 440 detects human body part feature points in the vicinity of the currently detected feature points from the determined feasible region by using the general feature point detection method. As shown in fig. 14, 1441 and 1442 represent detected arm feature points. Then, the human body part detection unit 440 regards the detected feature points as the currently detected human body part feature points. As shown in fig. 14, the feature points 1441 and 1442 will be regarded as the currently detected arm feature points.

Then, in step S5444, the human body part detection unit 440 determines whether all human body part feature points have been detected. If not, the human body part detection unit 440 repeatedly performs the corresponding operations from step S5442 to step S5443. Otherwise, the human body part detection unit 440 determines all the detected feature points as detected human body parts. As shown in fig. 14, 1450 represents the detected arm.

Returning to fig. 5, after detecting the human body parts in step S540, the human body part detection unit 440 transmits the detected human body parts to the output device 360 shown in fig. 3 via the system bus 380 for displaying the detected human body parts to the user or for outputting the detected human body parts to subsequent processes such as face recognition processing (e.g., demographics, crowd analysis, face verification, etc.).

According to the first embodiment of the present invention, in the case of detecting a human body part by using a regression method, the relative relationship between an accessory object and the human body part may be used to determine an initial position of a more accurate human body part feature point, to determine a more accurate feature, and/or to determine a more accurate position increment. Thus, the accuracy of the finally detected feature points of the human body part can be improved. Alternatively, in the case of detecting the human body part based on the connection relationship between the accessory object and the human body part, the feasible region may be determined using the relative relationship between the accessory object and the human body part, which may reduce the size of the detection region and may avoid sinking the human body part feature points into the locally optimal positions of the accessory object. Thus, the accuracy of the finally detected feature points of the human body part can also be improved. Therefore, according to the invention, the accuracy of human body part detection can be improved.

As described above, in the first embodiment of the present invention, human body part detection is performed in one image. However, in the present invention, human body part detection may also be implemented in video. Fig. 15 is a block diagram illustrating a configuration of an apparatus 1500 according to a second embodiment of the present invention. Wherein some or all of the modules shown in fig. 15 may be implemented by dedicated hardware. As shown in fig. 15, the apparatus 1500 includes an acquisition unit 1510, an object detection unit 1520, a relative relationship determination unit 430, and a human body part detection unit 440.

Comparing fig. 15 with fig. 4, the main difference is that the acquisition unit 1510 and the object detection unit 1520 shown in fig. 15 are different from the acquisition unit 410 and the object detection unit 420 shown in fig. 4.

First, in one implementation, for example, where the hardware configuration 300 shown in fig. 3 is implemented by a processor, the input device 350 receives video output from a specialized electronic device (e.g., a camera) or input by a user. Input device 350 then transmits the received video to apparatus 1500 via system bus 380. In another implementation, for example, where hardware configuration 300 is implemented by a monitor/analyzer, apparatus 1500 directly receives video captured by optical system 390.

Then, as shown in fig. 15, with the acquisition unit 1510, the acquisition unit 1510 acquires one video frame containing a human body from the received video (i.e., the input video).

With the object detection unit 1520, in the case where the acquired video frame is the first video frame of the input video, the object detection unit 1520 detects an object (i.e., an accessory object) from a region including a human body in the same manner as described for the object detection unit 420 shown in fig. 4. In the case where the acquired video frame is not the first video frame of the input video, the object detection unit 1520 detects an object from a region including a human body by tracking the object in the previous video frame of the acquired video frame. Wherein in the previous video frame, the object is tracked based on the relative relationship determined from the previous video frame by the relative relationship determining unit 430 and the human body part detected from the previous video frame by the human body part detecting unit 440. In other words, in the case where the acquired video frame is not the first video frame of the input video, the object detection unit 1520 effectively functions as a object tracker.

Since the relative relation determining unit 430 and the human body part detecting unit 440 shown in fig. 15 are the same as the relative relation determining unit 430 and the human body part detecting unit 440 shown in fig. 4, detailed description will not be repeated here. In addition, since the storage device 450 shown in fig. 15 is the same as the storage device 450 shown in fig. 4, a detailed description is not repeated here either.

Finally, after human body parts in all video frames of the input video are detected, the human body part detection unit 440 transmits the detected human body parts to the output device 360 shown in fig. 3 via the system bus 380 for displaying the detected human body parts to a user or for outputting the detected human body parts to subsequent processes such as face recognition processing (e.g., population statistics, population analysis, face verification, etc.).

The flowchart 1600 shown in fig. 16 is a corresponding process of the apparatus 1500 shown in fig. 15.

Comparing fig. 16 with fig. 5, since steps S520 to S540 shown in fig. 16 are the same as steps S520 to S540 shown in fig. 5, a detailed description of these steps will not be repeated here.

As shown in fig. 16, in step S1610, first, the acquisition unit 1510 acquires one video frame containing a human body from a received video (i.e., an input video). Then, the acquisition unit 1510 determines whether the acquired video frame is the first video frame of the input video.

In the case where the acquired video frame is the first video frame, the process goes to step S520. Otherwise, in case the acquired video frame is not the first video frame, such as the second video frame in which the acquired video frame is the input video, the object detection unit 1520 detects the object by tracking the object in the previous video frame of the acquired video frame by using a general tracking method such as a kernel-based tracking method (Kernel based tracking method), a contour tracking method (Contour tracking method), a kalman filtering method (Kalman filter method), or a particle filtering method (Particle filter method) in step S1620. Wherein, in the previous video frame, the object is tracked based on the relative relationship determined from the previous video frame by the relative relationship determining unit 430 in step S530 and the human body part detected from the previous video frame by the human body part detecting unit 440 in step S540. More specifically, in tracking objects, for each previous video frame, information (e.g., location, shape) of objects in the video frame will be adjusted based on the relative relationship determined from the video frame and the human body parts detected from the video frame.

After detecting the object from the acquired video frame in step S520 or step S1620, the process goes to steps S530 to S540.

Then, after detecting the human body part from the acquired video frames in step S540, the human body part detection unit 440 determines whether all video frames in the input video have been processed in step S1630. If not, the process returns to step S1610 to detect human body parts in subsequent video frames. Otherwise, the human body part detection unit 440 transmits human body parts detected from all video frames to the output device 360 shown in fig. 3 via the system bus 380 for displaying the detected human body parts to the user or for outputting the detected human body parts to subsequent processes such as face recognition processing (e.g., demographics, crowd analysis, face verification, etc.).

According to the second embodiment of the present invention, in tracking the subject in the video frame of the video, the relative relationship between the human body part and the subject and the human body part will also be taken into consideration, so that a more accurate subject can be detected, so that a more accurate relative relationship can be determined. Thus, based on a more accurate relative relationship between the accessory object and the human body part, the accuracy of the finally detected human body part feature points can also be improved. Therefore, according to the invention, the accuracy of human body part detection can be improved.

(image processing System)

As described above, in one implementation, the present invention may be implemented by a monitor/analyzer (e.g., a digital camera, video camera, or web cam). Taking the present invention as an example implemented by a network camera, after the network camera is triggered by the corresponding process of the present invention, the network camera may output the corresponding process result (i.e., human body parts) to a subsequent process such as a face recognition process (e.g., demographics, crowd analysis, face verification, etc.). Accordingly, as an exemplary application of the present invention, an exemplary analyzer (e.g., a web camera) will be described next with reference to fig. 17. Fig. 17 illustrates an arrangement of an exemplary analyzer 1700 in accordance with the present invention. As shown in fig. 17, analyzer 1700 includes an optical system 1710, an identification device 1720, and the above-described devices 400/1500.

As shown in fig. 17, first, an optical system 1710 captures an image or video. In other words, the optical system 1710 is actually used as an acquisition device for acquiring an image or video.

Then, referring to fig. 4 to 16, the apparatus 400/1500 detects a face in the acquired image or the acquired video.

Then, the recognition device 1720 extracts features from the detected face for face recognition.

For example, in the case where the face recognition process is a demographics process, the recognition device 1720 performs the demographics process as follows: first, extracting features from a detected face; then, for each detected face, it is verified whether the face is a true face based on the extracted features. Where the face is a true face, the number of people will increase by 1. Otherwise, the number of people will not change.

For example, in the case where the face recognition process is a crowd analysis process, the recognition apparatus 1720 performs the crowd analysis process in the following manner: first, extracting features from a detected face; then, character attributes (e.g., character age, character gender) are analyzed based on the extracted features.

As noted above, the present invention may alternatively be implemented by a processor (e.g., a tablet, notebook, or desktop computer). Taking the present invention as an example, after the computer receives an image or video and is triggered to perform the corresponding processing of the present invention, the computer may also output the corresponding processing result (i.e., human body parts) to subsequent processing such as face recognition processing (e.g., people count, crowd analysis, face verification, etc.). Accordingly, as an exemplary application of the present invention, an exemplary image processing system will be described next with reference to fig. 18. Fig. 18 illustrates an arrangement of an exemplary image processing system 1800 in accordance with the present invention. As shown in fig. 18, the image processing system 1800 includes an acquisition device 1810, an identification apparatus 1720, and the apparatus 400/1500 described above. Wherein the identification means 1720 and the means 400/1500 may be implemented by the same computer or by different computers.

In this implementation, the capture device 1810 is implemented by a camera (e.g., a digital camera, video camera, or webcam) and is used to capture/capture images or video. In addition, comparing fig. 18 with fig. 17, since the recognition apparatus 1720 and the apparatuses 400/1500 shown in fig. 18 are the same as the recognition apparatus 1720 and the apparatuses 400/1500 shown in fig. 17, detailed descriptions of these apparatuses will not be repeated here.

In the present application, the relative relationship between the accessory object and the human body part is directly used for human body part detection. However, as an alternative solution, the relative relationship between the appendage and the body component can also be used directly for other recognition processes. Taking the relative relationship for person authentication as an example, in the case where a person is far from the camera, this makes the facial features of the person not effectively and accurately usable, the relative relationship between the accessory object and the entire body of the person (such as the relative relationship between the bag and the entire body, the relative relationship between the club and the entire body, etc.) can be used to guide the corresponding authentication process.

All of the above-described elements are exemplary and/or preferred modules for implementing the processes described in this disclosure. These units may be hardware units (such as Field Programmable Gate Arrays (FPGAs), digital signal processors, application specific integrated circuits, etc.) and/or software modules (such as computer readable programs). The units for implementing the steps are not described in detail above. However, where there are steps to perform a certain process, there may be corresponding functional modules or units (implemented by hardware and/or software) for implementing the same process. The technical solutions by means of all combinations of the described steps and the units corresponding to these steps are included in the disclosure of the application as long as they constitute a complete, applicable technical solution.

The method and apparatus of the present invention can be implemented in a variety of ways. For example, the methods and apparatus of the present invention may be implemented by software, hardware, firmware, or any combination thereof. The order of the steps of the above-described method is intended to be illustrative only, and the steps of the method of the present invention are not limited to the order specifically described above, unless specifically stated otherwise. Furthermore, in some embodiments, the present invention may also be implemented as a program recorded in a recording medium, including machine-readable instructions for implementing the method according to the present invention. Therefore, the present invention also covers a recording medium storing a program for implementing the method according to the present invention.

While specific embodiments of the invention have been illustrated in detail by way of example, it will be appreciated by those skilled in the art that the foregoing examples are intended to be illustrative only and are not limiting of the scope of the invention. It will be appreciated by those skilled in the art that modifications may be made to the embodiments described above without departing from the scope and spirit of the invention. The scope of the invention is to be limited by the following claims.

Claims

1. An apparatus for detecting a human body component in an image, the apparatus comprising:

An acquisition unit configured to acquire an image including a human body;

an object detection unit configured to detect an object from a region including the human body;

a relative relationship determination unit configured to determine a relative relationship between a detected object and a human body part of the human body to be detected, wherein the relative relationship between the detected object and the human body part includes at least one of: a position of the human body part relative to the detected object, a region of the human body part relative to the detected object, or a shape of the human body part relative to the detected object, and wherein the relative relationship determination unit determines a relative relationship between the detected object and the human body part based on a pre-generated relative relationship model and features obtained from the detected object and the human body; and

a human body part detection unit configured to detect the human body part based on the determined relative relationship.

2. The device of claim 1, wherein the detected object is an object worn or held on the human body.

3. The apparatus of claim 1, wherein the features obtained from the detected object and the human body comprise at least one of:

A gesture feature of the human body determined based on a position of the detected object in the region containing the human body;

an apparent feature of the detected object extracted from the detected object; or (b)

A relative characteristic between the detected object and the human body determined based on the information of the detected object and the information of the human body.

4. The apparatus of claim 1, wherein the pre-generated relative relationship model is generated using a rule-based estimation method or a machine learning method based on a relative relationship between the labeled object and the human body part in the sample image.

5. The apparatus of claim 4, wherein the machine learning method comprises at least one of: regression methods, classification methods, or convolutional neural network methods.

6. The apparatus according to claim 1, wherein the human body part detected by the human body part detection unit is composed of feature points, wherein the human body part detection unit detects the human body part by using a regression method based on the determined relative relationship.

7. The apparatus of claim 6, wherein the detected object is related to the human body part detected by the human body part detection unit.

8. The apparatus of claim 6, wherein the determined relative relationship is used by the human component detection unit as at least one of:

the human body part detection unit determines an initial position of a feature point of the human body part based on the determined relative relationship and a pre-generated position of the feature point of the human body part;

the human body part detection unit determines features around a current position of a feature point of the human body part based on the determined relative relationship; or (b)

The human component detection unit adjusts a position increment for the current position of a feature point of the human component based on the determined relative relationship, wherein the position increment is determined based on at least one pre-generated regression model and features extracted around the current position of the feature point of the human component.

9. The apparatus according to claim 8, wherein the human body part detection unit determines the initial position by adjusting the pre-generated position of the feature point of the human body part based on the determined relative relationship.

10. The apparatus according to claim 8, wherein the human body part detection unit determines the feature by weighting a feature extracted around the current position of a feature point of the human body part based on a probability distribution of the current position of the feature point of the human body part in the region including the human body;

Wherein for each feature point of the human component, a corresponding probability distribution is determined based on the current position of the feature point and the determined relative relationship.

11. The apparatus according to claim 8, wherein for each feature point of the human body part, the human body part detection unit adjusts a corresponding position increment by weighting a position increment of the current position of the feature point based on the determined relative relationship and the current position of the feature point.

12. The apparatus of claim 8, wherein sample images of the pre-generated locations for generating the pre-generated regression model and feature points of the human component are grouped based on a relative relationship between an object marked in the sample images and the human component;

wherein, for each group of the sample images, a corresponding pre-generated regression model and a corresponding pre-generated position of the feature point of the human body part are generated by using a regression method based on the sample images in the group.

13. The apparatus according to claim 1, wherein the human body part detected by the human body part detection unit is composed of feature points, wherein the human body part detection unit detects the human body part based on the determined relative relationship and a connection point determined based on the determined relative relationship;

Wherein the connection point is a characteristic point of the human body part connecting the human body part and the detected object.

14. The apparatus according to claim 1, wherein the acquisition unit acquires the image by obtaining a current video frame from an input video.

15. The apparatus according to claim 14, wherein in the case where the current video frame obtained by the obtaining unit is not a first video frame of the input video, the object detecting unit detects the object from the current video frame by tracking the object in a previous video frame of the current video frame;

wherein in the previous video frame, the object is tracked based on the determined relative relationship in the previous video frame and the detected human body part in the previous video frame.

16. A method for detecting a human body component in an image, the method comprising:

an acquisition step of acquiring an image including a human body;

an object detection step of detecting an object from a region including the human body;

a relative relationship determining step of determining a relative relationship between a detected object and a human body part to be detected of the human body, wherein the relative relationship between the detected object and the human body part includes at least one of: a position of the human body part relative to the detected object, a region of the human body part relative to the detected object, or a shape of the human body part relative to the detected object, and wherein in the relative relationship determining step, a relative relationship between the detected object and the human body part is determined based on a pre-generated relative relationship model and features obtained from the detected object and the human body; and

A human body part detection step of detecting the human body part based on the determined relative relationship.

17. The method according to claim 16, wherein the human body part detected in the human body part detection step is composed of feature points, wherein the human body part is detected by using a regression method based on the determined relative relationship.

18. The method of claim 16, wherein the human body part detected in the human body part detecting step is composed of feature points, wherein the human body part is detected based on the determined relative relationship and connection points determined based on the determined relative relationship;

19. The method of claim 16, wherein in the acquiring step, the image is acquired by obtaining a current video frame from the input video.

20. An image processing system, the system comprising:

an acquisition device configured to acquire an image or video;

a detection device configured to detect a face component in the acquired image or the acquired video according to any one of claims 1 to 15; and

And an identification means configured to extract features from the detected face component for face identification.