CN111429487A

CN111429487A - Sticky foreground segmentation method and device for depth image

Info

Publication number: CN111429487A
Application number: CN202010191067.3A
Authority: CN
Inventors: 王磊; 李骊
Original assignee: Beijing HJIMI Technology Co Ltd
Current assignee: Beijing HJIMI Technology Co Ltd
Priority date: 2020-03-18
Filing date: 2020-03-18
Publication date: 2020-07-17
Anticipated expiration: 2040-03-18
Also published as: CN111429487B

Abstract

The application discloses a method and a device for segmentation of a sticky foreground of a depth image, wherein the method comprises the following steps: after a target depth image to be segmented is obtained, the target depth image is segmented into a background and a foreground where a target tracking object is located, connected region segmentation is carried out on the target depth image to obtain connected region blobs contained in the target depth image, then the blobs are classified to obtain blobs of various types, then the blobs of the preset types are divided into different small connected regions patch according to preset division rules, and finally all patches belonging to the same target tracking object are aggregated one by one through traversing each patch to obtain all complete target tracking objects in the target depth image. Therefore, the method realizes accurate segmentation of the adhesion foreground under the condition of only depth images, reduces the segmentation cost and the computation amount, improves the real-time performance of segmentation, and has wide application space.

Description

Sticky foreground segmentation method and device for depth image

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for sticky foreground segmentation of a depth image.

Background

With the development of portable and inexpensive depth cameras, the research and application of depth images in the field of image processing have more and more important meanings. By applying the depth image information, the performance of related research and application in the field of machine vision, such as image segmentation, object tracking, image recognition, image reconstruction and the like, can be improved.

When a moving target tracking object in the depth image is in contact with other objects or other tracking objects, namely the foreground is adhered, accurate segmentation is carried out on the object, and the premise that the target tracking object continues to be tracked and the posture of the object is identified. The existing method for segmenting the sticky foreground of the depth image mainly utilizes the registered color image information to segment at the same time, for example, the sticky foreground of the depth image is segmented on a color image by using a neural network, although the segmentation accuracy can be higher, the neural network usually needs a large amount of manual labeling data, the cost is higher, the calculation amount is larger, and the real-time segmentation cannot be realized; in addition, due to the need for registered colormap information, segmentation of the sticky foreground cannot be achieved with depth images only.

Disclosure of Invention

The main objective of the embodiments of the present application is to provide a method and an apparatus for segmenting a sticky foreground of a depth image, which can accurately segment the sticky foreground under the condition of only the depth image, reduce the segmentation cost and the computation workload, improve the real-time performance of segmentation, and have a wide application space.

In a first aspect, an embodiment of the present application provides a sticky foreground segmentation method for a depth image, including:

acquiring a target depth image to be segmented; the target depth image comprises a background and a foreground where a target tracking object is located;

acquiring a background in the target depth image and a foreground where a target tracking object is located, and performing connected region segmentation on the target depth image to obtain connected region blobs contained in the target depth image;

classifying each blob contained in the target depth image to obtain each type of blob contained in the target depth image;

dividing the blob of the preset type into different small connected regions patch according to a preset dividing rule;

and traversing each patch, and aggregating all the patches belonging to the same target tracking object one by one to obtain all complete target tracking objects in the target depth image.

Optionally, the classifying the blobs included in the target depth image to obtain the blobs of each type included in the target depth image includes:

s1: when the foreground proportion in the blob is judged to be smaller than a foreground proportion threshold value, determining the blob type as the blob only containing the background;

s2: when the foreground proportion in the blob is judged to be not smaller than the foreground proportion threshold, if the proportion of a part, which appears in the blob, of an area of a target tracking object in a previous frame is judged to be larger than a first proportion threshold, the type of the blob is determined to be the blob which only contains one target tracking object; if the proportion of the part of the region of the target tracking object in the previous frame, which appears in the blob, in the total area of the blob is judged to be smaller than a second proportion threshold value, determining that the type of the blob is the blob only containing the background; the second proportional threshold is much smaller than the first proportional threshold; if the proportion of the part of the region of the target tracking object in the previous frame in the total area of the blob is smaller than the first proportion threshold value and not smaller than the second proportion threshold value, determining that the type of the blob is the blob containing the target tracking object adhered to the background;

s3: when the foreground proportion in the blob is judged to be not smaller than a foreground proportion threshold value, if the proportion of a part, which appears in the blob, of an area in a previous frame in at least two target tracking objects in the blob in the total area of the blob is judged to be larger than a third proportion threshold value, and the number of effective target tracking objects of which the proportion value after proportion normalization is larger than a normalization proportion threshold value is 0, determining that the type of the blob is the blob only containing the background; if the proportion of the part of the area in the previous frame in the blob in the total area of the blob in the at least two target tracking objects is larger than a third proportion threshold value and the number of the effective target tracking objects with the proportion value after the proportion normalization larger than the normalization proportion threshold value is 1, repeatedly executing the step S2 to determine the type of the blob; if the proportion of the parts of the areas in the previous frame in the blob in the total area of the blob in the at least two target tracking objects is larger than a third proportion threshold value and the number of the effective target tracking objects with the proportion values larger than a normalization proportion threshold value after the proportion normalization is larger than 1, adding the proportion of the parts of the areas in the previous frame in the blob in the total area of the blob, and if the proportion values after the addition are larger than a fourth proportion threshold value, determining that the type of the blob is the blob containing the adhesion between the at least two target tracking objects; and if the added proportion value is not larger than the fourth proportion threshold value, determining that the type of the blob is the blob containing at least two target tracking objects adhered to the background.

Optionally, the preset type of blob includes the following three adhesion type blobs:

the system comprises a blob which is adhered to a background by a target tracking object;

the blob comprises a sticky connection between at least two target tracking objects;

contains at least two blobs of target tracking objects that are stuck to the background.

Optionally, according to a preset partitioning rule, partitioning the blob of the preset type into different small connected regions patch, including:

taking each depth pixel point in the blob of the preset type as an independent patch, and distributing a patch data structure object for each pixel; the patch data structure comprises the serial number, the number of pixel points and the depth value of the patch;

taking all patch pairs with adjacent relations in the blob as an edge respectively; the structure of the edge includes the positions and weights of the two end points;

arranging all edges in the blob in an ascending order according to the weight;

and combining the patches where the two end points in the edges meeting the preset condition are positioned into one patch according to the relation between the weight of each edge in the blob and the depth values of the two end points.

Optionally, the traversing each patch, and aggregating all patches belonging to the same target tracking object one by one to obtain a complete target tracking object, including:

dividing the patch with high attribution confidence coefficient into the target tracking object during the first pass of each patch; analyzing the patch with lower attribution confidence coefficient but larger area line by line, and dividing the patch into corresponding target tracking objects according to the proportion of the number of pixels of the target tracking object of the previous frame appearing in different segments of each line to the length of the current segment; marking the patch with lower attribution confidence degree but smaller area as the to-be-processed patch;

and dividing each patch to be processed into target tracking objects with the three-dimensional nearest distance in the second pass of each patch to be processed.

In a second aspect, an embodiment of the present application further provides a sticky foreground segmentation apparatus for a depth image, including:

the device comprises an acquisition unit, a segmentation unit and a segmentation unit, wherein the acquisition unit is used for acquiring a target depth image to be segmented; the target depth image comprises a background and a foreground where a target tracking object is located;

the segmentation unit is used for acquiring a background in the target depth image and a foreground where a target tracking object is located, and performing connected region segmentation on the target depth image to obtain connected region blobs contained in the target depth image;

the classification unit is used for classifying each blob contained in the target depth image to obtain each type of blob contained in the target depth image;

the dividing unit is used for dividing the blob of the preset type into different small connected regions patch according to a preset dividing rule;

and the obtaining unit is used for aggregating all the patches belonging to the same target tracking object one by one through traversing each patch to obtain all the complete target tracking objects in the target depth image.

Optionally, the classifying unit includes:

the first determining subunit is used for determining that the type of the blob is the blob only containing the background when the foreground proportion in the blob is judged to be smaller than the foreground proportion threshold;

the second determining subunit is configured to, when it is determined that the foreground proportion in the blob is not smaller than the foreground proportion threshold, determine that the type of the blob is the blob containing only one target tracking object if it is determined that a proportion of a portion, in the blob, of an area in a previous frame, of one target tracking object, in which the portion appears in the blob is larger than the first proportion threshold; if the proportion of the part of the region of the target tracking object in the previous frame, which appears in the blob, in the total area of the blob is judged to be smaller than a second proportion threshold value, determining that the type of the blob is the blob only containing the background; the second proportional threshold is much smaller than the first proportional threshold; if the proportion of the part of the region of the target tracking object in the previous frame in the total area of the blob is smaller than the first proportion threshold value and not smaller than the second proportion threshold value, determining that the type of the blob is the blob containing the target tracking object adhered to the background;

a third determining subunit, configured to, when it is determined that the foreground proportion in the blob is not smaller than the foreground proportion threshold, determine that the type of the blob is the blob containing only the background if it is determined that the proportion, in the blob, of a portion, appearing in a previous frame, of regions in at least two target tracking objects, in the blob, in the total area of the blob is greater than a third proportion threshold, and the number of effective target tracking objects whose proportion values after the proportion normalization are greater than the normalization proportion threshold is 0; if the proportion of the part of the area in the previous frame in the blob in the total area of the blob in the at least two target tracking objects is larger than a third proportion threshold value and the number of the effective target tracking objects with the proportion value after the proportion normalization larger than the normalization proportion threshold value is 1, calling a second determining subunit to determine the type of the blob; if the proportion of the parts of the areas in the previous frame in the blob in the total area of the blob in the at least two target tracking objects is larger than a third proportion threshold value and the number of the effective target tracking objects with the proportion values larger than a normalization proportion threshold value after the proportion normalization is larger than 1, adding the proportion of the parts of the areas in the previous frame in the blob in the total area of the blob, and if the proportion values after the addition are larger than a fourth proportion threshold value, determining that the type of the blob is the blob containing the adhesion between the at least two target tracking objects; and if the added proportion value is not larger than the fourth proportion threshold value, determining that the type of the blob is the blob containing at least two target tracking objects adhered to the background.

Optionally, the dividing unit includes:

the allocation subunit is used for taking each depth pixel point in the blob of the preset type as an independent patch respectively and allocating a patch data structure object to each pixel; the patch data structure comprises the serial number, the number of pixel points and the depth value of the patch;

obtaining subunits, which are used for respectively taking all patch pairs with adjacent relations in the blob as an edge; the structure of the edge includes the positions and weights of the two end points;

the arrangement subunit is used for arranging all the edges in the blob in an ascending order according to the weight;

and the merging subunit is used for merging the patches where the two end points in the edges which meet the preset condition are located into one patch according to the relation between the weight of each edge in the blob and the depth values of the two end points.

Optionally, the obtaining unit includes:

the first traversal subunit is used for dividing the patch with high attribution confidence coefficient into the target tracking object during the first traversal of each patch; analyzing the patch with lower attribution confidence coefficient but larger area line by line, and dividing the patch into corresponding target tracking objects according to the proportion of the number of pixels of the target tracking object of the previous frame appearing in different segments of each line to the length of the current segment; marking the patch with lower attribution confidence degree but smaller area as the to-be-processed patch;

and the second traversal subunit is used for dividing each patch to be processed into target tracking objects with the three-dimensional distance being closest to the target tracking object during the second traversal of each patch to be processed.

The embodiment of the present application further provides a sticky foreground segmentation equipment of depth image, including: a processor, a memory, a system bus;

the processor and the memory are connected through the system bus;

the memory is configured to store one or more programs, the one or more programs including instructions, which when executed by the processor, cause the processor to perform any one of the implementations of the above-described sticky foreground segmentation method for depth images.

The embodiment of the application further provides a computer-readable storage medium, where instructions are stored, and when the instructions are run on a terminal device, the terminal device is enabled to execute any implementation manner of the above method for segmenting the sticky foreground of the depth image.

The embodiment of the application provides a method and a device for segmenting a bonding foreground of a depth image, after a target depth image to be segmented, which comprises a background and a foreground of a target tracking object, is obtained, the background in the target depth image and the foreground of the target tracking object are obtained first, the target depth image is segmented in a connected region, each connected region blob contained in the target depth image is obtained, then each blob contained in the target depth image is classified, each type of blob contained in the target depth image is obtained, then, according to a preset dividing rule, the bonding blob of the preset type is divided into different small connected regions patch, and finally, all the patches belonging to the same target tracking object are aggregated one by one through carrying out each patch, and all complete target tracking objects in the target depth image are obtained. Therefore, the method and the device classify the blobs contained in the depth image, divide the blobs with the bonding foreground into the patches, match the patches with the target tracking object one by one, and aggregate all the patches belonging to the same tracking target to obtain all complete target tracking objects, so that the bonding foreground is accurately divided under the condition that only the depth image exists, the dividing cost and the operation amount are reduced, the real-time performance of the division is improved, and the method and the device have wide application space.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flowchart of a method for sticky foreground segmentation of a depth image according to an embodiment of the present disclosure;

fig. 2 is a schematic diagram of a foreground and a background where a target tracking object is located in a target depth image obtained according to the embodiment of the present application;

fig. 3 is a schematic diagram of each connected region blob included in a target depth image according to an embodiment of the present application;

fig. 4 is a schematic flowchart of classifying blobs included in a target depth image according to the embodiment of the present application;

fig. 5 is a schematic flow chart illustrating a process of dividing a blob of a preset type into different small connected regions patch according to a preset division rule according to the embodiment of the present application;

fig. 6 is a schematic diagram illustrating an effect of dividing a blob of a preset type into different small connected regions patch according to a preset division rule according to the embodiment of the present application;

fig. 7 is a schematic flowchart illustrating a process of segmenting a blob containing a target tracking object adhered to a background according to an embodiment of the present application;

fig. 8 is a schematic diagram illustrating an effect of segmenting a blob containing a sticky connection between two target tracking objects according to an embodiment of the present application;

fig. 9 is a schematic composition diagram of a sticky foreground segmentation apparatus for depth images according to an embodiment of the present disclosure.

Detailed Description

At present, a pedestrian tracking and gesture recognition technology based on a depth image is a basic technology and can be used in the fields of human-computer interaction, motion sensing games, human behavior analysis and the like. When the moving target tracking object in the depth image is in contact with other objects or other tracking objects, namely the foreground is adhered, accurate segmentation is carried out on the object, and the premise that the target tracking object continues to track and recognize the posture is provided. The existing method for segmenting the sticky foreground of the depth image mainly utilizes the registered color image information to segment at the same time, for example, the neural network is used for segmenting the sticky foreground of the depth image on the color image, although the segmentation accuracy can be higher, the neural network usually needs a large amount of manual labeling data, the cost is higher, the calculation amount is larger, and the real-time segmentation cannot be realized; in addition, due to the need for registered colormap information, segmentation of the sticky foreground cannot be achieved with depth images only.

In order to solve the above-mentioned drawbacks, embodiments of the present application provide a method for sticky foreground segmentation of a depth image, after a target depth image to be segmented, which comprises a background and a foreground where a target tracking object is located, is obtained, the background in the target depth image and the foreground where the target tracking object is located are obtained first, and carrying out connected region segmentation on the target depth image to obtain each connected region blob contained in the target depth image, then, classifying each blob contained in the target depth image to obtain each type of blob contained in the target depth image, and then, and dividing the blob of the preset type into different small connected regions patch according to a preset division rule, and finally, traversing each patch to aggregate all patches belonging to the same target tracking object one by one to obtain all complete target tracking objects in the target depth image. Therefore, the method and the device classify the blobs contained in the depth image, divide the blobs with the bonding foreground into the patches, match the patches with the target tracking object one by one, and aggregate all the patches belonging to the same tracking target to obtain all complete target tracking objects, so that the bonding foreground is accurately divided under the condition that only the depth image exists, the dividing cost and the operation amount are reduced, the real-time performance of the division is improved, and the method and the device have wide application space.

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

First embodiment

Referring to fig. 1, a schematic flow chart of a method for sticky foreground segmentation of a depth image according to this embodiment is provided, where the method includes the following steps:

s101: acquiring a target depth image to be segmented; the target depth image comprises a background and a foreground where the target tracking object is located.

In this embodiment, any depth image that realizes sticky foreground segmentation by using this embodiment is defined as a target depth image, where the target depth image includes a background and a foreground where a target tracking object is located. A depth image, also called range image, refers to an image having distances (depths) from an image capture to points in a scene as pixel values, and directly reflects the geometry of the visible surface of a scene. The target tracking object refers to a moving object of interest in the target depth image, such as a moving person in the scene, and correspondingly, the foreground where the target tracking object is located refers to a pixel area where the target tracking object is located in the target depth image, such as a pixel area corresponding to a moving person in the scene in the image. The background refers to all regions of the image except the target pixel region of interest, i.e., the pixel region outside the foreground where the target tracking object is located. The sticky foreground refers to a foreground where a target tracking object in contact with a background exists in the target depth image, or a foreground where a plurality of target tracking objects in contact with the background exist, for example, a pedestrian (a target tracking object) holds a chair (a background), two side-by-side pedestrians (two target tracking objects) and the like in a scene. It should be noted that, the target depth image can be usually obtained by shooting through a depth camera having a depth module or a camera, and the accurate three-dimensional coordinates of the target are directly given.

S102: and obtaining a background in the target depth image and a foreground where the target tracking object is located, and performing connected region segmentation on the target depth image to obtain connected region blobs contained in the target depth image.

In this embodiment, after the target depth image to be segmented is obtained in step S101, foreground segmentation may be further performed on the target depth image by using a current or future foreground segmentation algorithm, for example, a code index codebook algorithm, a gaussian mixture model, a vibe algorithm, or the like may be used to perform foreground and background segmentation on the target depth image so as to accurately segment a background in the target depth image and a foreground where the target tracking object is located, as shown in fig. 2, a black area in the image represents the background, a white area represents the foreground, and it can be seen that a foreground segmentation result has certain noise, and a part of a background area may be segmented into the foreground by mistake, and an area that belongs to the foreground may also be segmented into the background.

Then, the connected domain segmentation may be performed on the target depth image by using an existing or future connected domain algorithm to obtain each connected domain blob included in the target depth image. For example, a connected domain segmentation may be performed on the target depth image by using a one-pass traversal algorithm, a two-pass traversal algorithm, or a multiple-pass traversal algorithm, that is, each different target tracking object in the target depth image and an independent individual in the background are respectively segmented into independent connected domain portions, for example, a complete pedestrian (target tracking object), a chair (background), a wall (background), and the like, as shown in fig. 3, different color blocks in fig. 3 represent different connected domains, it can be seen that most of the independent targets are segmented into a complete blob, but when the depth value inside the target changes greatly, the independent target may be segmented into multiple blobs, or when multiple targets contact each other, the independent target may be segmented into a complete blob.

S103: and classifying each blob contained in the target depth image to obtain each type of blob contained in the target depth image.

In this embodiment, after the blobs included in the target depth image are obtained in step S102, since the target tracking object and other objects may be divided into one complete blob in step S102 when the target tracking object comes into contact with other objects (such as a background or other target tracking objects), and in addition, there may be a case where one complete target tracking object is divided into a plurality of blobs, it is necessary to classify the blobs included in the target depth image to obtain blobs of various types included in the target depth image.

In an optional implementation manner of the embodiment of the present application, a specific implementation process of this step S103 may include the following steps S1-S3:

step S1: and when the foreground proportion in the blob is judged to be smaller than the foreground proportion threshold value, determining the blob type as the blob only containing the background.

Step S2: when the foreground proportion in the blob is judged to be not smaller than the foreground proportion threshold, if the proportion of a part, which is present in the blob, of a region of one target tracking object in a previous frame in the total area of the blob is judged to be larger than a first proportion threshold, the type of the blob is determined to be the blob which only contains one target tracking object; if the proportion of the parts of the region of the target tracking object in the previous frame, which appear in the blob, in the total area of the blob is smaller than a second proportion threshold value, determining that the type of the blob is the blob only containing the background; the second proportional threshold is far smaller than the first proportional threshold; and if the ratio of the part of the region of the target tracking object in the previous frame in the total area of the blob is judged to be smaller than a first proportional threshold value and not smaller than a second proportional threshold value, determining that the type of the blob is the blob containing the target tracking object adhered to the background.

Step S3: when the foreground proportion in the blob is judged to be not smaller than the foreground proportion threshold, if the proportion of a part of an area in a previous frame in the blob in the total area of the blob in at least two target tracking objects is judged to be larger than a third proportion threshold and the number of effective target tracking objects of which the proportion value after the proportion normalization is larger than the normalization proportion threshold is 0, determining that the type of the blob is the blob only containing the background; if the proportion of the part of the area in the previous frame in the blob in the total area of the blob in the at least two target tracking objects is larger than the third proportion threshold and the number of the effective target tracking objects with the proportion value after the proportion normalization larger than the normalization proportion threshold is 1, repeatedly executing the step S2 to determine the type of the blob; if the proportion of parts of the area in the previous frame in the blob in the total area of the blob in the at least two target tracking objects is larger than a third proportion threshold value and the number of effective target tracking objects with the proportion value after proportion normalization larger than a normalization proportion threshold value is larger than 1, adding the proportion of the parts of the area in the previous frame in the blob in the total area of the blob in the effective target tracking object, and if the added proportion value is larger than a fourth proportion threshold value, determining that the type of the blob is the blob containing the adhesion between the at least two target tracking objects; and if the added proportion value is not larger than the fourth proportion threshold value, determining the type of the blob as the blob containing at least two target tracking objects adhered to the background.

Specifically, as shown in fig. 4, after the blobs included in the target depth image are obtained in step S102, for each blob, the foreground information in the blob is counted first, and if the foreground proportion (defined as for _ ratio) in the blob is smaller than the foreground proportion threshold (defined as for _ ratio _ threshold), the type of the blob is determined to be the blob including only the background. This is because when the foreground proportion in the blob is too small, it is likely that the noise point of the foreground segmentation appears in the background blob, so that the blob cannot be attributed to any target tracking object. It should be noted that the foreground proportion threshold value for _ ratio _ threshold is set according to the performance of the foreground segmentation algorithm, the frame rate, the motion speed of the target tracking object, and the camera parameter, and a specific value may be set according to an actual situation, which is not limited in this embodiment, for example, the foreground proportion threshold value for _ ratio _ threshold may be set to 0.1.

In addition, when the foreground proportion for _ ratio in the blob is not less than the foreground proportion threshold for _ ratio _ threshold, then the proportion of the area of the current blob in the region of each target tracking object currently being tracked in the previous frame is counted (here, it is defined as pre _ object _ ratio), and for convenience of subsequent processing and representation, the pre _ object _ ratio [0] represents the proportion of the background area of the previous frame appearing in the current blob, that is, the background can be regarded as a tracking object with ID of 0, and the remaining target tracking objects are respectively numbered 1, 2, …, N. Next, the number of tracking targets (here defined as pre _ object _ num) whose pre _ object _ ratio [ i ] (i >0) is greater than 0 in the current blob is counted. It is understood that, if pre _ object _ num is 0, that is, tracking target information of a previous frame does not appear in a current blob, the type of the blob is determined to be a blob containing only a background.

As shown in fig. 4, when it is determined that there is a target tracking object proportion for _ ratio not less than the foreground proportion threshold value for _ ratio _ threshold in a blob, that is, pre _ object _ num is equal to 1, at this time, it is calibrated as a target tracking object number 1, and its pre _ object _ ratio is greater than 0, and if the proportion pre _ object _ ratio of the part of the region in the previous frame appearing in the blob to the total area of the blob is greater than a first ratio threshold value (here, it is defined as pre _ object _ ratio _ threshold 1), it is considered that the blob completely belongs to a target tracked object number 1, that is, it is determined that the type of the blob is a blob containing only one target tracking object.

A blob is considered to be of the type that contains only background if the proportion of its parts in the blob that occur in the region in the previous frame, pre _ object _ ratio, to the total area of the blob is less than a second proportion threshold, defined here as pre _ object _ ratio _ threshold 2. Wherein the second ratio threshold value pre _ object _ ratio _ threshold 2 is much smaller than the first ratio threshold value pre _ object _ ratio _ threshold 1. If the ratio pre _ object _ ratio of the parts of the area in the previous frame, which appear in the blob, to the total area of the blob is smaller than the first ratio threshold pre _ object _ ratio _ threshold1 and not smaller than the second ratio threshold pre _ object _ ratio _ threshold 2, it is determined that the blob is of a type including a target tracking object 1 stuck to the background, and then the target tracking object 1 needs to be divided from the background through the subsequent steps S104-S105.

It should be noted that the first proportional threshold value pre _ object _ ratio _ threshold1 should be a large value, that is, most of the blob area should coincide with the area of the target tracking object 1 in the previous frame; whereas the second scale threshold pre _ object _ ratio _ threshold 2 should be a small value, i.e. most of the blob's area should coincide with the background area of the previous frame. The pre _ object _ ratio _ threshold1 and the pre _ object _ ratio _ threshold 2 are set according to the motion speed of the target tracking object, the frame rate of the camera can be set, and the specific value can be set according to the actual situation, which is not limited in this embodiment, for example, the pre _ object _ ratio _ threshold1 can be set to 0.9, and the pre _ object _ ratio _ threshold 2 can be set to 0.1.

As shown in fig. 4, when it is determined that there are at least two target tracking object ratios for _ ratio in a blob not less than the foreground ratio threshold value for _ ratio _ threshold, i.e., pre _ object _ num is greater than 1, at this time, there are multiple target tracking objects and their pre _ object _ ratios are greater than 0, first normalize their pre _ object _ ratios to obtain normalized ratio values (here, defined as pre _ object _ ratio _ norm), then compare them one by one with the third ratio threshold value (here, defined as pre _ object _ ratio _ threshold 3), and count that all of the pre _ object _ ratios are greater than the third ratio threshold value pre _ object _ ratio _ threshold 3 and the number of the pre _ object _ ratio _ thresholds are greater than the normalized ratio threshold value (here, defined as target _ object _ ratio _ threshold) and the number of the target tracking objects is greater than the normalized ratio threshold value (here, defined as target _ object _ ratio _ threshold). It should be noted that the pre _ object _ ratio _ threshold 3 and the pre _ object _ ratio _ norm _ threshold are set according to the motion speed of the target tracking object, the number of target tracking objects pre _ object _ sum in the previous frame appearing in the current blob, and the frame rate of the camera, and specific values may be set according to actual conditions, which is not limited in this embodiment, for example, the pre _ object _ ratio _ threshold 3 may be set to 0.05, and the pre _ object _ ratio _ norm _ threshold may be set to 0.15.

If valid _ pre _ object _ num is equal to 0, it indicates that the area proportion of the target tracking object of the previous frame appearing in the current blob is very small, so that it can be determined that the type of the current blob is a blob containing only the background.

If the value _ pre _ object _ num is equal to 1, it indicates that there is only one valid target tracking object, then compare the ratio of the part of the blob whose area in the previous frame appears in the blob to the total area of the blob, pre _ object _ ratio _ threshold1 and the size of the blob _ object _ ratio _ threshold 2, and if it is greater than the pre _ object _ ratio _ threshold1, determine that the type of the blob is the blob containing only one target tracking object. If it is less than pre _ object _ ratio _ threshold 2, then the blob is determined to be of a type that contains only background blobs. If the target tracking object is between the target tracking object and the background, the blob belongs to the common object of the target tracking object and the background, that is, the blob is determined to be a blob containing a target tracking object 1 adhered to the background, and at this time, the target tracking object needs to be segmented from the background through the subsequent steps S104 to S105.

If the valid _ pre _ object _ num is greater than 1, a plurality of valid target tracking objects are indicated, and the respective occupied proportions are not low, at this time, the proportions of the parts of the regions in the frame before the valid target tracking objects, which appear in the blob, occupying the total area of the blob may be added, and if the added proportion value is greater than a fourth proportion threshold (here, defined as pre _ object _ ratio _ threshold 4), the blob is indicated to belong to all of the plurality of target tracking objects, so that the blob may be determined to be of a type including a blob that is sticky between at least two target tracking objects; if the added proportion value is not greater than pre _ object _ ratio _ threshold 4, it indicates that the blob belongs to all of the plurality of target tracking objects and the background, and thus the blob may be determined to be of a type including at least two target tracking objects stuck to the background. At this time, when both types of blobs appear, the target tracking object needs to be segmented from the background through the subsequent steps S104 to S105.

S104: and dividing the preset type of the sticky blob into different small connected regions patch according to a preset dividing rule.

In this embodiment, after obtaining each type of blob included in the target depth image in step S103, for a preset type of blob having a sticky foreground, median filtering needs to be performed on the blob first to reduce the influence of noise, and meanwhile, edge information inside the blob can be better maintained. Then, the blob of the preset type may be segmented into a plurality of different small connected regions patch based on the depth value information inside the blob. And then according to a preset division rule, the attribution condition of the blob, the depth information of the patch, the foreground information in the patch and the target tracking object information of the previous frame, attribution division is carried out on the patch.

The preset type of blob comprises three adhesion type blobs, which are respectively as follows: the system comprises a blob which is adhered to a background by a target tracking object; the blob comprises a sticky connection between at least two target tracking objects; contains at least two blobs of target tracking objects that are stuck to the background.

In an optional implementation manner of the embodiment of the present application, a specific implementation process of this step S104 may include the following steps a1-a 4:

step A1: taking each depth pixel point in the blob of the preset type as an independent patch, and distributing a patch data structure object for each pixel; wherein, the patch data structure comprises the serial number, the number of pixel points and the depth value of the patch.

Step A2: taking all patch pairs with adjacent relations in the blob as an edge respectively; wherein the structure of the edge includes the positions and weights of the two end points.

Step A3: and arranging all edges in the blob in an ascending order according to the weight.

Step A4: and combining the patches in which the two end points in the edges meeting the preset condition are positioned into one patch according to the relation between the weight of each edge in the blob and the depth values of the two end points.

Specifically, as shown in fig. 5, after blobs of each preset type included in the target depth image are obtained in step S103, for each blob, first, each depth pixel in the blob is taken as an independent patch, and a patch data structure object is allocated to each pixel; the patch data structure includes a number id of the patch (i.e., an id of all pixels in the patch), a number of pixels (i.e., how many pixels are included), and a depth value. During initialization, each pixel point in the blob is used as an independent patch, id of the patch is the serial number of the pixel point in the blob, the size of the patch is 1, and the representative depth is the depth value of the patch.

Then, all the depth pixel pairs in the blob having the neighboring relationship are taken as an edge (defined as edge here), and an edge can be formed by the current patch and the patch above the current patch. The structure of an edge includes three main members, the position of two end points of the edge (defined herein as a, b) and a weight, where the weight is the absolute value of the difference between the depth values of the two end points of the edge (defined herein as w).

Then, all edges in the blob may be arranged in ascending order according to the weight w, and the side with the smaller weight w is arranged in the front.

And finally, traversing all the edges in the blob, and determining whether to combine the latches of the two endpoints a and b of the edge into one latch according to the relationship between the weight w of the edge and the latches of the two endpoints a and b. Specifically, if the weight of the edge is smaller than both the threshold value threshold _ a determined by the depth image value depth _ a of the patch _ a at which the endpoint a is located and the threshold value threshold _ b determined by the depth image value depth _ b of the patch _ b at which the endpoint b is located, the patch _ a and the patch _ b are merged into the patch _ c, the number of pixels is the sum of the numbers of the patch _ a and the patch _ b, and the depth value is the smaller of the numbers of the patch _ a and the patch _ b; otherwise, patch _ a and patch _ b remain unchanged. The thresholds threshold _ a and threshold _ b are determined by the depth value of the patch and the measurement error of the target depth image, and in general, the larger the depth value is, the larger the measurement error corresponding to the depth image is, and the larger the threshold is. For example, if the depth value of the endpoint a is 2m, the depth value of the endpoint b is 3m, the threshold value threshold _ a determined by the endpoint a is 5mm, the threshold value threshold _ b determined by the endpoint b is 10mm, and the difference between the depth values of the endpoints a and b is greater than 5mm, the endpoints a and b cannot be merged into one patch.

In addition, it is also possible to traverse all merged patches again, and merge the patches containing too few pixels (for example, the patches containing 50 pixels) into the adjacent patches, specifically, merge the adjacent patches into the one with the depth value closest to the representative value.

As shown in fig. 6, the left graph shows two pedestrians (i.e., two target tracking objects) contacting each other, and being divided into a complete blob at the time of "blob division", and the right graph is the result of dividing the blob into patches again, where the different gray degree patches represent different patches. It can be seen that a large area of blob is divided into many small areas of patch, and since the depth value variation on both sides of the contact boundary is still more obvious than that in other areas, the areas on both sides of the contact boundary are divided into many different patches.

S105: and traversing each patch, and aggregating all the patches belonging to the same target tracking object one by one to obtain all the complete target tracking objects in the target depth image.

In this embodiment, after the blob of the preset type is divided into different small connected regions patch in step S104, each patch may be further traversed, and all patches belonging to the same target tracking object are aggregated one by one, so as to obtain all complete target tracking objects in the target depth image.

In an optional implementation manner, the specific implementation process of step S104 may include the following steps B1-B2:

step B1: during the first pass of each patch, dividing the patch with high attribution confidence coefficient to the object tracking object; analyzing the patch with lower attribution confidence coefficient but larger area line by line, and dividing the patch into corresponding target tracking objects according to the proportion of the number of pixels of the target tracking object of the previous frame appearing in different segments of each line to the length of the current segment; the patch with lower attribution confidence but smaller area is marked as the to-be-processed patch.

Step B2: in the second pass of each patch to be processed, the patch to be processed is divided into target tracking objects closest to the target tracking object in three-dimensional distance.

The attribution confidence of the patch refers to a proportion of a part of a region of a target tracking object in a previous frame in the blob to which the patch belongs, wherein the part of the region of the target tracking object in the previous frame in the blob accounts for the total area of the blob. It is understood that a higher ratio indicates a higher confidence in the attribution of the corresponding patch, whereas a lower ratio indicates a lower confidence in the attribution of the corresponding patch.

It should be noted that, for the patches in the blobs of three different preset types (including a blob with a target tracking object and a background being stuck, a blob with at least two target tracking objects being stuck, and a blob with at least two target tracking objects and a background being stuck), the overall idea of the traversal and aggregation processes is the same, but the detailed parts of the specific implementation processes are still different, and then, this embodiment will respectively describe the traversal and aggregation processes of the patches in the blobs of the three different preset types:

(1) the first is for a patch in a blob containing a target tracking object glued to a background. As shown in FIG. 7, since it has been determined that the blob belongs to a target tracking object and is commonly owned by the background, the patch of the blob either completely belongs to the target tracking object, completely belongs to the background, or is commonly owned by the background and the tracking object. Most of the patches completely belong to the target tracking object or completely belong to the background, and only a few patches simultaneously belong to the target tracking object and the background. During the first time of traversal of each patch, firstly taking out an unprocessed patch, calculating the foreground proportion for _ ratio, wherein if the for _ ratio is less than a preset threshold value for _ ratio _ threshold1, the patch necessarily belongs to the background; otherwise, calculating the ratio pre _ object _ ready of the area of the part of the target tracking object, which is in the current patch and is in the frame, of the current patch, and if the pre _ object _ ready is greater than a higher threshold value pre _ ratio _ threshold1, attributing the target tracking object to the patch; otherwise, the relationship between the pre _ object _ ready and a smaller threshold pre _ object _ ready _ threshold 2 is further determined, if pre _ object _ ready is smaller than the threshold, the more area of the patch is attributed to the background and the less area of the patch is marked as the to-be-processed patch for processing in the second traversal according to the relationship between the area of the current patch and a smaller area threshold, path _ area _ threshold, and the to-be-processed patch is determined. For a patch with pre _ object _ ratio greater than the smaller threshold pre _ object _ ratio _ threshold 2, it is further analyzed line by line (specifically, after the circumscribed rectangle where the patch is located is determined, the patch included in each line of the rectangle is analyzed line by line). For any line in the patch, firstly, the pixel points of the line are divided into a plurality of continuous line segments according to whether the pixel points of the line are continuous or not, then the proportion of the number of pixels of the tracking target of the previous frame in each segment to the length of the current line segment is counted, if the proportion is greater than a threshold value pre _ object _ ratio _ threshold 3, the line segment is marked as belonging to the target tracking object, otherwise, the line segment is marked as belonging to the background. And according to the steps, after all the patches of the blob are processed, the second traversal is carried out. In the second pass, only the patches marked as pending are processed, and for each patch to be processed, the nearest three-dimensional distance to the adjacent patch which has been marked as belonging to the target tracking object is calculated, if the three-dimensional distance is less than the threshold range threshold, the patch is marked as belonging to the target tracking object, otherwise, the patch is marked as belonging to the background.

(2) The second is for a patch in a blob that contains a sticky connection between at least two target tracking objects. The traversal and aggregation process is substantially the same as the process described in (1) above for the patch process in a blob containing a target tracking object and a background sticky, and the difference is in the following two aspects: firstly, because it is determined that the blob belongs to all of a plurality of target tracking objects, the patch in the blob either completely belongs to a certain target tracking object or belongs to all of a plurality of target tracking objects together, and cannot belong to the background, so that the foreground proportion in the patch does not need to be counted, and the assignment of the patch based on the proportion is not needed; secondly, when judging the attribution of the patch or the attribution of the line segment under the condition that the blob belongs to a target tracking object and the background are all together, is determined based on the proportion of the pixel region of the target tracking object appearing in the current patch or the current line segment in the previous frame, however, in the case that a blob belongs to multiple target tracking objects, attribution needs to be determined according to the correlation between the proportion of pixel regions of the previous frame in the target tracking objects appearing in the current patch or the current line segment, and if the pre _ object _ ratio of a certain target tracking object is much higher than the threshold pre _ object _ ratio _ threshold1, then it is marked as belonging to the target tracking object, otherwise, according to the size of its area, and judging the patch with larger area line by line and line segment by segment, and dividing the patch with smaller area to the target tracking object with the closest three-dimensional distance during the secondary pass.

As shown in fig. 8, which is the result of re-segmenting the mutually contacted tracked pedestrians (i.e. two mutually contacted target tracking objects) in fig. 6, it can be seen that most of the segmented regions are correct, and only a few pixel points near the contact boundary are wrongly segmented to other targets, but this has no influence on the subsequent continuous pedestrian tracking and other data processing, such as human body posture recognition.

(3) The second is that the traversal and aggregation process for a patch in a blob containing at least two target tracking objects and a background is substantially the same as the process for the patch in the blob containing one target tracking object and a background, which is described in (1) above, except that: on the premise that the foreground proportion in the patch is greater than the determination threshold value for _ ratio _ threshold, the background is taken as a tracked object with number 0, meanwhile, the pre _ object _ ratio of the background is compared with the pre _ object _ ratios of other tracked objects, the maximum one is found out, and the tracked object belongs to the background or the foreground according to the relationship between the maximum pre _ object _ ratio and the thresholds pre _ object _ threshold1 and pre _ object _ threshold 2, or is further processed line by line segment.

In summary, according to the method for segmenting the sticky foreground of the depth image provided by this embodiment, after a target depth image to be segmented, which includes a background and a foreground where a target tracking object is located, is obtained, the background in the target depth image and the foreground where the target tracking object is located are obtained, connected region segmentation is performed on the target depth image, connected region blobs included in the target depth image are obtained, then the connected region blobs included in the target depth image are classified, each type of blob included in the target depth image is obtained, then, the preset type of blob is divided into different small connected regions patch according to a preset division rule, and finally, all patches belonging to the same target tracking object are aggregated one by traversing each patch, so that all complete target tracking objects in the target depth image are obtained. Therefore, the method and the device classify the blobs contained in the depth image, divide the blobs with the bonding foreground into the patches, match the patches with the target tracking object one by one, and aggregate all the patches belonging to the same tracking target to obtain all complete target tracking objects, so that the bonding foreground is accurately divided under the condition that only the depth image exists, the dividing cost and the operation amount are reduced, the real-time performance of the division is improved, and the method and the device have wide application space.

Second embodiment

In this embodiment, a sticky foreground segmentation apparatus for depth images will be described, and please refer to the above method embodiments for related contents.

Referring to fig. 9, a schematic composition diagram of an apparatus for sticky foreground segmentation of depth images according to this embodiment is provided, where the apparatus includes:

an obtaining unit 901, configured to obtain a target depth image to be segmented; the target depth image comprises a background and a foreground where a target tracking object is located;

a segmentation unit 902, configured to obtain a background in the target depth image and a foreground where the target tracking object is located, and perform connected region segmentation on the target depth image to obtain each connected region blob included in the target depth image;

a classifying unit 903, configured to classify each blob included in the target depth image to obtain each type of blob included in the target depth image;

the dividing unit 904 is configured to divide the blob of the preset type into different small connected regions patch according to a preset dividing rule;

the obtaining unit 905 is configured to aggregate all the patches belonging to the same target tracking object one by traversing each of the patches, so as to obtain all the complete target tracking objects in the target depth image.

In an implementation manner of this embodiment, the classifying unit 903 includes:

In an implementation manner of this embodiment, the predetermined type of blob includes the following three adhesion type blobs:

In an implementation manner of this embodiment, the dividing unit 904 includes:

In an implementation manner of this embodiment, the obtaining unit 905 includes:

the first traversal subunit is used for dividing the patch with high attribution confidence coefficient into the target tracking object during the first traversal of each patch; analyzing the patch with lower attribution confidence coefficient but larger area line by line, and dividing the patch into corresponding target tracking objects according to the confidence coefficient of each line of different segments; marking the patch with lower attribution confidence degree but smaller area as the to-be-processed patch;

In summary, according to the device for segmenting the sticky foreground of the depth image provided by this embodiment, after a target depth image to be segmented, which includes a background and a foreground where a target tracking object is located, is obtained, the background in the target depth image and the foreground where the target tracking object is located are obtained, connected region segmentation is performed on the target depth image, connected region blobs included in the target depth image are obtained, then the connected region blobs included in the target depth image are classified, each type of blob included in the target depth image is obtained, then, a preset type of blob is divided into different small connected regions patch according to a preset division rule, and finally, all patches belonging to the same target tracking object are aggregated one by traversing each patch, so that all complete target tracking objects in the target depth image are obtained. Therefore, the method and the device classify the blobs contained in the depth image, divide the blobs with the bonding foreground into the patches, match the patches with the target tracking object one by one, and aggregate all the patches belonging to the same tracking target to obtain all complete target tracking objects, so that the bonding foreground is accurately divided under the condition that only the depth image exists, the dividing cost and the operation amount are reduced, the real-time performance of the division is improved, and the method and the device have wide application space.

Further, the embodiment of the present application also provides a sticky foreground segmentation apparatus for a depth image, including: a processor, a memory, a system bus;

the processor and the memory are connected through the system bus;

the memory is configured to store one or more programs, the one or more programs including instructions, which when executed by the processor, cause the processor to perform any of the above-described methods of sticky foreground segmentation of depth images.

Further, an embodiment of the present application further provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the instructions are run on a terminal device, the terminal device is caused to execute any implementation method of the foregoing sticky foreground segmentation method for a depth image.

As can be seen from the above description of the embodiments, those skilled in the art can clearly understand that all or part of the steps in the above embodiment methods can be implemented by software plus a necessary general hardware platform. Based on such understanding, the technical solution of the present application may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network communication device such as a media gateway, etc.) to execute the method according to the embodiments or some parts of the embodiments of the present application.

It should be noted that, in the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A sticky foreground segmentation method of a depth image is characterized by comprising the following steps:

2. The method of claim 1, wherein the classifying the blobs included in the target depth image to obtain the blobs of each type included in the target depth image comprises:

3. The method of claim 1, wherein the blob of the preset type comprises three blocking type blobs:

4. The method according to claim 1 or 3, wherein the dividing the blob of the preset type into different small connected regions patch according to a preset dividing rule comprises:

arranging all edges in the blob in an ascending order according to the weight;

5. The method according to claim 1, wherein the aggregating all the patches belonging to the same target tracking object one by traversing each patch to obtain a complete target tracking object comprises:

6. A sticky foreground segmentation apparatus for depth images, comprising:

7. The apparatus of claim 6, wherein the classification unit comprises:

8. The apparatus of claim 6, wherein the blob of the preset type comprises three blocking type blobs:

9. The apparatus according to claim 6 or 8, wherein the dividing unit comprises:

10. The apparatus of claim 6, wherein the obtaining unit comprises:

11. A sticky foreground segmentation apparatus for depth images, comprising: a processor, a memory, a system bus;

the processor and the memory are connected through the system bus;

the memory is to store one or more programs, the one or more programs comprising instructions, which when executed by the processor, cause the processor to perform the method of any of claims 1-5.

12. A computer-readable storage medium having stored therein instructions that, when executed on a terminal device, cause the terminal device to perform the method of any one of claims 1-5.