CN111311603A

CN111311603A - Method and apparatus for outputting target object number information

Info

Publication number: CN111311603A
Application number: CN201811519247.9A
Authority: CN
Inventors: 董博; 李艺
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingdong Qianshi Technology Co Ltd
Priority date: 2018-12-12
Filing date: 2018-12-12
Publication date: 2020-06-19

Abstract

The embodiment of the application discloses a method and a device for outputting target object number information. One embodiment of the method comprises: acquiring a frame image on which at least one target object is displayed, and performing super-pixel segmentation on the frame image; determining the distance between the super pixel in the frame image and the corresponding super pixel in the preset background image; carrying out motion detection on a target object displayed in the frame image to obtain motion information of the target object in the frame image; based on the distance and the motion information of the target object, carrying out image segmentation on the frame image to obtain a foreground area in the frame image, wherein the foreground area comprises an area where the target object displayed in the frame image is located; and extracting image characteristic values of the foreground region, and inputting the extracted image characteristic values into a preset target object prediction regression model to output the number information of the target objects contained in the foreground region. This embodiment improves the accuracy of the predicted target object number information.

Description

Method and apparatus for outputting target object number information

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a method and a device for outputting target object number information.

Background

With the development of computer technology and image processing technology, video-based intelligent monitoring systems are widely used. Plays a great role in ensuring the public safety and traffic safety of the society, protecting the safety of lives and properties of people, ensuring the safe production and product detection in the industrial control field and the related commercial field.

The method has important effects on industries such as supermarkets, markets, transportation and the like by counting target objects such as human flow, vehicle flow and the like to obtain number information. Taking the target object as the pedestrian as an example, the output pedestrian number information can be used for realizing assistance management, and the manpower and material resources are reasonably configured so as to efficiently utilize limited resources, or the crowd density can be reasonably controlled according to the counted pedestrian number information so as to prevent the occurrence of safety accidents caused by overcrowding of the crowd. Therefore, how to accurately acquire the number information of the target objects through the images acquired by monitoring equipment such as a camera and the like plays an important role in daily production and life.

Disclosure of Invention

The embodiment of the application provides a method and a device for outputting target object number information.

In a first aspect, an embodiment of the present application provides a method for outputting information on the number of target objects, where the method includes: acquiring a frame image on which at least one target object is displayed, and performing super-pixel segmentation on the frame image; determining the distance between the super pixel in the frame image and the corresponding super pixel in the preset background image; carrying out motion detection on a target object displayed in the frame image to obtain motion information of the target object in the frame image; based on the distance and the motion information of the target object, carrying out image segmentation on the frame image to obtain a foreground area in the frame image, wherein the foreground area comprises an area where the target object displayed in the frame image is located; and extracting image characteristic values of the foreground region, and inputting the extracted image characteristic values into a preset target object prediction regression model to output the number information of the target objects contained in the foreground region.

In some embodiments, determining a distance between a superpixel in the frame image and a corresponding superpixel in the preset background image comprises: extracting the characteristics of the superpixels in the frame image and the superpixels in the background image; based on the extracted features, Euclidean distances between the superpixels in the frame image and the corresponding superpixels in the preset background image are determined.

In some embodiments, the performing motion detection on the target object displayed in the frame image to obtain motion information of the target object in the frame image includes: acquiring a previous frame image adjacent to the frame image, and performing super-pixel segmentation on the previous frame image; based on an optical flow method, detecting superpixels in a frame image and corresponding superpixels in a previous frame image, and determining the dynamic characteristics of the superpixels in the frame image to obtain the motion information of the target object in the frame image.

In some embodiments, the preset target object prediction regression model is obtained by training through the following steps: acquiring a training sample set, wherein the training sample comprises image characteristic values extracted from a foreground region of a sample image and target object number information contained in the foreground region of the sample image; and establishing a correlation vector machine regression model by adopting a sparse Bayesian learning algorithm, respectively taking image characteristic values extracted from foreground regions of sample images in training samples in a training sample set and target object number information contained in the foreground regions of the sample images as input and expected output of the correlation vector machine regression model, and training the correlation vector machine regression model to obtain a target object prediction regression model.

In some embodiments, the preset target object prediction regression model is obtained by training through the following steps: acquiring a training sample set, wherein the training sample comprises image characteristic values extracted from a foreground region of a sample image and target object number information contained in the foreground region of the sample image; replacing Gaussian distribution in a regression model of a correlation vector machine with Poisson distribution to obtain a sparse Bayesian Poisson regression model; and respectively taking the image characteristic value extracted from the foreground region of the sample image in the training samples in the training sample set and the target object number information contained in the foreground region of the sample image as the input and the expected output of a sparse Bayesian Poisson regression model, and training the sparse Bayesian Poisson regression model to obtain a target object prediction regression model.

In a second aspect, an embodiment of the present application provides an apparatus for outputting information on the number of target objects, where the apparatus includes: a superpixel segmentation unit configured to acquire a frame image on which at least one target object is displayed, and perform superpixel segmentation on the frame image; a determining unit configured to determine a distance between a super pixel in the frame image and a corresponding super pixel in a preset background image; the detection unit is configured to perform motion detection on a target object displayed in the frame image to obtain motion information of the target object in the frame image; the image segmentation unit is configured to perform image segmentation on the frame image to obtain a foreground area in the frame image based on the distance and the motion information of the target object, wherein the foreground area comprises an area where the target object displayed in the frame image is located; and the target object number information output unit is configured to extract image characteristic values of the foreground region, and input the extracted image characteristic values into a preset target object prediction regression model to output target object number information contained in the foreground region.

In some embodiments, the determining unit is further configured to: extracting the characteristics of the superpixels in the frame image and the superpixels in the background image; based on the extracted features, Euclidean distances between the superpixels in the frame image and the corresponding superpixels in the preset background image are determined.

In some embodiments, the detection unit is further configured to: acquiring a previous frame image adjacent to the frame image, and performing super-pixel segmentation on the previous frame image; based on an optical flow method, detecting superpixels in a frame image and corresponding superpixels in a previous frame image, and determining the dynamic characteristics of the superpixels in the frame image to obtain the motion information of the target object in the frame image.

The method and the device for outputting the number information of the target objects, provided by the embodiment of the application, firstly obtain a frame image displaying at least one target object, perform superpixel segmentation on the frame image, then determine a distance between a superpixel in the frame image and a corresponding superpixel in a preset background image, perform motion detection on the target object displayed in the frame image to obtain motion information of the target object in the frame image, then perform image segmentation on the frame image based on the distance and the motion information of the target object to obtain a foreground region in the frame image, finally perform image characteristic value extraction on the foreground region, input the extracted image characteristic value into a preset target object prediction regression model and output the number information of the target object contained in the foreground region, and therefore improve the accuracy of the predicted number information of the target object.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present application may be applied;

FIG. 2 is a flow diagram of one embodiment of a method for outputting target object count information according to the present application;

FIG. 3 is a schematic diagram of an application scenario of a method for outputting target object number information according to the present application;

FIG. 4 is a flow chart of yet another embodiment of a method for outputting target object count information according to the present application;

FIG. 5 is a schematic diagram of an embodiment of an apparatus for outputting target object count information according to the present application;

FIG. 6 is a schematic block diagram of a computer system suitable for use in implementing an electronic device according to embodiments of the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Fig. 1 shows an exemplary system architecture 100 to which embodiments of the present method for outputting target object number information or an apparatus for outputting target object number information may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The

terminal devices

101, 102, 103 may have installed thereon various communication client applications, such as image viewing software, web browsers, search-type applications, instant messaging tools, mailbox clients, social platform software, and the like.

The

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices having a display screen and supporting image saving and browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio layer iii, motion Picture Experts compression standard Audio layer 3), MP4 players (Moving Picture Experts Group Audio layer IV, motion Picture Experts compression standard Audio layer 4), laptop portable computers, desktop computers, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

The server 105 may be a server that provides various services, such as a background server that processes images transmitted on the

terminal devices

101, 102, 103. The background server may perform processing such as segmentation and feature extraction on the received image, and feed back the processing result (e.g., output information on the number of target objects) to the terminal device.

It should be noted that the method for outputting the target object number information provided in the embodiment of the present application is generally performed by the server 105, and accordingly, the apparatus for outputting the target object number information is generally disposed in the server 105.

The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein. It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

It should be further noted that the

terminal devices

101, 102, and 103 may also be installed with an image processing application, and the

terminal devices

101, 102, and 103 may also perform image segmentation and feature extraction on the image to be processed based on the image processing application, in this case, the method for outputting the target object number information may also be executed by the

terminal devices

101, 102, and 103, and accordingly, the apparatus for outputting the target object number information may also be installed in the

terminal devices

101, 102, and 103. At this point, the exemplary system architecture 100 may not have the server 105 and the network 104.

Further, the system architecture 100 may further include an image capturing device (not shown) such as a camera, which is used for capturing images of areas such as a supermarket and an intersection to obtain a frame image and a background image. The above-described

terminal apparatuses

101, 102, 103 may acquire an acquisition frame image and a background image from an image acquisition apparatus so as to transmit the acquired images to the server 105.

With continued reference to FIG. 2, a flow 200 of one embodiment of a method for outputting target object count information in accordance with the present application is shown. The method for outputting the number information of the target objects comprises the following steps:

step 201, acquiring a frame image displaying at least one target object, and performing superpixel segmentation on the frame image.

In the present embodiment, an execution subject (for example, a server shown in fig. 1) of the method for outputting the target object number information may acquire a frame image on which at least one target object is displayed, by a wired connection manner or a wireless connection manner. Then, the execution subject may perform a super-pixel division process on the acquired frame image, thereby dividing the acquired frame image into a number of super-pixels. It can be understood that the frame image may be collected by the camera and then stored in the terminal device, and at this time, the execution main body may acquire the frame image from the terminal device in a wired connection manner or a wireless connection manner; or, the execution main body may also directly acquire the acquired frame image from the camera in a wired connection manner or a wireless connection manner. The target object may be a moving object such as a pedestrian, a vehicle, an animal, and the like, and is not particularly limited herein. It should be noted that the wireless connection means may include, but is not limited to, a 3G/4G connection, a WiFi connection, a bluetooth connection, a WiMAX connection, a Zigbee connection, a uwb (ultra wideband) connection, and other wireless connection means now known or developed in the future.

In general, a super-pixel may refer to an irregular block of pixels with some visual significance made up of adjacent pixels having similar texture, color, brightness, etc. The super-pixel segmentation technique is to group pixels by using the similarity of features between pixels, and to express image features by replacing a large number of pixels with a small number of super-pixels. Therefore, the superpixel segmentation technique can greatly reduce the complexity of image processing. Further, superpixel segmentation may avoid the occurrence of pixel holes and noise as compared to pixel-level segmentation of images.

Step 202, determining the distance between the super pixel in the frame image and the corresponding super pixel in the preset background image.

In this embodiment, the execution subject (e.g., the server shown in fig. 1) described above may acquire a background image in advance. The background image and the frame image may be images captured by the same camera, and the background image is different from the frame image in that the target object does not exist in the background image. Then, the executing body may perform superpixel segmentation on the acquired background image to obtain superpixels of the background image. Finally, based on the superpixels of the frame image obtained in step 201, the execution subject may determine the distance between the superpixels of the frame image and the corresponding superpixels in the background image. It will be appreciated that the resulting distances may be used to characterize the similarity between the superpixels of the frame image and the corresponding superpixels in the background image.

Specifically, the frame image and the background image may be divided into N corresponding superpixels, and the ith superpixel of the frame image may correspond to the ith superpixel of the background image, where i is greater than or equal to 1 and less than or equal to N, and i and N are positive integers. The execution subject may calculate a distance between an ith super pixel of the frame image and an ith super pixel of the background image, and thus the execution subject may determine a distance between each super pixel in the frame image and a corresponding super pixel in the background image. Alternatively, the distance between the super pixel in the frame image and the corresponding super pixel in the background image may be an euclidean distance, a cosine distance, a hamming distance, or the like, and there is no unique limitation here.

In some optional implementations of the present implementation, a distance between a super pixel of the frame image and a corresponding super pixel in the preset background image may be a euclidean distance. The above-mentioned executing body may determine the euclidean distance by: extracting the characteristics of the superpixels in the frame image and the superpixels in the background image; based on the extracted features, Euclidean distances between the superpixels in the frame image and the corresponding superpixels in the preset background image are determined. Specifically, for a frame image including N superpixels, the executing body may perform feature extraction on the ith superpixel in the frame image to obtain the feature of the ith superpixel in the frame image

And x_i＝(x_i1，…，x_ij…，x_iJ) And the executing body can also extract the features of the ith super pixel in the background image to obtain the features of the ith super pixel in the background image

And y is_i＝(y_i1，…，y_ij…，y_iJ). Where N, J, i and J are positive integers, i can range from 1 up to N, J can be used to represent the number of features extracted from the superpixel, and J can range from 1 up to J, x_ijIs the j-th feature, y, of the i-th super-pixel of the frame image_ijThe jth feature of the ith super pixel of the background image. Finally, the euclidean distance d (x, y) between the ith superpixel in the frame image and the ith superpixel in the background image can be calculated using the following formula:

in some optional implementations of this embodiment, in the process of outputting the number information of the target objects acquired by a certain camera, the background image used in the process may be an image that is acquired by the camera most recently and does not have the target object. Therefore, in the method for outputting the number information of the target objects disclosed in this embodiment, the background image is not uniform, and when a new image without the target object is acquired by the camera, the original background image may be replaced with the newly acquired image without the target object.

Step 203, performing motion detection on the target object displayed in the frame image to obtain motion information of the target object in the frame image.

In this embodiment, the executing entity may detect a moving target object in the frame image by using the target object in the frame image as a detection target, thereby obtaining motion information of the target object. Here, the execution subject described above may obtain the motion information of the target object in the frame image by determining the motion information present in each super pixel of the frame image. The motion information present in any superpixel in the frame image can be represented by the probability mean of the motion information of all pixels in the superpixel. Further, the execution subject may perform motion detection on the target object displayed in the frame image by using various means to obtain the target object motion information in the frame image. The executing agent may obtain the motion information of the target object by using different methods such as an inter-frame difference method, a background subtraction method, an optical flow method, or the like, which is not limited herein.

And 204, carrying out image segmentation on the frame image based on the distance and the motion information of the target object to obtain a foreground area in the frame image.

In this embodiment, based on the distance between the super pixel in the frame image obtained in step 202 and the corresponding super pixel in the background image, and based on the motion information of the target object in the frame image obtained in step 203, the execution subject may determine the super pixel belonging to the foreground region in the frame image by combining the distance and the motion information of the target object, so as to segment the foreground region in the frame image to obtain the foreground region of the frame image. Therefore, the executing body can realize the purpose of segmenting the foreground area and the background area of the frame image to obtain the foreground area of the frame image. The foreground region of the frame image comprises a region where the target object is located in the frame image.

Step 205, extracting image characteristic values of the foreground region, and inputting the extracted image characteristic values into a preset target object prediction regression model to output the number information of the target objects contained in the foreground region.

In this embodiment, a target object prediction regression model may be trained in advance, and the target object prediction regression model may be used to represent a correspondence between image features in a foreground region and target object number information included in the foreground region. After the execution subject obtains the foreground regions in the frame image, the execution subject may extract image feature values of the independent foreground regions, and then input the extracted image feature values of the independent foreground regions into the target object prediction regression model, and the target object prediction regression model may output information on the number of target objects displayed in the corresponding foreground regions.

In some optional implementations of the present embodiment, the target object prediction regression model may be obtained by training through the following steps:

in a first step, a set of training samples is obtained. The training sample set may include a plurality of training samples, and each training sample may include image feature values extracted from a foreground region of a sample image and target object number information included in the foreground region of the sample image;

and secondly, establishing a correlation vector machine regression model by adopting a sparse Bayesian learning algorithm, and respectively taking image features extracted from a foreground region of a sample image in training samples in a training sample set and target object number information contained in the foreground region of the sample image as input and expected output of the established correlation vector machine regression model so as to train the correlation vector machine regression model, wherein the trained correlation vector machine regression model is the target object prediction regression model.

It can be understood that the relevance vector machine regression model established by the sparse Bayesian learning algorithm has the characteristic of high calculation speed of the sparse Bayesian algorithm, so that the calculation cost of model training is reduced, and the calculation resources of the model training are saved. The execution subject inputs image feature values extracted from the foreground region of the frame image into a trained target object prediction regression model, which can output information on the number of target objects contained in the foreground region.

In some optional implementations of the present embodiment, the target object prediction regression model may be further trained by:

secondly, substituting the Gaussian distribution in the established correlation vector machine regression model by Poisson distribution to obtain a sparse Bayesian Poisson regression model;

and thirdly, respectively taking the image characteristic value extracted from the foreground region of the sample image in the training samples in the obtained training sample set and the target object number information contained in the foreground region of the sample image as the input and the expected output of a sparse Bayesian Poisson regression model so as to train the sparse Bayesian Poisson regression model, thereby obtaining a target object prediction regression model.

It can be understood that the target object prediction regression model obtained through the training in the above steps may be a sparse bayesian regression model based on a correlation vector machine, and the sparse processing of the model may reduce the calculation cost of the model training. Furthermore, the model adopts the characteristic that Poisson regression is an integer, so that the number of the target objects output by the target object prediction regression model is an integer, and the model is more in line with the actual requirement. The executing body inputs the image feature value extracted from the foreground region of the frame image into a trained target object prediction regression model, and the target object prediction regression model can output the number of target objects which are contained in the foreground region and are integers.

In some optional implementations of this embodiment, the executing body may perform image feature value extraction on each independent foreground region, and the extracted image feature value may include at least one of: region size information, pixel information, region perimeter information, perimeter and region size ratio, boundary information, texture information, and shape information.

The region size information may include the number of pixels in the independent foreground region. The pixel information may include mean, variance, histogram, etc. of pixels in the individual foreground regions. The region perimeter information may include the number of pixels contained by the perimeter of the independent foreground region. The perimeter and region size ratio may be a ratio of region perimeter information and region size information of the individual foreground regions. The boundary information may include the number of boundary pixels within the independent foreground region, where the inner boundary may be extracted by a boundary detection algorithm. The texture information may be represented by energy characteristics of a Gray-Level Co-occurrrence Matrix (GLCM) of the independent foreground region. The shape information may characterize the relief of the shape of the individual foreground regions, for example, may be represented by a ratio of the perimeter of the convex polygon of the individual foreground regions to the perimeter of the individual foreground regions.

With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the method for outputting target object number information according to the present embodiment. The application scene shown in fig. 3 may include 3a-3e, which is to count the number of pedestrians in a frame image acquired by a camera in a supermarket, where two independent pedestrians are displayed in the frame image, as shown in fig. 3a, a server may perform superpixel segmentation on the frame image to obtain a superpixel-segmented frame image, as shown in fig. 3b, where mesh information in the frame image is an edge of a superpixel; thereafter, the server may determine the distance between the super-pixel in the frame image and the corresponding super-pixel in the preset background image, as shown in fig. 3 c; then, the server carries out motion detection on the pedestrians displayed in the frame image to obtain motion information of the pedestrians in the frame image; then, the server performs image segmentation on the frame image based on the distance and the motion information of the pedestrians to obtain two independent pedestrians of the frame image, as shown in fig. 3d, the two pedestrians are two independent foreground regions (foreground region 1 and foreground region 2) of the frame image; finally, image feature value extraction is performed on the foreground region 1 and the foreground region 2, the extracted image feature values of the foreground regions are input into a preset target object prediction regression model (the target object prediction regression model in the application scene is a pedestrian prediction regression model shown in the figure), the target object prediction regression model can output pedestrian number information contained in the foreground regions, and as shown in fig. 3e, "the number of people" is output in frames corresponding to the two foreground regions: 1".

The method for outputting the number information of the target objects according to the embodiment of the present application obtains a frame image on which at least one target object is displayed, performs superpixel segmentation on the frame image, then determines a distance between a superpixel in the frame image and a corresponding superpixel in a preset background image, then performs motion detection on the target object displayed in the frame image to obtain motion information of the target object in the frame image, performs image segmentation on the frame image based on the distance and the motion information of the target object to obtain a foreground region of the frame image, and finally performs image feature extraction on the foreground region, and inputs an extracted image feature value into a preset target object prediction regression model to output the number information of the target object included in the foreground region. The scheme disclosed by the embodiment can avoid the problems of noise, holes and the like by performing super-pixel level segmentation on the frame image, and improves the accuracy of foreground region extraction, thereby improving the accuracy of the predicted target object number information.

With further reference to FIG. 4, a flow 400 of another embodiment of a method for outputting target object count information is shown. The process 400 of the method for outputting the information on the number of target objects includes the following steps:

step 401, acquiring a frame image displaying at least one target object, and performing superpixel segmentation on the frame image.

In the present embodiment, an execution subject (for example, a server shown in fig. 1) of the method for outputting the target object number information may acquire a frame image on which at least one target object is displayed, by a wired connection manner or a wireless connection manner. Then, the execution subject may perform a super-pixel division process on the acquired frame image, thereby dividing the acquired frame image into a number of super-pixels. It can be understood that the frame image may be collected by the camera and then stored in the terminal device, and at this time, the execution main body may acquire the frame image from the terminal device in a wired connection manner or a wireless connection manner; or, the execution main body may also directly acquire the acquired frame image from the camera in a wired connection manner or a wireless connection manner.

Among various superpixel segmentation algorithms, the SLIC algorithm (simple iterative clustering algorithm) has the advantages of low memory occupation, high speed, few parameters, high accuracy of extracted boundary information and the like, so that the execution main body can perform superpixel segmentation on a frame image by using the existing SLIC algorithm. It is understood that the executing entity may also perform superpixel segmentation on the frame image by using other methods, which are not limited herein.

Step 402, determining the distance between the super pixel in the frame image and the corresponding super pixel in the preset background image.

In this embodiment, the execution subject (e.g., the server shown in fig. 1) described above may acquire a background image in advance. It should be noted that the background image and the frame image may be images captured by the same camera, and the background image is different from the frame image in that there is no target object in the background image. Then, the executing body may perform superpixel segmentation on the acquired background image to obtain superpixels of the background image. Finally, based on the superpixels of the frame image obtained in step 401, the execution subject may determine the distance between the superpixels of the frame image and the corresponding superpixels in the background image.

And step 403, acquiring a previous frame image adjacent to the frame image, and performing super-pixel segmentation on the previous frame image.

In this embodiment, based on the frame image acquired in step 401, the executing body may determine a previous frame image adjacent to the frame image in the acquired image and acquire the previous frame image. Then, super-pixel segmentation is performed on the acquired previous frame image by using, for example, an SLIC algorithm, so as to obtain the super-pixels of the previous frame image.

Step 404, detecting superpixels in the frame image and corresponding superpixels in the previous frame image based on an optical flow method, and determining the dynamic characteristics of the superpixels in the frame image to obtain the motion information of the target object in the frame image.

In this embodiment, the execution subject may detect a superpixel in a frame image and a corresponding superpixel in a previous frame image by using an optical flow method so as to determine a dynamic feature of the superpixel in the frame image. It is understood that the execution subject can obtain the probability of motion information of each super pixel in the frame image by using the dynamic characteristics of the super pixels in the obtained frame image. As an example, the probability of motion information of the ith super-pixel block in the frame image may be represented by the mean of the probabilities of motion information of all pixels within the ith super-pixel. And finally, combining the probability of the motion information of each super pixel in the frame image to obtain the motion information of the target object in the frame image.

Step 405, based on the distance and the motion information of the target object, performing image segmentation on the frame image to obtain a foreground region in the frame image.

In this embodiment, based on the distance obtained in step 402 and the motion information of the target object obtained in step 404, the executing entity may determine, in the frame image, a superpixel belonging to the foreground region by combining the distance and the motion information of the target object, so as to segment the foreground region in the frame image to obtain the foreground region of the frame image. It can be understood that the optical flow method can only detect moving objects in general, and if the target object is detected only by the optical flow method, the detection of the moving features may be missed when the moving amplitude of the target object is not large or the target object is static. In the solution disclosed in this embodiment, the executing body may perform foreground region segmentation by combining the distance between the super pixel in the frame image and the corresponding super pixel in the background image and the target object motion information of the super pixel in the frame image, so as to improve the accuracy of the foreground region segmented in the frame image.

Specifically, the probability that the ith super pixel in the frame image belongs to the foreground region can be calculated by the following formula:

wherein p is_iIs the probability, p, that the ith super-pixel in the frame image belongs to the foreground region_i ^distIs the distance between the ith superpixel in the frame image and the ith superpixel in the background image,

α is a preset coefficient for the motion information of the target object of the ith super pixel in the frame image obtained based on the optical flow method.

It can be understood that the above-mentioned implementation subject calculates the probability p that the ith super-pixel belongs to the foreground region_iThen, p can be substituted_iComparing with a preset threshold value, and determining p_iDetermining the ith super pixel as the foreground area of the frame image under the condition that the number of the super pixels is larger than a preset threshold valueOtherwise, determining the ith super pixel as a background area of the frame image.

In the prior art, a background subtraction method is usually adopted to obtain motion information of a target object, and the method determines a foreground region of an image based on difference information between the foreground region and a background region in the image, so that for a situation that a part of colors of the target object are similar to the background region, for example, colors of clothes, hats and the like of pedestrians are similar to the background region, the method affects a result of image foreground region segmentation. In the scheme disclosed by the embodiment, in order to solve the problem, an optical flow method is introduced to detect the motion information of the target object, so that the accuracy of the foreground area of the obtained frame image is improved.

And step 406, extracting image characteristic values of the foreground region, and inputting the extracted image characteristic values into a preset target object prediction regression model to output the number information of the target objects contained in the foreground region.

As can be seen from fig. 4, compared with the embodiment corresponding to fig. 2, the flow 400 of the method for outputting the target object number information in the present embodiment highlights the step of obtaining the motion information of the target object. Therefore, the scheme described in this embodiment can perform segmentation of the foreground region of the frame image by combining the characteristics of the superpixel and the optical flow method, and improves the accuracy of foreground region segmentation.

With further reference to fig. 5, as an implementation of the methods shown in the above figures, the present application provides an embodiment of an apparatus for outputting information on the number of target objects, which corresponds to the embodiment of the method shown in fig. 2, and which is particularly applicable to various electronic devices.

As shown in fig. 5, the apparatus 500 for outputting the target object number information of the present embodiment includes: a superpixel segmentation unit 501, a determination unit 502, a detection unit 503, an image segmentation unit 504, and a target object number information output unit 505. The super-pixel segmentation unit 501 is configured to acquire a frame image on which at least one target object is displayed, and perform super-pixel segmentation on the frame image; the determining unit 502 is configured to determine a distance between a super pixel in the frame image and a corresponding super pixel in a preset background image; the detection unit 503 is configured to perform motion detection on the target object displayed in the frame image, and obtain motion information of the target object in the frame image; the image segmentation unit 504 is configured to perform image segmentation on the frame image to obtain a foreground region in the frame image based on the distance and the motion information of the target object, wherein the foreground region includes a region where the target object displayed in the frame image is located; the target object number information output unit 505 is configured to perform image feature value extraction on the foreground region, and input the extracted image feature values into a preset target object prediction regression model to output target object number information contained in the foreground region.

In some optional implementations of the present embodiment, the determining unit 502 is further configured to: extracting the characteristics of the superpixels in the frame image and the superpixels in the background image; based on the extracted features, Euclidean distances between the superpixels in the frame image and the corresponding superpixels in the preset background image are determined.

In some optional implementations of this embodiment, the detecting unit 503 is further configured to: acquiring a previous frame image adjacent to the frame image, and performing super-pixel segmentation on the previous frame image; based on an optical flow method, detecting superpixels in a frame image and corresponding superpixels in a previous frame image, and determining the dynamic characteristics of the superpixels in the frame image to obtain the motion information of the target object in the frame image.

In some optional implementations of this embodiment, the preset target object prediction regression model is obtained by training through the following steps: acquiring a training sample set, wherein the training sample comprises image characteristic values extracted from a foreground region of a sample image and target object number information contained in the foreground region of the sample image; and establishing a correlation vector machine regression model by adopting a sparse Bayesian learning algorithm, respectively taking image characteristic values extracted from foreground regions of sample images in training samples in a training sample set and target object number information contained in the foreground regions of the sample images as input and expected output of the correlation vector machine regression model, and training the correlation vector machine regression model to obtain a target object prediction regression model.

In some optional implementations of this embodiment, the preset target object prediction regression model is obtained by training through the following steps: acquiring a training sample set, wherein the training sample comprises image characteristic values extracted from a foreground region of a sample image and target object number information contained in the foreground region of the sample image; replacing Gaussian distribution in a regression model of a correlation vector machine with Poisson distribution to obtain a sparse Bayesian Poisson regression model; and respectively taking the image characteristic value extracted from the foreground region of the sample image in the training samples in the training sample set and the target object number information contained in the foreground region of the sample image as the input and the expected output of a sparse Bayesian Poisson regression model, and training the sparse Bayesian Poisson regression model to obtain a target object prediction regression model.

The units recited in the apparatus 500 correspond to the various steps in the method described with reference to fig. 2 and 4. Thus, the operations and features described above for the method are equally applicable to the apparatus 500 and the units included therein, and are not described in detail here.

Referring now to FIG. 6, shown is a block diagram of a computer system 600 suitable for use in implementing the electronic device of an embodiment of the present application. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 6, the computer system 600 includes a Central Processing Unit (CPU)601 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the system 600 are also stored. The CPU 601, ROM 602, and RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output portion 607 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The driver 610 is also connected to the I/O interface 605 as needed. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 610 as necessary, so that a computer program read out therefrom is mounted in the storage section 608 as necessary.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 609, and/or installed from the removable medium 611. The computer program performs the above-described functions defined in the method of the present application when executed by a Central Processing Unit (CPU) 601. It should be noted that the computer readable medium described herein can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes a superpixel segmentation unit, a determination unit, a detection unit, an image segmentation unit, and a target object number information output unit. The names of these units do not in some cases constitute a limitation on the units themselves, and for example, the super-pixel division unit may also be described as a "unit that acquires a frame image on which at least one target object is displayed, and performs super-pixel division on the frame image".

As another aspect, the present application also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be present separately and not assembled into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to: acquiring a frame image on which at least one target object is displayed, and performing super-pixel segmentation on the frame image; determining the distance between the super pixel in the frame image and the corresponding super pixel in the preset background image; carrying out motion detection on a target object displayed in the frame image to obtain motion information of the target object in the frame image; based on the distance and the motion information of the target object, carrying out image segmentation on the frame image to obtain a foreground area in the frame image, wherein the foreground area comprises an area where the target object displayed in the frame image is located; and extracting image characteristic values of the foreground region, and inputting the extracted image characteristic values into a preset target object prediction regression model to output the number information of the target objects contained in the foreground region.

The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims

1. A method for outputting target object number information, comprising:

acquiring a frame image on which at least one target object is displayed, and performing super-pixel segmentation on the frame image;

determining the distance between the super pixel in the frame image and the corresponding super pixel in a preset background image;

performing motion detection on the target object displayed in the frame image to obtain motion information of the target object in the frame image;

performing image segmentation on the frame image based on the distance and the motion information of the target object to obtain a foreground region in the frame image, wherein the foreground region comprises a region where the target object displayed in the frame image is located;

and extracting image characteristic values of the foreground region, and inputting the extracted image characteristic values into a preset target object prediction regression model to output the number information of the target objects contained in the foreground region.

2. The method of claim 1, wherein the determining a distance between a superpixel in the frame image and a corresponding superpixel in a preset background image comprises:

performing feature extraction on the super pixels in the frame image and the super pixels in the background image;

determining Euclidean distance between the superpixels in the frame image and the corresponding superpixels in the preset background image based on the extracted features.

3. The method according to claim 1, wherein the performing motion detection on the target object displayed in the frame image to obtain motion information of the target object in the frame image comprises:

acquiring a previous frame image adjacent to the frame image, and performing super-pixel segmentation on the previous frame image;

detecting superpixels in the frame image and corresponding superpixels in the previous frame image based on an optical flow method, and determining the dynamic characteristics of the superpixels in the frame image to obtain the motion information of the target object in the frame image.

4. The method of claim 1, wherein the pre-set target object prediction regression model is trained by:

acquiring a training sample set, wherein the training sample comprises image characteristic values extracted from a foreground region of a sample image and target object number information contained in the foreground region of the sample image;

establishing a correlation vector machine regression model by adopting a sparse Bayesian learning algorithm, respectively taking image characteristic values extracted from foreground regions of sample images in training samples in the training sample set and target object number information contained in the foreground regions of the sample images as input and expected output of the correlation vector machine regression model, and training the correlation vector machine regression model to obtain the target object prediction regression model.

5. The method according to any one of claims 1 to 4, wherein the pre-set target object prediction regression model is trained by:

replacing Gaussian distribution in a regression model of a correlation vector machine with Poisson distribution to obtain a sparse Bayesian Poisson regression model;

and respectively taking the image characteristic value extracted from the foreground region of the sample image in the training samples in the training sample set and the target object number information contained in the foreground region of the sample image as the input and the expected output of the sparse Bayesian Poisson regression model, and training the sparse Bayesian Poisson regression model to obtain the target object prediction regression model.

6. An apparatus for outputting target object number information, comprising:

a superpixel segmentation unit configured to acquire a frame image on which at least one target object is displayed, perform superpixel segmentation on the frame image;

a determining unit configured to determine a distance between a super pixel in the frame image and a corresponding super pixel in a preset background image;

the detection unit is configured to perform motion detection on a target object displayed in the frame image to obtain motion information of the target object in the frame image;

the image segmentation unit is configured to perform image segmentation on the frame image based on the distance and the motion information of the target object to obtain a foreground region in the frame image, wherein the foreground region comprises a region where the target object displayed in the frame image is located;

and the target object number information output unit is configured to extract image characteristic values of the foreground region, and input the extracted image characteristic values into a preset target object prediction regression model to output the target object number information contained in the foreground region.

7. The apparatus of claim 6, wherein the determination unit is further configured to:

8. The apparatus of claim 6, wherein the detection unit is further configured to:

9. The apparatus of claim 6, wherein the pre-set target object prediction regression model is trained by:

10. The apparatus according to one of claims 6 to 9, wherein the pre-set target object prediction regression model is trained by:

11. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-5.

12. A computer-readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the method of any one of claims 1-5.