WO2022228142A1

WO2022228142A1 - Object density determination method and apparatus, computer device and storage medium

Info

Publication number: WO2022228142A1
Application number: PCT/CN2022/086848
Authority: WO
Inventors: 王昌安
Original assignee: 腾讯科技（深圳）有限公司
Priority date: 2021-04-26
Filing date: 2022-04-14
Publication date: 2022-11-03
Also published as: CN112862023A; CN112862023B

Abstract

The present application relates to an object density determination method and apparatus, a computer device and a storage medium, which can be applied to scenarios such as intelligent transportation and intelligent supermarkets. The method comprises: inputting a training sample image into an object density determination model to be trained, so as to obtain a predictive density map output by the object density determination model; acquiring a plurality of standard image blocks corresponding to a standard density map and a plurality of predicted image blocks corresponding to the predictive density map; respectively compiling statistics on object densities in the standard image blocks and the predicted image blocks, so as to obtain a standard density statistical value corresponding to each standard image block and a predictive density statistical value corresponding to each predicted image block; and training the object density determination model on the basis of the difference between the standard density statistical value and the predictive density statistical value that correspond to an image pair. The object density determination model is an artificial intelligence model, and the object density determination model can be deployed in a cloud server, thereby improving an artificial intelligence cloud service.

Description

Object density determination method, apparatus, computer equipment and storage medium

This application claims the priority of the Chinese patent application filed on April 26, 2021 with the application number 202110453975X and the application title is "Object Density Determination Method, Apparatus, Computer Equipment and Storage Medium", the entire contents of which are by reference Incorporated in this application.

technical field

The present application relates to the technical field of image processing, and in particular, to a method, apparatus, computer equipment and storage medium for determining the density of an object.

Background technique

With the development of image processing technology in artificial intelligence, a technology to determine the density of objects based on images has emerged. The object density determination technology can automatically infer the density of the crowd in the image, which plays an important role in video surveillance, public transportation safety and other fields.

In the traditional technology, when determining the object density, the method of object density map regression is mainly used for prediction, and the deep learning technology based on artificial intelligence is used for end-to-end training and reasoning. However, the density values in the object density map output by the trained object density determination model are often inaccurate, resulting in low accuracy of the acquired object density map.

SUMMARY OF THE INVENTION

According to various embodiments provided in the present application, an object density determination method, apparatus, computer device and storage medium are provided.

An object density determination method, executed by computer equipment, the method comprising: acquiring a training sample image and a standard density map corresponding to the training sample image; inputting the training sample image into an object density determination model to be trained, and obtaining The predicted density map output by the object density determination model; the standard density map and the predicted density map are respectively divided to obtain a plurality of standard image blocks corresponding to the standard density map and multiple standard image blocks corresponding to the predicted density map. Counting the object density in the standard image block to obtain the standard density statistic value corresponding to the standard image block, and performing statistics on the object density in the predicted image block to obtain the predicted image block The corresponding predicted density statistic value; and the standard image block and the predicted image block that has an image position corresponding relationship with the standard image block form an image pair, based on the standard density statistic value corresponding to the image pair and the predicted density statistic value The difference between the values, the parameters of the object density determination model to be trained are adjusted, and the trained object density determination model is obtained, and the trained object density determination model is used to generate an object density map.

An object density determination device, the device comprising: an image acquisition module for acquiring a training sample image and a standard density map corresponding to the training sample image; an image input module for inputting the training sample image into an image to be trained In the object density determination model, the predicted density map output by the object density determination model is obtained; the image division module is used for dividing the standard density map and the predicted density map respectively, and obtains the multi-point density map corresponding to the standard density map. a standard image block and a plurality of predicted image blocks corresponding to the predicted density map; a density statistics module, configured to perform statistics on object densities in the standard image block to obtain a standard density statistic value corresponding to the standard image block, Counting object densities in the predicted image block to obtain a predicted density statistic value corresponding to the predicted image block; and a training module configured to compare the standard image block and the image position corresponding relationship with the standard image block The predicted image blocks form image pairs, and based on the difference between the standard density statistic value corresponding to the image pair and the predicted density statistic value, the parameters of the object density determination model to be trained are adjusted, and the trained object density determination model is obtained, The trained object density determination model is used to generate an object density map.

A computer device comprising a memory and one or more processors, the memory stores computer-readable instructions, the computer-readable instructions, when executed by the processor, cause the processor to perform the above object density determination steps of the method.

One or more non-volatile readable storage media storing computer readable instructions which, when executed by one or more processors, cause the processors to perform the steps of the above object density determination method.

A computer program product comprising computer readable instructions that, when executed by a processor, implement the steps of the above object density determination method.

According to various embodiments provided in the present application, another object density determination method, apparatus, computer device and storage medium are also provided.

An object density determination method, executed by computer equipment, the method comprising: acquiring a target image whose density is to be determined; inputting the target image into a trained object density determination model, and performing the object density determination by the object density determination model Determine; the object density determination model is obtained by adjusting the parameters of the object density determination model to be trained based on the difference between the standard density statistic value corresponding to the image pair and the predicted density statistic value; wherein, the image pair is a standard It consists of an image block and a predicted image block that has an image position corresponding relationship with the standard image block, and the standard image block is obtained by dividing the standard density map corresponding to the training sample image; the predicted image block is obtained by The predicted density map is obtained by dividing the predicted density map, and the predicted density map is obtained by inputting the training sample image into the object density determination model to be trained; and obtaining the corresponding target image output by the object density determination model. Object density map.

An object density determination device, the device comprises: an image acquisition module for acquiring a target image whose density is to be determined; a density determination module for inputting the target image into a trained object density determination model, through the The object density determination model is used to determine the object density; the object density determination model is obtained by adjusting the parameters of the object density determination model to be trained based on the difference between the standard density statistic value corresponding to the image pair and the predicted density statistic value; wherein , the image pair is composed of a standard image block and a predicted image block that has an image position correspondence with the standard image block, and the standard image block is obtained by dividing the standard density map corresponding to the training sample image; The predicted image block is obtained by dividing the predicted density map, and the predicted density map is obtained by inputting the training sample image into the object density determination model to be trained; and a density map acquisition module for acquiring The object density determines the object density map corresponding to the target image output by the model.

A computer device includes a memory and a processor, the memory stores computer-readable instructions, and when the computer-readable instructions are executed by the processor, causes the processor to perform the steps of the above object density determination method.

The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below. Other features, objects and advantages of the present application will become apparent from the description, drawings and claims.

Description of drawings

In order to illustrate the technical solutions in the embodiments of the present application more clearly, the following briefly introduces the drawings that are used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without creative effort.

Fig. 1 is the application environment diagram of the object density determination method in some embodiments;

2 is a schematic flowchart of a method for determining object density in some embodiments;

3 is a schematic structural diagram of an object density determination model in some embodiments;

4 is a specific schematic diagram of a skip connection in some embodiments;

FIG. 5 is a schematic diagram of image position correspondence in some embodiments;

6 is a schematic flowchart of a step of determining a loss value weight of an image to a loss value in some embodiments;

7 is a schematic flowchart of a method for determining object density in other embodiments;

8 is a schematic diagram of a Gaussian kernel at two different sizes of human heads in some embodiments;

9 is a schematic diagram of the application of the object density determination method in some embodiments;

FIG. 10 is a structural block diagram of an apparatus for determining object density in some embodiments;

FIG. 11 is a structural block diagram of an apparatus for determining object density in other embodiments; and

Figure 12 is a diagram of the internal structure of a computer device in some embodiments.

Detailed ways

In order to make the purpose, technical solutions and advantages of the present application more clearly understood, the present application will be described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, but not to limit the present application.

It should be noted that, in this embodiment of the present application, the size of the standard density map and the predicted density map obtained from the same training sample image are the same, and the following correspondences exist:

1) Position correspondence between pixels: when the standard density map and the predicted density map are established in the same way as the coordinate system, if the coordinates between a pixel in the standard density map and a pixel in the predicted density map are the same, Then these two pixels are the pixels corresponding to the positions.

2) Position correspondence between image blocks: When the standard density map is divided to obtain multiple standard image blocks, and the predicted density map is divided to obtain multiple predicted image blocks, if all the Pixels have corresponding pixels in another predicted image block, and all pixels in the predicted image block have corresponding pixels in the standard image block, then there is an image position correspondence between these two image blocks. .

It should also be noted that the multiple mentioned in the embodiments of the present application refers to at least two.

The object density model provided by the embodiments of the present application can be applied to cloud services based on artificial intelligence. For example, the object density model can be deployed in a cloud server, and the cloud server obtains the target image whose density is to be determined, determines the object density map corresponding to the target image based on the object density model, and returns it to the terminal for display.

In the object density determination method provided by the embodiment of the present application, the training sample image and the object density map generated by the object density determination model can be saved on the blockchain. The blockchain can generate a query code for the saved training sample image and object density map respectively, and return the generated query code to the terminal. Based on the query code corresponding to the training sample image, the training sample image can be queried, based on the object density map corresponding The query code can be used to query the object density map.

The solutions provided in the embodiments of the present application relate to technologies such as computer vision and machine learning of artificial intelligence, and are specifically described by the following examples:

The object density determination method provided in this application can be applied to the application environment shown in FIG. 1 . The terminal 102 and the camera device 106 respectively communicate with the server 104 through the network. The network may be a wired network or a wireless network, and the wireless network may be any one of a local area network, a metropolitan area network, and a wide area network.

The terminal 102 may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, etc., but is not limited thereto. The server 104 may be an independent physical server, or a server cluster or a distributed system composed of multiple physical servers, or may provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, Cloud servers for basic cloud computing services such as middleware services, domain name services, security services, CDN, and big data and artificial intelligence platforms. Camera device 106 may include one or more cameras.

In the method or device for determining the density of objects provided by the embodiments of the present application, a plurality of servers can be formed into a blockchain, and the servers are nodes on the blockchain.

In some embodiments, the server 104 trains the object density determination model to be trained through the acquired training sample images, and after obtaining the trained object density determination model, deploys the object density determination model, after which the server 104 may receive the camera device 106 The images collected and transmitted in real time, the object density is determined on these images to obtain the object density map, and the object density map is sent to the terminal, and the terminal can display the object density map in the form of a heat map.

In some embodiments, after the server 104 trains the object density determination model to be trained through the acquired training sample images and obtains the trained object density determination model, when receiving the request from the terminal 102, the server 104 can use wired or wireless method, the trained object density determination model is sent to the terminal 102, and the terminal 102 receives the trained object density determination model and deploys it. When the user uses the terminal to process the image, the terminal can The object density determination model processes image data to realize object density determination.

In some embodiments, as shown in FIG. 2 , a method for determining the density of objects is provided, and the method for determining the density of objects can be applied to a computer device, and the computer may be a terminal or a server in FIG. 1 , and may also be a terminal and a server. The interactive system composed, the method specifically includes the following steps:

Step 202: Obtain the training sample image and the standard density map corresponding to the training sample image.

The training sample images refer to images used for supervised training of the object density determination model to be trained. One or more target objects are included in the training sample images. The target object may specifically be an independent living body or object, such as a natural person, an animal, a vehicle, a virtual character, etc., or a specific part, such as a head, a hand, and the like. Due to the supervised training, there is a corresponding standard density map for the training sample images. The standard density map is a density map that truly reflects the object density of the training sample images, and is a density map that supervises model training. The standard density map corresponding to the training sample image may be a density map determined according to the object position points in the training sample image. The density map reflects the number of objects in each position of the image. For example, the crowd density map can reflect the average number of people in the corresponding position of the unit pixel in the actual scene. The density map can determine the total number of target objects in the image.

In some embodiments, the computer device may acquire an image marked with an object position point of the target object as a training sample image, and the object position point may specifically be the position center point of the target object. For example, when the target object is a natural person, in the training sample image The object position point of is the center point of the human head.

For example, a computer device can acquire an image containing one or more target objects by taking a picture of a scene containing one or more target objects, and the image containing the target object can be used as a training sample image after the object position points are manually marked ; The computer device can also obtain images including one or more target objects and marked object position points from a third-party computer device in a wired or wireless manner as a training sample image.

After acquiring the training sample image, the computer device determines an object response map corresponding to the training sample image according to the object position points corresponding to the training sample image, and obtains a standard density map corresponding to the training sample image according to the object response map.

In other embodiments, the computer device may also directly acquire the image for which the standard density map has been determined as a training sample image. For example, the computer device may obtain images for which a standard density map has been determined from a public database of a third party as a training sample image.

Step 204 , input the training sample image into the object density determination model to be trained, and obtain a predicted density map output by the object density determination model.

The object density determination model to be trained refers to an object density determination model that needs to be trained to determine model parameters. The object density determination model is a machine learning model for determining the density of target objects in an image. The object density determination model may employ a deep learning model comprising multiple convolutional neural networks.

Specifically, the computer equipment inputs the training sample images into the object density determination model to be trained, the object density determination model predicts the object density in the training sample images, and obtains a predicted density map, and the computer equipment obtains the predicted density output by the object density determination model picture. It can be understood that both the standard density map and the predicted density map are obtained based on the same training sample image, so the standard density map and the predicted density map can be images of the same size.

In some embodiments, the object density determination model includes an encoding layer, a decoding layer, and a prediction layer; inputting the training sample images into the object density determination model to be trained, and obtaining the predicted density map output by the object density determination model includes: Input the coding layer, perform downsampling processing through the coding layer to obtain the first target feature, then input the first target feature into the decoding layer, perform upsampling processing through the decoding layer to obtain the second target feature, and finally input the second target feature into the prediction layer, the density prediction is performed through the prediction layer, and the predicted density map is obtained.

Among them, the coding layer and decoding layer can use VGGnet ((Visual Geometry Group, Oxford University Computer Vision Group) series of neural networks, ResNet (residual network) series of neural networks, etc. Among them, the VGGnet series of neural networks are developed by Oxford University Computer Vision The deep convolutional neural network developed by the Visual Geometry Group and researchers from Google DeepMind is composed of 5 layers of convolution layers, 3 layers of fully connected layers, and a softmax output layer. The layers use max-pooling ( The maximization pool) is separated, the activation units of all hidden layers use the ReLU function, and the ResNet series of neural networks are neural networks constructed by residual blocks.

The high-level semantic information of the training sample image can be extracted by down-sampling the training sample image through the encoding layer, and the obtained first target feature is a low-resolution image with high-level semantic information. The high-level semantic information is restored to higher-resolution semantic information, and the final second target feature is a high-resolution feature image with high-level semantic information.

In some embodiments, the encoding layer and the decoding layer adopt skip links; the encoding layer includes a plurality of first convolutional layers; the decoding layer includes a plurality of second convolutional layers; the training sample images are input into the encoding layer, and the downlink is performed by the encoding layer. The sampling process to obtain the first target feature includes: in the coding layer, down-sampling the intermediate feature output by the previous first convolutional layer through the current first convolutional layer, and obtaining the output of the last first convolutional layer as the first convolutional layer. A target feature; inputting the first target feature into the decoding layer, performing upsampling processing through the decoding layer, and obtaining the second target feature includes: at the decoding layer, passing the current second convolutional layer according to the middle output of the previous second convolutional layer. The features and the intermediate features output by the connected first convolutional layer are up-sampled, and the output of the last second convolutional layer is obtained as the second target feature.

Through the skip link, the features output by the earlier convolutional layers can be integrated and the features output by the later convolutional layers can be used as the input of a convolutional layer, so that the input features of the convolutional layer include the multiple convolutional layers. The contextual features with high-level semantic information obtained by the layer-by-step convolution process also include local detailed information, and the extracted features are more complete and accurate.

For example, as shown in FIG. 3 , it is a schematic structural diagram of an object density determination model in some specific embodiments. Among them, the coding layer includes five first convolution layers connected end to end, and each first convolution layer performs convolution processing on the intermediate features output by the previous convolution layer to realize down-sampling, and outputs five features in turn, namely V ₁ , V ₂ , V ₃ , V ₄ , V ₅ , obtain the output feature V ₅ of the first convolutional layer of the last layer as the first target feature, and input the obtained first target feature into the decoding layer. The decoding layer includes five head and tail Connected second convolutional layers, each second convolutional layer is upsampled according to the intermediate features output by the previous second convolutional layer and the intermediate features output by the connected first convolutional layer, and outputs five features in turn, namely P ₅ , P ₄ , P ₃ , P ₂ , P ₁ , obtain the feature P′ ₁ output by the second convolutional layer of the last layer as the second target feature and input it to the prediction layer. On the one hand, the prediction layer passes the second target feature through three The two parallel convolutional layers respectively perform convolution processing on the second target feature, perform channel-wise concatenation between the output of each convolutional layer and the second target feature, and then perform convolution processing through a convolutional layer. The final output predicted density map.

As shown in FIG. 4 , it is a specific schematic diagram of skip connection. Referring to Figure 4, the output feature obtained by upsampling the input feature P _i+1 by the second convolution layer and the intermediate feature output by the skip-connected first convolution layer are firstly channel-concatenated to obtain the intermediate feature, and then pass through the convolutional layer. The accumulation layer is fused to obtain _Pi as the input feature of the next second convolutional layer.

Step 206: Divide the standard density map and the predicted density map respectively to obtain multiple standard image blocks corresponding to the standard density map and multiple predicted image blocks corresponding to the predicted density map.

Specifically, the computer device divides the standard density map to obtain multiple standard image blocks, and divides the predicted density map to obtain multiple predicted image blocks. The division here refers to the area division of the pixels in the image block. In some embodiments, there is at least one image block in a plurality of predicted image blocks that has an image position corresponding relationship with the standard image block. When dividing the image blocks, the computer device may first divide the standard density map to obtain the standard density map. and then divide the predicted density map according to the position of at least one standard image block in the standard density map to obtain a predicted image block that has an image position corresponding relationship with the standard image block.

For example, suppose that the standard density map is divided into four standard image blocks, namely image block A, image block B, image block C and image block D, then the computer equipment can be based on the location of each pixel in image block A. The predicted density map is divided so that the pixels in the predicted density map with the same position as each pixel in the image block A are divided into the same area, and the predicted image block corresponding to the image block A is obtained.

In some embodiments, the computer device may divide the standard density map and the predicted density map respectively by using the same image block division method, so that the number, position and size of the predicted image blocks and the standard image blocks are matched.

In some embodiments, the computer device may acquire a sliding window, slide the sliding window on the standard density map according to a preset sliding method, use the image area within the sliding window as a standard image block, and slide the sliding window according to the preset sliding method. The window is slid on the predicted density map, and the image area in the sliding window is used as the predicted image block, so that the standard image block and the predicted image block with the same size and quantity and the image positions correspond one-to-one can be obtained.

Step 208: Count the density of objects in the standard image block to obtain a standard density statistic value corresponding to the standard image block, and perform statistics on the object density in the predicted image block to obtain a predicted density statistic value corresponding to the predicted image block.

The object density refers to the density value of each pixel in the image block, and the pixel density value is used to represent the density of the object at the location of the pixel. Counting the density of objects in an image block refers to expressing the density values of all pixels in the image block with a statistical value. Average density values, or median density values of all pixels in an image block, etc.

Specifically, after obtaining the standard image block and the predicted image block, the computer device performs statistics on the object density in the standard image block, obtains the standard density statistic value corresponding to the standard image block, and calculates the object density in the predicted image block in the same way. Statistics are performed to obtain the predicted density statistic value corresponding to the predicted image block. For example, assuming that the object densities in the standard image block are accumulated to obtain the standard density statistic value corresponding to the standard image block, then the object densities in the predicted image block are also accumulated to obtain the predicted density statistic value corresponding to the predicted image block. .

In step 210, the standard image block and the predicted image block that has an image position correspondence with the standard image block are formed into an image pair, and based on the difference between the standard density statistic value corresponding to the image pair and the predicted density statistic value, the density of the object to be trained is calculated. The model is determined and parameters are adjusted to obtain an object density determination model after training, and the trained object density determination model is used to generate an object density map.

The image position correspondence between the standard image block and the predicted image block means that the position of the standard image block in the standard density map corresponds to the position of the predicted image block in the predicted density map, then for each There are pixels with the same position in the predicted image block corresponding to its position. The standard density statistic value corresponding to the image pair refers to the standard density statistic value corresponding to the standard image block in the image pair. The predicted density statistic value corresponding to the image pair refers to the predicted density statistic value corresponding to the predicted image block in the image pair.

For example, suppose the standard density map A is divided into 4 standard image blocks A1, A2, A3, A4, and the predicted density map B is divided into 4 predicted image blocks B1, B2, B3 and B4, the image position correspondence is shown in Figure 5, in which the dotted arrows indicate the image position correspondence. As can be seen from Figure 5, there is an image position correspondence between the standard image block A1 and the predicted image block B1. There is an image position correspondence between the image block A2 and the predicted image block B2, an image position correspondence between the standard image block A3 and the predicted image block B3, and an image position correspondence between the standard image block A4 and the predicted image block B4, That is, the positions of the standard image blocks and the predicted image blocks constituting the image pair in the image are consistent.

Specifically, for the divided standard image block, the computer device determines a predicted image block that has an image position correspondence relationship with the standard image block from the multiple predicted image blocks obtained by division, and the standard image block is associated with the predicted image block. An image pair is formed, and based on the difference between the standard density statistic value corresponding to the image pair and the predicted density statistic value, the computer device can obtain the image pair loss value corresponding to the image pair, and the loss value is counted based on the image pair. The target loss value can be obtained, and based on the target loss value, the computer equipment can adjust the parameters of the object density determination model to be trained to obtain the trained object density determination model. The use of the object density determination model to generate the object density map means that the object density determination model can output object density values corresponding to each position in the image, such as the number of people corresponding to each position. In practice, when the object density map needs to be displayed, different forms can be used to reflect the object density value corresponding to each position in the object density map as required. For example, the color corresponding to each object density value can be determined, in the form of a heat map. Displays an object density map.

In some embodiments, when dividing the standard density map and the predicted density map, the computer device slides the same sliding window on the standard density map and the predicted density map in the same sliding manner, so as to obtain the multi-point density map corresponding to the standard density map. standard image blocks and multiple predicted image blocks corresponding to the predicted density map, then when determining the image position correspondence, the computer device can determine the corresponding relationship according to the sliding sequence, number the obtained standard image blocks based on the sliding sequence, and based on the sliding sequence The obtained predicted image blocks are sequentially numbered, and two image blocks with the same number are determined as image blocks with a corresponding relationship of image positions, and the two image blocks are formed into an image pair.

In some embodiments, when there is an image block that does not have an image position correspondence with any one of the standard image blocks in the plurality of predicted image blocks, for each pixel in the image block, calculate the relationship between the pixel and the standard density map. The difference in the density values of the corresponding position pixels, and finally based on the difference in the density values of these pixels and the difference between the standard density statistic value and the predicted density statistic value corresponding to the image pair, the parameters of the object density determination model to be trained are adjusted to obtain The trained object density determines the model.

In some embodiments, after obtaining the trained object density determination model, the computer device may generate an object density map from the object density determination model. Specifically, the target image whose density is to be determined is input into the trained object density determination model, the object density is determined by the object density determination model, and the object density map corresponding to the target image output by the object density determination model is obtained.

In some embodiments, after obtaining the object density map, the computer device may integrate the object density map to obtain the total number of target objects in the target image.

In the above object density determination method, since the standard density map and the predicted density map are divided respectively, multiple standard image blocks corresponding to the standard density map and multiple predicted image blocks corresponding to the predicted density map are obtained, and the standard image blocks are Count the object density of the predicted image block to obtain the standard density statistic value corresponding to the standard image block, and count the object density in the predicted image block to obtain the predicted density statistic value corresponding to the predicted image block, then in the training process, the standard image block can be and the predicted image blocks that have an image position correspondence with the standard image blocks to form an image pair, and based on the difference between the standard density statistic value corresponding to the image pair and the predicted density statistic value, the parameters of the object density determination model to be trained are adjusted, so that The density value of the local area can be fitted in units of image blocks, and the overall density value of the local area is comprehensively considered, which improves the accuracy of the object density determination model obtained by training for determining the object density.

In some embodiments, adjusting parameters of the object density determination model to be trained based on the difference between the standard density statistic value corresponding to the image pair and the predicted density statistic value, and obtaining the trained object density determination model includes: based on the image pair The difference between the corresponding standard density statistic value and the predicted density statistic value is used to obtain the image pair loss value corresponding to the image pair; the image pair loss value is counted to obtain the target loss value; the object density to be trained is based on the target loss value Determine the model and adjust the parameters to obtain the object density determination model after training.

In this embodiment, each standard image block in a plurality of standard image blocks obtained by dividing the standard density map has a predicted image block with an image position corresponding relationship with it in the predicted density map. The object density in each standard image block in the figure is counted, and the standard density statistic value corresponding to each standard image block is obtained, and the standard density statistic value is used to replace the density value of the area where the standard image block is located, which is equivalent to obtaining a standard density map. Corresponding standard local count map, count the object density in each predicted image block in the predicted density map, obtain the predicted density statistic value corresponding to each predicted image block, and replace the predicted image block with the predicted density statistic value in the area where the predicted image block is located. Density value, the predicted local count map corresponding to the predicted density map can be obtained, the standard local count map and the image blocks with the image position correspondence in the predicted local count map are formed into pairs, based on the standard density statistics corresponding to the image pair and the prediction The difference between the density statistical values, the loss value of the image pair corresponding to the image pair is obtained, and then during training, the computer equipment can count the loss value of the image pair corresponding to all the image pairs to obtain the standard local count map and predicted local Count the target loss value between the graphs, and finally, the computer equipment can back-propagate the target loss value to the object density determination model, and adjust the model parameters of the object density determination model through the gradient descent algorithm until the stopping condition is satisfied, The trained object density determination model is obtained. Among them, the gradient descent algorithm includes but is not limited to stochastic gradient descent algorithm, Adagrad ((Adaptive Gradient, adaptive gradient) algorithm, Adadelta (improvement of AdaGrad algorithm), RMSprop (improvement of AdaGrad algorithm) and so on.

In some embodiments, the computer device may construct a loss function based on the difference between the standard density statistic value corresponding to the image pair and the predicted density statistic value, and obtain the image pair loss value corresponding to the image pair based on the loss function. The loss function can be one of the cross entropy (Cross Entropy) loss function, the MERS (mean-square error, mean square error) loss function and so on.

In some embodiments, the computer device performs statistics on the image pair loss values to obtain the target loss value, specifically: summing the respective image pair loss values of all the image pairs to obtain the target loss value. In some other embodiments, the computer device performs statistics on the loss values of the image pairs to obtain the target loss: averaging the respective image pair loss values of all the image pairs to obtain the target loss value.

In the above embodiment, the target loss value is obtained by counting the loss values of the images, and the parameters of the object density determination model to be trained are adjusted based on the target loss value to obtain the trained object density determination model, which can avoid pixel by pixel to the greatest extent. Training error due to fitted density values.

In some embodiments, as shown in FIG. 5 , based on the difference between the standard density statistic value corresponding to the image pair and the predicted density statistic value, obtaining the image pair loss value corresponding to the image pair includes: compressing the image pair according to the target shrinking method. Shrink the corresponding standard density statistic value to obtain the shrunk standard density statistic value. The shrinkage amplitude corresponding to the target shrinkage mode is positively correlated with the size of the value to be shrunk; the predicted density statistics corresponding to the image pair are calculated according to the target shrinkage mode According to the difference between the standard density statistic value after shrinkage and the predicted density statistic value after shrinkage, the loss value of the image pair corresponding to the image pair is obtained, where the loss value of the image pair is the same as The difference is positively correlated.

The target shrinking method refers to a mathematical operation method that can shrink the numerical value to reduce the numerical value. The contraction amplitude corresponding to the target contraction mode is positively correlated with the value to be contracted, that is, the larger the value to be contracted, the greater the contraction amplitude; conversely, the smaller the value to be contracted, the smaller the contraction amplitude. The to-be-shrinked value in the embodiment of the present application refers to a standard density statistic value or a predicted density statistic value. Shrinkage amplitude refers to the difference between the value after shrinkage and the value before shrinkage.

Specifically, the computer device can shrink the standard density statistic value corresponding to the image pair according to the target shrinking method to obtain the shrunk standard density statistic value, and shrink the predicted density statistic value corresponding to the image pair according to the target shrinking method to obtain The predicted density statistic value after shrinkage, and the computer equipment can further subtract the shrinkage standard density statistic value from the shrinkage predicted density statistic value, and when the obtained difference value is greater than 0, the difference value is used as the image corresponding to the image pair For the loss value, when the obtained difference value is less than 0, the absolute value of the difference value is taken as the image pair loss value corresponding to the image pair. The image pair loss value is positively correlated with the difference value. The difference value here refers to the absolute difference value. The larger the absolute difference value, the larger the image pair loss value; on the contrary, the smaller the absolute difference value, the smaller the image pair loss value .

In some embodiments, shrinking the standard density statistic value corresponding to the image pair according to the target shrinking method, and obtaining the shrunk standard density statistic value includes: shrinking the standard density statistic value corresponding to the image pair according to the target shrinking method, Obtaining the shrunk standard density statistic value includes: taking the preset value as the base, performing logarithmic transformation with the standard density statistic value as the true number, and using the obtained logarithm as the shrunk standard density statistic value, and the preset value is greater than 1 ; Shrink the predicted density statistic value corresponding to the image pair according to the target shrinking method, and obtain the shrunk predicted density statistic value including: taking the preset value as the base, and using the predicted density statistic value as the true number to perform logarithmic transformation, and converting the The resulting log is used as the predicted density statistic after shrinkage.

Specifically, assuming that the preset value is a, the standard density statistic value is N, and the predicted density statistic value is M, the standard density value after shrinking is log _a N, and the predicted density statistic value after shrinking is log _a M, then the computer The device can obtain the image pair loss value corresponding to the image pair according to the difference between log _a N and log _a M. Wherein, the preset value is greater than 1, for example, it may be e.

In some other embodiments, considering that there may be no target object in some areas of the training sample image, at this time, the density statistic value of this area in the standard density map and the predicted density map may be 0. In order to avoid taking the logarithm when If an error occurs, a constant deviation can be added to each density statistic value, and the constant deviation can be set as required, for example, it can be 1e-3 (ie 0.001), and then logarithmically transformed according to the method in the above embodiment, the image The specific calculation method of the loss value refers to the following formula (1), where pred is the predicted density statistic value corresponding to the predicted image block in a certain image pair, gt is the standard density statistic value corresponding to the standard image block in the image pair, and Loss refers to is the image pair loss value, log refers to the logarithmic transformation, and the base of log can be a number greater than 1, such as e:

Loss=|log(pred+le|-3)-log(gt+le-3)| (1)

In the above embodiment, the standard density statistic value and the prediction degree statistic value corresponding to the image pair are respectively shrunk according to the target shrinking method, and the image is obtained according to the difference between the shrunk standard density statistic value and the shrunk predicted density statistic value. For the corresponding loss value of the image pair, since the shrinkage amplitude corresponding to the target shrinkage mode is positively correlated with the size of the value to be shrunk, the prediction deviation of the difficult-to-predict samples (ie, image blocks in high-density areas) can be reduced, and the reverse transmission can be performed. The gradient will also be reduced accordingly, and the image blocks in the high-density area are likely to be wrong samples, which is conducive to weakening the excessive gradient caused by some wrong samples, highlighting the gradient of useful samples, which is conducive to the optimization of model parameters during the training process.

In some embodiments, performing statistics on the loss value of the image pair to obtain the target loss value includes: determining the loss value weight of the image pair loss value according to the standard density statistical value corresponding to the image pair, and the loss value weight and the standard density statistical value are negative. Correlation; based on the weight of the loss value and the image, the weighted sum of the loss value is obtained to obtain the target loss value.

Specifically, considering that areas with smaller density values tend to occupy most of the image, in the training process, more attention can be given to image blocks corresponding to areas with smaller density values, so that such samples (ie, images block) with less error in the overall density statistic. Based on this, the computer device can determine the loss value weight of the image pair loss value according to the standard density statistical value corresponding to the image pair, and the loss value weight is negatively correlated with the standard density statistical value, that is, the larger the standard density statistical value, the greater the loss value The smaller the value weight, the smaller the standard density statistic, and the greater the weight of the loss value.

In some embodiments, a preset threshold value Y may be set. When the standard density statistic value X corresponding to the image pair is greater than the preset threshold value Y, it is determined that the standard density statistic value corresponding to the image pair is larger, and the computer device determines that the standard density statistic value corresponding to the image pair is larger. A smaller loss value weight a is determined for the loss value of the image pair corresponding to the image pair. When the standard density statistic value X corresponding to the image pair is less than the preset threshold Y, it is determined that the standard density statistic value corresponding to the image pair is higher than the standard density statistic value of the image pair. is small, the computer device determines a larger loss value weight b for the image pair loss value corresponding to the image pair, where b is greater than a.

In some embodiments, as shown in FIG. 6 , determining the loss value weight of the loss value of the image pair according to the standard density statistic value corresponding to the image pair includes:

Step 602: Divide the standard density statistics into density intervals to obtain a plurality of density intervals.

Specifically, it is assumed that N standard image blocks are obtained by dividing the standard density map. Among these standard image blocks, except for the standard image block whose number of objects is 0, among the standard density statistic values of other standard image blocks, the minimum value is a, If the maximum value is b, the standard density statistic value can be divided into K (K≥2) density intervals, and K can be specified as needed. For example, K can be 4, and the i-th (1≤i≤K) The statistical value range of the density interval is shown in the following formula (2):

[e^{i*(log(b)-log(a))/K+log(a)}, e^{(i+1)*(log(b)-log(a))/K+log (a)} (2)

Step 604: Acquire the number of image blocks of the standard image blocks whose standard density statistic value is in the density interval.

Step 606: Determine the loss value weight of the image corresponding to the standard image block to the loss value based on the number of image blocks in the density interval corresponding to the standard image block; the number of image blocks is positively correlated with the loss value weight.

Specifically, for each density interval i, the computer device counts the number n _i of image blocks of the standard image blocks that fall within the density interval.

In some embodiments, the computer device may calculate the ratio pi of the number of image blocks n _i in the density interval _i to the total number of standard image blocks N with reference to the following formula (3):

p _i =n _i /N (3)

The loss value weight of the image pair loss value corresponding to the standard image block in the density interval is determined according to the ratio. For example, the computer device may directly determine the ratio as the loss value weight of the image corresponding to the standard image block within the density interval i to the loss value.

In some specific embodiments, after calculating the ratio of the number of image blocks n _i in the density interval i to the total number of standard image blocks N, the computer device can calculate the standard value in the density interval i with reference to the following formula (4) The loss value of the image pair corresponding to the image block, where α can be valued as needed, for example, the value of α can be 20.0:

Loss=|log(pred+le-3)-log(gt+le-3)|*(1+α* _pi ) (4)

In the above embodiment, by dividing the standard density statistical value into intervals, a plurality of density intervals are obtained, and the number of image blocks of the standard image block whose standard density statistical value is in the density interval is obtained, and the image blocks in the density interval corresponding to the standard image block are obtained. Determine the loss value weight of the image corresponding to the standard image block to the loss value. Since the number of image blocks has a positive correlation with the loss value weight, the image blocks corresponding to the density interval with a large number of image blocks can be given more attention. This makes the total prediction error of these image blocks smaller.

In some embodiments, performing statistics on the image pair loss value to obtain the target loss value includes: attenuating the image pair loss value according to the target attenuation mode to obtain the attenuated image pair loss value, wherein the attenuation corresponding to the target attenuation mode The magnitude is positively correlated with the image pair loss value; the attenuated image pair loss value is summed to obtain the target loss value.

Among them, the target attenuation method refers to a method that can reduce the loss value of the image pair. The attenuation amplitude corresponding to the target attenuation mode is positively correlated with the loss value of the image pair, that is, the larger the loss value of the image pair, the greater the attenuation amplitude; on the contrary, the smaller the loss value of the image pair, the smaller the attenuation amplitude. The attenuation magnitude refers to the difference between the image pair loss value before attenuation and the image pair loss value after attenuation.

Specifically, considering that the more erroneous the sample (that is, the standard image block with inaccurate object density value), the more likely its prediction error is large. Based on this, the computer equipment can decay according to the target when training the object density determination model. The image pair loss value is attenuated by the method to obtain the attenuated image pair loss value, and the sum operation is performed on the attenuated image pair loss value to obtain the target loss value.

In some embodiments, the loss values of all image pairs may be sorted, and a preset number (eg, 10%) of image pair loss values with larger values may be selected according to the sorting result, and the loss values of these image pairs are set to 0 , so that these samples that may be mislabeled can be filtered out during training, thereby stabilizing the training process of the network. For example, assuming that there are 100 image pairs, the computer equipment can sort the loss values of the 100 image pairs in descending order, and then select the loss values of the top 10 image pairs, and set the loss values of these image pairs directly to is 0.

In other embodiments, the computer device may obtain a preset exponential function, and weight the loss value of the image pair by the exponential function, and the value of the exponential function is negatively correlated with the loss value of the image pair, that is, the loss value of the image pair The larger the value of the exponential function, the larger the value of the exponential function; on the contrary, the smaller the loss value of the image pair, the smaller the value of the exponential function, so that such samples with large prediction errors can be allowed to participate in training, and this kind of The samples dominate the gradient information of the entire training process. The exponential function may be, for example, e ^-x , where x is the image pair loss value, and xe ^-x is the attenuated image pair loss value.

In the above embodiment, the computer attenuates the loss value of the image pair according to the target attenuation method, obtains the loss value of the image pair after attenuation, and then performs a sum operation on the loss value of the image pair after attenuation to obtain the target loss value. Value backpropagation adjusts the object density to determine the model parameters of the model. Since the partial samples with the largest image loss value are suppressed by attenuation, the gradient information brought by the useful samples can be highlighted, because the proportion of these useful gradient information from the correctly labeled samples will be higher. large, so the training of the model will be more helpful.

In some embodiments, dividing the standard density map and the predicted density map respectively to obtain multiple standard image blocks corresponding to the standard density map and multiple predicted image blocks corresponding to the predicted density map includes: acquiring a sliding window; sliding according to a preset sliding the sliding window on the standard density map, and taking the image area in the sliding window as the standard image block; sliding the sliding window on the predicted density map according to the preset sliding method, and taking the image area in the sliding window as the standard image block; Predicted image blocks.

Wherein, there may be one or more sliding windows. Plural means at least two. The size of the sliding window can be determined according to needs, for example, it can be determined according to the size of the training sample image. The sizes of multiple sliding windows can be the same or different. The preset sliding mode refers to determining the sliding starting point from the training image, and traversing the entire training sample image to slide in a certain order.

Specifically, after acquiring the predicted sliding window, the computer device slides the sliding window on the standard density map according to a preset sliding method. Each time the sliding window is slid, the image area in the sliding window is used as the standard image block. In the same sliding manner as described above, the sliding window is slid on the predicted density map. For each sliding, the image area within the sliding window is used as the predicted image block.

In some embodiments, in order to improve the sliding efficiency, when sliding the sliding window on the image, the sliding window can be slid without overlapping. Non-overlapping means that there are no overlapping pixels between two adjacent image blocks obtained by sliding.

For example, assuming that the size of the standard density map is 128*128, if a sliding window of size 4*4 is slid non-overlapping on the standard density map, 1024 standard image blocks of 4*4 size can be obtained, if By sliding the sliding window of size 8*8 on the standard density map without overlapping, you can obtain 256 standard image blocks of size 8*8. If the sliding window of size 16*16 is performed on the standard density map Non-overlapping sliding, you can get 64 standard image blocks of 16*16 size, if you slide the sliding window of 32*32 size on the standard density map without overlapping, you can get 16 32*32 size Standard image block.

In the above embodiment, since the same sliding window can be slid on the standard density map and the predicted density map respectively in the same sliding manner, standard image blocks and predicted image blocks with the same size and one-to-one position can be obtained, ensuring that Accuracy of image position correspondence between standard and predicted image patches.

In some embodiments, the training sample image is marked with a plurality of object position points; obtaining the training sample image and the standard density map corresponding to the training sample image includes: determining the object corresponding to the training sample image according to the object position points corresponding to the training sample image Response map; the pixel value of the object position point in the object response map is the first pixel value, and the pixel value of the non-object position point is the second pixel value; the object response map is convolved to obtain the standard density map corresponding to the training sample image .

Among them, the object position point is used to represent the actual position of the target object in the training sample image. The object position point may specifically be the center point of the object. For example, when the target object is a natural person, the center point of the object may specifically be the center point of the human head. The object response map refers to the image obtained by responding to the position of the center point of the object, and the image is the same size as the training sample image. In the object response map, the pixel value of the object position point is the first pixel value, the pixel value of the non-object position point is the second pixel value, the first pixel value and the second pixel value are different pixel values, so that the object Distinguish between object location points and non-object location points in the response graph. The first pixel value may be, for example, 1, and the second similarity value may be, for example, 0.

Specifically, the computer equipment can respectively respond to each object position point corresponding to the training sample image to obtain a response map of each object position point, the response map is the same size as the training sample image, and then all the response maps are pixel-superimposed, The object response map corresponding to the training sample image is obtained, and the computer device can further perform convolution processing on the object response map according to a preset Gaussian kernel to obtain a standard density map corresponding to the training sample image.

For example, assuming that the target object is a natural person, and the training sample images are marked with N head center points x ₁ , x ₂ , x ₂ , ... x _N , then for a certain head center point x _i (1≤i≤N ), it can be expressed as a picture δ(xx _i ) of the same size as the training sample image, that is, only the position x _i is 1, and the rest of the positions are 0, then the N head can be expressed as H(x), refer to The following formula (5):

It can be noted that the total number of people in the training sample image can be obtained by integrating the image, and then convolving the image with a Gaussian kernel G _σ to obtain the standard density map D corresponding to the training sample image, referring to the following formula ( 6):

D= _Gσ *H(x) (6)

It is understandable that since the Gaussian kernel is normalized, integrating the convolved density map D can also get the total number of people in the training sample map.

In the above embodiment, the computer device determines the object response map corresponding to the training sample image according to the object position points corresponding to the training sample image, and then performs convolution processing on the object response map to obtain the standard density map corresponding to the training sample image, which can eliminate the The sparsity of the features in the object response map, the obtained standard density map is more conducive to the learning of the model.

In some embodiments, as shown in FIG. 7 , a method for determining the density of objects is provided, and the method for determining the density of objects can be applied to a computer device, and the computer may be a terminal or a server in FIG. 1 , and may also be a terminal and a server The interactive system composed, the method specifically includes the following steps:

Step 702, acquiring the target image of the density to be determined.

The target image whose density is to be determined may be the target image whose density needs to be determined. The target image contains one or more target objects.

Specifically, the computer device may photograph a scene containing one or more target objects to obtain target images of the density to be determined. The computer device can also acquire the target image whose density is to be determined from other computer devices through the network. According to different requirements, the target image can be the image of various scenes. For example, the target image may be an image for monitoring crowds in a target place, and the target place may be, for example, a subway, a shopping mall, or the like.

Step 704: Input the target image into the trained object density determination model, and determine the object density through the object density determination model.

The object density determination model is obtained by adjusting the parameters of the object density determination model to be trained based on the difference between the standard density statistic value corresponding to the image pair and the predicted density statistic value; wherein the image pair is the standard image block and the standard image The block is composed of predicted image blocks with corresponding image positions. The standard image block is obtained by dividing the standard density map corresponding to the training sample image; the predicted image block is obtained by dividing the predicted density map. It is obtained by inputting the training sample image into the object density determination model to be trained.

Step 706: Obtain an object density map corresponding to the target image output by the object density determination model.

For the detailed description of steps 702 to 704, reference may be made to the foregoing embodiments, which will not be repeated in this application.

The above object density determination method is obtained by adjusting the parameters of the object density determination model to be trained based on the difference between the standard density statistic value corresponding to the image pair and the predicted density statistic value, wherein the image pair is The standard image block is composed of the standard image block and the predicted image block corresponding to the image position of the standard image block. The standard image block is obtained by dividing the standard density map corresponding to the training sample image, and the predicted image block is obtained by dividing the predicted density map. The predicted density map is obtained by inputting the training sample image into the object density determination model to be trained. During the training of the object density determination model to be trained, the image block can be used as a unit to fit the local area. The density value, which comprehensively considers the overall density value of the local area, improves the accuracy of the object density determination model obtained by training when it is used to determine the object density, so that the target image is input into the trained object density determination model, and the object density is determined. The model can output accurate object density maps.

In some embodiments, after acquiring the object density map corresponding to the target image output by the object density determination model, the computer device may integrate the object density map to determine the total number of target objects in the target image.

In some embodiments, after acquiring the object density map corresponding to the target image output by the object density determination model, the computer device may display the object density map in the form of a heat map. In the displayed object density map, the darker the color, the denser the target object.

In some embodiments, the object density determination method further includes a training step of the object density determination model, and the training step specifically includes: acquiring a training sample image and a standard density map corresponding to the training sample image; inputting the training sample image into the object to be trained In the density determination model, the predicted density map output by the object density determination model is obtained; the standard density map and the predicted density map are divided respectively, and multiple standard image blocks corresponding to the standard density map and multiple predicted image blocks corresponding to the predicted density map are obtained. ; Count the object density in the standard image block to obtain the standard density statistic value corresponding to the standard image block, and perform statistics on the object density in the predicted image block to obtain the predicted density statistic value corresponding to the predicted image block; The predicted image blocks that have an image position correspondence with the standard image blocks are composed of image pairs. Based on the difference between the standard density statistic value corresponding to the image pair and the predicted density statistic value, the parameters of the object density determination model to be trained are adjusted to obtain training. After the object density determines the model.

For the specific description of the training step, reference may be made to the description in the upper embodiment, which is not repeated in this application.

The present application also provides an application scenario, where the above-mentioned object density determination method is applied to realize intelligent transportation. In this application scenario, the object density determination method provided by the embodiment of the present application can perform passenger flow statistics for any traffic location. Sent to the server, where the trained crowd density determination model (ie, the object density determination model in the above embodiment) is deployed.

Specifically, the application of the object density determination method in this application scenario is as follows:

(1) The object density determination model is obtained by pre-training on the server through the following steps:

1. The server obtains the training sample set, the training sample images in the training sample set are marked with the center point of the human head, and the crowd response map of the same size is obtained according to the training sample image. In the crowd response map, the pixel of each head center point is 1, and other positions are The pixel value of 0 is 0, and the server further uses a preset Gaussian kernel to perform convolution processing on the response map to obtain a standard density map corresponding to the training sample image.

It should be noted that the standard deviation of the Gaussian kernel here is manually specified or estimated. Therefore, for heads of different scales, the area covered by the Gaussian kernel is inconsistent. As shown in Figure 8, the Gaussian kernel is in two different A schematic diagram of a human head with a size, in which (a) the area covered by the Gaussian kernel in the figure is area 802, and (b) the area covered by the Gaussian kernel in the figure is area 804. It can be clearly seen that the semantic information of these two areas is different. identical.

This inconsistency of semantic information makes the density values in the standard density map corresponding to the obtained training sample images inaccurate. In related technologies, it is necessary to fit these density values pixel by pixel during the training process, resulting in Crowd density determination models have low accuracy when used to determine crowd density. The object density determination method provided by the embodiment of the present application can effectively avoid this phenomenon.

2. Input the training sample image into the population density determination model to be trained, and obtain the predicted density map output by the object density determination model.

Among them, the crowd density determination model is based on deep learning technology, takes a single image as input, and extracts image features through a deep convolutional network. Since the crowd density determination task requires both contextual features with high semantic information and local detailed information, so In order to obtain high-resolution feature maps with both high-level semantic information and detailed information, a U-shaped network structure with down-sampling and then up-sampling is usually used, and skip links are introduced to introduce detailed information for up-sampling, and finally the output crowd is predicted using dense crowds. The density map, the network structure of the crowd density determination model is shown in Figure 3.

3. Obtain a preset sliding window, slide the sliding window on the standard density map according to the preset sliding method, and use the image area in the sliding window as a standard image block to obtain a plurality of standard image blocks, and follow the preset sliding method. The sliding window is slid on the predicted density map, and the image area within the sliding window is used as the predicted image block to obtain multiple predicted image blocks.

4. Count the density of objects in each standard image block separately to obtain the standard density statistic value corresponding to the standard image block of each standard image block, and perform statistics on the object density in the predicted image block of each standard image block to obtain each standard image block. The predicted density statistic corresponding to the predicted image block for a standard image block.

Specifically, for each standard image block, the server may accumulate the crowd density values in the standard image block to obtain the standard density statistical value corresponding to the standard image block. Similarly, for each predicted image block, the server can predict the The crowd density values in the image block are accumulated to obtain the predicted density statistic value corresponding to the predicted image block.

5. Combine each standard image block and the predicted image block with the image position corresponding to the standard image block into an image pair to obtain a plurality of image pairs. For the standard density statistics and predicted density statistics of each image pair: first, respectively Add a constant deviation, and then take e as the base, perform logarithmic transformation with the standard density statistic value and the predicted density statistic value as the true number, respectively, to obtain the logarithm corresponding to the standard density statistic value and the logarithm corresponding to the predicted density statistic value, Take the difference between these two logarithms and take the absolute value of the difference as the image pair loss value for this image pair.

6. Divide the density interval based on the standard density value of each standard image block to obtain a plurality of density intervals.

7. For each density interval, count the number of image blocks in the standard image block in the density interval, calculate the ratio of the number of image blocks in the standard image block to the total number of image blocks, and determine the standard image in the density interval according to the ratio The loss value weight of the image pair loss value of the image pair corresponding to the block, where the ratio is positively correlated with the loss value weight.

8. For each image pair, calculate the difference between the standard density statistic value corresponding to the image pair and the predicted density statistic value, select the 10% image pairs with the largest difference, and set the image pair loss values of these image pairs as If it is 0, the weighted summation of the loss values of other image pairs is used to obtain the target loss value. According to the target loss value backpropagation, the crowd density is adjusted to determine the model parameters of the model, until the convergence condition is met, the crowd density after training is obtained. Determine the model.

(2) The server inputs the crowd image into the trained crowd density determination model, determines the density of the crowd image through the crowd density determination model, obtains the crowd density map corresponding to the crowd image, and performs integration based on the crowd density map to obtain the crowd The total number of people in the image (the number of people is counted in the image based on the center point of the head), the crowd density map and the total number of people are sent to the terminal, and the terminal can display the crowd density map in the form of a heat map.

For example, as shown in FIG. 9 , by applying the object density determination method provided by the present application, the server can determine the object density on the image (a) in FIG. 9 to obtain a crowd density image, and can also determine the The total number of people in the crowd image, for example, the total number of people is 208, the server sends the crowd density map to the terminal, and the terminal displays the crowd density degree in the image, as shown in (b) in Figure 9 . The total number of people 208 is shown in figure (b), the density of people in different image areas may be different, which can be displayed in different colors in figure (b), which uses different patterns instead of colors in (b) figure. indicated. When a density value greater than a preset threshold is detected in the image, the terminal can also generate prompt information to prompt that there may be an excessive flow of people.

The present application also provides another application scenario, where the above-mentioned object density determination method is applied to realize a smart quotient supermarket. In this application scenario, by obtaining the crowd density map of each target area of the supermarket, the terminal can count the flow of people in each area of the supermarket according to a certain period, and generate a report for the statistical results to provide to relevant personnel for The footprint of the target area is adjusted to ease the crowded situation in some areas.

The present application also provides another application scenario, where the above-mentioned object density determination method is applied to monitor the crowd density of tourist attractions. In this application scenario, the crowd density of various popular scenic spots in tourist attractions can be monitored. When the crowd density in the target area exceeds the threshold, the monitoring personnel can be prompted in the form of text or voice to improve the security of the target area. .

The object density determination method provided by the embodiments of the present application can alleviate the problems existing in the related art in the regression of artificially generated density maps from multiple perspectives. First, the standard density map regression is transformed into the regression of the density statistics, and then the logarithmic change of the density statistics is performed to reduce the gradient generated by the samples with large prediction deviations, and finally the gradient information of the samples with large prediction errors is filtered out, so as to stabilize The optimization process of the network. After eliminating the negative effects of inaccurate artificially generated density maps, the network can be optimized to a better local optimum, resulting in better generalization ability. At the same time, this scheme fully considers the contribution of most samples with low density values to the final counting error. Therefore, in the optimization process, the method of mining between partitions is used to alleviate this problem, which is conducive to further reducing the training error.

It should be understood that although the steps in the flowcharts of FIGS. 2-9 are shown in sequence according to the arrows, these steps are not necessarily executed in the sequence shown by the arrows. Unless explicitly stated herein, the execution of these steps is not strictly limited to the order, and these steps may be performed in other orders. Moreover, at least a part of the steps in FIGS. 2-9 may include multiple steps or multiple stages. These steps or stages are not necessarily executed and completed at the same time, but may be executed at different times. The execution of these steps or stages The order is also not necessarily sequential, but may be performed alternately or alternately with other steps or at least a portion of the steps or phases within the other steps.

In some embodiments, as shown in FIG. 10 , an object density determination apparatus 1000 is provided. The apparatus may adopt software modules or hardware modules, or a combination of the two to become a part of computer equipment. The apparatus specifically includes:

An image acquisition module 1002, configured to acquire training sample images and standard density maps corresponding to the training sample images;

The image input module 1004 is used to input the training sample image into the object density determination model to be trained, and obtain the predicted density map output by the object density determination model;

The image division module 1006 is used to divide the standard density map and the predicted density map respectively, to obtain multiple standard image blocks corresponding to the standard density map and multiple predicted image blocks corresponding to the predicted density map;

The density statistics module 1008 is configured to perform statistics on the object density in the standard image block, obtain the standard density statistics value corresponding to the standard image block, and perform statistics on the object density in the predicted image block to obtain the predicted density statistics value corresponding to the predicted image block ;

The training module 1010 is used to form an image pair from a standard image block and a predicted image block with an image position corresponding relationship with the standard image block, and based on the difference between the standard density statistic value corresponding to the image pair and the predicted density statistic value, the training module is to be trained. The parameters of the object density determination model are adjusted to obtain a trained object density determination model, and the trained object density determination model is used to generate an object density map.

The above-mentioned object density determination device, because the standard density map and the predicted density map are divided respectively, obtains a plurality of standard image blocks corresponding to the standard density map and a plurality of predicted image blocks corresponding to the predicted density map, and for the standard image blocks. The object density is counted to obtain the standard density statistic value corresponding to the standard image block, the object density in the predicted image block is counted, and the predicted density statistic value corresponding to the predicted image block is obtained, then in the training process, the standard image block and The predicted image blocks that have an image position correspondence with the standard image blocks form an image pair. Based on the difference between the standard density statistic value corresponding to the image pair and the predicted density statistic value, the parameters of the object density determination model to be trained are adjusted, so as to be able to The density value of the local area is fitted in units of image blocks, and the overall density value of the local area is comprehensively considered, which improves the accuracy of the object density determination model obtained by training for determining the object density.

In some embodiments, the training module 1010 is further configured to obtain the image pair loss value corresponding to the image pair based on the difference between the standard density statistic value corresponding to the image pair and the predicted density statistic value; perform statistics on the image pair loss value , obtain the target loss value; adjust the parameters of the object density determination model to be trained based on the target loss value, and obtain the trained object density determination model.

In some embodiments, the training module 1010 is further configured to shrink the standard density statistic value corresponding to the image pair according to the target shrinkage mode to obtain the shrunk standard density statistic value, the shrinkage amplitude corresponding to the target shrinkage mode and the value to be shrunk The size of the shrinkage is positively correlated; the predicted density statistic value corresponding to the image pair is shrunk according to the target shrinkage method to obtain the shrunk predicted density statistic value; according to the difference between the shrunk standard density statistic value and the shrunk predicted density statistic value value, the loss value of the image pair corresponding to the image pair is obtained, wherein the loss value of the image pair is positively correlated with the difference value.

In some embodiments, the training module 1010 is further configured to use the preset value as the base, perform logarithmic transformation with the standard density statistic value as the true number, and use the obtained logarithm as the shrunk standard density statistic value, the preset value greater than 1; take the preset value as the base, perform logarithmic transformation with the predicted density statistic value as the true number, and use the obtained logarithm as the shrunk predicted density statistic value.

In some embodiments, the training module 1010 is further configured to determine the loss value weight of the loss value of the image pair according to the standard density statistic value corresponding to the image pair, and the loss value weight has a negative correlation with the standard density statistic value; based on the loss value weight and The image performs a weighted sum of the loss values to obtain the target loss value.

In some embodiments, the training module 1010 is further configured to divide the standard density statistic value into density intervals to obtain multiple density intervals; obtain the number of image blocks of the standard image blocks whose standard density statistic value is in the density interval; The number of image blocks in the corresponding density interval determines the loss value weight of the image corresponding to the standard image block to the loss value; the number of image blocks has a positive correlation with the loss value weight.

In some embodiments, the training module 1010 is further configured to attenuate the loss value of the image pair according to the target attenuation method to obtain the loss value of the image pair after attenuation, wherein the attenuation amplitude corresponding to the target attenuation method is positively correlated with the loss value of the image pair relationship; sum the loss value of the attenuated image to obtain the target loss value.

In some embodiments, the image division module 1006 is further configured to obtain a sliding window; slide the sliding window on the standard density map according to a preset sliding method, and use the image area within the sliding window as a standard image block; slide the sliding window according to the preset The method slides the sliding window on the predicted density map, and uses the image area within the sliding window as the predicted image block.

In some embodiments, the training sample image is marked with a plurality of object position points; the image division module 1006 is further configured to determine the object response map corresponding to the training sample image according to the object position points corresponding to the training sample image; The pixel value of the position point is the first pixel value, and the pixel value of the non-object position point is the second pixel value; the object response map is convolved to obtain a standard density map corresponding to the training sample image.

In some embodiments, as shown in FIG. 11 , an object density determination apparatus 1100 is provided. The apparatus may adopt software modules or hardware modules, or a combination of the two to become a part of computer equipment. The apparatus specifically includes:

an image acquisition module 1102, configured to acquire a target image whose density is to be determined;

The density determination module 1104 is used to input the target image into the trained object density determination model, and the object density determination model is used to determine the object density; the object density determination model is based on the standard density statistic value and the predicted density statistic value corresponding to the image pair The difference between the two is obtained by adjusting the parameters of the object density determination model to be trained; among them, the image pair is composed of a standard image block and a predicted image block that has an image position correspondence with the standard image block. The standard density map corresponding to the sample image is divided; the predicted image block is obtained by dividing the predicted density map, and the predicted density map is obtained by inputting the training sample image into the object density determination model to be trained;

The density map acquisition module 1106 is configured to acquire the object density map corresponding to the target image output by the object density determination model.

The above object density determination device, because the object density determination model is based on the difference between the standard density statistic value corresponding to the image pair and the predicted density statistic value, the object density determination model to be trained is obtained by adjusting the parameters, wherein the image pair is The standard image block is composed of the standard image block and the predicted image block corresponding to the image position of the standard image block. The standard image block is obtained by dividing the standard density map corresponding to the training sample image, and the predicted image block is obtained by dividing the predicted density map. The predicted density map is obtained by inputting the training sample image into the object density determination model to be trained. During the training of the object density determination model to be trained, the image block can be used as a unit to fit the local area. The density value, which comprehensively considers the overall density value of the local area, improves the accuracy of the object density determination model obtained by training when it is used to determine the object density, so that the target image is input into the trained object density determination model, and the object density is determined. The model can output accurate object density maps.

In some embodiments, the above-mentioned device further includes: a training module for acquiring training sample images and standard density maps corresponding to the training sample images; inputting the training sample images into the object density determination model to be trained, and obtaining the output of the object density determination model The predicted density map of , obtain the standard density statistic value corresponding to the standard image block, count the object density in the predicted image block, and obtain the predicted density statistic value corresponding to the predicted image block; The image blocks form image pairs, and based on the difference between the standard density statistic value corresponding to the image pair and the predicted density statistic value, the parameters of the object density determination model to be trained are adjusted to obtain the trained object density determination model.

For the specific definition of the device for determining the density of the object, reference may be made to the definition of the method for determining the density of the object above, which will not be repeated here. Each module in the above-mentioned object density determination device may be implemented in whole or in part by software, hardware and combinations thereof. The above modules can be embedded in or independent of the processor in the computer device in the form of hardware, or stored in the memory in the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.

In some embodiments, a computer device is provided, and the computer device may be a server, and its internal structure diagram may be as shown in FIG. 12 . The computer device includes a processor, memory, and a network interface connected by a system bus. Among them, the processor of the computer device is used to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium, an internal memory. The non-volatile storage medium stores an operating system, computer readable instructions and a database. The internal memory provides an environment for the execution of the operating system and computer-readable instructions in the non-volatile storage medium. The computer device's database is used to store training sample image data. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer readable instructions, when executed by a processor, implement an object density determination method.

Those skilled in the art can understand that the structure shown in FIG. 12 is only a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied. Include more or fewer components than shown in the figures, or combine certain components, or have a different arrangement of components.

In some embodiments, a computer device is also provided, including a memory and a processor, where computer-readable instructions are stored in the memory, and when the computer-readable instructions are executed by the processor, cause the processor to execute the above method embodiments. step.

In some embodiments, one or more non-volatile readable storage media are provided, and computer-readable instructions are stored. When the computer-readable instructions are executed by one or more processors, the processors perform the above-mentioned methods. steps in the example.

In some embodiments, a computer program product is provided, comprising computer-readable instructions, which, when executed by a processor, implement the steps in each of the foregoing method embodiments.

Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing relevant hardware through computer-readable instructions, and the computer-readable instructions can be stored in a non-volatile computer. In the readable storage medium, the computer-readable instructions, when executed, may include the processes of the foregoing method embodiments. Wherein, any reference to memory, storage, database or other media used in the various embodiments provided in this application may include at least one of non-volatile and volatile memory. Non-volatile memory may include read-only memory (Read-Only Memory, ROM), magnetic tape, floppy disk, flash memory, or optical memory, and the like. Volatile memory may include random access memory (RAM) or external cache memory. By way of illustration and not limitation, the RAM may be in various forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM).

The technical features of the above embodiments can be combined arbitrarily. In order to make the description simple, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features It is considered to be the range described in this specification.

The above-mentioned embodiments only represent several embodiments of the present application, and the descriptions thereof are specific and detailed, but should not be construed as a limitation on the scope of the invention patent. It should be noted that, for those skilled in the art, without departing from the concept of the present application, several modifications and improvements can be made, which all belong to the protection scope of the present application. Therefore, the scope of protection of the patent of the present application shall be subject to the appended claims.

Claims

A method for determining the density of an object, executed by a computer device, the method comprising:

Obtain a training sample image and a standard density map corresponding to the training sample image;

Inputting the training sample image into the object density determination model to be trained, to obtain a predicted density map output by the object density determination model;

respectively dividing the standard density map and the predicted density map to obtain multiple standard image blocks corresponding to the standard density map and multiple predicted image blocks corresponding to the predicted density map;

Counting the object density in the standard image block to obtain a standard density statistic value corresponding to the standard image block, and performing statistics on the object density in the predicted image block to obtain the predicted density statistic corresponding to the predicted image block value; and

The standard image block and the predicted image block that has an image position correspondence with the standard image block are formed into an image pair, and based on the difference between the standard density statistic value corresponding to the image pair and the predicted density statistic value, an image pair is prepared for training. The parameters of the object density determination model are adjusted to obtain a trained object density determination model, and the trained object density determination model is used to generate an object density map.
The method according to claim 1, wherein, based on the difference between the standard density statistic value corresponding to the image pair and the predicted density statistic value, the parameters of the object density determination model to be trained are adjusted to obtain the training The post object density determination model includes:

Obtaining the image pair loss value corresponding to the image pair based on the difference between the standard density statistic value corresponding to the image pair and the predicted density statistic value;

Counting the image pair loss values to obtain a target loss value; and

Based on the target loss value, the parameters of the object density determination model to be trained are adjusted to obtain the trained object density determination model.
The method according to claim 2, wherein the obtaining the image pair loss value corresponding to the image pair based on the difference between the standard density statistic value corresponding to the image pair and the predicted density statistic value comprises the following steps: :

The standard density statistic value corresponding to the image pair is shrunk according to the target shrinkage mode to obtain the shrunk standard density statistic value, and the shrinkage amplitude corresponding to the target shrinkage mode is positively correlated with the size of the value to be shrunk;

shrinking the predicted density statistic value corresponding to the image pair according to the target shrinking method to obtain a shrunk predicted density statistic value; and

According to the difference between the standard density statistic value after shrinkage and the predicted density statistic value after shrinkage, the image pair loss value corresponding to the image pair is obtained, wherein the image pair loss value has a positive correlation with the difference value.
The method according to claim 3, wherein the shrinking of the standard density statistic value corresponding to the image pair according to the target shrinking mode, and obtaining the shrunk standard density statistic value comprises:

Taking the preset numerical value as the base, performing logarithmic transformation with the standard density statistic value as the true number, and using the obtained logarithm as the shrunk standard density statistic value, where the preset numerical value is greater than 1; and

The shrinking of the predicted density statistic value corresponding to the image pair according to the target shrinking method, and obtaining the shrunk predicted density statistic value includes:

The preset numerical value is used as the base, the logarithmic transformation is performed with the predicted density statistic value as the true number, and the obtained logarithm is used as the shrunk predicted density statistic value.
The method according to claim 2, wherein, performing statistics on the loss value of the image pair to obtain the target loss value comprises:

Determine a loss value weight of the image pair loss value according to the standard density statistic value corresponding to the image pair, and the loss value weight has a negative correlation with the standard density statistic value; and

A weighted summation of the loss values is performed based on the loss value weight and the image to obtain a target loss value.
The method according to claim 5, wherein the determining the loss value weight of the loss value of the image pair according to the standard density statistic value corresponding to the image pair comprises:

The standard density statistical value is divided into density intervals to obtain a plurality of density intervals;

obtaining the number of image blocks of the standard image blocks whose standard density statistics are in the density interval; and

Based on the number of image blocks in the density interval corresponding to the standard image block, the loss value weight of the image corresponding to the standard image block to the loss value is determined; the number of image blocks is positively correlated with the loss value weight.
The method according to claim 2, wherein, performing statistics on the loss value of the image pair to obtain the target loss value comprises:

Attenuate the image pair loss value according to a target attenuation mode to obtain an attenuated image pair loss value, wherein the attenuation amplitude corresponding to the target attenuation mode is positively correlated with the image pair loss value; and

The loss value of the attenuated image is summed to obtain the target loss value.
The method according to any one of claims 1 to 7, wherein the standard density map and the predicted density map are respectively divided to obtain a plurality of standard image blocks corresponding to the standard density map and The multiple predicted image blocks corresponding to the predicted density map include:

Get the sliding window;

sliding the sliding window on the standard density map according to a preset sliding method, and using the image area within the sliding window as a standard image block; and

The sliding window is slid on the predicted density map according to the preset sliding manner, and the image area within the sliding window is used as a predicted image block.
The method according to claim 8, wherein the training sample image is marked with a plurality of object position points; the acquiring the training sample image and the standard density map corresponding to the training sample image comprises:

Determine the object response map corresponding to the training sample image according to the object position point corresponding to the training sample image; the pixel value of the object position point in the object response map is the first pixel value, and the pixel value of the non-object position point The pixel value is the second pixel value; and

Convolution processing is performed on the object response map to obtain a standard density map corresponding to the training sample image.
The method according to claim 1, wherein the object density determination model includes an encoding layer, a decoding layer and a prediction layer; the training sample image is input into the object density determination model to be trained, and the object density determination model is obtained. Object density determines the predicted density map output by the model, including:

Input the training sample image into the encoding layer, and perform downsampling processing through the encoding layer to obtain the first target feature;

Inputting the first target feature into the decoding layer, and performing up-sampling processing through the encoding layer to obtain a second target feature;

The second target feature is input into the prediction layer, and object density prediction is performed through the prediction layer to obtain the standard density map.
The method according to claim 10, wherein the encoding layer and the decoding layer adopt skip links; the encoding layer comprises a plurality of first convolutional layers; the decoding layer comprises a plurality of second convolutional layers layer; inputting the training sample image into the encoding layer, and performing downsampling processing through the encoding layer to obtain the first target feature, including:

In the encoding layer, down-sampling the intermediate feature output from the previous first convolution layer through the current first convolution layer, and obtains the output of the last first convolution layer as the first target feature;

The inputting the first target feature into the decoding layer, and performing up-sampling processing on the encoding layer to obtain the second target feature, including:

In the decoding layer, the current second convolution layer performs upsampling according to the intermediate features output by the previous second convolution layer and the intermediate features output by the connected first convolution layer to obtain the last second convolution layer. The output is used as the second target feature.
A method for determining the density of an object, executed by a computer device, the method comprising:

Obtain the target image of the density to be determined;

The target image is input into the trained object density determination model, and the object density determination model is performed through the object density determination model; the object density determination model is based on the standard density statistic value corresponding to the image pair and the predicted density statistic value. The difference between the two is obtained by adjusting the parameters of the object density determination model to be trained; wherein, the image pair is composed of a standard image block and a predicted image block that has an image position corresponding relationship with the standard image block, and the standard image pair is composed of The image block is obtained by dividing the standard density map corresponding to the training sample image; the predicted image block is obtained by dividing the predicted density map, and the predicted density map is obtained by inputting the training sample image into the processed in the trained object density determination model; and

Obtain an object density map corresponding to the target image output by the trained object density determination model.
The method according to claim 12, wherein the generating step of the object density determination model comprises:

Obtain a training sample image and a standard density map corresponding to the training sample image;

Inputting the training sample image into the object density determination model to be trained, to obtain a predicted density map output by the object density determination model;

respectively dividing the standard density map and the predicted density map to obtain multiple standard image blocks corresponding to the standard density map and multiple predicted image blocks corresponding to the predicted density map;

Counting the object density in the standard image block to obtain a standard density statistic value corresponding to the standard image block, and performing statistics on the object density in the predicted image block to obtain the predicted density statistic corresponding to the predicted image block and combining the standard image block and the predicted image block with the image position corresponding to the standard image block into an image pair, based on the difference between the standard density statistic value corresponding to the image pair and the predicted density statistic value , and adjust the parameters of the object density determination model to be trained to obtain the trained object density determination model.
An object density determination device, the device comprising:

an image acquisition module for acquiring training sample images and standard density maps corresponding to the training sample images;

an image input module, configured to input the training sample image into the object density determination model to be trained to obtain a predicted density map output by the object density determination model;

an image division module, for dividing the standard density map and the predicted density map respectively, to obtain a plurality of standard image blocks corresponding to the standard density map and a plurality of predicted image blocks corresponding to the predicted density map;

A density statistics module, configured to perform statistics on the object density in the standard image block, obtain a standard density statistical value corresponding to the standard image block, and perform statistics on the object density in the predicted image block to obtain the predicted image the predicted density statistic corresponding to the block; and

A training module, configured to form an image pair between the standard image block and the predicted image block that has an image position correspondence with the standard image block, based on the difference between the standard density statistic value corresponding to the image pair and the predicted density statistic value The difference between the two, the parameters of the object density determination model to be trained are adjusted to obtain a trained object density determination model, and the trained object density determination model is used to generate an object density map.
An object density determination device, the device comprising:

an image acquisition module, used to acquire the target image of the density to be determined;

A density determination module for inputting the target image into the trained object density determination model, and performing object density determination through the object density determination model; the object density determination model is based on the standard density statistic value corresponding to the image pair The difference between the statistical value of the predicted density and the statistical value of the predicted density is obtained by adjusting the parameters of the object density determination model to be trained; wherein, the image pair is composed of a standard image block and a predicted image block that has an image position corresponding relationship with the standard image block. The standard image block is obtained by dividing the standard density map corresponding to the training sample image; the predicted image block is obtained by dividing the predicted density map, and the predicted density map is obtained by dividing the The training sample images are input into the object density determination model to be trained for processing; and

A density map acquisition module, configured to acquire an object density map corresponding to the target image output by the trained object density determination model.
A computer device, comprising a memory and one or more processors, wherein the memory stores computer-readable instructions, wherein the computer-readable instructions, when executed by the processor, cause the one or more The processor implements the method of any one of claims 1 to 11 or 12 to 13.
One or more non-transitory readable storage media storing computer-readable instructions that, when executed by one or more processors, cause the one or more processors to implement the invention as claimed in claim 1 To the method of any one of 11 or 12 to 13.
A computer program product comprising computer-readable instructions, characterized in that, when the computer-readable instructions are executed by a processor, the method according to any one of claims 1 to 11 or 12 to 13 is implemented.