WO2022179474A1 - 群体数量确定方法、装置、设备、存储介质及程序产品 - Google Patents

群体数量确定方法、装置、设备、存储介质及程序产品 Download PDF

Info

Publication number
WO2022179474A1
WO2022179474A1 PCT/CN2022/077070 CN2022077070W WO2022179474A1 WO 2022179474 A1 WO2022179474 A1 WO 2022179474A1 CN 2022077070 W CN2022077070 W CN 2022077070W WO 2022179474 A1 WO2022179474 A1 WO 2022179474A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
sub
sample
region
groups
Prior art date
Application number
PCT/CN2022/077070
Other languages
English (en)
French (fr)
Inventor
王昌安
宋庆宇
张博深
王亚彪
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2022179474A1 publication Critical patent/WO2022179474A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/53Recognition of crowd images, e.g. recognition of crowd congestion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present application relates to the technical field of image processing, and in particular, to a method, apparatus, device, storage medium and program product for determining the number of groups.
  • Crowd size estimation is a method that can automatically infer the total number of people contained in an image.
  • computer equipment when recognizing the number of objects contained in an image, can combine the method of heat map regression, and use deep learning technology to train and infer the model end-to-end, so as to pass the trained model to the end-to-end model. Predict the number of objects in the image.
  • the embodiments of the present application provide a method, apparatus, device, storage medium, and program product for determining the number of groups.
  • the technical solution is as follows.
  • a method for determining the number of groups executed by a computer device, the method comprising:
  • the population density feature map is an image feature obtained by feature extraction on the first image
  • the predicted objects corresponding to the respective sub-regions in the first image are acquired Quantity; the estimated value of the number of objects corresponding to each of the predicted categories is determined based on the number of groups of each sub-region in the known image;
  • the number of groups in the first image is acquired based on the number of predicted objects corresponding to the respective sub-regions in the first image.
  • a method for determining the number of groups executed by a computer device, the method comprising:
  • the data processing layer of the model for determining the number of groups data processing is performed on the first sample image to obtain a sample population density feature map of the first sample image; the sample population density feature map is the data processing layer. Image features obtained by performing feature extraction on the first sample image;
  • the feature classification layer of the model is determined by the number of groups, and classification processing is performed based on the density feature map of the sample population to obtain prediction results corresponding to each sub-region in the first sample image;
  • the model for determining the number of groups is trained;
  • the trained model for determining the number of groups is used to obtain the predicted category of each sub-region in the first image according to the input first image, and determine the estimated number of objects corresponding to each of the predicted categories.
  • the number of groups in the first image; the estimated value of the number of objects corresponding to each of the predicted categories is determined based on the number of groups in each sub-region of the sample image contained in the training sample set.
  • an apparatus for determining the number of groups comprising:
  • a first image acquisition module for acquiring a first image
  • a first data processing module configured to perform data processing on the first image to obtain a population density feature map of the first image;
  • the population density feature map is an image feature obtained by feature extraction on the first image ;
  • a first classification module configured to perform classification processing based on the population density feature map to obtain prediction categories corresponding to respective sub-regions in the first image
  • the first predicted quantity obtaining module is configured to obtain the predicted quantity corresponding to each sub-area in the first image and the estimated value of the number of objects corresponding to each predicted type, The number of predicted objects corresponding to each sub-region of , respectively; the estimated value of the number of objects corresponding to each of the predicted categories is determined based on the number of groups of each sub-region in the known image;
  • a first group number determination module configured to acquire the number of groups in the first image based on the number of predicted objects corresponding to each sub-region in the first image.
  • an apparatus for determining the number of groups comprising:
  • a first sample acquisition module configured to acquire a first sample image and the labeling categories corresponding to each sub-region in the first sample image
  • the sample feature acquisition module is used to determine the data processing layer of the model by the number of groups, perform data processing on the first sample image, and obtain a sample population density feature map of the first sample image;
  • the sample population density feature Figure is an image feature obtained by the data processing layer performing feature extraction on the first sample image;
  • a sample data processing module configured to determine the feature classification layer of the model by the number of groups, perform classification processing based on the density feature map of the sample population, and obtain prediction results corresponding to each sub-region in the first sample image;
  • the model training module is used for, based on the prediction results corresponding to the respective sub-regions in the first sample image, and the labeling categories corresponding to the respective sub-regions in the first sample image, for the group Quantity determines the model for training;
  • the trained model for determining the number of groups is used to obtain the predicted category of each sub-region in the first image according to the input first image, and determine the estimated number of objects corresponding to each of the predicted categories.
  • the number of groups in the first image; the estimated value of the number of objects corresponding to each of the predicted categories is determined based on the number of groups in each sub-region of the sample image contained in the training sample set.
  • a computer device comprising a processor and a memory, the memory stores at least one computer program, the at least one computer program is loaded and executed by the processor to implement the above-mentioned population Quantity determination method.
  • a computer-readable storage medium is provided, and at least one computer program is stored in the computer-readable storage medium, and the computer program is loaded and executed by a processor to implement the above-mentioned method for determining the number of groups.
  • a computer program product includes at least one computer program, the computer program is loaded and executed by a processor to implement the above-mentioned method for determining the number of groups.
  • the computer device may determine the predicted categories of each sub-region of the input first image, and the estimated value of the number of objects corresponding to each predicted category, and then determine the number of groups in the first image.
  • the computer device can determine the estimated number of objects by the number of groups in each sub-region of the known image, so that the estimated number of objects corresponding to each predicted category is closer to the actual value of the number of objects corresponding to the predicted category, The discretization error generated when the number of objects in each sub-region is determined by the estimated value of the number of objects is reduced, and the accuracy of estimating the number of groups in an image is improved.
  • FIG. 1 shows a schematic diagram of a computer system provided by an exemplary embodiment of the present application
  • FIG. 2 is a schematic flowchart of a method for determining the number of groups according to an exemplary embodiment
  • FIG. 3 is a schematic flowchart of a method for determining the number of groups according to an exemplary embodiment
  • Fig. 4 is a method flow chart of a method for determining the number of groups according to an exemplary embodiment
  • Fig. 5 shows a schematic diagram of a method for determining a labeling category involved in the embodiment shown in Fig. 4;
  • FIG. 6 shows a schematic diagram of a model of a data processing layer involved in the embodiment shown in FIG. 4;
  • Fig. 7 shows a schematic diagram of obtaining interval proxy values involved in the embodiment shown in Fig. 4;
  • FIG. 8 shows a schematic flowchart of a process for determining an estimated value of the number of objects involved in the embodiment shown in FIG. 4;
  • Fig. 9 is a flow chart of model training and data processing provided according to an exemplary embodiment
  • FIG. 10 is a block diagram showing the structure of an apparatus for determining the number of groups according to an exemplary embodiment
  • FIG. 11 is a block diagram showing the structure of an apparatus for determining the number of groups according to an exemplary embodiment
  • Fig. 12 is a schematic structural diagram of a computer device according to an exemplary embodiment.
  • the method for determining the number of groups provided in this embodiment of the present application may be applied to a computer device; optionally, the computer device may have a data processing capability.
  • the method for determining the number of groups may be a training method for a model for determining the number of groups; the model for determining the number of groups may process the input image to obtain the number of groups corresponding to the input image.
  • the method for determining the number of groups provided in the embodiments of the present application can be applied to a personal computer, workstation or server, that is, the model for determining the number of groups can be trained through the personal computer, workstation or server.
  • the model for determining the number of groups trained by the method for determining the number of groups provided in the embodiment of the present application can perform data processing on the input image data to obtain prediction data of the number of groups in the image.
  • FIG. 1 shows a schematic diagram of a computer system provided by an exemplary embodiment of the present application.
  • the computer system 100 includes a terminal 110 and a server 120, wherein data communication is performed between the terminal 110 and the server 120 through a communication network; optionally, the communication network may be a wired network, a wireless network, a local area network, a metropolitan area network, or a wide area network. at least one of.
  • An application program with image processing function may be installed in the terminal 110, and the application program may be a professional image processing application program, a social application program, a virtual reality application program, a game application program, an artificial intelligence (Artificial Intelligence, AI) applications, etc., which are not limited in this embodiment of the present application.
  • AI Artificial Intelligence
  • the terminal 110 may be a terminal device with an image acquisition component, and the image acquisition component is used to acquire images and store them in a data storage module in the terminal 110; or, the terminal 110 may also have a data transmission interface The terminal device, the data transmission interface is used to receive image data collected by an image capture device with an image capture component.
  • the terminal 110 may be a mobile terminal such as a smart phone, a tablet computer, a laptop and a laptop, or a terminal such as a desktop computer or a projection computer, or an intelligent terminal with a data processing component. This is not limited.
  • the server 120 may be implemented as one server, or may be implemented as a server cluster composed of a group of servers; the server 120 may be a physical server or a cloud server. In a possible implementation, the server 120 is a background server of the application program installed in the terminal 110 .
  • the server 120 trains the model for determining the number of groups by using a preset training sample set; wherein, the training sample set may include sample images with different population densities.
  • the server 120 sends the trained model for determining the number of groups to the terminal 110 through a wired or wireless connection.
  • the terminal 110 receives the trained group number determination model, and inputs the data information corresponding to the group number determination model into an application program with a group number determination function, so that when the application program processes the image data, it can be based on the trained group number determination model.
  • the population size determination model processes the image data to implement all or part of the steps in the population size determination method.
  • a computer can obtain a first image; perform data processing on the first image to obtain a population density feature map of the first image, where the population density feature map is obtained by feature extraction on the first image Image features; perform classification processing based on the population density feature map to obtain the predicted categories corresponding to each sub-region in the first image; based on the predicted categories corresponding to each sub-region in the first image, and the corresponding predicted categories
  • the estimated value of the number of objects is to obtain the number of predicted objects corresponding to each sub-region in the first image; the estimated value of the number of objects corresponding to each predicted category is determined based on the number of groups in each sub-region in the known image; based on the first The number of predicted objects corresponding to each sub-region in the image is obtained, and the number of groups in the first image is obtained.
  • the computer device when acquiring the number of groups in the first image, can determine the estimated value of the number of objects of each prediction category based on the number of groups in each sub-region of the known image, so that the number of objects corresponding to each prediction category is estimated The value is closer to the actual value of the number of objects corresponding to the predicted category, which reduces the discretization error generated when determining the number of objects in each sub-region through the estimated number of objects, and improves the accuracy of estimating the number of groups in the image.
  • the method for determining the number of groups may be implemented by a machine learning model, and the machine learning model may be a model for determining the number of groups including a data processing layer and a feature classification layer;
  • the known images of the corresponding object number estimates can be implemented as: sample images used to train the population number determination model; that is, the object number estimates corresponding to each predicted class are based on the population of each subregion of the sample image The number is determined; based on this, FIG. 2 is a schematic flowchart of a method for determining the number of groups according to an exemplary embodiment.
  • the method may be performed by a computer device, wherein the computer device may be the terminal 110 in the embodiment shown in FIG. 1 above.
  • the flow of the method for determining the number of groups may include the following steps (201-205).
  • Step 201 acquiring a first image.
  • Step 202 Perform data processing on the first image through the data processing layer of the model for determining the number of groups to obtain a population density feature map of the first image.
  • the population density feature map is an image feature obtained by performing feature extraction on the first image.
  • Step 203 the feature classification layer of the model is determined by the number of groups, and classification processing is performed based on the group density feature map to obtain the predicted categories corresponding to the respective sub-regions in the first image.
  • Step 204 Obtain the number of predicted objects corresponding to each sub-region in the first image based on the predicted category corresponding to each sub-region in the first image and the estimated number of objects corresponding to each predicted category.
  • Step 205 Obtain the number of groups in the first image based on the number of predicted objects corresponding to each sub-region in the first image.
  • the model for determining the number of groups is a machine learning model trained based on the sample images in the training sample set and the labeling categories corresponding to each sub-region of the sample images; the estimated value of the number of objects corresponding to each predicted category is based on the number of the groups The number of groups in each sub-region of the sample image in the training sample set corresponding to the model is determined.
  • the model for determining the number of groups when the group in the first image is implemented as a crowd, the model for determining the number of groups may be implemented as a model for determining the number of crowds, the estimated value of the object may be an estimated value of the number of people, and each sub-region in the first image
  • the corresponding number of predicted objects may be the predicted number of people corresponding to each sub-region in the first image;
  • the method for determining the number of groups provided by the present application may be a method for determining the number of people for obtaining the number of people in the first image.
  • the group may also be implemented as other groups, and illustratively, the group may be an animal group, such as a group of birds included in the image; or, the group may also be a group of plants, such as a forest included in the image, etc. Wait.
  • the group may be an animal group, such as a group of birds included in the image; or, the group may also be a group of plants, such as a forest included in the image, etc. Wait.
  • the model for determining the number of groups is trained by using the sample image and the labeling category of each sub-region in the sample image, and the trained model for determining the number of groups is obtained;
  • the group quantity determination model determines the predicted categories of each sub-region of the input first image and the estimated value of the number of objects corresponding to each predicted category, thereby determining the number of groups in the first image.
  • the computer device can determine the estimated value of the number of objects according to the number of groups in each sub-region of the sample image in the training sample set corresponding to the group number determination model, so that the estimated value of the number of objects corresponding to each predicted category is closer to the predicted category.
  • the corresponding real value of the number of objects reduces the discretization error generated when the number of objects is determined in each sub-region through the estimated value of the number of objects, and improves the accuracy of estimating the number of groups in the image.
  • Fig. 3 is a schematic flowchart of a method for determining the number of groups according to an exemplary embodiment. The method may be performed by a computer device, where the computer device may be the server 120 in the embodiment shown in FIG. 1 above. As shown in FIG. 3 , the flow of the method for determining the number of groups may include the following steps (301-304).
  • Step 301 Acquire a first sample image and label categories corresponding to respective sub-regions in the first sample image.
  • Step 302 Perform data processing on the first sample image through the data processing layer of the model for determining the number of groups to obtain a sample population density feature map of the first sample image.
  • the sample population density feature map is an image feature obtained by the data processing layer performing feature extraction on the first sample image.
  • Step 303 Determine the feature classification layer of the model by the number of groups, perform classification processing based on the density feature map of the sample population, and obtain prediction results corresponding to each sub-region in the first sample image.
  • Step 304 Based on the prediction results corresponding to the respective sub-regions in the first sample image and the labeling categories corresponding to the respective sub-regions in the first sample image, train the model for determining the number of groups.
  • the trained group number determination model is used to obtain the predicted categories of each sub-region in the first image according to the input first image, and determine the number of objects in the first image according to the estimated value of the number of objects corresponding to each predicted category. number of groups.
  • the estimated value of the number of objects corresponding to each predicted category is determined based on the number of groups in each sub-region of the sample image included in the training sample set of the model for determining the number of groups.
  • the model for determining the number of groups is trained by using the sample image and the labeling category of each sub-region in the sample image, and the trained model for determining the number of groups is obtained;
  • the group quantity determination model determines the predicted categories of each sub-region of the input first image and the estimated value of the number of objects corresponding to each predicted category, thereby determining the number of groups in the first image.
  • the computer device can determine the estimated value of the number of objects according to the number of groups in each sub-region of the sample image in the training sample set corresponding to the group number determination model, so that the estimated value of the number of objects corresponding to each predicted category is closer to the predicted category.
  • the corresponding real value of the number of objects reduces the discretization error generated when the number of objects is determined in each sub-region through the estimated value of the number of objects, and improves the accuracy of estimating the number of groups in the image.
  • Fig. 4 is a method flowchart of a method for determining the number of groups according to an exemplary embodiment.
  • the method may be executed jointly by a model processing device and a data processing device; wherein, the model processing device may be the server 120 in the above embodiment shown in FIG. 1 , and the data processing device may be the above embodiment shown in FIG. 1 .
  • Terminal 110 As shown in FIG. 4 , the flow of the method for determining the number of groups may include the following steps (401-409).
  • Step 401 Acquire a first sample image and label categories corresponding to respective sub-regions in the first sample image.
  • the first sample image may be one of the sample images included in the training sample set; therefore, the process of acquiring the first sample image may be implemented as:
  • the training sample set includes a first sample image, and an image annotation of the first sample image; the image annotation is used to indicate the position of the sample object in the first sample image;
  • the labeling category corresponding to each sub-region in the first sample image is obtained.
  • the image annotation may be generated based on the head position of each object (ie, human body) on each sample image; that is, the model processing device may The head position information determines the group position and group number on the respective sample images.
  • the group position and the number of groups on the first sample image may be calculated by a mathematical algorithm based on the image annotation of the first sample image. ;
  • the process can be implemented as:
  • a first sample heat map of the first sample image is obtained; the first sample heat map is used to indicate the sample objects in the first sample image the location of;
  • data processing is performed through a Gaussian convolution kernel to obtain a first sample heat map of the first sample image
  • the image annotation corresponding to the image annotation on the first sample image can be marked according to the image annotation of the first sample image.
  • the position is highlighted, and a first sample heat map of the first sample image is obtained.
  • the integral value of the response graph is the total number of objects.
  • any sub-region in the first sample image contains the center point of the head of an object, but all image parts of the object are not completely located in the sub-region
  • the object since the center point of the head of the object is located in the sub-region, the object will be considered to be all located in the sub-region, so the first sample heat map of the first sample image is generated to represent the first sample image.
  • the number of predicted objects in each sub-region in a sample image is inaccurate.
  • the model processing device can perform convolution processing on the response map (the first sample heat map) through a normalized Gaussian convolution kernel to obtain the first sample heat map of the first sample image; the first sample heat map is obtained;
  • This heat map is a Gaussian distribution map formed based on the center points of each human head in the first sample image, and the size of the pixel value of each point in the first sample heat map is used to indicate the first sample heat map.
  • the population density of each point of After the convolution kernel performs data processing to obtain the first sample heat map the value obtained after integrating the first sample heat map is still the total number of objects in the first sample image; similarly, for the first sample image By integrating each sub-region in this image, the number of groups corresponding to each sub-region in the first sample image can be obtained.
  • the process of acquiring the labeling categories corresponding to each sub-region in the first sample image may be implemented as follows:
  • the object quantity classification interval corresponding to the feature classification layer; the object quantity classification interval includes at least two sub-intervals;
  • classification is performed through the object number classification interval, and the labeling category corresponding to each sub-region in the first sample image is obtained.
  • the labeling category is used to indicate the respective sub-intervals corresponding to each sub-region in the first sample image in the object quantity classification interval.
  • FIG. 5 shows a schematic diagram of a method for determining an annotation category involved in an embodiment of the present application.
  • a first sample heat map of the first sample image is generated;
  • the sample heat map is convolved to obtain a first sample heat map 501 of the first sample image, and the pixel value of each point in the first sample heat map 501 can indicate the size of the pixel value in the first sample heat map.
  • the population density at each point Integrate each sub-region in the first sample image based on the heat map of the first sample to obtain the number of groups corresponding to each sub-region in the first sample image, as shown in the group number set 502 in FIG. 5 .
  • the numerical values in each sub-region in the group number set 502 correspond to the respective group numbers corresponding to each sub-region in the first sample image;
  • the number of groups corresponding to each sub-area in the image is classified, and the labeling category corresponding to each sub-area in the first sample image is obtained;
  • the object number classification interval 503 may include [0, 1], [1 , 2], [2, 3], [3, 4], [4, 5] and other sub-intervals, the labeling category corresponding to the sub-interval [0, 1] in the object quantity classification interval is A;
  • the subinterval [1, 2] in the classification interval corresponds to the labeling category B;
  • the subinterval [2, 3] in the object quantity classification interval corresponds to the labeling category C;
  • the subinterval [3] in the object quantity classification interval , 4] corresponds to the labeling category D;
  • the subinterval [4, 5] in the object quantity classification interval corresponds to the labeling category E.
  • the object quantity classification interval 503 For example, for the upper left part "1.2" of the group quantity set 502, through the object quantity classification interval 503, it can be classified into the [0, 1] sub-interval, and its corresponding labeling category is A; In the lower left part "4.2”, through the object quantity classification interval 503, it can be classified into the [4, 5] sub-interval, and its corresponding labeling category is D.
  • the process that the model processing device obtains the object quantity classification interval corresponding to the feature classification layer may be implemented as follows:
  • a first endpoint set is obtained; the first endpoint set is used to indicate the interval endpoint of the object quantity classification interval;
  • the quantitative classification interval is divided into sub-intervals;
  • the object quantity classification interval corresponding to the feature classification layer is obtained.
  • the classification interval of the number of objects can include the number of groups of each sub-region in all the sample images in the training sample set, the smaller the classification interval of the number of objects, the more accurate the classification;
  • the maximum value (maximum value and minimum value) of the number of groups in each sub-region in the sample image is determined as the interval endpoint of the classification interval of the number of objects corresponding to the feature classification layer.
  • the interval end point of the object quantity classification interval may be determined according to the maximum value of the group quantity of each sub-area in each of the sample images. Since the object quantity classification interval corresponding to the feature classification layer is used to classify the number of groups in the sub-regions of each sample image in the training sample set, the object quantity classification interval corresponding to the feature classification layer includes each The maximum number of populations in each sub-region in the sample image.
  • the minimum value of the population number of each sub-region in each sample image is the minimum value that is not zero among the population numbers of each sub-region in each sample image, that is, it is not zero the minimum value of the values.
  • the maximum value of the population number of each sub-region in each sample image is obtained as the first endpoint set.
  • the model processing device can obtain the minimum value of the population number of each sub-region in each sample image as the left endpoint of the first endpoint set, and the maximum value of the population number of each sub-region in each sample image It is obtained as the right endpoint in the first endpoint set, and the left and right endpoints are the interval endpoints of the object quantity classification interval corresponding to the feature classification layer.
  • the feature classification layer can be determined according to the interval endpoint of the object quantity classification interval corresponding to the feature classification layer.
  • the interval segment point of the corresponding object quantity classification interval is determined according to the interval endpoint of the object quantity classification interval.
  • the classification number corresponding to the feature classification layer is obtained; based on the classification number corresponding to the feature classification layer, an interval segment point of the object quantity classification interval corresponding to the feature classification layer is determined.
  • the number of categories corresponding to the feature classification layer is used to indicate the number of possible categories obtained by the feature classification layer after classifying the input sample image. For example, when the number of classifications corresponding to the feature classification layer is N (N is greater than or equal to 2, and N is a positive integer), after classifying the data through the feature classification layer, the probability of the data being N categories can be obtained. , at this time, the number of interval segmentation points in the object quantity classification interval corresponding to the feature classification layer can be N-1. By segmenting the object quantity classification interval corresponding to the feature classification layer by N-1 segmentation points, we can obtain The N sub-intervals corresponding to the feature classification layer.
  • the object quantity classification interval corresponding to the feature classification layer is evenly divided by the classification number corresponding to the feature classification layer, and the object quantity classification interval corresponding to the feature classification layer is obtained. endpoint of the interval.
  • the interval endpoint of the object quantity classification interval corresponding to the feature classification layer may be e ⁇ k*(log(b)-log(a))/K+log(a) ⁇ , Among them, it is assumed that except for the area where the number of objects is 0, the minimum total number of objects is a, and the maximum total number of objects is b, and the number of sub-intervals to be divided is K. At this time, the interval size of each sub-interval is non-linearly distributed, and the sub-intervals are divided by nonlinear distribution. In the obtained classification results, the sub-intervals with a small number of classification objects are more densely distributed, and the sub-intervals with a large number of classification objects are distributed more densely. The distribution of intervals is relatively scattered, so that there is a good classification effect on the number of groups of different densities.
  • Step 402 Perform data processing on the first sample image through the data processing layer of the model for determining the number of groups to obtain a sample population density feature map of the first sample image.
  • the data processing layer in the model for determining the number of groups is used to perform feature extraction on the first sample image in the training sample set to obtain image features of the first sample image;
  • the sample population density feature map is the pair of data processing layers.
  • the image features obtained by the feature extraction of the first sample image; wherein, the image features obtained by the feature extraction by the data processing layer are used to indicate the group information in the first sample image. Therefore, the sample of the first sample image
  • the population density feature map can be used to indicate the population number and population density in the first sample image.
  • the size of the sample population density feature map is the same as the size of the first sample image. That is to say, the sample population density feature map obtained after feature extraction is performed on the first sample image by the model for determining the population number is the same as the pixel size of the input first sample image.
  • the data processing layer in the population quantity determination model may be a U-shaped neural network model with an encoder-decoder structure.
  • the encoder structure in the data processing layer is used to extract the deep features of the input sample image through downsampling; the encoder structure in the data processing layer is used to restore the low-resolution deep features to high-resolution through upsampling. rate image features.
  • FIG. 6 shows a schematic diagram of a model of a data processing layer involved in an embodiment of the present application.
  • the input image first passes through the first four convolution blocks of the VGG16 network to extract features 601, and then passes through three consecutive hole convolutions 602 (the hole rates can be 2, 4, and 4, respectively), and the hole convolution 602 can improve the receptive field of the network without increasing the amount of parameters, so as to obtain a wider range of context information and obtain sufficient semantic features for group counting. Then, classify each point on the feature map (corresponding to each image block in the original image) through the feature classification layer 603 (schematically, it can be a 1x1 convolution), and obtain the prediction category corresponding to each sub-region of the image. .
  • Step 403 Determine the feature classification layer of the model by the number of groups, perform classification processing based on the density feature map of the sample population, and obtain prediction results corresponding to each sub-region in the first sample image.
  • the prediction results corresponding to the respective sub-regions may directly indicate the prediction categories corresponding to the respective sub-regions.
  • the prediction result is used to indicate a prediction probability set corresponding to each sub-region in the first sample image and the feature classification layer.
  • the model processing device determines the feature classification layer in the model based on the sample population density feature map by the population number Perform classification processing to obtain a prediction probability set corresponding to each sub-region in the first sample image and the feature classification layer; wherein, the prediction probability set is used to indicate that each sub-region in the first sample image belongs to the feature The probability of each category corresponding to the classification layer; based on the predicted probability set corresponding to each subregion in the first sample image, the predicted category corresponding to each subregion in the first sample image and the feature classification layer is obtained.
  • the first prediction probability set corresponding to each sub-region in the sample population density feature map and the feature classification layer can be obtained; wherein, the first prediction probability set The probability set is used to indicate the probability that each sub-region of the sample image belongs to each category of the feature classification layer (that is, the probability that each sub-region belongs to each sub-region of the object number classification interval respectively).
  • the category with the highest probability in the first predicted probability set corresponding to each sub-region may be determined as the predicted category of each sub-region in the sample image.
  • Step 404 based on the prediction results corresponding to the respective sub-regions in the first sample image and the labeling categories corresponding to the respective sub-regions in the first sample image, train the group quantity determination model.
  • the prediction category corresponding to each sub-region in the first sample image is obtained; according to the prediction category corresponding to each sub-region in the first sample image, and the first sample image Each sub-region in this image corresponds to the labeling category, and the model for determining the number of groups is trained.
  • the process of training the model for determining the number of groups may be to obtain the corresponding loss function value of each sub-region through the corresponding labeling category of each sub-region in the first sample image and the corresponding prediction category of each sub-region , and train the model for determining the number of groups according to the corresponding loss function values of each sub-region; or, the process of training the model for determining the number of groups may also be to obtain the corresponding sub-regions in the first sample image respectively. Label the category and the probability distribution corresponding to each sub-region, obtain the corresponding loss function value of each sub-region, and train the model for determining the number of groups according to the corresponding loss function value of each sub-region.
  • the training sample set may include at least two sample images, and the first sample image may be any one of the at least two sample images included in the training sample set.
  • the above is based on the first sample image, and The image annotation of the first sample image
  • the process of training the model for determining the number of groups is also applicable to other sample images in the training sample set; in the process of training the model for determining the number of groups, the sample images in the training sample set
  • the model for determining the number of groups is trained until the model for determining the number of groups converges, the training of the determining model for determining the number of groups is completed, and the trained model for determining the number of groups is obtained.
  • Step 405 acquiring a first image.
  • Step 406 based on the data processing layer of the population number determination model, perform data processing on the first image to obtain a population density feature map of the first image.
  • the data processing layer in the population number determination model is used to perform feature extraction on the first image to obtain image features of the first image;
  • the population density feature map is the pair of data processing layers. Image features obtained by performing feature extraction on the first image; wherein, the image features obtained by performing feature extraction by the data processing layer are used to indicate the population information in the first image, therefore, the population density feature map of the first image can be used for Indicates the population number and population density in this first image.
  • the size of the population density feature map is the same as the size of the first image. That is, the population density feature map obtained after feature extraction is performed on the first image by the population quantity determination model is the same as the pixel size of the input first image.
  • Step 407 Determine the feature classification layer of the model by the number of groups, and perform classification processing based on the population density feature map to obtain the predicted categories corresponding to each sub-region of the first image.
  • the feature classification layer of the model is determined by the number of groups, and classification processing is performed based on the group density feature map to obtain prediction results corresponding to each sub-region of the first image; based on each sub-region of the first image The prediction results corresponding to the regions are obtained, and the prediction categories corresponding to the sub-regions in the first image are obtained.
  • the feature classification layer of the model is determined by the number of groups, and the classification processing is performed based on the group density feature map to directly obtain the prediction categories corresponding to each sub-region of the first image, that is, the prediction result is Predicted category.
  • the feature classification layer in the model is determined by the number of groups, and the classification processing is performed based on the population density feature map, so that the corresponding sub-regions in the first image and the feature classification layer can be obtained.
  • a prediction probability set (prediction result); wherein, the prediction probability set is used to indicate the probability that each sub-region in the first image belongs to each category corresponding to the feature classification layer; based on the corresponding probability of each sub-region in the first image A prediction probability set is obtained, and prediction categories corresponding to each sub-region in the first image and the feature classification layer are obtained.
  • the first prediction probability set corresponding to each sub-region in the population density feature map and the feature classification layer can be obtained; wherein, the first prediction probability set It is used to indicate the probability that each sub-region of the image belongs to each category of the feature classification layer (that is, the probability that each sub-region belongs to each sub-region of the object number classification interval respectively).
  • the category with the highest probability in the first predicted probability set corresponding to each sub-region may be determined as the predicted category of each sub-region in the image.
  • Step 408 Obtain the number of predicted objects corresponding to each sub-region of the first image based on the predicted categories corresponding to each sub-region of the first image and the estimated value of the number of objects corresponding to each of the predicted categories.
  • the estimated value of the number of objects corresponding to each of the predicted categories may be determined based on the number of groups in each sub-region of a known image; in this embodiment of the present application, the known image may be a training sample for training a model for determining the number of groups
  • the sample images in the set that is, the estimated value of the number of objects corresponding to each of the predicted categories may be determined based on the number of groups in each sub-region of the sample images in the training sample set corresponding to the model for determining the number of groups.
  • the estimated value of the number of objects corresponding to the predicted category may be used to indicate the number of predicted objects in the sub-region corresponding to the predicted category. That is, when a certain sub-area corresponds to a certain prediction category, it can be considered that the estimated value of the number of objects corresponding to the prediction category is the number of prediction objects corresponding to the sub-area.
  • a first-type sample sub-region is obtained; the first-type sample sub-region is a sub-region corresponding to the first-type labeling category in the sub-regions of each sample image in the training sample set;
  • the first type of labeling category is any one of each labeling category; based on the number of groups corresponding to each sub-region in the first-type sample sub-region, the estimated value of the number of objects corresponding to the first-type labeling category is determined.
  • the average value of the number of groups corresponding to each sub-region in the sample sub-regions of the first type is determined as the data value of the number of objects corresponding to the labeling category of the first type.
  • the estimated value of the number of objects corresponding to each labeling category may be determined according to the number of groups corresponding to the subregions corresponding to the labeling category in each subregion of each sample image in the training sample set. That is, the estimated value of the number of objects corresponding to each label category can be the average value of the number of groups corresponding to the sub-regions corresponding to the label class in each sub-region of each sample image in the training sample set.
  • each label class The sum of the discrete errors between the number of real objects in the corresponding sub-region and the estimated value of the number of objects is small. Therefore, when predicting the real image through the estimated value of the number of objects, the generated discrete error should also be small.
  • the proof process Can be as follows:
  • the corresponding local count values d k , k ⁇ 1,2...K ⁇ form the set
  • calculating the population size determines the model's expected count error ⁇ on that test image.
  • the expected count error ⁇ of the image can be approximated as where pi is the frequency of occurrence of d i in the set ⁇ , is the predicted value of the local count value d i .
  • pi can be approximated as ( is the number of occurrences of d i in ⁇ )
  • the expected count error can be expressed as:
  • FIG. 7 shows a schematic diagram of obtaining an interval proxy value involved in an embodiment of the present application.
  • the method of obtaining the interval proxy value is usually to take the value of the midpoint of the interval as the interval proxy value (ie, the estimated value of the number of objects).
  • the sub-interval 701 shown, the endpoint of this sub-interval 701 is [0, 10], the number of interval groups for each image in this interval is biased towards the interval endpoint on the 0 side, and the midpoint 5 of the interval is taken as the interval.
  • the proxy value cannot offset the discrete error generated by the proxy value of the interval, so taking the midpoint 5 of the interval as the proxy value of the interval will produce a certain discrete error, while If the population numbers of all sub-regions corresponding to the sub-region in the training sample set are averaged, when the training sample set is large enough, the average value can reflect the distribution of the population numbers corresponding to the sub-region to a certain extent. This average is used as a proxy for the interval, which reduces the dispersion error caused by classifying population numbers by interval.
  • Step 409 Obtain the number of groups in the first image based on the number of predicted objects corresponding to each sub-region in the first image.
  • the number of predicted objects corresponding to each sub-region in the first image is summed to obtain the number of groups in the first image.
  • the number of prediction objects corresponding to each sub-region in the first image is summed to obtain the number of groups in the first image.
  • the specified condition may be the number of prediction objects excluding the maximum value and the minimum value among the number of prediction objects corresponding to each sub-region in the first image respectively.
  • FIG. 8 shows a schematic flowchart of a process of determining an estimated value of the number of objects involved in an embodiment of the present application.
  • Given an image first obtain the population density feature map 801 of the image according to the method shown in the embodiment of the present application, and then calculate the sum of the density values for each image block as the total object in the image block
  • the number 802 (referred to as a local count value), and finally the label category 804 of each image block is determined based on the total number of objects corresponding to each image block relative to the counting interval in which the object number classification interval 803 is located.
  • the number of predicted objects of the image block 805 classified into a certain category is the estimated number of objects 806 corresponding to the label category (ie, A).
  • the prediction of each image block is obtained
  • the sum of the predicted object numbers of all image blocks is taken as the population number of the whole image.
  • the solutions shown in the above embodiments of the present application can also be applied to the field of intelligent transportation.
  • the management platform corresponding to smart transportation can obtain real-time group images of traffic locations that need to be managed through devices such as cameras, and then reduce the impact of discretization errors on the estimation of the number of groups according to the solutions shown in the embodiments of the present application.
  • Accurate estimation of the group density in the group image is realized to obtain the number of groups in the group image, and based on the real-time group number of each traffic location, the passenger flow density of each traffic location is evaluated, so that the transportation hub can carry out the evaluation of the vehicle.
  • Intelligent scheduling can effectively improve the passenger flow management capability of the transportation hub.
  • the model for determining the number of groups is trained by using the sample image and the labeling category of each sub-region in the sample image, and the trained model for determining the number of groups is obtained;
  • the group quantity determination model determines the predicted categories of each sub-region of the input first image and the estimated value of the number of objects corresponding to each predicted category, thereby determining the number of groups in the first image.
  • the computer device can determine the estimated value of the number of objects according to the number of groups in each sub-region of the sample image in the training sample set corresponding to the group number determination model, so that the estimated value of the number of objects corresponding to each predicted category is closer to the predicted category.
  • the corresponding real value of the number of objects reduces the discretization error generated when the number of objects is determined in each sub-region through the estimated value of the number of objects, and improves the accuracy of estimating the number of groups in the image.
  • FIG. 9 is a flowchart of model training and data processing according to an exemplary embodiment.
  • the model training process can be applied to the model training device 900, the model training device 900 can be a server, and the crowd quantity determination process can be applied to the data processing device 910, and the data processing device 910 can be a terminal, wherein the The flow of model training and crowd size estimation is as follows.
  • the sample images 901 in the training data set are first classified by the population classification interval corresponding to the feature classification layer in the crowd quantity determination model (group quantity clustering model) 902, and each sample image 901 in the sample image 901 is determined.
  • the sub-areas are the corresponding sub-intervals in the population classification interval, and the sub-areas corresponding to each sub-area in the population classification interval 903 in the sample images in the training data set are obtained as the labeling categories 904 of the sample images.
  • the estimated number of people in the subregion corresponding to the labeling category can be determined according to the actual number of people in the subregions corresponding to the labeling category.
  • the value 906 (ie, the average value of the number of real people), so that when judging that a sub-region of the image corresponds to the labeled category, the model training device can directly determine the estimated number of people as the predicted number of people in the sub-region.
  • the data processing layer in the crowd quantity determination model 902 extracts data from the input sample image 901 to obtain the sample crowd density feature map output by the data processing layer, and then classifies the sample crowd density feature map through the feature classification layer to obtain The prediction results 905 corresponding to each sub-region in the sample image 901 respectively (that is, the probability that each sub-region corresponds to each sub-interval in the population classification interval).
  • the model training device 900 can train the crowd quantity determination model 902 through the label category 904 and the prediction result 905 to obtain a trained crowd quantity determination model; the model training device 900 can transmit the trained crowd quantity determination model to data processing In the device 910, the data processing device 910 processes the input image to obtain the number of people in the image.
  • the data processing device 910 for the inputted first image 911, data processing can be performed on the first image 911 through the trained model 912 for determining the number of people to obtain predictions corresponding to each sub-region in the first image.
  • the prediction result may be the probability that each sub-region in the first image corresponds to each sub-region in the population classification interval; the data processing device 910 obtains the sub-region with the highest probability as each sub-region in the first image The sub-interval corresponding to the area in the population classification interval; and then determine the predicted number of people in each sub-area in the first image according to the estimated number of people in each sub-interval obtained above, to determine the number of people in the first image 913 .
  • Fig. 10 is a block diagram showing the structure of an apparatus for determining the number of groups according to an exemplary embodiment.
  • the device for determining the number of groups can implement all or part of the steps in the method provided by the embodiment shown in FIG. 2 or FIG. 4 , and the device for determining the number of groups includes the following parts:
  • a first data processing module 1002 configured to perform data processing on the first image to obtain a population density feature map of the first image;
  • the population density feature map is an image obtained by feature extraction on the first image feature;
  • a first classification module 1003 configured to perform classification processing based on the population density feature map, to obtain prediction categories corresponding to respective sub-regions in the first image;
  • a first predicted quantity obtaining module 1004 configured to obtain the first image based on the predicted categories corresponding to each sub-region in the first image and the estimated number of objects corresponding to each of the predicted categories The number of predicted objects corresponding to each sub-region in the image; the estimated value of the number of objects corresponding to each of the predicted categories is determined based on the number of groups of each sub-region in the known image;
  • a first group number determination module 1005 configured to acquire the number of groups in the first image based on the number of predicted objects corresponding to each sub-region in the first image.
  • the first data processing module is configured to perform data processing on the first image by determining the data processing layer of the model by the number of groups to obtain the population density of the first image feature map;
  • the population density feature map is an image feature obtained by feature extraction by the data processing layer;
  • the first classification module is used to determine the feature classification layer of the model by the number of groups, and perform classification processing based on the group density feature map to obtain the predicted categories corresponding to the respective sub-regions in the first image. ;
  • the model for determining the number of groups is a machine learning model trained based on the sample images in the training sample set and the labeling categories corresponding to each sub-region of the sample images; the estimated number of objects corresponding to each of the predicted categories is estimated Values are determined based on the population size of each sub-region of the sample image.
  • the model for determining the number of groups is trained by using the sample image and the labeling category of each sub-region in the sample image, and the trained model for determining the number of groups is obtained;
  • the group quantity determination model determines the predicted categories of each sub-region of the input first image and the estimated value of the number of objects corresponding to each predicted category, thereby determining the number of groups in the first image.
  • the computer device can determine the estimated value of the number of objects according to the number of groups in each sub-region of the sample image in the training sample set corresponding to the group number determination model, so that the estimated value of the number of objects corresponding to each predicted category is closer to the predicted category.
  • the corresponding real value of the number of objects reduces the discretization error generated when the number of objects is determined in each sub-region through the estimated value of the number of objects, and improves the accuracy of estimating the number of groups in the image.
  • Fig. 11 is a block diagram showing the structure of an apparatus for determining the number of groups according to an exemplary embodiment.
  • the device for determining the number of groups can implement all or part of the steps in the method provided by the embodiment shown in FIG. 3 or FIG. 4 , and the device for determining the number of groups includes the following parts:
  • a first sample acquisition module 1101 configured to acquire a first sample image and the labeling categories corresponding to each sub-region in the first sample image
  • the sample feature acquisition module 1102 is configured to determine the data processing layer of the model by the number of groups, perform data processing on the first sample image, and obtain a sample population density feature map of the first sample image; the sample population density The feature map is an image feature obtained by the data processing layer performing feature extraction on the first sample image;
  • the sample data processing module 1103 is used to determine the feature classification layer of the model by the number of groups, perform classification processing based on the density feature map of the sample population, and obtain prediction results corresponding to each sub-region in the first sample image. ;
  • the model training module 1104 is configured to, based on the prediction results corresponding to the respective sub-regions in the first sample image, and the labeling categories corresponding to the respective sub-regions in the first sample image, perform a The number of groups determines the model for training;
  • the trained model for determining the number of groups is used to obtain the predicted category of each sub-region in the first image according to the input first image, and determine the estimated number of objects corresponding to each of the predicted categories.
  • the number of groups in the first image; the estimated value of the number of objects corresponding to each of the predicted categories is determined based on the number of groups in each sub-region of the sample image contained in the training sample set.
  • the first sample acquisition module includes:
  • a sample acquisition unit configured to acquire the training sample set;
  • the training sample set includes the first sample image and the image annotation of the first sample image;
  • the image annotation is used to indicate the first sample image the location of the sample object in the sample image;
  • a sample population obtaining unit configured to obtain the population number of each sub-region in the first sample image based on the image annotation of the first sample image
  • a labeling category obtaining unit configured to obtain the labeling categories corresponding to each sub-region in the first sample image based on the number of groups of each sub-region in the first sample image.
  • the labeling category obtaining unit includes:
  • a classification interval obtaining subunit used for obtaining the object quantity classification interval corresponding to the feature classification layer;
  • the object quantity classification interval includes at least two sub-intervals;
  • Annotation category acquisition sub-unit used for classifying through the object number classification interval based on the number of groups in each sub-region in the first sample image, and obtaining the corresponding sub-regions in the first sample image respectively of the annotation category.
  • the labeling category obtaining unit includes:
  • the first-class sample sub-region is a sub-region corresponding to the first-class labeling category in the sub-regions of each sample image in the training sample set ;
  • the first type of labeling category is any one of the labeling categories;
  • the estimated value obtaining subunit is configured to determine the estimated value of the number of objects corresponding to the first type of labeling category based on the number of groups corresponding to each sub-area in the first type of sample sub-areas.
  • the estimated value obtaining subunit is further used for:
  • the average value of the number of groups corresponding to each sub-region in the first-type sample sub-region is determined as the estimated value of the number of objects corresponding to the first-class labeling category.
  • the classification interval acquisition subunit is used for,
  • the first endpoint set is used to indicate the interval endpoints of the object number classification interval ;
  • a first segmentation point set is determined; the first segmentation point set is used to indicate the interval segmentation points of the object quantity classification interval; the interval segmentation points are used for dividing the object quantity classification interval into sub-intervals;
  • the object quantity classification interval corresponding to the feature classification layer is obtained.
  • the sample population acquisition unit is used to:
  • a first sample heat map of the first sample image is obtained; the first sample heat map is used to indicate the the location of the group in the first sample image;
  • data processing is performed through a Gaussian convolution kernel to obtain a first sample heat map of the first sample image
  • integration is performed on each sub-region in the first sample image, to obtain the population number of each sub-region in the first sample image.
  • the model for determining the number of groups is trained by using the sample image and the labeling category of each sub-region in the sample image, and the trained model for determining the number of groups is obtained;
  • the group quantity determination model determines the predicted categories of each sub-region of the input first image and the estimated value of the number of objects corresponding to each predicted category, thereby determining the number of groups in the first image.
  • the computer device can determine the estimated value of the number of objects according to the number of groups in each sub-region of the sample image in the training sample set corresponding to the group number determination model, so that the estimated value of the number of objects corresponding to each predicted category is closer to the predicted category.
  • the corresponding real value of the number of objects reduces the discretization error generated when the number of objects is determined in each sub-region through the estimated value of the number of objects, and improves the accuracy of estimating the number of groups in the image.
  • Fig. 12 is a schematic structural diagram of a computer device according to an exemplary embodiment.
  • the computer device may be implemented as the model training device and/or the data processing device in each of the above method embodiments.
  • the computer device 1200 includes a central processing unit (CPU, Central Processing Unit) 1201, a system memory 1204 including a random access memory (Random Access Memory, RAM) 1202 and a read-only memory (Read-Only Memory, ROM) 1203, and A system bus 1205 that connects the system memory 1204 and the central processing unit 1201 .
  • the computer device 1200 also includes a basic input/output system 1206 that facilitates the transfer of information between various components within the computer, and a mass storage device 1207 for storing an operating system 1213, application programs 1214, and other program modules 1215.
  • the computer-readable media can include computer storage media and communication media.
  • Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.
  • Computer storage media include RAM, ROM, flash memory, or other solid-state storage technology, CD-ROM, or other optical storage, magnetic tape cartridges, magnetic tape, magnetic disk storage, or other magnetic storage devices.
  • RAM random access memory
  • ROM read-only memory
  • flash memory or other solid-state storage technology
  • CD-ROM or other optical storage
  • magnetic tape cartridges magnetic tape
  • magnetic disk storage or other magnetic storage devices.
  • the system memory 1204 and the mass storage device 1207 described above may be collectively referred to as memory.
  • the memory also includes one or more programs, the one or more programs are stored in the memory, and the central processing unit 1201 implements the method shown in FIG. 2 , FIG. 3 or FIG. 4 by executing the one or more programs all or part of the steps in .
  • a computer-readable storage medium is also provided, and at least one computer program is stored in the computer-readable storage medium, and the computer program is loaded and executed by a processor to realize the above information generation method. all or part of the steps.
  • the computer-readable storage medium may be Read-Only Memory (ROM), Random Access Memory (RAM), Compact Disc Read-Only Memory (CD-ROM), Tape, floppy disk, and optical data storage devices, etc.
  • a computer program product includes at least one computer program, the computer program is loaded by a processor and executes all or part of the steps in the methods shown in the above embodiments .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

一种群体数量确定方法、装置、设备、存储介质及程序产品,涉及图像处理技术领域。所述方法包括:获取第一图像(201);对第一图像进行数据处理,得到第一图像的群体密度特征图(202);基于群体密度特征图进行分类处理,得到第一图像中的各个子区域分别对应的预测类别(203);基于各个子区域分别对应的预测类别,以及与各个预测类别相对应的对象数量估计值,获取各个子区域分别对应的预测对象数量(204);基于各个子区域分别对应的预测对象数量,获取第一图像中的群体数量(205)。

Description

群体数量确定方法、装置、设备、存储介质及程序产品
本申请要求于2021年02月25日提交的申请号为202110212105.3、发明名称为“人群数量确定方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及图像处理技术领域,特别涉及一种群体数量确定方法、装置、设备、存储介质及程序产品。
背景技术
人群数量估计是一种能够自动推理出图像中包含的总人数的方法。
在相关技术中,在对图像中包含的对象数量进行识别时,计算机设备可以结合热力图回归的方法,利用深度学习技术进行端到端地对模型进行训练与推理,以通过训练好的模型对图像中的对象数量进行预测。
发明内容
本申请实施例提供了群体数量确定方法、装置、设备、存储介质及程序产品。该技术方案如下。
一方面,提供了一种群体数量确定方法,由计算机设备执行,所述方法包括:
获取第一图像;
对所述第一图像进行数据处理,得到所述第一图像的群体密度特征图;所述群体密度特征图是对所述第一图像进行特征提取得到的图像特征;
基于所述群体密度特征图进行分类处理,得到所述第一图像中的各个子区域分别对应的预测类别;
基于所述第一图像中的各个子区域分别对应的所述预测类别,以及与各个所述预测类别相对应的对象数量估计值,获取所述第一图像中的各个子区域分别对应的预测对象数量;与各个所述预测类别相对应的对象数量估计值是基于已知图像中的各个子区域的群体数量确定的;
基于所述第一图像中的各个子区域分别对应的所述预测对象数量,获取所述第一图像中的群体数量。
另一方面,提供了一种群体数量确定方法,由计算机设备执行,所述方法包括:
获取第一样本图像,以及所述第一样本图像中的各个子区域分别对应的所述标注类别;
通过群体数量确定模型的数据处理层,对所述第一样本图像进行数据处理,得到所述第一样本图像的样本群体密度特征图;所述样本群体密度特征图是所述数据处理层对所述第一样本图像进行特征提取得到的图像特征;
通过所述群体数量确定模型的特征分类层,基于所述样本群体密度特征图进行分类处理,获得所述第一样本图像中的各个子区域分别对应的预测结果;
基于所述第一样本图像中的各个子区域分别对应的预测结果,以及所述第一样本图像中的各个子区域分别对应的所述标注类别,对所述群体数量确定模型进行训练;
训练后的所述群体数量确定模型用于根据输入的第一图像,获得所述第一图像中的各个子区域的预测类别,并根据与各个所述预测类别相对应的对象数量估计值,确定所述第一图 像中的群体数量;与各个所述预测类别相对应的对象数量估计值是基于训练样本集中包含的样本图像的各个子区域的群体数量确定的。
又一方面,提供了一种群体数量确定装置,所述装置包括:
第一图像获取模块,用于获取第一图像;
第一数据处理模块,用于对所述第一图像进行数据处理,得到所述第一图像的群体密度特征图;所述群体密度特征图是对所述第一图像进行特征提取得到的图像特征;
第一分类模块,用于基于所述群体密度特征图进行分类处理,得到所述第一图像中的各个子区域分别对应的预测类别;
第一预测数量获取模块,用于基于所述第一图像中的各个子区域分别对应的所述预测类别,以及与各个所述预测类别相对应的对象数量估计值,获取所述第一图像中的各个子区域分别对应的预测对象数量;与各个所述预测类别相对应的对象数量估计值是基于已知图像中的各个子区域的群体数量确定的;
第一群体数量确定模块,用于基于所述第一图像中的各个子区域分别对应的所述预测对象数量,获取所述第一图像中的群体数量。
再一方面,提供了一种群体数量确定装置,所述装置包括:
第一样本获取模块,用于获取第一样本图像,以及所述第一样本图像中的各个子区域分别对应的所述标注类别;
样本特征获取模块,用于通过群体数量确定模型的数据处理层,对所述第一样本图像进行数据处理,得到所述第一样本图像的样本群体密度特征图;所述样本群体密度特征图是所述数据处理层对所述第一样本图像进行特征提取得到的图像特征;
样本数据处理模块,用于通过所述群体数量确定模型的特征分类层,基于所述样本群体密度特征图进行分类处理,获得所述第一样本图像中的各个子区域分别对应的预测结果;
模型训练模块,用于基于所述第一样本图像中的各个子区域分别对应的预测结果,以及所述第一样本图像中的各个子区域分别对应的所述标注类别,对所述群体数量确定模型进行训练;
训练后的所述群体数量确定模型用于根据输入的第一图像,获得所述第一图像中的各个子区域的预测类别,并根据与各个所述预测类别相对应的对象数量估计值,确定所述第一图像中的群体数量;与各个所述预测类别相对应的对象数量估计值是基于训练样本集中包含的样本图像的各个子区域的群体数量确定的。
一方面,提供了一种计算机设备,所述计算机设备包含处理器和存储器,所述存储器中存储有至少一条计算机程序,所述至少一条计算机程序由所述处理器加载并执行以实现上述的群体数量确定方法。
再一方面,提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有至少一条计算机程序,所述计算机程序由处理器加载并执行以实现上述的群体数量确定方法。
又一方面,提供了一种计算机程序产品,所述计算机程序产品包括至少一条计算机程序,所述计算机程序由处理器加载并执行以实现上述群体数量确定方法。
本申请实施例提供的技术方案带来的有益效果至少包括:
计算机设备可以确定输入的第一图像的各个子区域的预测类别,以及与各个预测类别相对应的对象数量估计值,进而确定该第一图像中的群体数量。在上述方案中,计算机设备可以通过已知图像的各个子区域的群体数量确定对象数量估计值,使得与各个预测类别相对应的对象数量估计值更加贴近该预测类别对应的对象数量的真实值,降低了通过对象数量估计值对各个子区域进行对象数量确定时产生的离散化误差,提高了对图像进行群体数量估计的准确性。
附图说明
图1示出了本申请一个示例性实施例提供的计算机系统的示意图;
图2是根据一示例性实施例示出的一种群体数量确定方法的流程示意图;
图3是根据一示例性实施例示出的一种群体数量确定方法的流程示意图;
图4是根据一示例性实施例示出的一种群体数量确定方法的方法流程图;
图5示出了图4所示实施例涉及的一种标注类别确定的方法示意图;
图6示出了图4所示实施例涉及的一种数据处理层的模型示意图;
图7示出了图4所示实施例涉及的一种获取区间代理值的示意图;
图8示出了图4所示实施例涉及的一种对象数量估计值的确定过程的流程示意图;
图9是根据一示例性实施例提供的一种模型训练以及数据处理的流程框图;
图10是根据一示例性实施例示出的一种群体数量确定装置的结构方框图;
图11是根据一示例性实施例示出的一种群体数量确定装置的结构方框图;
图12是根据一示例性实施例示出的一种计算机设备的结构示意图。
具体实施方式
本申请实施例提供的群体数量确定方法可以应用于计算机设备中;可选的,该计算机设备可以具有数据处理能力。其中,该群体数量确定方法可以是对群体数量确定模型的训练方法;该群体数量确定模型可以实现对输入图像的处理,得到与输入图像相对应的群体数量。
在一种可能的实现方式中,本申请是实施例提供的群体数量确定方法可以应用于个人计算机、工作站或服务器中,即可以通过个人计算机、工作站或服务器对群体数量确定模型进行训练。
在一种可能的实现方式中,通过本申请实施例提供的群体数量确定方法训练出的群体数量确定模型可以对输入的图像数据进行数据处理,得到该图像的群体数量的预测数据。
图1示出了本申请一个示例性实施例提供的计算机系统的示意图。该计算机系统100中包括终端110和服务器120,其中,终端110与服务器120之间通过通信网络进行数据通信;可选地,通信网络可以是有线网络、无线网络、局域网、城域网以及广域网中的至少一种。
终端110中可以安装有具有图像处理功能的应用程序,该应用程序可以是专业图像处理应用程序、社交类应用程序、虚拟现实应用程序、游戏应用程序、具有图像处理功能的人工智能(Artificial Intelligence,AI)应用程序等等,本申请实施例对此不作限定。
可选的,该终端110可以是具有图像采集组件的终端设备,该图像采集组件用于获取图像并存储于该终端110中的数据存储模块中;或者,该终端110还可以是具有数据传输接口的终端设备,该数据传输接口用于接收具有图像采集组件的图像采集设备采集到的图像数据。
可选的,终端110可以是智能手机、平板电脑、膝上便携式笔记本电脑等移动终端,也可以是台式电脑、投影式电脑等终端,或是具有数据处理组件的智能终端,本申请实施例对此不做限定。
服务器120可以实现为一台服务器,也可以实现为一组服务器构成的服务器集群;该服务器120可以是物理服务器,也可以实现为云服务器。在一种可能的实施方式中,服务器120是终端110中安装的应用程序的后台服务器。
在一种可能的实现方式中,服务器120通过预先设置的训练样本集对群体数量确定模型进行训练;其中,训练样本集中可以包含具有不同群体密度的样本图像。当服务器120对该群体数量确定模型的训练过程完成后,通过有线或无线连接,将训练好的群体数量确定模型发送至终端110中。终端110接收到训练好的群体数量确定模型,并将群体数量确定模型对应的数据信息输入具有群体数量确定功能的应用程序中,以便该应用程序对图像数据进行处理时,可以根据训练好的该群体数量确定模型对图像数据进行处理,从而实现群体数量确定方法中的全部或部分步骤。
基于本申请提供的群体数量确定方法,计算机可以获取第一图像;对第一图像进行数据处理,得到第一图像的群体密度特征图,该群体密度特征图是对第一图像进行特征提取得到的图像特征;基于群体密度特征图进行分类处理,得到第一那个中的各个子区域分别对应的预测类别;基于第一图像中的各个子区域分别对应的预测类别,以及与各个预测类别相对应的对象数量估计值,获取第一图像中各个子区域分别对应的预测对象数量;与各个预测类别相对应的对象数量估计值是基于已知图像中的各个子区域的群体数量确定的;基于第一图像中的各个子区域分别对应的预测对象数量,获取第一图像中的群体数量。通过上述方法,计算机设备在获取第一图像中的群体数量时,可以基于已知图像的各个子区域的群体数量确定各个预测类别的对象数量估计值,使得与各个预测类别相对应的对象数量估计值更加贴近该预测类别对应的对象数量的真实值,降低了通过对象数量估计值对各个子区域进行对象数量确定时产生的离散化误差,提高了对图像进行群体数量估计的准确性。
本申请实施例提供的群体数量确定方法可以通过机器学习模型实现,该机器学习模型可以是包括数据处理层以及特征分类层的群体数量确定模型;在此情况下,用于确定与各个预测类别相对应的对象数量估计值的已知图像可以实现为:用于训练群体数量确定模型的样本图像;也就是说,与各个预测类别相对应的对象数量估计值是基于样本图像的各个子区域的群体数量确定的;基于此,图2是根据一示例性实施例示出的一种群体数量确定方法的流程示意图。该方法可以由计算机设备执行,其中,该计算机设备可以是上述图1所示的实施例中的终端110。如图2所示,该群体数量确定方法的流程可以包括如下步骤(201~205)。
步骤201,获取第一图像。
步骤202,通过群体数量确定模型的数据处理层,对该第一图像进行数据处理,得到该第一图像的群体密度特征图。
该群体密度特征图是对第一图像进行特征提取得到的图像特征。
步骤203,通过该群体数量确定模型的特征分类层,基于群体密度特征图进行分类处理,得到该第一图像中的各个子区域分别对应的预测类别。
步骤204,基于该第一图像中的各个子区域分别对应的预测类别,以及与各个预测类别相对应的对象数量估计值,获取该第一图像中的各个子区域分别对应的预测对象数量。
步骤205,基于该第一图像中的各个子区域分别对应的预测对象数量,获取该第一图像中的群体数量。
其中,该群体数量确定模型是基于训练样本集中的样本图像,以及样本图像的各个子区域分别对应的标注类别训练得到的机器学习模型;各个该预测类别对应的对象数量估计值是基于该群体数量确定模型对应的训练样本集中样本图像的各个子区域的群体数量确定的。
在本申请实施例中,当第一图像中的群体实现为人群时,该群体数量确定模型可以实现为人群数量确定模型,该对象估计值可以是人数估计值,第一图像中的各个子区域分别对应的预测对象数量可以是第一图像中的各个子区域分别对应的预测人数;本申请提供的群体数量确定方法可以是用于获取第一图像中的人群数量的人群数量确定方法。
可选的,该群体还可以实现为其他群体,示意性的,该群体可以是动物群体,比如,图像包含的鸟群;或者,该群体还可以是植物群体,比如,图像中包含的森林等等。
综上所述,本申请实施例所示方案,通过样本图像,以及该样本图像中的各个子区域的标注类别对群体数量确定模型进行训练,得到训练好的群体数量确定模型;计算机设备可以通过该群体数量确定模型确定输入的第一图像的各个子区域的预测类别,以及与各个预测类别相对应的对象数量估计值,进而确定该第一图像中的群体数量。在上述方案中,计算机设备可以通过群体数量确定模型对应的训练样本集中样本图像的各个子区域的群体数量确定对象数量估计值,使得与各个预测类别相对应的对象数量估计值更加贴近该预测类别对应的对 象数量的真实值,降低了通过对象数量估计值对各个子区域进行对象数量确定时产生的离散化误差,提高了对图像进行群体数量估计的准确性。
图3是根据一示例性实施例示出的一种群体数量确定方法的流程示意图。该方法可以由计算机设备执行,其中,该计算机设备可以是上述图1所示的实施例中的服务器120。如图3所示,该群体数量确定方法的流程可以包括如下步骤(301~304)。
步骤301,获取第一样本图像,以及第一样本图像中的各个子区域分别对应的标注类别。
步骤302,通过群体数量确定模型的数据处理层,对该第一样本图像进行数据处理,得到该第一样本图像的样本群体密度特征图。
该样本群体密度特征图是数据处理层对第一样本图像进行特征提取得到的图像特征。
步骤303,通过该群体数量确定模型的特征分类层,基于该样本群体密度特征图进行分类处理,获得该第一样本图像中的各个子区域分别对应的预测结果。
步骤304,基于该第一样本图像中的各个子区域分别对应的预测结果,以及该第一样本图像中的各个子区域分别对应的标注类别,对该群体数量确定模型进行训练。
训练后的群体数量确定模型用于根据输入的第一图像,获得该第一图像中的各个子区域的预测类别,并根据与各个预测类别相对应的对象数量估计值,确定第一图像中的群体数量。
其中,与各个预测类别相对应的对象数量估计值是基于该群体数量确定模型的训练样本集中包含的样本图像的各个子区域的群体数量确定的。
综上所述,本申请实施例所示方案,通过样本图像,以及该样本图像中的各个子区域的标注类别对群体数量确定模型进行训练,得到训练好的群体数量确定模型;计算机设备可以通过该群体数量确定模型确定输入的第一图像的各个子区域的预测类别,以及与各个预测类别相对应的对象数量估计值,进而确定该第一图像中的群体数量。在上述方案中,计算机设备可以通过群体数量确定模型对应的训练样本集中样本图像的各个子区域的群体数量确定对象数量估计值,使得与各个预测类别相对应的对象数量估计值更加贴近该预测类别对应的对象数量的真实值,降低了通过对象数量估计值对各个子区域进行对象数量确定时产生的离散化误差,提高了对图像进行群体数量估计的准确性。
图4是根据一示例性实施例示出的一种群体数量确定方法的方法流程图。该方法可以由模型处理设备与数据处理设备共同执行;其中,该模型处理设备可以是上述图1所示的实施例中的服务器120,该数据处理设备可以是上述图1所示实施例中的终端110。如图4所示,该群体数量确定方法的流程可以包括以下步骤(401~409)。
步骤401,获取第一样本图像,以及该第一样本图像中的各个子区域分别对应的标注类别。
在一种可能的实现方式中,该第一样本图像可以是训练样本集中包含的样本图像中的一个;因此,获取第一样本图像的过程可以实现为:
获取训练样本集;该训练样本集中包含第一样本图像,以及该第一样本图像的图像标注;该图像标注用于指示该第一样本图像中样本对象的位置;
基于该第一样本图像的图像标注,获取该第一样本图像中的各个子区域的群体数量;
基于该第一样本图像中的各个子区域的群体数量,获取该第一样本图像中的各个子区域分别对应的标注类别。
在一种可能的实现方式中,该图像标注可以是基于各个样本图像上的各个对象(即人体)的头部位置生成的;也就是说,模型处理设备可以根据各个样本图像上的各个对象的头部位置信息确定该各个样本图像上的群体位置以及群体数量。
以各个样本图像中的第一样本图像为例,在一种可能的实现方式中,可以基于第一样本图像的图像标注,通过数学算法计算第一样本图像上的群体位置以及群体数量;示意性的, 该过程可以实现为:
基于第一样本图像,以及第一样本图像的图像标注,获得第一样本图像的第一样本热点图;该第一样本热点图用于指示该第一样本图像中样本对象的所在位置;
基于该第一样本热点图,通过高斯卷积核进行数据处理,获取该第一样本图像的第一样本热力图;
基于该第一样本热力图,分别在该第一样本图像中的各个子区域进行积分,获得该第一样本图像的各个子区域的群体数量。
当获取到该第一样本图像以及与该第一样本图像相对应的图像标注时,可以根据该第一样本图像的图像标注,将第一样本图像上与该图像标注相对应的位置进行高亮显示,获取该第一样本图像的第一样本热点图。例如,对于第一样本热点图,获取图中的N个人头中心点x 1至x n;对于每个人头中心点x i,可以生成一张二维的响应图H i,该响应图中人头中心点位置的像素值为1,其余位置均为0;然后将所有人头中心点对应的H i相加得到该第一样本图像中的所有人头的响应图H(即第一样本热点图),该响应图的积分值即为总对象数量。
由于在对第一样本图像进行图像块分割时,存在该第一样本图像中的任意一个子区域包含了某个对象的人头中心点,但该对象的所有图像部分并不完全位于该子区域中的情况,但由于该对象的人头中心点位于该子区域内,该对象会被认为全部位于该子区域内,因此通过生成该第一样本图像的第一样本热点图来表示第一样本图像中的各个子区域的预测对象数量是不准确的。此时,模型处理设备可以通过一个归一化的高斯卷积核对响应图(第一样本热点图)进行卷积处理得到该第一样本图像的第一样本热力图;该第一样本热力图是基于该第一样本图像中的各个人头中心点形成的高斯分布图,该第一样本热力图中的各点的像素值的大小用于指示该第一样本热力图中的各点的群体密度;因此,该第一样本热力图可以用于指示该第一样本图像上各个像素点上的群体密度;并且由于高斯核是归一化的,因此,在通过高斯卷积核进行数据处理后得到第一样本热力图后,对第一样本热力图进行积分后得到的值仍然是该第一样本图像中的总对象数量;同理,对第一样本图像中的各个子区域进行积分,可以获得第一样本图像中的各个子区域分别对应的群体数量。
在一种可能的实现方式中,获取第一样本图像中的各个子区域分别对应的标注类别的过程可以实现为:
获取该特征分类层对应的对象数量分类区间;该对象数量分类区间包含至少两个子区间;
基于该第一样本图像中的各个子区域的群体数量,通过该对象数量分类区间进行分类,获得该第一样本图像中的各个子区域分别对应的标注类别。
该标注类别用于指示该第一样本图像中的各个子区域在该对象数量分类区间中分别对应的子区间。
请参考图5,其示出了本申请实施例涉及的一种标注类别确定的方法示意图。如图5所示,根据第一样本图像,以及该第一样本图像的图像标注,生成该第一样本图像的第一样本热点图;通过归一化的高斯卷积核对第一样本热点图进行卷积得到该第一样本图像的第一样本热力图501,该第一样本热力图501中的各点的像素值大小可以指示该第一样本热力图中的各点的群体密度。基于该第一样本热力图对第一样本图像中的各个子区域进行积分,得到该第一样本图像中的各个子区域各自对应的群体数量,如图5中的群体数量集合502所示,该群体数量集合502中的各个子区域中的数值分别对应于第一样本图像中的各个子区域各自对应的群体数量;通过特征分类层中的对象数量分类区间503对第一样本图像中的各个子区域各自对应的群体数量进行分类,得到该第一样本图像中的各个子区域分别对应的标注类别;其中,该对象数量分类区间503可以包括[0,1]、[1,2]、[2,3]、[3,4]、[4,5]等各个子区间,该对象数量分类区间中的子区间[0,1]对应的标注类别为A;该对象数量分类区间中的子区间[1,2]对应的标注类别为B;该对象数量分类区间中的子区间[2,3]对应的标注类别为C;该对象数量分类区间中的子区间[3,4]对应的标注类别为D;该对象数量分类区间中的子区间 [4,5]对应的标注类别为E。例如,对于该群体数量集合502的左上部分“1.2”,通过该对象数量分类区间503,可以将其分类为[0,1]子区间,其对应的标注类别为A;对于群体数量集合502的左下部分“4.2”,通过该对象数量分类区间503,可以将其分类为[4,5]子区间,其对应的标注类别为D。
在一种可能的实现方式中,模型处理设备获取特征分类层对应的对象数量分类区间的过程可以实现为:
基于训练样本集中的各个样本图像的各个子区域的群体数量的最大值和最小值,获取第一端点集;该第一端点集用于指示该对象数量分类区间的区间端点;
基于该对象数量分类区间的区间端点,确定该第一分段点集;该第一分段点集用于指示该对象数量分类区间的区间分段点;该区间分段点用于将该对象数量分类区间分割为各个子区间;
基于该第一端点集与该第一分段点集,获取该特征分类层对应的对象数量分类区间。
在保证该对象数量分类区间能够包括该训练样本集中的所有样本图像中的各个子区域的群体数量时,该对象数量分类区间越小,分类越准确;因此,可以直接将该训练样本集中的各个样本图像中的各个子区域的群体数量的最值(最大值和最小值)确定为该特征分类层对应的对象数量分类区间的区间端点。
其中,该对象数量分类区间的区间端点,可以是根据该各个样本图像中的各个子区域的群体数量的最值确定的。由于该特征分类层对应的对象数量分类区间用于对该训练样本集中的各个样本图像的子区域的群体数量进行分类,因此,该特征分类层对应的对象数量分类区间包括该训练样本集中的各个样本图像中的各个子区域的群体数量的最值。
在一种可能的实现方式中,该各个样本图像中的各个子区域的群体数量的最小值,是该各个样本图像中的各个子区域的群体数量中不为零的最小值,即不为零的数值中的最小值。
在一种可能的实现方式中,将该各个样本图像中的各个子区域的群体数量的最值获取为该第一端点集。
其中,模型处理设备可以将该各个样本图像中的各个子区域的群体数量的最小值获取为该第一端点集中的左端点,将该各个样本图像中的各个子区域的群体数量的最大值获取为该第一端点集中的右端点,该左右端点即为该特征分类层对应的对象数量分类区间的区间端点。
在确定了该第一端点集后,即确定了该特征分类层对应的对象数量分类区间的区间端点后,可以根据该特征分类层对应的对象数量分类区间的区间端点,确定该特征分类层对应的对象数量分类区间的区间分段点。
在一种可能的实现方式中,获取该特征分类层对应的分类数;基于该特征分类层对应的分类数,确定该特征分类层对应的对象数量分类区间的区间分段点。
其中,该特征分类层对应的分类数用于指示该特征分类层对于输入的样本图像进行分类后可能得到的类别的数量。例如,当该特征分类层对应的分类数为N(N大于等于2,且N为正整数)时,在通过该特征分类层对数据进行分类后,可以获得该数据分别为N种类别的概率,此时该特征分类层对应的对象数量分类区间的区间分段点的数量可以为N-1,通过N-1个分段点对特征分类层对应的对象数量分类区间进行分段,可以获得该特征分类层对应的N个子区间。
在一种可能的实现方式中,基于该特征分类层对应的对象数量分类区间,通过该特征分类层对应的分类数,平均分割该对象数量分类区间,获得该特征分类层对应的对象数量分类区间的区间端点。
在另一种可能的实现方式中,该特征分类层对应的对象数量分类区间的区间端点可以为e^{k*(log(b)-log(a))/K+log(a)},其中,假设除了对象数量为0的区域外,最小的总对象数量为a,而最大的总对象数量是b,需要划分的子区间数量为K。此时,该各个子区间的区间大小是非线性分布的,通过非线性分布对子区间进行划分,获得的分类结果中,分类对象数 量较少的子区间分布较为密集,分类对象数量较多的子区间的分布较为分散,以使得对不同密度的群体数量都存在较好的分类效果。
步骤402,通过群体数量确定模型的数据处理层,对该第一样本图像进行数据处理,得到该第一样本图像的样本群体密度特征图。
该群体数量确定模型中的数据处理层用于对该训练样本集中的第一样本图像进行特征提取,获得该第一样本图像的图像特征;该样本群体密度特征图即为数据处理层对第一样本图像进行特征提取得到的图像特征;其中,通过数据处理层进行特征提取得到的图像特征用于指示该第一样本图像中的群体信息,因此,该第一样本图像的样本群体密度特征图可以用于指示该第一样本图像中的群体数量以及群体密度。
在一种可能的实现方式中,该样本群体密度特征图的尺寸与该第一样本图像的尺寸相同。也就是说,通过该群体数量确定模型对第一样本图像进行特征提取后得到的样本群体密度特征图,与输入的第一样本图像的像素大小相同。
在一种可能的实现方式中,该群体数量确定模型中的数据处理层可以是具有编码器-解码器结构的U型神经网络模型。其中,该数据处理层中的编码器结构用于通过下采样提取输入的样本图像的深层特征;该数据处理层中的编码器结构用于通过上采样将低分辨率的深层特征还原为高分辨率的图像特征。请参考图6,其示出了本申请实施例涉及的一种数据处理层的模型示意图。如图6所示,输入图像先经过VGG16网络的前四个卷积块提取特征601,然后经过了三个连续的空洞卷积602(空洞率可以分别为2,4,4),空洞卷积602能够在不增加参数量的同时提升网络的感受野,从而获取更大范围的上下文信息,获取用于群体计数的充足语义特征。再通过特征分类层603(示意性的,可以为1x1卷积)对特征图上的每个点(对应原图中的每个图像块)进行分类,得到图像的各个子区域各自对应的预测类别。
步骤403,通过群体数量确定模型的特征分类层,基于该样本群体密度特征图进行分类处理,得到该第一样本图像中的各个子区域分别对应的预测结果。
在一种可能的实现方式中,该各个子区域分别对应的预测结果可以直接指示各个子区域分别对应的预测类别。
在另一种可能的实现方式中,该预测结果用于指示该第一样本图像中的各个子区域与特征分类层对应的预测概率集。
当预测结果用于指示该第一样本图像中的各个子区域与特征分类层对应的预测概率集时,模型处理设备基于该样本群体密度特征图,通过该群体数量确定模型中的特征分类层进行分类处理,得到该第一样本图像中的各个子区域与该特征分类层对应的预测概率集;其中,该预测概率集用于指示该第一样本图像中的各个子区域属于该特征分类层对应的各个类别的概率;基于该第一样本图像中的各个子区域对应的预测概率集,获取该第一样本图像中的各个子区域与特征分类层对应的预测类别。
例如,通过该特征分类层对该样本群体密度特征图进行处理时,可以得到该样本群体密度特征图中的各个子区域与该特征分类层对应的第一预测概率集;其中,该第一预测概率集用于指示该样本图像的各个子区域属于该特征分类层的各个类别的概率(即该各个子区域分别属于对象数量分类区间的各个子区间的概率)。
当获取该第一预测概率集后,可以分别将各个子区域对应的该第一预测概率集中概率最大的类别确定为该样本图像中的各个子区域的预测类别。
步骤404,基于该第一样本图像中的各个子区域分别对应的预测结果,以及该第一样本图像中的各个子区域分别对应的标注类别,对该群体数量确定模型进行训练。
在一种可能的实现方式中,获取该第一样本图像中的各个子区域分别对应的预测类别;根据该第一样本图像中的各个子区域分别对应的预测类别,以及该第一样本图像中的各个子区域分别对应的标注类别,对该群体数量确定模型进行训练。
在另一种可能的实现方式中,响应于该预测结果用于指示该第一样本图像中的各个子区 域与该对象数量分类区间的各个子区间对应的概率分布,基于该第一样本图像中的各个子区域分别对应的标注类别,以及该概率分布,对该群体数量确定模型进行训练。
其中,对群体数量确定模型进行训练的过程可以是通过第一样本图像中的各个子区域分别对应的标注类别,以及各个子区域各自对应的预测类别,获取各个子区域各自对应的损失函数值,并根据各个子区域各自对应的损失函数值,对群体数量确定模型进行训练;或者,对群体数量确定模型进行训练的过程也可以是获取该第一样本图像中的各个子区域分别对应的标注类别,以及各个子区域分别对应的概率分布,获得该各个子区域各自对应的损失函数值,并根据各个子区域各自对应的损失函数值,对群体数量确定模型进行训练。
在本申请实施例中,训练样本集中可以包含至少两个样本图像,该第一样本图像可以是训练样本集中包含的至少两个样本图像中的任意一个,上述基于第一样本图像,以及第一样本图像的图像标注对群体数量确定模型进行训练的过程同样适用于训练样本集中的其他样本图像;在对群体数量确定模型进行训练的过程中,可以依次通过训练样本集中的样本图像对群体数量确定模型进行训练,直至群体数量确定模型收敛,确定群体数量确定模型训练完成,获得训练好的群体数量确定模型。
步骤405,获取第一图像。
步骤406,基于群体数量确定模型的数据处理层,对该第一图像进行数据处理,获得该第一图像的群体密度特征图。
在一种可能的实现方式中,该群体数量确定模型中的数据处理层用于对该第一图像进行特征提取,获得该第一图像的图像特征;该群体密度特征图即为数据处理层对第一图像进行特征提取得到的图像特征;其中,通过数据处理层进行特征提取得到的图像特征用于指示该第一图像中的群体信息,因此,该第一图像的群体密度特征图可以用于指示该第一图像中的群体数量以及群体密度。
在一种可能的实现方式中,该群体密度特征图的尺寸与该第一图像的尺寸相同。即通过该群体数量确定模型对该第一图像进行特征提取后得到的群体密度特征图,与输入的第一图像的像素大小相同。
步骤407,通过群体数量确定模型的特征分类层,基于该群体密度特征图进行分类处理,得到该第一图像各个子区域分别对应的预测类别。
在一种可能的实现方式中,通过群体数量确定模型的特征分类层,基于该群体密度特征图进行分类处理,获得该第一图像各个子区域分别对应的预测结果;基于该第一图像各个子区域分别对应的预测结果,获取该第一图像中的各个子区域分别对应的预测类别。
在一种可能的实现方式中,通过群体数量确定模型的特征分类层,基于该群体密度特征图进行分类处理,可以直接获得第一图像各个子区域分别对应的预测类别,即该预测结果即为预测类别。
在另一种可能的实现方式中,通过该群体数量确定模型中的特征分类层,基于该群体密度特征图进行分类处理,可以得到该第一图像中的各个子区域与该特征分类层对应的预测概率集(预测结果);其中,该预测概率集用于指示该第一图像中的各个子区域属于该特征分类层对应的各个类别的概率;基于该第一图像中的各个子区域对应的预测概率集,获取该第一图像中的各个子区域与特征分类层对应的预测类别。
例如,通过该特征分类层对该群体密度特征图进行处理时,可以得到该群体密度特征图中的各个子区域与该特征分类层对应的第一预测概率集;其中,该第一预测概率集用于指示该图像的各个子区域属于该特征分类层的各个类别的概率(即该各个子区域分别属于对象数量分类区间的各个子区间的概率)。
当获取该第一预测概率集后,可以分别将各个子区域对应的该第一预测概率集中概率最大的类别确定为该图像中的各个子区域的预测类别。
步骤408,基于该第一图像各个子区域分别对应的预测类别,以及各个该预测类别对应 的对象数量估计值,获取该第一图像各个子区域分别对应的预测对象数量。
各个该预测类别分别对应的对象数量估计值可以是基于已知图像的各个子区域的群体数量确定的;在本申请实施例中,该已知图像可以是用于训练群体数量确定模型的训练样本集中的样本图像,也就是说,各个该预测类别分别对应的对象数量估计值可以是基于该群体数量确定模型对应的训练样本集中样本图像的各个子区域的群体数量确定的。
其中,该预测类别对应的对象数量估计值可以用于指示该预测类别对应的子区域的预测对象数量。即当某一子区域与某一预测类别对应时,可以认为该预测类别对应的对象数量估计值即为该子区域对应的预测对象数量。
在一种可能的实现方式中,获取第一类样本子区域;该第一类样本子区域是该训练样本集中的各个样本图像的子区域中,与第一类标注类别相对应的子区域;该第一类标注类别是各个标注类别中的任意一个;基于该第一类样本子区域中的各个子区域分别对应的群体数量,确定该第一类标注类别对应的对象数量估计值。
在一种可能的实现方式中,将该第一类样本子区域中的各个子区域分别对应的群体数量的平均值,确定为该第一类标注类别对应的对象数量的数据值。
在本申请实施例中,各个标注类别对应的对象数量估计值,可以是根据训练样本集中的各个样本图像的各个子区域中,与该标注类别对应的子区域对应的群体数量确定的。即各个标注类别对应的对象数量估计值,可以是训练样本集中的各个样本图像的各个子区域中,与该标注类别对应的子区域对应的群体数量的平均值,此时,每一个该标注类别对应的子区域的真实对象数量,与对象数量估计值之间的离散误差之和较小,因此,通过该对象数量估计值对真实图像进行预测时,产生的离散误差也应该较小,证明过程可以如下所示:
对于训练集中的所有图像块(假设一共有K个),对应的局部计数值d k,k∈{1,2…K}组成了集合
Figure PCTCN2022077070-appb-000001
对于任意给定的测试图像,计算群体数量确定模型在该测试图像上的期望计数误差ε。首先,用
Figure PCTCN2022077070-appb-000002
来表示τ中去除重复的局部计数值后的集合,在数据独立同分布的假设下,图像τ中的所有局部计数值可以视为从集合
Figure PCTCN2022077070-appb-000003
中随机采样得到的,因此,图像的期望计数误差ε可以被近似为
Figure PCTCN2022077070-appb-000004
其中,p i为集合τ中d i出现的频率,
Figure PCTCN2022077070-appb-000005
是局部计数值d i的预测值。当K足够大时,p i可以被近似为
Figure PCTCN2022077070-appb-000006
(
Figure PCTCN2022077070-appb-000007
是τ中d i出现的次数),此时期望计数误差可以表示为:
Figure PCTCN2022077070-appb-000008
忽略常数K的影响,有
Figure PCTCN2022077070-appb-000009
为了方便表述给出如下记号定义:真实值
Figure PCTCN2022077070-appb-000010
和预测值
Figure PCTCN2022077070-appb-000011
这时可将该误差分为两部分,即所有样本都被正确分类时的计数误差
Figure PCTCN2022077070-appb-000012
和由于错误分类所带来的计数误差
Figure PCTCN2022077070-appb-000013
进而期望计数误差可以被表示为
Figure PCTCN2022077070-appb-000014
为了统计离散化误差,假设所有样本都被正确分类,即
Figure PCTCN2022077070-appb-000015
那么此时的期望计数误差就是
Figure PCTCN2022077070-appb-000016
为了最小化
Figure PCTCN2022077070-appb-000017
我们继续以下推导:
Figure PCTCN2022077070-appb-000018
从上式可以看出如果选取
Figure PCTCN2022077070-appb-000019
期望计数误差
Figure PCTCN2022077070-appb-000020
达到最小值0,也就是说,通过科学地选取对象数量估计值,离散化误差可以做到微乎其微。
请参考图7,其示出了本申请实施例涉及的一种获取区间代理值的示意图。如图7所示,对于对象数量分类区间中的子区间701,通常获取区间代理值的做法为取区间中点的值为区间代理值(即对象数量估计值),比如,对于如图7所示的子区间701,该子区间701的端点为[0,10],对于该区间的各个图像的区间群体数量是偏向于0一侧的区间端点的,此时将区间的中点5作为区间代理值,其靠近0一侧的群体数量与靠近10一侧的群体数量,与该区间代理值产生的离散误差不能抵消,因此将区间中点5作为区间代理值会产生一定的离散误差,而若是将训练样本集中的与该子区间对应的所有子区域的群体数量进行平均,当训练样本集够大时,该平均值可以一定程度上反映该子区间对应的群体数量的分布情况,因此通过该平均值作为区间代理值,可以减小由于通过区间对群体数量进行分类所产生的离散误差。
步骤409,基于该第一图像中的各个子区域分别对应的预测对象数量,获取该第一图像中的群体数量。
在一种可能的实现方式中,将该第一图像中的各个子区域分别对应的预测对象数量进行求和,得到该第一图像中的群体数量。
在一种可能的实现方式中,将该第一图像中的各个子区域分别对应的预测对象数量中,满足指定条件的预测对象数量进行求和,得到该第一图像中的群体数量。
其中,该指定条件可以是该第一图像中的各个子区域分别对应的预测对象数量中除去最大值与最小值的预测对象数量。
请参考图8,其示出了本申请实施例涉及的一种对象数量估计值的确定过程的流程示意图。如图8所示,给定一张图像,先按照本申请实施例所示方式获取该图像的群体密度特征图801,然后对每一个图像块计算密度值的和作为该图像块内的总对象数量802(称为局部计数值),最后基于各个图像块对应的总对象数量相对于对象数量分类区间803所处的计数区间来确定各个图像块的标签类别804。在测试时,被分为某一类(以c0类为例)的图像块805的预测对象数量为该标签类别对应的对象数量估计值806(即为A),在获取到各个图像块的预测对象数量后,将所有图像块的预测对象数量之和作为整张图像的群体数量。
当上述群体为人群时,上述本申请实施例所示方案,还可以应用于智慧交通领域。在智慧交通领域中,智慧交通对应的管理平台可以通过摄像头等设备,获取需要管理的交通地点的实时群体图像,再根据本申请实施例所示方案,降低离散化误差对群体数量估计的影响,实现对该群体图像中的群体密度的准确估计,以获取该群体图像中的群体数量,并基于各个交通地点实时的群体数量,对各个交通地点的客流密度进行评估,以便交通枢纽对交通工具进行智能调度,有效地提高交通枢纽客流管理能力。
综上所述,本申请实施例所示方案,通过样本图像,以及该样本图像中的各个子区域的标注类别对群体数量确定模型进行训练,得到训练好的群体数量确定模型;计算机设备可以通过该群体数量确定模型确定输入的第一图像的各个子区域的预测类别,以及与各个预测类别相对应的对象数量估计值,进而确定该第一图像中的群体数量。在上述方案中,计算机设备可以通过群体数量确定模型对应的训练样本集中样本图像的各个子区域的群体数量确定对象数量估计值,使得与各个预测类别相对应的对象数量估计值更加贴近该预测类别对应的对象数量的真实值,降低了通过对象数量估计值对各个子区域进行对象数量确定时产生的离散化误差,提高了对图像进行群体数量估计的准确性。
可以理解的是,在本申请的具体实施方式中,涉及到图像等与用户相关的数据,当本申请以上实施运用到具体产品或技术中时,需要获得用户许可或者同意,且相关数据的收集、使用和处理需要遵守相关国家和地区的相关法律法规和标准。
以该群体为人群为例,图9是根据一示例性实施例示出的模型训练及数据处理的流程框图。其中,该模型训练过程可以应用于模型训练设备900中,该模型训练设备900可以是服 务器,该人群数量确定流程可以应用于数据处理设备910中,该数据处理设备910可以是终端,其中,该模型训练以及人群数量估计的流程如下所示。
在模型训练设备900中,训练数据集中的样本图像901先通过与人群数量确定模型(群体数量群定模型)902中的特征分类层对应的人数分类区间进行分类,确定该样本图像901中的各个子区域在该人数分类区间中对应的子区间,并将该训练数据集中的样本图像中的各个子区域在该人数分类区间903中对应的子区间获取为样本图像的标注类别904。当获取到该训练数据集中的所有样本图像中的各个子区域分别对应的标注类别后,可以根据与标注类别相对应的子区域各自的真实人群数量,确定该标注类别对应的子区间的人数估计值906(即真实人群数量的平均值),以便当判断图像的子区域对应该标注类别时,模型训练设备可以直接将该人数估计值确定为该子区域的预测人数。
该人群数量确定模型902中的数据处理层对输入的样本图像901进行数据提取,得到该数据处理层输出的样本人群密度特征图,再通过特征分类层对该样本人群密度特征图进行分类,得到该样本图像901中各个子区域分别对应的预测结果905(即该各个子区域对应该人数分类区间中各个子区间的概率)。
模型训练设备900可以通过该标注类别904以及预测结果905对该人群数量确定模型902进行训练,获得训练好的人群数量确定模型;模型训练设备900可以将训练好的人群数量确定模型传输至数据处理设备910中,以使得数据处理设备910对输入的图像进行处理,获得图像中的人群数量。
在数据处理设备910中,对于输入的第一图像911,可以通过上述训练好的人群数量确定模型912对该第一图像911进行数据处理,得到该第一图像中的各个子区域分别对应的预测结果,该预测结果可以是该第一图像中的各个子区域在该人数分类区间中对应各个子区间的概率;数据处理设备910将概率最大的子区间,获取为该第一图像中的各个子区域在该人数分类区间中对应的子区间;再根据上述获取的各个子区间的人数估计值,确定该第一图像中的各个子区域的预测人数,以确定该第一图像中的人群数量913。
图10是根据一示例性实施例示出的一种群体数量确定装置的结构方框图。该群体数量确定装置可以实现由图2或图4所示实施例提供的方法中的全部或部分步骤,该群体数量确定装置包括如下部分:
第一图像获取模块1001,用于获取第一图像;
第一数据处理模块1002,用于对所述第一图像进行数据处理,得到所述第一图像的群体密度特征图;所述群体密度特征图是对所述第一图像进行特征提取得到的图像特征;
第一分类模块1003,用于基于所述群体密度特征图进行分类处理,得到所述第一图像中的各个子区域分别对应的预测类别;
第一预测数量获取模块1004,用于基于所述第一图像中的各个子区域分别对应的所述预测类别,以及与各个所述预测类别相对应的对象数量估计值,获取所述第一图像中的各个子区域分别对应的预测对象数量;与各个所述预测类别相对应的对象数量估计值是基于已知图像中的各个子区域的群体数量确定的;
第一群体数量确定模块1005,用于基于所述第一图像中的各个子区域分别对应的所述预测对象数量,获取所述第一图像中的群体数量。
在一种可能的实现方式中,所述第一数据处理模块,用于通过群体数量确定模型的数据处理层,对所述第一图像进行数据处理,得到所述第一图像的所述群体密度特征图;所述群体密度特征图是所述数据处理层进行特征提取得到的图像特征;
所述第一分类模块,用于通过所述群体数量确定模型的特征分类层,基于所述群体密度特征图进行分类处理,得到所述第一图像中的各个子区域分别对应的所述预测类别;
其中,所述群体数量确定模型是基于训练样本集中的样本图像,以及所述样本图像的各 个子区域分别对应的标注类别训练得到的机器学习模型;与各个所述预测类别相对应的对象数量估计值是基于所述样本图像的各个子区域的群体数量确定的。
综上所述,本申请实施例所示方案,通过样本图像,以及该样本图像中的各个子区域的标注类别对群体数量确定模型进行训练,得到训练好的群体数量确定模型;计算机设备可以通过该群体数量确定模型确定输入的第一图像的各个子区域的预测类别,以及与各个预测类别相对应的对象数量估计值,进而确定该第一图像中的群体数量。在上述方案中,计算机设备可以通过群体数量确定模型对应的训练样本集中样本图像的各个子区域的群体数量确定对象数量估计值,使得与各个预测类别相对应的对象数量估计值更加贴近该预测类别对应的对象数量的真实值,降低了通过对象数量估计值对各个子区域进行对象数量确定时产生的离散化误差,提高了对图像进行群体数量估计的准确性。
图11是根据一示例性实施例示出的一种群体数量确定装置的结构方框图。该群体数量确定装置可以实现由图3或图4所示实施例提供的方法中的全部或部分步骤,该群体数量确定装置包括如下部分:
第一样本获取模块1101,用于获取第一样本图像,以及所述第一样本图像中的各个子区域分别对应的所述标注类别;
样本特征获取模块1102,用于通过群体数量确定模型的数据处理层,对所述第一样本图像进行数据处理,得到所述第一样本图像的样本群体密度特征图;所述样本群体密度特征图是所述数据处理层对所述第一样本图像进行特征提取得到的图像特征;
样本数据处理模块1103,用于通过所述群体数量确定模型的特征分类层,基于所述样本群体密度特征图进行分类处理,获得所述第一样本图像中的各个子区域分别对应的预测结果;
模型训练模块1104,用于基于所述第一样本图像中的各个子区域分别对应的预测结果,以及所述第一样本图像中的各个子区域分别对应的所述标注类别,对所述群体数量确定模型进行训练;
训练后的所述群体数量确定模型用于根据输入的第一图像,获得所述第一图像中的各个子区域的预测类别,并根据与各个所述预测类别相对应的对象数量估计值,确定所述第一图像中的群体数量;与各个所述预测类别相对应的对象数量估计值是基于训练样本集中包含的样本图像的各个子区域的群体数量确定的。
在一种可能的实现方式中,所述第一样本获取模块,包括:
样本获取单元,用于获取所述训练样本集;所述训练样本集中包含所述第一样本图像,以及所述第一样本图像的图像标注;所述图像标注用于指示所述第一样本图像中的样本对象的位置;
样本群体获取单元,用于基于所述第一样本图像的图像标注,获取所述第一样本图像中的各个子区域的群体数量;
标注类别获取单元,用于基于所述第一样本图像中的各个子区域的群体数量,获取所述第一样本图像中的各个子区域分别对应的所述标注类别。
在一种可能的实现方式中,所述标注类别获取单元,包括:
分类区间获取子单元,用于获取所述特征分类层对应的对象数量分类区间;所述对象数量分类区间包含至少两个子区间;
标注类别获取子单元,用于基于所述第一样本图像中的各个子区域的群体数量,通过所述对象数量分类区间进行分类,获得所述第一样本图像中的各个子区域分别对应的所述标注类别。
在一种可能的实现方式中,所述标注类别获取单元,包括:
子区域获取子单元,用于获取第一类样本子区域;所述第一类样本子区域是所述训练样本集中的各个样本图像的子区域中,与第一类标注类别相对应的子区域;所述第一类标注类 别是各个所述标注类别中的任意一个;
估计值获取子单元,用于基于所述第一类样本子区域中的各个子区域分别对应的群体数量,确定所述第一类标注类别对应的对象数量估计值。
在一种可能的实现方式中,所述估计值获取子单元,还用于,
将所述第一类样本子区域中的各个子区域分别对应的群体数量的平均值,确定为所述第一类标注类别对应的对象数量估计值。
在一种可能的实现方式中,所述分类区间获取子单元,用于,
基于所述训练样本集中的各个样本图像的各个子区域的群体数量的最大值和最小值,获取第一端点集;所述第一端点集用于指示所述对象数量分类区间的区间端点;
基于所述对象数量分类区间的区间端点,确定第一分段点集;所述第一分段点集用于指示所述对象数量分类区间的区间分段点;所述区间分段点用于将所述对象数量分类区间分割为各个子区间;
基于所述第一端点集与所述第一分段点集,获取所述特征分类层对应的所述对象数量分类区间。
在一种可能的实现方式中,所述样本群体获取单元,用于,
基于所述第一样本图像,以及所述第一样本图像的图像标注,获得所述第一样本图像的第一样本热点图;所述第一样本热点图用于指示所述第一样本图像中群体的所在位置;
基于所述第一样本热点图,通过高斯卷积核进行数据处理,获取所述第一样本图像的第一样本热力图;
基于所述第一样本热力图,分别在所述第一样本图像中的各个子区域进行积分,获得所述第一样本图像的各个子区域的群体数量。
综上所述,本申请实施例所示方案,通过样本图像,以及该样本图像中的各个子区域的标注类别对群体数量确定模型进行训练,得到训练好的群体数量确定模型;计算机设备可以通过该群体数量确定模型确定输入的第一图像的各个子区域的预测类别,以及与各个预测类别相对应的对象数量估计值,进而确定该第一图像中的群体数量。在上述方案中,计算机设备可以通过群体数量确定模型对应的训练样本集中样本图像的各个子区域的群体数量确定对象数量估计值,使得与各个预测类别相对应的对象数量估计值更加贴近该预测类别对应的对象数量的真实值,降低了通过对象数量估计值对各个子区域进行对象数量确定时产生的离散化误差,提高了对图像进行群体数量估计的准确性。
图12是根据一示例性实施例示出的一种计算机设备的结构示意图。该计算机设备可以实现为上述各个方法实施例中的模型训练设备和/或数据处理设备。所述计算机设备1200包括中央处理单元(CPU,Central Processing Unit)1201、包括随机存取存储器(Random Access Memory,RAM)1202和只读存储器(Read-Only Memory,ROM)1203的系统存储器1204,以及连接系统存储器1204和中央处理单元1201的系统总线1205。所述计算机设备1200还包括帮助计算机内的各个器件之间传输信息的基本输入/输出系统1206,和用于存储操作系统1213、应用程序1214和其他程序模块1215的大容量存储设备1207。
不失一般性,所述计算机可读介质可以包括计算机存储介质和通信介质。计算机存储介质包括以用于存储诸如计算机可读指令、数据结构、程序模块或其他数据等信息的任何方法或技术实现的易失性和非易失性、可移动和不可移动介质。计算机存储介质包括RAM、ROM、闪存或其他固态存储其技术,CD-ROM、或其他光学存储、磁带盒、磁带、磁盘存储或其他磁性存储设备。当然,本领域技术人员可知所述计算机存储介质不局限于上述几种。上述的系统存储器1204和大容量存储设备1207可以统称为存储器。
所述存储器还包括一个或者一个以上的程序,所述一个或者一个以上程序存储于存储器中,中央处理器1201通过执行该一个或一个以上程序来实现图2、图3或图4所示的方法中 的全部或者部分步骤。
在一示例性实施例中,还提供了一种计算机可读存储介质,该计算机可读存储介质中存储有至少一条计算机程序,该计算机程序由处理器加载并执行以实现上述信息生成方法中的全部或部分步骤。例如,该计算机可读存储介质可以是只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、只读光盘(Compact Disc Read-Only Memory,CD-ROM)、磁带、软盘和光数据存储设备等。
在一示例性实施例中,还提供了一种计算机程序产品,该计算机程序产品包括至少一条计算机程序,该计算机程序由处理器加载并执行上述各个实施例所示的方法中的全部或部分步骤。

Claims (20)

  1. 一种群体数量确定方法,所述方法由计算机设备执行,所述方法包括:
    获取第一图像;
    对所述第一图像进行数据处理,得到所述第一图像的群体密度特征图;所述群体密度特征图是对所述第一图像进行特征提取得到的图像特征;
    基于所述群体密度特征图进行分类处理,得到所述第一图像中的各个子区域分别对应的预测类别;
    基于所述第一图像中的各个子区域分别对应的所述预测类别,以及与各个所述预测类别相对应的对象数量估计值,获取所述第一图像中的各个子区域分别对应的预测对象数量;与各个所述预测类别相对应的对象数量估计值是基于已知图像中的各个子区域的群体数量确定的;
    基于所述第一图像中的各个子区域分别对应的所述预测对象数量,获取所述第一图像中的群体数量。
  2. 根据权利要求1所述的方法,所述对所述第一图像进行数据处理,得到所述第一图像的群体密度特征图,包括:
    通过群体数量确定模型的数据处理层,对所述第一图像进行数据处理,得到所述第一图像的所述群体密度特征图;所述群体密度特征图是所述数据处理层进行特征提取得到的图像特征;
    所述基于所述群体密度特征图进行分类处理,得到所述第一图像中的各个子区域分别对应的预测类别,包括:
    通过所述群体数量确定模型的特征分类层,基于所述群体密度特征图进行分类处理,得到所述第一图像中的各个子区域分别对应的所述预测类别;
    其中,所述群体数量确定模型是基于训练样本集中的样本图像,以及所述样本图像的各个子区域分别对应的标注类别训练得到的机器学习模型;与各个所述预测类别相对应的对象数量估计值是基于所述样本图像的各个子区域的群体数量确定的。
  3. 一种群体数量确定方法,所述方法由计算机设备执行,所述方法包括:
    获取第一样本图像,以及所述第一样本图像中的各个子区域分别对应的所述标注类别;
    通过群体数量确定模型的数据处理层,对所述第一样本图像进行数据处理,得到所述第一样本图像的样本群体密度特征图;所述样本群体密度特征图是所述数据处理层对所述第一样本图像进行特征提取得到的图像特征;
    通过所述群体数量确定模型的特征分类层,基于所述样本群体密度特征图进行分类处理,获得所述第一样本图像中的各个子区域分别对应的预测结果;
    基于所述第一样本图像中的各个子区域分别对应的预测结果,以及所述第一样本图像中的各个子区域分别对应的所述标注类别,对所述群体数量确定模型进行训练;
    训练后的所述群体数量确定模型用于根据输入的第一图像,获得所述第一图像中的各个子区域的预测类别,并根据与各个所述预测类别相对应的对象数量估计值,确定所述第一图像中的群体数量;与各个所述预测类别相对应的对象数量估计值是基于训练样本集中包含的样本图像的各个子区域的群体数量确定的。
  4. 根据权利要求3所述的方法,所述获取第一样本图像,以及所述第一样本图像中的各个子区域分别对应的所述标注类别,包括:
    获取所述训练样本集;所述训练样本集中包含所述第一样本图像,以及所述第一样本图像的图像标注;所述图像标注用于指示所述第一样本图像中的样本对象的位置;
    基于所述第一样本图像的图像标注,获取所述第一样本图像中的各个子区域的群体数量;
    基于所述第一样本图像中的各个子区域的群体数量,获取所述第一样本图像中的各个子区域分别对应的所述标注类别。
  5. 根据权利要求4所述的方法,所述基于所述第一样本图像中的各个子区域的群体数量,获取所述第一样本图像中的各个子区域分别对应的所述标注类别,包括:
    获取所述特征分类层对应的对象数量分类区间;所述对象数量分类区间包含至少两个子区间;
    基于所述第一样本图像中的各个子区域的群体数量,通过所述对象数量分类区间进行分类,获得所述第一样本图像中的各个子区域分别对应的所述标注类别。
  6. 根据权利要求5所述的方法,所述方法还包括:
    获取第一类样本子区域;所述第一类样本子区域是所述训练样本集中的各个样本图像的子区域中,与第一类标注类别相对应的子区域;所述第一类标注类别是各个所述标注类别中的任意一个;
    基于所述第一类样本子区域中的各个子区域分别对应的群体数量,确定所述第一类标注类别对应的对象数量估计值。
  7. 根据权利要求6所述的方法,所述基于所述第一类样本子区域中的各个子区域分别对应的群体数量,确定所述第一类标注类别对应的对象数量估计值,包括:
    将所述第一类样本子区域中的各个子区域分别对应的群体数量的平均值,确定为所述第一类标注类别对应的对象数量估计值。
  8. 根据权利要求5至7任一所述的方法,所述获取所述特征分类层对应的对象数量分类区间,包括:
    基于所述训练样本集中的各个样本图像的各个子区域的群体数量的最大值和最小值,获取第一端点集;所述第一端点集用于指示所述对象数量分类区间的区间端点;
    基于所述对象数量分类区间的区间端点,确定第一分段点集;所述第一分段点集用于指示所述对象数量分类区间的区间分段点;所述区间分段点用于将所述对象数量分类区间分割为各个子区间;
    基于所述第一端点集与所述第一分段点集,获取所述特征分类层对应的所述对象数量分类区间。
  9. 根据权利要求4所述的方法,所述基于所述第一样本图像的图像标注,获取所述第一样本图像中的各个子区域的群体数量,包括:
    基于所述第一样本图像,以及所述第一样本图像的图像标注,获得所述第一样本图像的第一样本热点图;所述第一样本热点图用于指示所述第一样本图像中群体的所在位置;
    基于所述第一样本热点图,通过高斯卷积核进行数据处理,获取所述第一样本图像的第一样本热力图;
    基于所述第一样本热力图,分别在所述第一样本图像中的各个子区域进行积分,获得所述第一样本图像的各个子区域的群体数量。
  10. 一种群体数量确定装置,所述装置包括:
    第一图像获取模块,用于获取第一图像;
    第一数据处理模块,用于对所述第一图像进行数据处理,得到所述第一图像的群体密度特征图;所述群体密度特征图是对所述第一图像进行特征提取得到的图像特征;
    第一分类模块,用于基于所述群体密度特征图进行分类处理,得到所述第一图像中的各个子区域分别对应的预测类别;
    第一预测数量获取模块,用于基于所述第一图像中的各个子区域分别对应的所述预测类别,以及与各个所述预测类别相对应的对象数量估计值,获取所述第一图像中的各个子区域分别对应的预测对象数量;与各个所述预测类别相对应的对象数量估计值是基于已知图像中的各个子区域的群体数量确定的;
    第一群体数量确定模块,用于基于所述第一图像中的各个子区域分别对应的所述预测对象数量,获取所述第一图像中的群体数量。
  11. 根据权利要求10所述的装置,所述第一数据处理模块,用于通过群体数量确定模型的数据处理层,对所述第一图像进行数据处理,得到所述第一图像的所述群体密度特征图;
    所述第一分类模块,用于通过所述群体数量确定模型的特征分类层,基于所述群体密度特征图进行分类处理,得到所述第一图像中的各个子区域分别对应的所述预测类别;
    其中,所述群体数量确定模型是基于训练样本集中的样本图像,以及所述样本图像的各个子区域的标注类别训练得到的机器学习模型;与各个所述预测类别相对应的对象数量估计值是基于所述样本图像的各个子区域的群体数量确定的。
  12. 一种群体数量确定装置,所述装置包括:
    第一样本获取模块,用于获取第一样本图像,以及所述第一样本图像中的各个子区域分别对应的所述标注类别;
    样本特征获取模块,用于通过群体数量确定模型的数据处理层,对所述第一样本图像进行数据处理,得到所述第一样本图像的样本群体密度特征图;所述样本群体密度特征图是所述数据处理层对所述第一样本图像进行特征提取得到的图像特征;
    样本数据处理模块,用于通过所述群体数量确定模型的特征分类层,基于所述样本群体密度特征图进行分类处理,获得所述第一样本图像中的各个子区域分别对应的预测结果;
    模型训练模块,用于基于所述第一样本图像中的各个子区域分别对应的预测结果,以及所述第一样本图像中的各个子区域分别对应的所述标注类别,对所述群体数量确定模型进行训练;
    训练后的所述群体数量确定模型用于根据输入的第一图像,获得所述第一图像中的各个子区域的预测类别,并根据与各个所述预测类别相对应的对象数量估计值,确定所述第一图像中的群体数量;与各个所述预测类别相对应的对象数量估计值是基于训练样本集中包含的样本图像的各个子区域的群体数量确定的。
  13. 根据权利要求12所述的装置,所述第一样本获取模块,包括:
    样本获取单元,用于获取所述训练样本集;所述训练样本集中包含所述第一样本图像,以及所述第一样本图像的图像标注;所述图像标注用于指示所述第一样本图像中的样本对象的位置;
    样本群体获取单元,用于基于所述第一样本图像的图像标注,获取所述第一样本图像中的各个子区域的群体数量;
    标注类别获取单元,用于基于所述第一样本图像中的各个子区域的群体数量,获取所述第一样本图像中的各个子区域分别对应的所述标注类别。
  14. 根据权利要求13所述的装置,所述标注类别获取单元,包括:
    分类区间获取子单元,用于获取所述特征分类层对应的对象数量分类区间;所述对象数 量分类区间包含至少两个子区间;
    标注类别获取子单元,用于基于所述第一样本图像中的各个子区域的群体数量,通过所述对象数量分类区间进行分类,获得所述第一样本图像中的各个子区域分别对应的所述标注类别。
  15. 根据权利要求14所述的装置,所述标注类别获取单元,包括:
    子区域获取子单元,用于获取第一类样本子区域;所述第一类样本子区域是所述训练样本集中的各个样本图像的子区域中,与第一类标注类别相对应的子区域;所述第一类标注类别是各个所述标注类别中的任意一个;
    估计值获取子单元,用于基于所述第一类样本子区域中的各个子区域分别对应的群体数量,确定所述第一类标注类别对应的对象数量估计值。
  16. 根据权利要求15所述的装置,所述估计值获取子单元,还用于,
    将所述第一类样本子区域中的各个子区域分别对应的群体数量的平均值,确定为所述第一类标注类别对应的对象数量估计值。
  17. 根据权利要求14至16任一所述的装置,所述分类区间获取子单元,用于,
    基于所述训练样本集中的各个样本图像的各个子区域的群体数量的最大值和最小值,获取第一端点集;所述第一端点集用于指示所述对象数量分类区间的区间端点;
    基于所述对象数量分类区间的区间端点,确定第一分段点集;所述第一分段点集用于指示所述对象数量分类区间的区间分段点;所述区间分段点用于将所述对象数量分类区间分割为各个子区间;
    基于所述第一端点集与所述第一分段点集,获取所述特征分类层对应的所述对象数量分类区间。
  18. 一种计算机设备,所述计算机设备包含处理器和存储器,所述存储器中存储有至少一条计算机程序,所述至少一条计算机程序由所述处理器加载并执行以实现如权利要求1至9任一所述的群体数量确定方法。
  19. 一种计算机可读存储介质,所述计算机可读存储介质中存储有至少一条计算机程序,所述计算机程序由处理器加载并执行以实现如权利要求1至9任一所述的群体数量确定方法。
  20. 一种计算机程序产品,所述计算机程序产品包括至少一条计算机程序,所述计算机程序由处理器加载并执行以实现如权利要求1至9任一所述的群体数量确定方法。
PCT/CN2022/077070 2021-02-25 2022-02-21 群体数量确定方法、装置、设备、存储介质及程序产品 WO2022179474A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110212105.3 2021-02-25
CN202110212105.3A CN112560829B (zh) 2021-02-25 2021-02-25 人群数量确定方法、装置、设备及存储介质

Publications (1)

Publication Number Publication Date
WO2022179474A1 true WO2022179474A1 (zh) 2022-09-01

Family

ID=75034766

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/077070 WO2022179474A1 (zh) 2021-02-25 2022-02-21 群体数量确定方法、装置、设备、存储介质及程序产品

Country Status (2)

Country Link
CN (1) CN112560829B (zh)
WO (1) WO2022179474A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115860284A (zh) * 2023-03-01 2023-03-28 山东省海洋资源与环境研究院(山东省海洋环境监测中心、山东省水产品质量检验中心) 渔业资源密度识别方法、装置、存储介质以及电子设备

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112560829B (zh) * 2021-02-25 2021-06-04 腾讯科技(深圳)有限公司 人群数量确定方法、装置、设备及存储介质
CN112862023B (zh) * 2021-04-26 2021-07-16 腾讯科技(深圳)有限公司 对象密度确定方法、装置、计算机设备和存储介质
CN113807260B (zh) * 2021-09-17 2022-07-12 北京百度网讯科技有限公司 数据处理方法、装置、电子设备和存储介质
CN114581854A (zh) * 2022-03-18 2022-06-03 上海商汤智能科技有限公司 一种人群统计方法及装置、电子设备和存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110163060A (zh) * 2018-11-07 2019-08-23 腾讯科技(深圳)有限公司 图像中人群密度的确定方法及电子设备
US20190325231A1 (en) * 2018-07-02 2019-10-24 Baidu Online Network Technology (Beijing) Co., Ltd. Method, apparatus, device, and storage medium for predicting the number of people of dense crowd
CN111898578A (zh) * 2020-08-10 2020-11-06 腾讯科技(深圳)有限公司 人群密度的获取方法、装置、电子设备及计算机程序
CN112560829A (zh) * 2021-02-25 2021-03-26 腾讯科技(深圳)有限公司 人群数量确定方法、装置、设备及存储介质

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104077613B (zh) * 2014-07-16 2017-04-12 电子科技大学 一种基于级联多级卷积神经网络的人群密度估计方法
US20160259980A1 (en) * 2015-03-03 2016-09-08 Umm Al-Qura University Systems and methodologies for performing intelligent perception based real-time counting
US20190130189A1 (en) * 2017-10-30 2019-05-02 Qualcomm Incorporated Suppressing duplicated bounding boxes from object detection in a video analytics system
CN109271864B (zh) * 2018-08-17 2021-07-06 武汉烽火凯卓科技有限公司 一种基于小波变换和支持向量机的人群密度估计方法
CN109271960B (zh) * 2018-10-08 2020-09-04 燕山大学 一种基于卷积神经网络的人数统计方法
CN110163140A (zh) * 2019-05-15 2019-08-23 腾讯科技(深圳)有限公司 人群密度图获取方法及装置
CN111144377B (zh) * 2019-12-31 2023-05-16 北京理工大学 一种基于人群计数算法的密集区域预警方法
CN112001274B (zh) * 2020-08-06 2023-11-17 腾讯科技(深圳)有限公司 人群密度确定方法、装置、存储介质和处理器

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190325231A1 (en) * 2018-07-02 2019-10-24 Baidu Online Network Technology (Beijing) Co., Ltd. Method, apparatus, device, and storage medium for predicting the number of people of dense crowd
CN110163060A (zh) * 2018-11-07 2019-08-23 腾讯科技(深圳)有限公司 图像中人群密度的确定方法及电子设备
CN111898578A (zh) * 2020-08-10 2020-11-06 腾讯科技(深圳)有限公司 人群密度的获取方法、装置、电子设备及计算机程序
CN112560829A (zh) * 2021-02-25 2021-03-26 腾讯科技(深圳)有限公司 人群数量确定方法、装置、设备及存储介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115860284A (zh) * 2023-03-01 2023-03-28 山东省海洋资源与环境研究院(山东省海洋环境监测中心、山东省水产品质量检验中心) 渔业资源密度识别方法、装置、存储介质以及电子设备

Also Published As

Publication number Publication date
CN112560829A (zh) 2021-03-26
CN112560829B (zh) 2021-06-04

Similar Documents

Publication Publication Date Title
WO2022179474A1 (zh) 群体数量确定方法、装置、设备、存储介质及程序产品
Tuor et al. Overcoming noisy and irrelevant data in federated learning
CN111191791B (zh) 基于机器学习模型的图片分类方法、装置及设备
US11348249B2 (en) Training method for image semantic segmentation model and server
CN109002766B (zh) 一种表情识别方法及装置
CN112862093B (zh) 一种图神经网络训练方法及装置
CN109977895B (zh) 一种基于多特征图融合的野生动物视频目标检测方法
US11562179B2 (en) Artificial intelligence system for inspecting image reliability
WO2022179542A1 (zh) 群体数量确定方法、装置、设备、存储介质及程序产品
CN110399895A (zh) 图像识别的方法和装置
Mehdi et al. Entropy-based traffic flow labeling for CNN-based traffic congestion prediction from meta-parameters
WO2018006631A1 (zh) 一种用户等级自动划分方法及系统
CN111832414A (zh) 一种基于图正则光流注意力网络的动物计数方法
CN112258250A (zh) 基于网络热点的目标用户识别方法、装置和计算机设备
CN114648680A (zh) 图像识别模型的训练方法、装置、设备、介质及程序产品
Alqahtani et al. CMRS: a classifier matrix recognition system for traffic management and analysis in a smart city environment
KR20100116404A (ko) 영상정보로부터 독립세포와 군집세포를 분류하는 방법 및 장치
CN109359689B (zh) 一种数据识别方法及装置
CN105678333B (zh) 一种拥挤区域的确定方法和装置
Bhattacharya et al. A deep neural network framework for detection and identification of bengal tigers
CN114758787A (zh) 区域疫情信息处理方法、装置和系统
WO2024011853A1 (zh) 人体图像质量检测方法、装置、电子设备及存储介质
US11321843B1 (en) Adaptive machine learning system for image based biological sample constituent analysis
CN111291597B (zh) 一种基于图像的人群态势分析方法、装置、设备及系统
Paredes et al. Bandwidth Allocation Mechanism based on Users' Web Usage Patterns for Campus Networks

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22758838

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 18.01.2024)