CN118229986A - Method, device, equipment and medium for counting number of objects in dense scene - Google Patents

Method, device, equipment and medium for counting number of objects in dense scene Download PDF

Info

Publication number
CN118229986A
CN118229986A CN202410155949.2A CN202410155949A CN118229986A CN 118229986 A CN118229986 A CN 118229986A CN 202410155949 A CN202410155949 A CN 202410155949A CN 118229986 A CN118229986 A CN 118229986A
Authority
CN
China
Prior art keywords
target image
feature
objects
sampling
map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410155949.2A
Other languages
Chinese (zh)
Inventor
胡文骏
蒋召
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Xumi Yuntu Space Technology Co Ltd
Original Assignee
Shenzhen Xumi Yuntu Space Technology Co Ltd
Filing date
Publication date
Application filed by Shenzhen Xumi Yuntu Space Technology Co Ltd filed Critical Shenzhen Xumi Yuntu Space Technology Co Ltd
Publication of CN118229986A publication Critical patent/CN118229986A/en
Pending legal-status Critical Current

Links

Abstract

The application relates to the technical field of image processing, and provides a method, a device, equipment and a medium for counting the number of objects in a dense scene. According to the method, the target objects comprising the dense scene are preprocessed to obtain the input sequence of the target image, then the feature extraction module is used for extracting features of the input sequence to obtain the feature map which can represent the similarity of each feature in the target image and all other features, namely the global dependency relationship between the objects in the dense scene, finally the obtained feature map is converted into the density map, the number of the objects in the dense scene is obtained based on the statistics of the density map, auxiliary data are not required to be additionally introduced, the global dependency relationship between the objects in the dense scene is modeled, the statistics of the number of the objects is realized by converting the feature map representing the global dependency relationship into the density map, the accuracy of the statistics of the number of the objects in the dense scene is improved, and the algorithm has higher universality and is easy to use.

Description

Method, device, equipment and medium for counting number of objects in dense scene
Technical Field
The present application relates to the field of image processing technologies, and in particular, to a method, an apparatus, a device, and a medium for counting the number of objects in a dense scene.
Background
At present, the precision of the people counting algorithm is higher and the application is wider, but when the scene is complex, a large amount of shielding can exist for the dense crowd, and the precision of the result of counting by using the people counting algorithm is poor.
In the related art, in order to solve the problem of statistics in dense scenes, other assistance measures, such as using depth maps or thermodynamic diagrams, are often used to assist the accuracy of statistics. These are not general algorithms, however, they require additional acquisition of partial data, are less ubiquitous and add complexity to algorithm training and reasoning.
Disclosure of Invention
In view of the above, the embodiment of the application provides a method, a device, equipment and a medium for counting the number of objects in a dense scene, which are used for solving the problem that the existing dense scene people counting algorithm has low precision when shielding exists in the prior art.
In a first aspect of the embodiment of the present application, there is provided a method for counting the number of objects in a dense scene, including:
Acquiring a target image, wherein the target image comprises a dense scene, N objects to be counted are included in the dense scene, the distribution density of the N objects to be counted is larger than a preset density threshold, and N is a positive integer larger than a preset quantity threshold;
Preprocessing a target image to obtain an input sequence of the target image;
Extracting features of the input sequence by using a feature extraction module to obtain a feature map of the target image, wherein the feature map is used for representing the similarity of each feature in the target image and all other features;
And converting the feature map into a density map, and counting the number of objects in the target image based on the density map.
In a second aspect of the embodiment of the present application, there is provided an object number statistics apparatus in a dense scene, including:
The acquisition module is configured to acquire a target image, wherein the target image comprises a dense scene, N objects to be counted are included in the dense scene, the distribution density of the N objects to be counted is larger than a preset density threshold, and N is a positive integer larger than a preset quantity threshold;
the preprocessing module is configured to preprocess the target image to obtain an input sequence of the target image;
The extraction module is configured to perform feature extraction on the input sequence by using the feature extraction module to obtain a feature map of the target image, wherein the feature map is used for representing the similarity of each feature in the target image and all other features;
and the statistics module is configured to convert the feature map into a density map, and obtain the number of objects in the target image based on the density map statistics.
In a third aspect of the embodiments of the present application, there is provided an electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the above method when executing the computer program.
In a fourth aspect of the embodiments of the present application, there is provided a computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the above method.
Compared with the prior art, the embodiment of the application has the beneficial effects that: according to the embodiment of the application, the target object containing the dense scene is preprocessed to obtain the input sequence of the target image, then the feature extraction module is used for extracting the features of the input sequence to obtain the feature map which can represent the similarity of each feature in the target image and all other features, namely the global dependency relationship between the objects in the dense scene, finally the obtained feature map is converted into the density map, the number of the objects in the dense scene is obtained based on the statistics of the density map, auxiliary data is not required to be additionally introduced, the statistics of the number of the objects is realized by modeling the global dependency relationship between the objects in the dense scene and converting the feature map representing the global dependency relationship into the density map, the accuracy of the statistics of the number of the objects in the dense scene is improved, and the algorithm has higher universality and is easy to use.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic view of an application scenario according to an embodiment of the present application.
Fig. 2 is a flow chart of a method for counting the number of objects in a dense scene according to an embodiment of the present application.
Fig. 3 is a flowchart of a method for extracting features of an input sequence by using a feature extraction module to obtain a feature map of a target image according to an embodiment of the present application.
Fig. 4 is a flowchart of a method for preprocessing a target image to obtain an input sequence of the target image according to an embodiment of the present application.
Fig. 5 is a flowchart of a method for converting a feature map into a density map and obtaining the number of objects in a target image based on density map statistics according to an embodiment of the present application.
Fig. 6 is a flowchart of a training method of an object number statistics algorithm in a dense scene according to an embodiment of the present application.
Fig. 7 is a schematic structural diagram of a feature extraction module according to an embodiment of the present application.
Fig. 8 is a schematic diagram of an object count statistics device in a dense scene according to an embodiment of the present application.
Fig. 9 is a schematic diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
The following describes in detail a method and apparatus for counting the number of objects in a dense scene according to an embodiment of the present application with reference to the accompanying drawings.
Fig. 1 is a schematic view of an application scenario according to an embodiment of the present application. The application scenario may include terminal devices 1, 2 and 3, a server 4 and a network 5.
The terminal devices 1,2 and 3 may be hardware or software. When the terminal devices 1,2 and 3 are hardware, they may be various electronic devices having a display screen and supporting communication with the server 4, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like; when the terminal apparatuses 1,2, and 3 are software, they can be installed in the electronic apparatus as described above. The terminal devices 1,2 and 3 may be implemented as a plurality of software or software modules, or as a single software or software module, to which the embodiments of the present application are not limited. Further, various applications, such as a data processing application, an instant messaging tool, social platform software, a search class application, a shopping class application, and the like, may be installed on the terminal devices 1,2, and 3.
The server 4 may be a server that provides various services, for example, a background server that receives a request transmitted from a terminal device with which communication connection is established, and the background server may perform processing such as receiving and analyzing the request transmitted from the terminal device and generate a processing result. The server 4 may be a server, a server cluster formed by a plurality of servers, or a cloud computing service center, which is not limited in this embodiment of the present application.
The server 4 may be hardware or software. When the server 4 is hardware, it may be various electronic devices that provide various services to the terminal devices 1,2, and 3. When the server 4 is software, it may be a plurality of software or software modules providing various services to the terminal devices 1,2 and 3, or may be a single software or software module providing various services to the terminal devices 1,2 and 3, to which the embodiment of the present application is not limited.
The network 5 may be a wired network using coaxial cable, twisted pair wire, and optical fiber connection, or may be a wireless network that can implement interconnection of various Communication devices without wiring, for example, bluetooth (Bluetooth), near Field Communication (NFC), infrared (Infrared), etc., which are not limited in this embodiment of the present application.
The terminal devices 1, 2 and 3 can be used to acquire images. Further, the terminal devices 1, 2, and 3 may also establish a communication connection with the server 4 via the network 5 to transmit the acquired images to the server and receive the statistical results from the server. It should be noted that the specific types, numbers and combinations of the terminal devices 1, 2 and 3, the server 4 and the network 5 may be adjusted according to the actual requirements of the application scenario, which is not limited in the embodiment of the present application.
As mentioned above, when the scene is complex, there may be a large amount of occlusion for the dense population, and the result of statistics using the human statistics method may be poor in accuracy. In the related art, in order to solve the problem of statistics in dense scenes, other assistance measures, such as using depth maps or thermodynamic diagrams, are often used to assist the accuracy of statistics. These are not general algorithms, however, they require additional acquisition of partial data, are less ubiquitous and add complexity to algorithm training and reasoning.
In view of this, the embodiment of the application provides a method for counting the number of objects in a dense scene, which comprises the steps of firstly preprocessing a target object containing the dense scene to obtain an input sequence of the target image, then extracting features of the input sequence by using a feature extraction module to obtain a feature map capable of representing the similarity of each feature in the target image and all other features, namely, the global dependency relationship between the objects in the dense scene, finally converting the obtained feature map into a density map, counting the number of the objects in the dense scene based on the density map, obtaining the number of the objects in the dense scene without additional introduction of auxiliary data, counting the number of the objects by modeling the global dependency relationship between the objects in the dense scene and converting the feature map representing the global dependency relationship into the density map, thereby improving the accuracy of counting the number of the objects in the dense scene, and the algorithm has higher universality and is easy to use.
Fig. 2 is a flow chart of a method for counting the number of objects in a dense scene according to an embodiment of the present application. The object count statistics method in the dense scene of fig. 2 may be performed by the server of fig. 1. As shown in fig. 2, the method for counting the number of objects in the dense scene includes the following steps:
In step S201, a target image is acquired.
The target image comprises a dense scene, N objects to be counted are included in the dense scene, the distribution density of the N objects to be counted is larger than a preset density threshold, and N is a positive integer larger than the preset quantity threshold.
In step S202, a target image is preprocessed to obtain an input sequence of target images.
In step S203, feature extraction is performed on the input sequence using a feature extraction module, so as to obtain a feature map of the target image.
Wherein the feature map is used to represent the similarity of each feature in the target image to all other features.
In step S204, the feature map is converted into a density map, and the number of objects in the target image is obtained based on the density map statistics.
In the embodiment of the application, the method can be executed by a server. In some embodiments, the method may also be performed by a terminal having certain processing capabilities.
In the embodiment of the application, a target image can be acquired first, wherein the target image is an image comprising a dense scene, the dense scene comprises N objects to be counted, the distribution density of the N objects to be counted is greater than a preset density threshold, and N is a positive integer greater than a preset quantity threshold. That is, a plurality of objects in a dense scene requiring statistics of the number of objects are included in the acquired target image.
In the embodiment of the application, the acquired target image can be preprocessed first to obtain the input sequence of the target image. And then, carrying out feature extraction on the input sequence by using a feature extraction module to obtain a feature map of the target image. The feature extraction module may be a feature extraction module constructed based on a transducer feature extractor, and may calculate a global dependency relationship of each feature in the target image when performing feature extraction on the input sequence, so as to obtain a feature map that may represent similarity between each feature in the target image and all other features.
In the embodiment of the application, after the feature map of the target image is acquired, the feature map can be converted into the density map, and the number of objects in the target object is obtained based on the statistics of the density map.
In the embodiment of the application, before the number statistics is performed, training data can be acquired first, wherein the training data comprises a target image and a target image label, and the target image label is the number of objects to be counted in the target image. The method comprises the steps of preprocessing target images in training data to obtain an input sequence, carrying out feature extraction on the input sequence by using a feature extraction module to obtain a feature image, converting the feature image into a density image, carrying out statistics on the basis of the density image to obtain the number of objects in the target images, comparing the counted number of objects with a target image label to obtain a loss value, updating network parameters of the feature extraction module when the loss value does not meet the preset threshold requirement, carrying out the steps of extracting the feature image again by using the updated feature extraction module, converting the feature image into the density image, carrying out statistics on the basis of the density image to obtain the number of objects in the target image, and calculating the loss value based on the counted number of objects and the target image label until the loss value meets the preset threshold requirement. That is, the feature extraction module used in the embodiment of the present application is a trained feature extraction module.
According to the technical scheme provided by the embodiment of the application, the input sequence of the target image is obtained by preprocessing the target object containing the dense scene, then the feature extraction module is used for extracting the features of the input sequence to obtain the feature map which can represent the similarity of each feature in the target image and all other features, namely the global dependency relationship between the objects in the dense scene, finally the obtained feature map is converted into the density map, the number of the objects in the dense scene is obtained based on the statistics of the density map, no additional auxiliary data is required to be introduced, the statistics of the number of the objects is realized by modeling the global dependency relationship between the objects in the dense scene and the feature map representing the global dependency relationship is converted into the density map, the accuracy of the statistics of the number of the objects in the dense scene is improved, and the algorithm has higher universality and is easy to use.
Fig. 3 is a flowchart of a method for extracting features of an input sequence by using a feature extraction module to obtain a feature map of a target image according to an embodiment of the present application. As shown in fig. 3, the method comprises the steps of:
In step S301, the input sequence is sequentially subjected to feature extraction using K serially connected transform feature extraction submodules, so as to obtain K sampling results.
Wherein K is a positive integer greater than 1.
In step S302, up-sampling processing is performed on the K sampling results, respectively, to obtain K up-sampling results.
In step S303, the K upsampling processing results are spliced to obtain a feature map of the target image.
As previously described, the feature extraction module may be a feature extraction module constructed based on a transducer feature extractor. Further, the feature extraction module may be a feature extraction module composed of a plurality of Transformer feature extractors having the same or different structures. The plurality of transducer feature extractors are connected in series, sequentially perform feature extraction on the input sequence, and perform downsampling processing on the input sequence during feature extraction, so as to obtain a plurality of different sampling results. And then, respectively carrying out up-sampling processing on the plurality of different sampling results to obtain a plurality of up-sampling results with the same output size. And finally, splicing the plurality of upsampling results to obtain a feature map of the target image.
In the embodiment of the present application, a preferred value of K may be 4. At this time, the K serially connected transducer feature extraction submodules are respectively: a first transducer feature extraction sub-module derived from a series of 2 transducer feature extractors, a second transducer feature extraction sub-module derived from a series of 4 transducer feature extractors, a third transducer feature extraction sub-module derived from a series of 4 transducer feature extractors, and a fourth transducer feature extraction sub-module derived from a series of 8 transducer feature extractors.
Further, the input sequence is sequentially extracted by using K serially connected transform feature extraction submodules to obtain K sampling results, which may be: performing 2 times downsampling processing on the input sequence by using a first transducer characteristic extraction submodule to obtain a first sampling result; performing 2 times downsampling processing on the first sampling result by using a second transducer characteristic extraction submodule to obtain a second sampling result; performing 2 times downsampling processing on the second sampling result by using a third transducer characteristic extraction submodule to obtain a third sampling result; and performing 2 times downsampling processing on the third sampling result by using a fourth transducer characteristic extraction submodule to obtain a fourth sampling result.
That is, among 4 transducer feature extraction sub-modules connected in series, the first transducer feature extraction sub-module directly performs feature extraction on an input sequence, and at this time, since the image resolution is large and the calculation amount is large when the transducer feature extractor is used for feature extraction, the first transducer feature extraction sub-module is set to be a sub-module composed of 2 transducer feature extractors connected in series, and the first transducer feature extraction sub-module is used for performing feature extraction and 2 times downsampling processing on the input sequence to obtain a first sampling result in which the input sequence is 2 times downsampled.
The second transducer feature extraction submodule is connected in series with the first transducer feature extraction submodule to perform feature extraction on the first sampling result, at this time, since the resolution of the first sampling result is reduced by half compared with that of the input sequence, in order to improve feature extraction accuracy as much as possible on the premise of ensuring calculation speed, the second transducer feature extraction submodule can be set to be a submodule formed by connecting 4 transducer feature extractors in series, and the second transducer feature extraction submodule is used for performing feature extraction on the first sampling result and performing 2 times downsampling processing on the first sampling result to obtain a second sampling result of performing 4 times downsampling on the input sequence. Similarly, to achieve both the calculation speed and the feature extraction accuracy, the third transducer feature extraction submodule may be configured to be composed of 4 transducer feature extractors connected in series. And performing feature extraction and 2 times of downsampling processing on the second sampling result by using the third transducer feature extraction submodule to obtain a third sampling result of 8 times of downsampling on the input sequence.
And the fourth transducer characteristic extraction submodule is connected with the third transducer characteristic extraction submodule in series, and the characteristic extraction is carried out on the third sampling result. At this time, since the resolution of the third sampling result is already low, the number of the transform feature extractors of the fourth transform feature extraction submodule can be further increased, that is, the fourth transform feature extraction submodule is set to be a submodule formed by connecting 8 transform feature extractors in series, and the fourth transform feature extraction submodule is used for carrying out feature extraction and 2 times of downsampling on the third sampling result to obtain a fourth sampling result of which 16 times of downsampling is carried out on the input sequence.
In the embodiment of the application, the first sampling result, the second sampling result, the third sampling result and the fourth sampling result obtained by adopting the mode have different output sizes, so that each sampling result is also required to be transformed to the same size through up-sampling transformation so as to carry out splicing processing. Specifically, 2 times of up-sampling processing can be performed on the first sampling result to obtain a first up-sampling result; 4 times of up-sampling processing is carried out on the second sampling result to obtain a second up-sampling result; 8 times of up-sampling processing is carried out on the third sampling result to obtain a third up-sampling result; and carrying out 16 times of up-sampling processing on the fourth sampling result to obtain a fourth up-sampling result. Up to this point, a first up-sampling result, a second up-sampling result, a third up-sampling result, and a fourth up-sampling result having the same output size can be obtained.
Fig. 4 is a flowchart of a method for preprocessing a target image to obtain an input sequence of the target image according to an embodiment of the present application. As shown in fig. 4, the method comprises the steps of:
In step S401, the target image is divided into M image blocks.
Wherein M is a positive integer greater than 1.
In step S402, M image blocks are spliced and leveled, to obtain an input sequence of the target image.
In the embodiment of the application, when the target image is preprocessed, the target image can be divided into M image blocks. Specifically, a visual converter (Vision Transformer, VIT) may be used to block the target image to obtain multiple image blocks (patches). Then, the different patches are spliced and flattened (flat) is performed, so that an input sequence for obtaining the extraction of the transducer features, namely an input sequence of the target image, is constructed.
Fig. 5 is a flowchart of a method for converting a feature map into a density map and obtaining the number of objects in a target image based on density map statistics according to an embodiment of the present application. As shown in fig. 5, the method comprises the steps of:
In step S501, each pixel in the feature map is converted to a value greater than 0 and less than 1 using an S-shaped growth curve function.
In step S502, the converted feature map is subjected to gradation processing to obtain a density map of the target image.
In step S503, connected domains in the density map are counted.
In step S504, the number of pixels in the connected domain is determined as the number of objects in the target image.
In the embodiment of the application, after the feature map of the target image is extracted, each pixel in the feature map can be converted into a value greater than 0 and less than 1 by using a sigmoid function. Next, the converted feature map is subjected to graying processing, that is, a value of 0.5 or more in each pixel is converted to 1, and a value of less than 05 is converted to 0, thereby obtaining a gray map, that is, a density map of the target image. And finally, determining the connected domain in the density map, and determining the number of pixels included in the connected domain as the number of objects of the target image, thereby completing statistics of the number of objects in the target image.
Fig. 6 is a flowchart of a training method of an object number statistics algorithm in a dense scene according to an embodiment of the present application. As shown in fig. 6, when training the object number statistical algorithm in the dense scene provided by the embodiment of the present application, the input image may be patched, that is, divided into different image blocks, and the VIT processing method may be used during the division; splicing different patch image blocks, and flattening the flat to construct an input of a transducer; then extracting image features through a transducer feature extraction module, and calculating global dependency relationships during extraction to obtain a feature map of an input image; then, carrying out convolution layer calculation on the extracted feature map, outputting a corresponding density map, and outputting the number of people counted according to the density map; after counting the number of people to obtain information, calculating loss according to the label of the input image, wherein the label of the input image comprises the actual value of the number of people in the input image; and finally, reversely updating the network parameters according to the loss, and iteratively executing the steps until the loss meets the requirement of the preset condition.
Fig. 7 is a schematic structural diagram of a feature extraction module according to an embodiment of the present application. As shown in fig. 7, the feature extraction module includes four serially connected transform feature extraction sub-modules, where the first transform feature extraction sub-module is composed of 2 serially connected transform feature extractors, performs 2 times downsampling processing on an input sequence to obtain a first sampling result, and outputs the first sampling result to the second transform feature extraction sub-module and the first upsampling sub-module, and the first upsampling sub-module performs 2 times upsampling processing on the first sampling result and outputs the first upsampling result.
The second transform feature extraction submodule consists of 4 serially connected transform feature extractors, performs 2 times downsampling processing on the first sampling result to obtain a second sampling result, and outputs the second sampling result to the third transform feature extraction submodule and the second upsampling submodule respectively, wherein the second upsampling submodule performs 4 times upsampling processing on the second sampling result and outputs the second upsampling result.
The third transform feature extraction submodule is composed of 4 serially connected transform feature extractors, the transform feature extraction submodule performs 2 times downsampling processing on the second sampling result to obtain a third sampling result, the third sampling result is respectively output to the fourth transform feature extraction submodule and the third upsampling submodule, and the third upsampling submodule performs 8 times upsampling processing on the third sampling result and outputs the third upsampling result.
The fourth transform feature extraction submodule consists of 8 serially connected transform feature extractors, performs 2 times downsampling processing on the third sampling result to obtain a fourth sampling result, outputs the fourth sampling result to a fourth upsampling submodule, performs 16 times upsampling processing on the third sampling result, and outputs the fourth upsampling result.
And the splicing sub-module splices the first upsampling result, the second upsampling result, the third upsampling result and the fourth upsampling result to obtain a feature map of the target image, and outputs the feature map.
That is, in the case of performing feature extraction on the input sequence of the target image using the feature extraction module shown in fig. 7, features may be extracted by using 2 convertors first, where the reason for using fewer convertors is that the resolution is too large, and the amount of computation required if the convertors are too much is very large; then, the features extracted in the previous step are transformed through 4 convectors, and meanwhile, 2 times of downsampling is carried out; then the features are still extracted by 4 convectors, and downsampling is needed here as in the previous processing step; the reason for the subsequent transformation with 8 transducers is that the resolution is small and the computation is relatively small; then, carrying out up-sampling 16 times, up-sampling 8 times, up-sampling 4 times and up-sampling 2 times on the output result of each layer in sequence so as to change the output sizes of different branches into the same; and finally splicing the results of different branches.
By adopting the technical scheme of the embodiment of the application, the optimized transducer feature extraction module is used for extracting the features of the target image, and the statistics of the number of objects in the target image is carried out based on the extracted feature image, so that the global feature dependence can be better modeled, the more robust features are extracted, the accuracy of the people counting algorithm in a dense scene is obviously improved, and the algorithm effect is improved.
Any combination of the above optional solutions may be adopted to form an optional embodiment of the present application, which is not described herein.
The following are examples of the apparatus of the present application that may be used to perform the method embodiments of the present application. For details not disclosed in the embodiments of the apparatus of the present application, please refer to the embodiments of the method of the present application.
Fig. 8 is a schematic diagram of an object count statistics device in a dense scene according to an embodiment of the present application. As shown in fig. 8, the object number statistics apparatus in the dense scene includes:
the obtaining module 801 is configured to obtain a target image, where the target image includes a dense scene, the dense scene includes N objects to be counted, and a distribution density of the N objects to be counted is greater than a preset density threshold, and N is a positive integer greater than a preset number threshold.
The preprocessing module 802 is configured to preprocess the target image to obtain an input sequence of the target image.
The extracting module 803 is configured to perform feature extraction on the input sequence by using the feature extracting module, so as to obtain a feature map of the target image, where the feature map is used to represent similarity between each feature and all other features in the target image.
A statistics module 804, configured to convert the feature map into a density map, and obtain the number of objects in the target image based on the density map statistics.
According to the technical scheme provided by the embodiment of the application, the input sequence of the target image is obtained by preprocessing the target object containing the dense scene, then the feature extraction module is used for extracting the features of the input sequence to obtain the feature map which can represent the similarity of each feature in the target image and all other features, namely the global dependency relationship between the objects in the dense scene, finally the obtained feature map is converted into the density map, the number of the objects in the dense scene is obtained based on the statistics of the density map, no additional auxiliary data is required to be introduced, the statistics of the number of the objects is realized by modeling the global dependency relationship between the objects in the dense scene and the feature map representing the global dependency relationship is converted into the density map, the accuracy of the statistics of the number of the objects in the dense scene is improved, and the algorithm has higher universality and is easy to use.
In the embodiment of the application, a feature extraction module is used for extracting features of an input sequence to obtain a feature map of a target image, and the method comprises the following steps: sequentially carrying out feature extraction on an input sequence by using K serially connected transducer feature extraction submodules to obtain K sampling results, wherein K is a positive integer greater than 1; respectively carrying out up-sampling treatment on the K sampling results to obtain K up-sampling results; and splicing the K upsampling processing results to obtain a feature map of the target image.
In the embodiment of the application, K is equal to 4; the K serially connected transducer feature extraction submodules are respectively: a first transducer feature extraction sub-module derived from a series of 2 transducer feature extractors, a second transducer feature extraction sub-module derived from a series of 4 transducer feature extractors, a third transducer feature extraction sub-module derived from a series of 4 transducer feature extractors, and a fourth transducer feature extraction sub-module derived from a series of 8 transducer feature extractors.
In the embodiment of the application, K series-connected transform feature extraction submodules are used for sequentially carrying out feature extraction on an input sequence to obtain K sampling results, and the method comprises the following steps: performing 2 times downsampling processing on the input sequence by using a first transducer characteristic extraction submodule to obtain a first sampling result; performing 2 times downsampling processing on the first sampling result by using a second transducer characteristic extraction submodule to obtain a second sampling result; performing 2 times downsampling processing on the second sampling result by using a third transducer characteristic extraction submodule to obtain a third sampling result; and performing 2 times downsampling processing on the third sampling result by using a fourth transducer characteristic extraction submodule to obtain a fourth sampling result.
In the embodiment of the application, up-sampling processing is performed on K sampling results respectively to obtain K up-sampling results, which comprises the following steps: performing 2 times up-sampling processing on the first sampling result to obtain a first up-sampling result; 4 times of up-sampling processing is carried out on the second sampling result to obtain a second up-sampling result; 8 times of up-sampling processing is carried out on the third sampling result to obtain a third up-sampling result; and carrying out 16 times of up-sampling processing on the fourth sampling result to obtain a fourth up-sampling result.
In the embodiment of the application, preprocessing is performed on a target image to obtain an input sequence of the target image, which comprises the following steps: dividing a target image into M image blocks, wherein M is a positive integer greater than 1; and splicing the M image blocks, and leveling to obtain an input sequence of the target image.
In the embodiment of the application, the feature map is converted into the density map, and the number of objects in the target image is obtained based on the statistics of the density map, which comprises the following steps: converting each pixel in the feature map to a value greater than 0 and less than 1 using an S-shaped growth curve function; carrying out graying treatment on the converted feature map to obtain a density map of the target image; counting connected domains in the density map; the number of pixels in the connected domain is determined as the number of objects in the target image.
It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present application.
Fig. 9 is a schematic diagram of an electronic device according to an embodiment of the present application. As shown in fig. 9, the electronic apparatus 9 of this embodiment includes: a processor 901, a memory 902 and a computer program 903 stored in the memory 902 and executable on the processor 901. The steps of the various method embodiments described above are implemented when the processor 901 executes the computer program 903. Or the processor 901 when executing the computer program 903 implements the functions of the modules/units in the above-described device embodiments.
The electronic device 9 may be a desktop computer, a notebook computer, a palm computer, a cloud server, or the like. The electronic device 9 may include, but is not limited to, a processor 901 and a memory 902. It will be appreciated by those skilled in the art that fig. 9 is merely an example of the electronic device 9 and is not limiting of the electronic device 9 and may include more or fewer components than shown, or different components.
The Processor 901 may be a central processing unit (Central Processing Unit, CPU) or other general purpose Processor, digital signal Processor (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like.
The memory 902 may be an internal storage unit of the electronic device 9, for example, a hard disk or a memory of the electronic device 9. The memory 902 may also be an external storage device of the electronic device 9, such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD) or the like, which are provided on the electronic device 9. The memory 902 may also include both internal and external memory units of the electronic device 9. The memory 902 is used to store computer programs and other programs and data required by the electronic device.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit.
The integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, and the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of each of the method embodiments described above. The computer program may comprise computer program code, which may be in source code form, object code form, executable file or in some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth.
The above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims (10)

1. A method for counting the number of objects in a dense scene, comprising:
acquiring a target image, wherein the target image comprises the dense scene, N objects to be counted are included in the dense scene, the distribution density of the N objects to be counted is larger than a preset density threshold, and N is a positive integer larger than a preset quantity threshold;
Preprocessing the target image to obtain an input sequence of the target image;
Extracting features of the input sequence by using a feature extraction module to obtain a feature map of the target image, wherein the feature map is used for representing the similarity of each feature in the target image and all other features;
And converting the characteristic map into a density map, and obtaining the number of objects in the target image based on the density map statistics.
2. The method of claim 1, wherein the feature extraction of the input sequence using a feature extraction module results in a feature map of the target image, comprising:
Sequentially carrying out feature extraction on the input sequence by using K serially connected transducer feature extraction submodules to obtain K sampling results, wherein K is a positive integer greater than 1;
Respectively carrying out up-sampling treatment on the K sampling results to obtain K up-sampling results;
And splicing the K upsampling processing results to obtain a feature map of the target image.
3. The method according to claim 2, characterized in that K is equal to 4;
The K serially connected transducer feature extraction submodules are respectively: a first transducer feature extraction sub-module derived from a series of 2 transducer feature extractors, a second transducer feature extraction sub-module derived from a series of 4 transducer feature extractors, a third transducer feature extraction sub-module derived from a series of 4 transducer feature extractors, and a fourth transducer feature extraction sub-module derived from a series of 8 transducer feature extractors.
4. A method according to claim 3, wherein the sequentially performing feature extraction on the input sequence using K serially connected transform feature extraction submodules to obtain K sampling results, includes:
Performing 2 times downsampling processing on the input sequence by using the first transducer characteristic extraction submodule to obtain a first sampling result;
performing 2 times downsampling processing on the first sampling result by using a second transducer characteristic extraction submodule to obtain a second sampling result;
Performing 2 times downsampling processing on the second sampling result by using a third transducer characteristic extraction submodule to obtain a third sampling result;
And performing 2 times downsampling processing on the third sampling result by using a fourth transducer characteristic extraction submodule to obtain a fourth sampling result.
5. The method of claim 4, wherein the upsampling the K samples to obtain K upsampled results comprises:
Performing 2 times up-sampling processing on the first sampling result to obtain a first up-sampling result;
4 times of up-sampling processing is carried out on the second sampling result to obtain a second up-sampling result;
8 times of up-sampling processing is carried out on the third sampling result to obtain a third up-sampling result;
and carrying out 16 times of up-sampling processing on the fourth sampling result to obtain a fourth up-sampling result.
6. The method of claim 1, wherein the preprocessing the target image to obtain an input sequence of the target image comprises:
Dividing a target image into M image blocks, wherein M is a positive integer greater than 1;
And splicing the M image blocks, and leveling to obtain the input sequence of the target image.
7. The method of claim 1, wherein said converting the feature map to a density map, and deriving the number of objects in the target image based on the density map statistics, comprises:
converting each pixel in the feature map to a value greater than 0 and less than 1 using an S-shaped growth curve function;
graying treatment is carried out on the converted feature images to obtain a density image of the target image;
Counting connected domains in the density map;
the number of pixels in the connected domain is determined as the number of objects in the target image.
8. An object number statistics device in a dense scene, comprising:
the acquisition module is configured to acquire a target image, wherein the target image comprises the dense scene, N objects to be counted are included in the dense scene, the distribution density of the N objects to be counted is larger than a preset density threshold, and N is a positive integer larger than a preset quantity threshold;
The preprocessing module is configured to preprocess the target image to obtain an input sequence of the target image;
The extraction module is configured to perform feature extraction on the input sequence by using the feature extraction module to obtain a feature map of the target image, wherein the feature map is used for representing the similarity of each feature in the target image and all other features;
and the statistics module is configured to convert the feature map into a density map, and obtain the number of objects in the target image based on the density map statistics.
9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 7 when the computer program is executed.
10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method according to any one of claims 1 to 7.
CN202410155949.2A 2024-02-02 Method, device, equipment and medium for counting number of objects in dense scene Pending CN118229986A (en)

Publications (1)

Publication Number Publication Date
CN118229986A true CN118229986A (en) 2024-06-21

Family

ID=

Similar Documents

Publication Publication Date Title
CN110222220B (en) Image processing method, device, computer readable medium and electronic equipment
CN110413812B (en) Neural network model training method and device, electronic equipment and storage medium
CN111915480B (en) Method, apparatus, device and computer readable medium for generating feature extraction network
CN109255337B (en) Face key point detection method and device
CN110516678B (en) Image processing method and device
CN112258512A (en) Point cloud segmentation method, device, equipment and storage medium
CN113177888A (en) Hyper-resolution restoration network model generation method, image hyper-resolution restoration method and device
CN114330565A (en) Face recognition method and device
CN116385328A (en) Image data enhancement method and device based on noise addition to image
CN111915689B (en) Method, apparatus, electronic device, and computer-readable medium for generating an objective function
CN114692085A (en) Feature extraction method and device, storage medium and electronic equipment
CN116612500B (en) Pedestrian re-recognition model training method and device
CN116030520A (en) Face recognition method and device with shielding
CN115953803A (en) Training method and device for human body recognition model
CN118229986A (en) Method, device, equipment and medium for counting number of objects in dense scene
CN115588218A (en) Face recognition method and device
CN111680754B (en) Image classification method, device, electronic equipment and computer readable storage medium
CN110209851B (en) Model training method and device, electronic equipment and storage medium
CN116912631B (en) Target identification method, device, electronic equipment and storage medium
CN111639198A (en) Media file identification method and device, readable medium and electronic equipment
CN117372685B (en) Target detection method, target detection device, electronic equipment and storage medium
CN111814807B (en) Method, apparatus, electronic device, and computer-readable medium for processing image
CN115147871B (en) Pedestrian re-identification method in shielding environment
CN114863025B (en) Three-dimensional lane line generation method and device, electronic device and computer readable medium
CN116912633B (en) Training method and device for target tracking model

Legal Events

Date Code Title Description
PB01 Publication