CN114445778A

CN114445778A - Counting method and device, electronic equipment and storage medium

Info

Publication number: CN114445778A
Application number: CN202210115053.2A
Authority: CN
Inventors: 杨昆霖; 刘诗男; 侯军; 伊帅
Original assignee: Shanghai Sensetime Intelligent Technology Co Ltd
Current assignee: Shanghai Sensetime Intelligent Technology Co Ltd
Priority date: 2022-01-29
Filing date: 2022-01-29
Publication date: 2022-05-06
Also published as: WO2023142554A1

Abstract

The present disclosure relates to a counting method and apparatus, an electronic device, and a storage medium, the method including: performing feature extraction on an input image to obtain a first feature map and a second feature map; extracting object features of the objects to be counted from the first feature map according to the position information of at least one object to be counted marked in the input image; according to the object features, determining distribution position information of at least two objects to be counted in the first feature map; determining a characteristic distribution diagram of the object to be counted from the second characteristic diagram by using the distribution position information; determining a total number of objects to be counted based on the feature profile. The disclosed embodiments can improve the accuracy of visual counting.

Description

Counting method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a counting method and apparatus, an electronic device, and a storage medium.

Background

With the development of artificial intelligence technology, visual counting has been widely used in counting statistics work, and the visual counting can count the number and distribution of objects to be counted in an image.

In the related art, the work of visual counting is mainly focused on some specific categories, such as pedestrians, automobiles, cells, animals, and the like, in most of the technical works, a counting model is trained for one category of objects, the obtained counting model is not universal, if multiple categories of objects need to be counted in an application scene, a large amount of data needs to be collected to train the model, a large amount of human resources are consumed, and the efficiency is low.

Disclosure of Invention

The present disclosure presents a visual counting scheme.

According to an aspect of the present disclosure, there is provided a counting method including:

performing feature extraction on an input image to obtain a first feature map and a second feature map;

extracting object features of the objects to be counted from the first feature map according to the position information of at least one object to be counted marked in the input image;

according to the object features, determining distribution position information of at least two objects to be counted in the first feature map;

determining a characteristic distribution diagram of the object to be counted from the second characteristic diagram by using the distribution position information;

determining a total number of objects to be counted based on the feature profile.

In a possible implementation manner, the extracting, from the first feature map, an object feature of the object to be counted according to the position information of the at least one object to be counted labeled in the input image includes:

under the condition that at least two marked objects to be counted are available, respectively extracting first characteristics of the marked objects to be counted from the first characteristic diagram according to the position information of the marked objects to be counted;

and fusing the extracted first characteristics of the objects to be counted to obtain the object characteristics of the objects to be counted.

In a possible implementation manner, the determining, according to the object feature, distribution position information of at least two objects to be counted in the first feature map includes:

calculating the similarity of the object features and the features at the positions in the second feature map;

and obtaining the distribution position information of the at least two objects to be counted in the first feature map based on the similarity of the features at the positions.

In a possible implementation manner, the determining, from the second feature map, a feature distribution map of an object to be counted by using the distribution position information includes:

and multiplying the distribution position information and the second characteristic diagram to obtain a characteristic distribution diagram of the object to be counted.

In one possible implementation, determining the total number of objects to be counted based on the feature distribution map includes:

performing up-sampling on the characteristic distribution map to obtain a density map of an object to be counted;

determining a total number of objects to be counted based on the density map.

In a possible implementation manner, the performing feature extraction on the input image to obtain a first feature map and a second feature map includes:

performing initial feature extraction on an input image to obtain an initial feature map;

performing first feature extraction on the initial feature map to obtain a first feature map;

and performing second feature extraction on the initial feature map to obtain a second feature map.

In a possible implementation manner, the input image is a sample image pre-labeled with the distribution positions of the at least two objects to be counted, the counting method is implemented based on a neural network, and a parameter updating process of the neural network includes:

determining a first loss based on the distribution position information and the distribution positions of at least two objects to be counted in the sample image labeled in advance;

determining a second loss based on the density map of the object to be counted and a pre-labeled density map of the object to be counted;

updating a parameter in the neural network based on the first loss and/or the second loss.

In a possible implementation manner, the counting method is implemented based on a neural network, and the method for constructing the training sample and/or the test sample of the neural network includes:

obtaining at least one target sub-graph of a target object according to the labeling information of the first sample image;

pasting the target subgraph into a second sample image to obtain a composite image and pasting position information of the target subgraph in the composite image;

and using the pasting position information as labeling information in the synthetic image to generate a synthetic sample image.

In a possible implementation manner, the obtaining at least one target sub-graph of the target object according to the annotation information of the first sample image includes:

extracting an image in a target region corresponding to the labeling information as a target sub-image; and/or the presence of a gas in the gas,

and respectively carrying out at least one image transformation on the image in the target area corresponding to the labeling information, and taking the image after the image transformation as a target sub-image.

In a possible implementation manner, before obtaining at least one target sub-graph of the target object according to the annotation information of the first sample image, the method further includes:

determining a first number of target subgraphs to be generated according to the size information of the target area and the size information of the second sample image; wherein the first number of target subgraphs is positively correlated with the size information of the second sample image and negatively correlated with the size information of the target area.

In a possible implementation manner, after determining the number of target sub-images to be generated, the performing at least one image transformation on the images in the target region corresponding to the annotation information respectively includes:

respectively carrying out at least one image transformation on the images in the target area according to the first quantity to obtain the first quantity of target subgraphs;

the image transformation includes at least one of:

image stretching, image shrinking, image selection, image symmetry transformation, and noise addition in an image.

According to an aspect of the present disclosure, there is provided a counting apparatus including:

the input image feature extraction module is used for extracting features of an input image to obtain a first feature map and a second feature map;

the object feature extraction module is used for extracting object features of the objects to be counted from the first feature map according to the position information of at least one object to be counted marked in the input image;

the distribution position information determining module is used for determining the distribution position information of at least two objects to be counted in the first feature map according to the object features;

the characteristic distribution map determining module is used for determining a characteristic distribution map of the object to be counted from the second characteristic map by using the distribution position information;

and the total number determining module is used for determining the total number of the objects to be counted based on the characteristic distribution map.

In one possible implementation manner, the object feature extraction module includes:

the object feature extraction submodule is used for extracting first features of the marked objects to be counted from the first feature map respectively according to the position information of the marked objects to be counted under the condition that at least two marked objects to be counted are available;

and the fusion module is used for fusing the extracted first characteristics of the objects to be counted to obtain the object characteristics of the objects to be counted.

In one possible implementation manner, the distributed location information determining module includes:

a similarity determination module, configured to calculate a similarity between the object feature and a feature at each position in the second feature map;

and the distribution position information determining submodule is used for obtaining the distribution position information of the at least two objects to be counted in the first feature map based on the similarity of the features at the positions.

In a possible implementation manner, the feature distribution map determining module is configured to multiply the distribution position information and the second feature map to obtain a feature distribution map of the object to be counted.

In one possible implementation, the total number determining module includes:

the density map acquisition module is used for performing up-sampling on the characteristic distribution map to obtain a density map of the object to be counted;

and the total number determining submodule is used for determining the total number of the objects to be counted on the basis of the density map.

In one possible implementation manner, the input image feature extraction module includes:

the initial feature extraction module is used for performing initial feature extraction on the input image to obtain an initial feature map;

the first feature map extraction module is used for performing first feature extraction on the initial feature map to obtain a first feature map;

and the second feature map extraction module is used for performing second feature extraction on the initial feature map to obtain a second feature map.

In a possible implementation manner, the input image is a sample image pre-labeled with the distribution positions of the at least two objects to be counted, the counting apparatus is implemented based on a neural network, and a parameter-based updating module of the neural network is updated, where the parameter updating module includes:

a first loss determining module, configured to determine a first loss based on the distribution position information and distribution positions of at least two objects to be counted in the pre-labeled sample image;

the second loss determining module is used for determining second loss based on the density graph of the object to be counted and a pre-labeled density graph of the object to be counted;

and the parameter updating submodule is used for updating the parameters in the neural network based on the first loss and/or the second loss.

In a possible implementation manner, the counting method is implemented based on a neural network, and the apparatus further includes: the sample construction module is used for constructing a training sample and/or a testing sample of the neural network;

the sample construction module comprises:

the target sub-image determining module is used for obtaining at least one target sub-image of the target object according to the labeling information of the first sample image;

the pasting module is used for pasting the target sub-image into a second sample image to obtain a composite image and pasting position information of the target sub-image in the composite image;

and the synthesis module is used for generating a synthesized sample image by taking the pasting position information as the marking information in the synthesized image.

In one possible implementation, the target subgraph determining module includes:

the first target subgraph determining module is used for extracting the image in the target region corresponding to the labeling information as a target subgraph; and/or the presence of a gas in the gas,

and the second target sub-image determining module is used for respectively carrying out at least one image transformation on the image in the target region corresponding to the labeling information and taking the image after the image transformation as a target sub-image.

In one possible implementation, the sample construction module further includes:

a number determination module, configured to determine a first number of target subgraphs to be generated according to the size information of the target region and the size information of the second sample image; wherein the first number of target subgraphs is positively correlated with the size information of the second sample image and negatively correlated with the size information of the target area.

In a possible implementation manner, the second target sub-image determining module is configured to perform at least one image transformation on the images in the target region according to the first number, so as to obtain a first number of target sub-images;

the image transformation includes at least one of:

According to an aspect of the present disclosure, there is provided an electronic device including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the memory-stored instructions to perform the above-described method.

According to an aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method.

In the embodiment of the disclosure, a first feature map and a second feature map are obtained by performing feature extraction on an input image, object features of an object to be counted are extracted from the first feature map according to position information of at least one object to be counted marked in the input image, and distribution position information of at least two objects to be counted is determined in the first feature map according to the object features; determining a characteristic distribution diagram of the object to be counted from the second characteristic diagram by using the distribution position information; determining a total number of objects to be counted based on the feature profile. Therefore, during counting, a user is required to mark (for example, select) an object to be counted in an input image to obtain position information of the object to be counted, and then object features of the object to be counted are extracted by using the position information, namely feature information of the object to be counted without categories is obtained from the image to be counted, and counting is not performed through a network of a specific category. In addition, in the embodiment of the present disclosure, two feature maps are extracted, for the two extracted feature maps, the distribution position information of each object to be counted is determined from the first feature map by using the object features, and then the feature distribution map of the object to be counted is determined from the second feature map by using the distribution position information, that is, one of the two extracted feature maps is used for determining the distribution position and the other is used for determining the feature distribution, and the two feature maps each use two sets of parameters to perform the distribution position determination and the feature distribution determination, so that the number of the objects to be counted can be accurately determined.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure. Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure.

Fig. 1 shows a flow chart of a counting method according to an embodiment of the present disclosure.

Fig. 2 shows an application scenario diagram according to an embodiment of the present disclosure.

Fig. 3 shows an application scenario diagram according to an embodiment of the present disclosure.

Fig. 4 shows a flow chart of a feature extraction method according to an embodiment of the present disclosure.

Fig. 5 shows a schematic diagram of a distribution of convolution kernel sample point locations according to an embodiment of the present disclosure.

Fig. 6 shows an application scenario diagram according to an embodiment of the present disclosure.

Fig. 7 shows an application scenario diagram according to an embodiment of the present disclosure.

Fig. 8 shows a block diagram of a counting device according to an embodiment of the present disclosure.

Fig. 9 shows a block diagram of an electronic device in accordance with an embodiment of the disclosure.

FIG. 10 shows a block diagram of an electronic device in accordance with an embodiment of the disclosure.

Detailed Description

Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

The term "and/or" herein is merely an association relationship describing an associated object, and means that there may be three relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.

Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.

In the related art, the work of visual counting mainly focuses on some specific categories, for example, training a pedestrian counting model for pedestrians, training the pedestrian counting model by using samples labeled with pedestrian positions and numbers, and in application, inputting images into the trained pedestrian counting model can count the number of pedestrians, but the pedestrian counting model cannot count the number of vehicles because the model is trained based on pedestrian samples. If the number of the vehicles needs to be counted, the samples marked with the positions and the numbers of the vehicles need to be used for training new model parameters, so that the number of the vehicles can be counted. Therefore, the visual counting method requires a large amount of data to be collected to train the model, which consumes a large amount of human resources and is inefficient.

In addition, in the visual counting work, the counting algorithm model can generate convolution operation parameters depending on the characteristics of the object of a specific class to be identified, but the network model designed in the mode has large parameters, slow model operation speed and low efficiency.

In one possible implementation, the counting method may be performed by an electronic device such as a terminal device or a server, the terminal device may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or the like, and the method may be implemented by a processor calling a computer readable instruction stored in a memory.

For convenience of description, in one or more embodiments of the present specification, an execution subject of the counting method may be a terminal device, and hereinafter, an embodiment of the method will be described by taking the execution subject as the terminal device as an example. It is understood that the implementation of the method by the terminal device is only an exemplary illustration, and should not be construed as a limitation of the method.

Fig. 1 shows a flowchart of a counting method according to an embodiment of the present disclosure, as shown in fig. 1, the counting method includes:

in step S11, performing feature extraction on the input image to obtain a first feature map and a second feature map;

the input image is an image to be counted, and the image may be a single image taken by the image acquisition device, for example, a photo taken in a photographing mode of the image acquisition device; alternatively, the image may be a video frame in a video captured by the image capturing device, for example, a video frame in a video captured by a video recording mode of the image capturing device. The embodiment of the present disclosure does not limit the specific form of the image. The input image may be an image stored locally by the device, or may also be a cloud image acquired from a network, and the storage form is not limited in the embodiment of the present disclosure.

In one example, the input image may be an image of a road captured by an image capturing device to count the number of people, vehicles, and the like in the road; in another example, the input image may be a microscopic cell map acquired by an image acquisition device, used for counting the number of cells, and the like. In addition, the embodiments of the present disclosure may also be applied to counting the number of other types of objects, such as animals, etc., and are not limited herein.

The feature extraction of the input image may be implemented by a convolutional network layer, and in the embodiment of the present disclosure, the first feature map and the second feature map may be obtained by performing feature extraction on two different convolutional network layers respectively. The structure of the two convolutional network layers may be identical or may also be completely different, i.e. the first signature and the second signature may be identical or different. In one example, the two convolutional network layers may be network layers with different parameters obtained by training two identical initial convolutional networks. Since the first and second profiles have different roles, one for determining the distribution location and the other for determining the profile, as the network is iteratively trained based on the first penalty determined for the distribution location and the second penalty determined for the profile, the parameters of the two convolutional networks may become different in order to better achieve the respective roles of the two profiles.

For a specific way of determining the first characteristic diagram and the second characteristic diagram, reference may be made to possible implementation manners provided by the present disclosure, and details are not repeated here. It will be appreciated that "first" and "second" in the embodiments of the disclosure are used to distinguish between the objects described and should not be construed as limiting the order in which the objects are described, indicating or implying relative importance or the like.

In step S12, extracting an object feature of at least one object to be counted from the first feature map according to the position information of the at least one object to be counted labeled in the input image;

in the input image, the user can mark the position information of the object to be counted in advance, it should be noted that the user does not need to mark the position information of all the objects to be counted, and the user only needs to mark part of the objects to be counted because the number of the objects to be counted is often large. In one example, a user may randomly select 3 objects to be counted in an input image and then select the objects in a rectangular frame, or may select the objects in a polygonal or circular frame as labels of the objects to be counted; in one example, the user may select 1 object to be counted from any one of the input images.

After the user marks the position information of the object to be counted, the object feature of the object to be counted can be extracted from the first feature map based on the position information. In an example, in a case that the position information is coordinates of a rectangular frame, according to the coordinates of the rectangular frame, a region corresponding to the coordinates of the rectangular frame in the first feature map (i.e., a region selected by a frame corresponding to the rectangular frame) may be extracted as an object feature of the object to be counted.

In step S13, determining distribution position information of at least two objects to be counted in the first feature map according to the object features;

after the object features of the objects to be counted are determined, the positions of at least two objects to be counted can be searched in the first feature map by using the object features to obtain the distribution position information of the objects to be counted, wherein the at least two objects to be counted can be all the objects to be counted. Specifically, the positions of the objects to be counted, which may be possibly distributed, may be found by calculating the similarity between the object features and the features at the positions in the first feature map, which may specifically refer to a possible implementation manner provided by the present disclosure, and details are not described here.

The distribution position information is used for representing the position of the object to be counted in the input image, and the distribution position information can be realized through a segmentation map or a segmentation mask of the object to be counted, wherein the scale of the segmentation map or the segmentation mask is the same as that of the second feature map, and the feature distribution map of the object to be counted can be determined from the second feature map by using the segmentation map or the segmentation mask.

For example, the representation form of the segmentation map, the segmentation mask and the second feature map in the computer may be a matrix, and then, the specific representation form of the scale herein may be the number of parameters of the matrix in each dimension, for example, the number of rows, columns and the like of the matrix. In the case that the segmentation map or the segmentation mask has the same scale as the second feature map, it indicates that the parameters in the segmentation map or the segmentation mask correspond to the distribution positions of the objects to be counted in the second feature map. Each parameter in the matrix of the segmentation map or the segmentation mask represents whether an object to be counted exists at the position, and each parameter in the matrix of the second feature map represents the image feature at the position.

In step S14, determining a feature distribution map of the object to be counted from the second feature map by using the distribution position information;

the first feature map after feature extraction is used for determining the position information of the object to be counted, so that the network parameters of the first feature map are more prone to extracting accurate position information, and the obtained first feature map more contains the position features of the object to be counted. And the extracted second feature map is used for determining feature distribution, so that the network parameters of the extracted second feature map tend to extract accurate feature representation of the object to be counted. Therefore, in the embodiment of the present disclosure, the feature distribution map of the object to be counted is obtained from the second feature map in combination with the position feature of the object to be counted, so as to calculate the number of the objects to be counted more accurately.

In step S15, the total number of objects to be counted is determined based on the feature distribution map.

The characteristic distribution graph can represent density characteristic information of the objects to be counted marked by the user, namely the characteristics of the objects to be counted and the distribution of the characteristics, wherein the distribution is not related to a specific category, but is related to the objects to be counted selected by the user.

In one possible implementation, the determining the total number of objects to be counted based on the feature distribution map includes: performing up-sampling on the characteristic distribution map to obtain a density map of an object to be counted; determining a total number of objects to be counted based on the density map.

The density map of the object to be counted is used for representing the density of the object to be counted, in one example, a specific implementation form of the density map may be a matrix, each element in the matrix represents whether the object to be counted exists at the position, the density map may be obtained by upsampling specifically, the feature distribution map is amplified to be consistent with the scale of the input image through the upsampling operation, then the amplified feature distribution map is convolved through a convolution operation, so that the density map is obtained, and parameters of the upsampling operation and the convolution operation may be obtained through training a network. In the density map, the probability of the existence of the crowd at a single pixel point is represented by a numerical value between 0 and 1, and then all values of the matrix are accumulated (integrated), so that the obtained numerical value is the total number of the objects to be counted.

In one possible implementation manner, the extracting, according to the position information of at least one object to be counted labeled in the input image, an object feature of the object to be counted from the first feature map includes: under the condition that at least two marked objects to be counted are available, respectively extracting the characteristics of the marked objects to be counted according to the position information of the marked objects to be counted; and fusing the extracted first characteristics of the objects to be counted to obtain the object characteristics of the objects to be counted.

In the input image, the user may label the position information of the object to be counted in advance, and in the case that the user labels one object to be counted, for example, the user only selects one object to be counted, the feature of the one object to be counted may be directly used as the object feature.

When the user marks at least two objects to be counted, the first features in each marking frame may be extracted, and then the average value of each object to be counted is extracted as the object features of the object to be counted. Specifically, the extracted first features in each labeling frame may be unified to the same size, and then the first features may be fused, where the specific fusion manner may be to obtain an average value of the first features, that is, to sum up the first features and then divide the sum by the labeling number. For example, if the user selects the positions of 3 objects to be counted, the features at the three positions can be extracted from the first feature map respectively to obtain three features v₁,v₂And v₃Then, the three characteristics are unified in scale, and the characteristics are obtained by taking the average value

I.e. object features representing the object to be counted.

In the implementation mode, under the condition that at least two marked objects to be counted are provided, the characteristics of the marked objects to be counted are respectively extracted according to the position information of the marked objects to be counted; and fusing the extracted first characteristics of the objects to be counted to obtain the object characteristics of the objects to be counted. Therefore, under the condition that the features of the objects to be counted marked in the image are different, the object features of the objects to be counted are obtained by fusing the first features of at least two objects to be counted, more accurate object features can be obtained, and the accuracy of counting the objects to be counted is improved.

In a possible implementation manner, the determining, according to the object feature, distribution position information of at least two objects to be counted in the first feature map includes: calculating the similarity of the object features and the features at the positions in the second feature map; and obtaining the distribution position information of the at least two objects to be counted in the first feature map based on the similarity of the features at the positions.

The distribution position information of at least two objects to be counted in the first feature map obtained here may be distribution position information of all objects to be counted in the first feature map. By using the object features, the positions of all the objects to be counted can be searched in the first feature map to obtain the distribution position information of the objects to be counted. The higher the similarity, the more likely the feature at the position is to be characterized as the object to be counted, and the lower the similarity, the less likely the feature at the position is to be characterized as the object to be counted.

In this implementation, the distribution position information of the object to be counted may be calculated by cosine similarity, specifically, the expression form of the object feature of the object to be counted in the computer may be a vector, and each position in the first feature map may also be an image feature represented by the vector, then, the cosine similarity may be obtained by solving the cosine similarity between the vector characterizing the object feature and the vector characterizing the feature at each position in the first feature map, and a value of the cosine similarity at each position may be directly used as the distribution position information of the object to be counted.

In one example, cosine similarity values may be directly assigned to the positions to obtain a feature matrix representing distribution positions of the objects to be counted, that is, a segmentation map or a segmentation mask of the objects to be counted in the input image.

In this implementation, the similarity between the object feature and the feature at each position in the second feature map is calculated; and obtaining the distribution position information of all the objects to be counted in the first feature map based on the similarity of the features at the positions. Therefore, the segmentation graph or the segmentation mask of the object to be counted can be obtained by calculating the similarity, and compared with a method of extracting features from the feature graph to obtain the segmentation graph through convolution operation, the method can save network operation parameters, improve the acquisition efficiency of the distribution position information and further improve the counting efficiency.

In a possible implementation manner, the determining, from the second feature map, a feature distribution map of an object to be counted by using the distribution position information includes: and multiplying the distribution position information and the second characteristic diagram to obtain a characteristic distribution diagram of the object to be counted.

The extracted second feature map is used for determining feature distribution, so that the network parameters of the extracted second feature map tend to extract accurate feature characterization, and the features of the object to be counted can be accurately characterized in the second feature map.

In this implementation, since the value of each position in the distribution position information is a value of cosine similarity, that is, the probability that each position is an object to be counted, the value of the feature at the position with lower probability can be suppressed and the value of the feature at the position with higher probability can be enhanced by directly multiplying the distribution position information by the second feature map. The result obtained after multiplication can be used as a characteristic distribution map of the object to be counted.

In the embodiment of the present disclosure, the feature distribution map of the object to be counted is obtained by multiplying the distribution position information by the second feature map, so that the feature distribution map of the object to be counted can be obtained quickly, and the counting efficiency is further improved.

In a possible implementation manner, the performing feature extraction on the input image to obtain a first feature map and a second feature map includes: performing initial feature extraction on an input image to obtain an initial feature map; performing first feature extraction on the initial feature map to obtain a first feature map; and performing second feature extraction on the initial feature map to obtain a second feature map.

The initial feature extraction is performed on the input image so as to extract some information common to the first feature map and the second feature map, and in one example, the first feature map and the second feature map can be extracted through a convolutional neural network designed on the basis of a VGG-16 convolutional neural network, and specifically, the initial feature map in the input image can be extracted by utilizing the first 10 convolutional layers and 3 pooling layers of the VGG-16. Of course, the initial feature map may be extracted by other convolutional neural networks, which is not limited by the present disclosure.

In the embodiment of the disclosure, initial feature extraction is performed on an input image to obtain an initial feature map; performing first feature extraction on the initial feature map to obtain a first feature map; and performing second feature extraction on the initial feature map to obtain a second feature map. Therefore, the first feature map and the second feature map are obtained by two different operations, and the distribution position determination and the feature distribution determination are respectively carried out on the two feature maps by using two sets of parameters according to the first feature map and the second feature map, so that the distribution position and the features of the object to be counted can be accurately determined, and the number of the object to be counted can be accurately determined. In addition, the first feature map and the second feature map are extracted from the initial feature map, namely, the initial feature map is obtained by performing initial feature extraction on the input image so as to extract some information common to the first feature map and the second feature map, so that parameters of network operation can be reduced, the first feature map and the second feature map do not need to be extracted from the input image respectively, and the efficiency of extracting the first feature map and the second feature map is improved.

In a possible implementation manner, the input image is a sample image pre-labeled with the distribution positions of the at least two objects to be counted, the counting method is implemented based on a neural network, and a parameter updating process of the neural network includes: determining a first loss based on the distribution position information and the distribution positions of at least two objects to be counted in the sample image labeled in advance; and determining a second loss based on the density map of the object to be counted and a pre-labeled density map of the object to be counted, and updating parameters in the neural network based on the first loss and/or the second loss.

In this implementation, the neural network may be a convolutional neural network, and the training of the neural network may be implemented by updating parameters of the neural network. The distribution positions of all objects to be counted can be labeled in advance in the sample image, and the distribution positions are used for calculating the first loss of the distribution positions predicted by the neural network, so that the neural network is trained. The position information of at least one object to be counted marked in step S11 is used to select the object to be counted in a frame, and the user only needs to select a small number (e.g., 3) of objects to be counted in a frame, so that the neural network can predict the total number of all objects to be counted.

In the process of updating the parameters in the neural network based on the first loss and/or the second loss, the parameters in the neural network may be updated based on the first loss, the parameters in the neural network may also be updated based on the second loss, a total loss may also be determined based on the first loss and the second loss, and the parameters in the neural network may be updated based on the total loss. The total loss is determined based on the first loss and the second loss, specifically, the first loss and the second loss may be summed to obtain the total loss, in addition, corresponding weights may be respectively given to the first loss and the second loss to perform weighting, and then the weighting and the summing are performed to obtain the total loss.

In the updating process, a first loss between the distribution position information and the distribution position of the object to be counted in the input image labeled in advance can be calculated, parameters in the neural network are updated based on the first loss, the distribution position labeled in advance can be the position of all the objects to be counted in the input image labeled in advance by a user, the position can be specifically selected by a rectangular frame or a polygonal frame and other framing tools, and after framing is completed, a mask label representing the distribution position of the object to be counted in the input image can be generated and can be used as the distribution position information of the object to be counted in the input image labeled in advance.

In the process of updating the parameters in the neural network based on the first loss, the parameters in the network may be updated so that the distribution position information output by the updated neural network after receiving the input image is as consistent as possible with the pre-labeled distribution position. In one example, the first loss may be determined by a binary cross entropy loss function (BCE loss), and then the parameters of the neural network are updated.

In the updating process, a second loss between the density map of the object to be counted and the pre-labeled density map of the object to be counted can be calculated, the parameters in the neural network are updated based on the second loss, and the pre-labeled density map of the object to be counted can be generated by pre-labeled distribution position information of the object to be counted in the input image.

In the process of updating the parameters in the neural network based on the second loss, the parameters in the network may be updated so that the density map of the object to be counted output by the updated neural network after receiving the input image is as consistent as possible with the pre-labeled density map. In one example, the second loss may be determined by a mean square error loss function (L2loss), and then the parameters of the neural network are updated.

In the embodiment of the present disclosure, a first loss is determined based on the distribution position information and a distribution position of an object to be counted in a pre-labeled input image; and determining a second loss based on the density map of the object to be counted and a pre-labeled density map of the object to be counted, and updating parameters in the neural network based on the first loss and/or the second loss. Since the first feature map and the second feature map have different functions, one is used for determining the distribution position, and the other is used for determining the feature distribution, so that the loss is calculated respectively based on the distribution position and the density function, the parameters of the two initial convolution networks are different along with the network training, so that the respective functions of the two feature maps are better realized, and the number of the objects to be counted is more accurately determined.

An application scenario of the embodiment of the present disclosure is explained below. Referring to fig. 2 of the drawings, a schematic diagram of a display device,in the application scenario, the counting method provided by the present disclosure is implemented by a convolutional neural network, a user marks 3 objects to be counted in an input image in advance, and feature extraction is performed on the input image by using the first 10 convolutional layers and 3 pooling layers of VGG-16 to obtain an initial feature F; then extracting first features F from the initial features F respectively_cAnd a second feature F_d(ii) a From the first feature F_cExtracting the marked features v of the 3 objects to be counted₁,v₂And v₃The three characteristics are averaged to obtain

Computing

And F_cThe cosine similarity of the characteristic of each position is obtained to obtain the distribution position information of the object to be counted on the input image, namely the segmentation graph of the object to be counted in the input image

Will be provided with

And F_dMultiplying to obtain density characteristic information without category, at this time, obtaining characteristic information only containing object of category to be counted, decoding (up-sampling and enlarging to be consistent with input image scale) the density characteristic information to obtain density map of object to be counted

During the training process of the neural network, calculating through a binary cross entropy loss function (BCE loss)

And the first loss between the tag information S, and then updating the parameters of the neural network; determination by mean square error loss function (L2loss)

And the second loss between the tag information D, and then the parameters of the neural network are updated.

In a possible implementation manner, the counting method is implemented based on a neural network, and the method for constructing the training sample and/or the test sample of the neural network includes: obtaining at least one target sub-graph of a target object according to the labeling information of the first sample image; pasting the target subgraph to a second sample image to obtain a composite image and pasting position information of the target subgraph in the composite image; and using the pasting position information as labeling information in the synthetic image to generate a synthetic sample image.

The first sample image may be an image with annotation information, where the annotation information is used to indicate a position where a target object annotated in the first sample image is located, and the target object may be an object annotated by any user, and the specific type of the target object is not limited in this disclosure. The sample plot, as shown in the upper left hand diagram of fig. 3, has 3 glass spheres located in positions selected by rectangular boxes.

The annotation information is used to indicate a position of the target object annotated in the first sample image, and the annotation information may specifically be coordinates of the target object, so that the annotated target area may be determined according to the coordinates. In one example, in a case where the position information is coordinates of a rectangular frame, the target region to be labeled in the first sample image may be determined according to the coordinates of the rectangular frame.

After the target area is determined, at least one target sub-image can be obtained based on the image in the target area, and specifically, the image in the target area corresponding to the annotation information can be extracted as the target sub-image; and/or performing at least one image transformation on the image in the target region corresponding to the labeling information respectively, and taking the image after the image transformation as a target sub-image.

Under the condition that at least two target regions exist, images in the target regions can be respectively extracted to obtain at least two target subgraphs, namely, an image in a single target region is used as one target subgraph, and exemplarily, assuming that positions of 3 target objects are labeled in advance, 3 target subgraphs can be obtained.

In addition, the image in the target region may be subjected to image transformation, and the image after the image transformation may be used as a target sub-image. The image transformation includes at least one of: image stretching, image shrinking, image selection, image symmetry transformation, and noise addition in an image. In one example, the image transformation here may be stretching the image, for example, in a horizontal direction and/or a vertical direction; in another example, the image transformation here may also be shrinking the image, for example, stretching in the horizontal direction and/or the vertical direction; in another example, the image transformation here may also be a rotation of the image, for example a rotation of 90 ° to the left; in another example, the image transformation here may also be a symmetric transformation of the image; in another example, the image transformation may be adding noise to the image, and the noise may be gaussian noise, for example.

In this implementation, the extracted image and the image obtained by image transformation may be taken together as the target sub-image.

After the target sub-image is obtained, the target sub-image may be pasted into the second sample image to obtain a composite image. The second sample image may be an arbitrary image, the second sample image serves as a background, and when the object sub-image is pasted into the second sample image, pasting may be performed randomly at an arbitrary position in the second image while recording pasting position information.

In the embodiment of the disclosure, at least one target area marked in a first sample image is determined according to marking information of the first sample image; obtaining at least one target subgraph based on the image in the target region; pasting the target subgraph into a second sample image to obtain a composite image and pasting position information of the target subgraph in the composite image; and using the pasting position information as labeling information in the synthetic image to generate a synthetic sample image. Therefore, a large number of synthetic sample images can be generated by using a copy-paste method, the speed of constructing the sample images is higher compared with manual labeling, and the accuracy of labeling information is higher by using the paste position information as the labeling information of the synthetic sample images.

In a possible implementation manner, before obtaining at least one target sub-graph of the target object according to the annotation information of the first sample image, the method further includes: determining a first number of target subgraphs to be generated according to the size information of the target area and the size information of the second sample image; wherein the first number of target subgraphs is positively correlated with the size information of the second sample image and negatively correlated with the size information of the target area.

The size information of the target region may be the area of the target region, or information such as the length and width of the target region, and in the case where the target region is rectangular, the area of the target region may be obtained by multiplying the length and width of the target region. In the case where the target region includes a plurality of target regions, the size information of the target region may be an average of sizes of the plurality of target regions, for example, an average of areas of the plurality of target regions. The larger the target size is, the fewer target subgraphs can be placed in the second sample image, and therefore, the first number of the target subgraphs is positively correlated with the size information of the second sample image.

The size information of the second sample image may be the area of the second sample image, or information such as the length and width of the second sample image, and in the case where the second sample image is rectangular, the area of the second sample image may be obtained by multiplying the length and width of the second sample image. The larger the size of the second sample image is, more target subgraphs can be placed in the second sample image, and therefore, the first number of the target subgraphs is positively correlated with the size information of the second sample image.

In order to more clearly understand the determination process of the first number N, the determination process of the first number N is described below by using specific mathematical expressions, and it should be noted that the specific mathematical expressions provided in the present disclosure are one possible implementation manner of the embodiments of the present disclosure in specific implementation, and should not be construed as limiting the protection scope of the embodiments of the present disclosure.

Wherein, α is a weighting parameter, such as a random number between 0.25 and 0.75,

is the height of the second sample image,

is the width of the second sample image, e_iCharacterizing the ith target subgraph, i being a positive integer,

is e_iThe set of components is composed of a plurality of groups,

as the height of the target sub-graph,

for the width of the target sub-graph,

characterizing a first sample image I_fIn (e)_iThe number of the cells.

In the embodiment of the present disclosure, a first number of target subgraphs to be generated is determined according to the size information of the target area and the size information of the second sample image; and the first number of the target subgraphs is positively correlated with the size information of the second sample image, and the first number of the target subgraphs is negatively correlated with the size information of the target area. Therefore, the calculated number of the maps meets the distribution rule of the actual object, the obtained synthetic sample image is closer to the really shot image, and the accuracy of the neural network training is improved.

In a possible implementation manner, after determining the number of target sub-images to be generated, the performing at least one image transformation on the images in the target region corresponding to the annotation information respectively includes: and respectively carrying out at least one image transformation on the images in the target region according to the first quantity to obtain target subgraphs of which the total number meets the first quantity.

After the first number is determined, the first number of target subgraphs can be obtained through an image transformation mode, and specific transformation modes can refer to possible implementation modes provided by the present disclosure, which is not described herein again.

An application scenario of the embodiment of the present disclosure is explained below. Please refer to fig. 3, which is a schematic diagram of an application scenario provided by the present disclosure, in which a first sample image I is selected from a training set_fAccording to the labeling information provided by the training set, the image corresponds to 3 rectangular frames e₁、e₂、e₃Representing the object to be counted, randomly selecting another second sample image I from the training set_bAs a background figure. From I_fCopying the images in 3 object frames, performing several times of random image transformation, obtaining N target sub-images in total, and pasting the target sub-images to a background image I_bAs shown in fig. 3, label information corresponding to an object on the synthesized sample image is obtained according to the position of the paste image, so that a density map can be generated as a label (ground route), and the neural network can be pre-trained.

After the pre-training is finished, the artificially labeled label information can be used for training the neural network, so that the neural network can be further optimized, and the learning capability of the neural network on the object characteristic information is improved.

During testing, the trained neural network can be tested by using label information artificially labeled in the test image set, and the test image can be synthesized by continuously using the copying-pasting mode of the embodiment of the disclosure to optimize the neural network.

In the embodiment of the present disclosure, when performing initial feature extraction on an input image, the initial feature extraction may be performed based on a convolutional neural network, fig. 4 shows a flowchart of a feature extraction method according to an embodiment of the present disclosure, and as shown in fig. 4, performing initial feature extraction on an input image to obtain an initial feature map includes:

in step S21, acquiring an input feature, where the input feature includes an input image or a feature obtained by performing at least one convolution operation on the input image;

the input feature may be an original input image, that is, a convolution operation may be directly performed by using a pixel value of the image as an input, or the input feature may also be a feature obtained by performing at least one convolution operation on the input image, where the convolution operation may be a convolution operation with a fixed convolution kernel size, or a convolution operation with a variable convolution kernel size provided in an embodiment of the present disclosure, which is not limited in this disclosure.

In step S22, size information of a target object in the input image is acquired;

the target object here may be the foregoing object to be counted, the size information of the target object, and information of the size occupied by the target object in the input image, for example, the area occupied by the target object in the input image, or the length and width of the area occupied by the target object in the input image.

The size information of the target object may be determined based on labeling information of the target object labeled in the input image, for example, in the case where the position of the target object is labeled by a rectangular frame, the size information of the rectangular frame may be directly used as the size information of the target object, and for example, the area of the target object may be determined based on the length and width of the rectangular frame as the size information of the target object. In addition, the size information of the target object can be input by the user, and the specific acquisition mode of the size information of the target object is not limited by the disclosure.

In step S23, based on the size information and the input feature, determining a position offset of a sample point when a convolution kernel samples in the input feature;

when the convolution kernel samples in the input features, there is at least one sampling point, if the size of the convolution kernel is 1 × 1, the convolution kernel has 1 sampling point, if the size of the convolution kernel is 2 × 2, the convolution kernel has 4 sampling points, if the size of the convolution kernel is 3 × 3, the convolution kernel has 9 sampling points, and for the correspondence between the sizes of more convolution kernels and the sampling points, details are not repeated here. Each sampling point in the convolution kernel can be sampled in the input characteristic according to a predetermined position when sampling, as shown in fig. 5, a schematic diagram of the position distribution of the sampling points of the convolution kernel provided by the present disclosure is provided, an arrow in fig. 5 indicates the position offset direction and distance of the sampling point, a hollow circle represents the position of the sampling point before the position offset, a solid circle represents the position of the sampling point after the position offset, fig. 5 shows the sampling points after two kinds of offsets, it can be seen that the position of the sampling point after the offset is more flexible, the change of the receptive field during sampling can be realized, the range of the receptive field can be obviously seen to be enlarged after the sampling point of the convolution kernel on the right side of fig. 5 is offset, and the method is suitable for determining a large-size target object.

In the embodiment of the present disclosure, the offset of the sampling point position is determined based on the size information and the input feature of the target object, for example, a convolution operation may be performed on the size information and the input feature to obtain the offset of the sampling point position, a parameter of the convolution operation may be obtained by training a sample, the convolution operation may be a part of a feature extraction process of the counting method provided in the present disclosure, and the parameter of the convolution operation may be trained through a training process of a network, where a specific training mode may refer to the relevant description in the foregoing, and details are not repeated here.

In step S24, based on the convolution kernel after the position offset adjustment, a convolution operation is performed on the input feature to obtain an output feature.

The obtained output feature may be used as the initial feature, or the initial feature may be obtained by performing the convolution operation again for at least one time.

Each sample point in the convolution kernel can get a position offset, for example, 9 sample points in fig. 5 can be shifted. For example, the position offset of each sampling point may include two directions of an x axis and a y axis, and based on the offset of the x axis and the offset of the y axis, the initial position of the sampling point is shifted, so as to obtain a convolution kernel after the position offset is adjusted.

And acquiring image characteristics of corresponding positions based on the sampling points after the position offset is adjusted, and then performing convolution operation to obtain output characteristics. Therefore, the obtained output characteristics are obtained by sampling based on the size of the target object, and the target object in the input image can be better characterized. The target object in the input image determined based on the output features will be more accurate.

In a possible implementation manner, in a case that size information of at least two target objects exists, the acquiring size information of the target objects in the input image includes: and taking the average value of the acquired size information of the at least two target objects as the size information of the target objects in the input image.

In one example, the size information where at least two target objects exist may be size information where two or more target objects are marked in the input image in advance, so that in a case where the size information of two or more target objects exists, an average value of the acquired size information of the at least two target objects may be used as the size information of the target objects in the input image.

For example, in a case where a plurality of target objects in the input image are labeled by a plurality of rectangular boxes, the area of the target object may be calculated for each rectangular box, and then the average of the areas may be taken as the size information of the target object.

In the embodiment of the disclosure, because a plurality of target objects may exist in the input image, especially when counting the target objects, the number of the target objects is often large, and the sizes may be different, according to an average value of size information of the plurality of target objects, all the target objects in the input image can be more accurately characterized, the accuracy rate when determining all the target objects in the input image is improved, and the accuracy of counting the target objects in the input image is improved.

In one possible implementation, the determining, based on the size information and the input feature, a position offset of a sampling point when the convolution kernel samples in the input feature includes: fusing the size information and the input features to obtain fused features; and determining the position offset of the sampling point when the convolution kernel samples in the input features based on the fusion features.

The sampling point position offset is determined based on the size information and the input features of the target object, specifically, the size information and the input features are fused, and the sampling point position offset is determined based on the fused features, where the fusion may be, for example, stitching the size information and the input features. The representation form of the size information and the input features in the computer may be a matrix, and here, the splicing of the size information and the input features may be the splicing of two matrices for representing the size information and the input features.

Exemplarily, when the position offset of the sampling point of the convolution kernel is determined based on the fusion feature, a convolution operation may be performed on the fusion feature to obtain the position offset of the sampling point, a parameter of the convolution operation may be obtained by training a sample, the convolution operation may be a part of the feature extraction process of the counting method provided by the present disclosure, and the parameter of the convolution operation may be trained through a training process of a network, where a specific training mode may refer to the related description in the foregoing, and is not described herein again.

In a possible implementation manner, the fusing the size information and the input feature to obtain a fused feature includes: carrying out nonlinear transformation operation on the size information to obtain size characteristics representing the size information; performing convolution operation on the input features to obtain convolution input features; and fusing the size characteristic and the convolution input characteristic to obtain a fused characteristic.

As described above, in the embodiment of the present disclosure, the size information may be information such as an area, a length, a width, and the like, and here, the size information may be converted into a size characteristic representing the size information through a nonlinear transformation operation, where the size characteristic may be in a vector form, so as to facilitate a subsequent convolution operation, or may be a characteristic obtained by transforming a vector by a size, and a process of specifically performing the nonlinear transformation operation may refer to related technologies, which is not described herein again.

The scale of the vector representing the size information is often different from the scale of the input features, so the vector can be scaled here to be the same as the scale of the input features for subsequent fusion with the input features.

For the input features, a convolution operation can be executed once to obtain convolution input features, and then the convolution input features and the size features are spliced to obtain fusion features.

In the embodiment of the present disclosure, a nonlinear transformation operation is performed on the size information to obtain a vector representing the size information; carrying out scale transformation on the vector to obtain a size characteristic with the same scale as the input characteristic; performing convolution operation on the input features to obtain convolution input features; and splicing the size characteristic and the convolution input characteristic to obtain a fusion characteristic. Therefore, the obtained fusion features can more accurately represent the size information and the fusion features of the target object so as to accurately obtain the position offset of the sampling point.

In the application scenario, the size information of the target object at the upper right corner is subjected to a nonlinear transformation operation to obtain a vector, and then the scale of the vector is expanded in a space dimension to obtain a size feature g consistent with the scale of the input feature; the input feature of the lower left corner is subjected to convolution operation to obtain a convolution input feature c; and splicing the input features c and the size features g in the channel dimension to obtain fusion features r, performing convolution operation on the fusion features r to obtain the position offset of the sampling point, and performing convolution operation on the input features based on the convolution kernel adjusted by the position offset to obtain the output features at the lower right corner.

An application scenario of the embodiment of the present disclosure is explained below. Referring to fig. 7, a schematic diagram of an application scenario provided by the present disclosure is shown, in which the initial feature extraction provided by the present disclosure is implemented by a convolutional neural network, and in fig. 7, feature extraction is performed on an input image by using the first 10 convolutional layers and 3 pooling layers of VGG-16, where a last convolution operation of each stage is replaced by a convolutional layer with a variable convolutional core sampling point to construct a skeleton network, so as to extract object feature information. The specific implementation process of the convolutional layer with variable convolutional kernel sampling points can be referred to the relevant description of the steps S21-24 in the present disclosure. Then, the output characteristic of the last convolution kernel variable convolution layer is used as an initial characteristic map, and the number of target objects in the input image is further determined based on the initial characteristic map.

It is understood that the above-mentioned method embodiments of the present disclosure can be combined with each other to form a combined embodiment without departing from the logic of the principle, which is limited by the space, and the detailed description of the present disclosure is omitted. Those skilled in the art will appreciate that in the above methods of the specific embodiments, the specific order of execution of the steps should be determined by their function and possibly their inherent logic.

The method has specific technical relevance with the internal structure of the computer system, and can solve the technical problems of how to improve the hardware operation efficiency or the execution effect (including reducing data storage capacity, reducing data transmission capacity, improving hardware processing speed and the like), thereby obtaining the technical effect of improving the internal performance of the computer system according with the natural law.

In addition, the present disclosure also provides a counting device, an electronic device, a computer-readable storage medium, and a program, which can be used to implement any one of the counting methods provided by the present disclosure, and the corresponding technical solutions and descriptions and corresponding descriptions in the method sections are not repeated.

Fig. 8 shows a block diagram of a counting device according to an embodiment of the present disclosure, as shown in fig. 8, the device 30 comprising:

an input image feature extraction module 31, configured to perform feature extraction on an input image to obtain a first feature map and a second feature map;

an object feature extraction module 32, configured to extract, according to the position information of at least one object to be counted labeled in the input image, an object feature of the object to be counted from the first feature map;

a distribution position information determining module 33, configured to determine, according to the object features, distribution position information of at least two objects to be counted in the first feature map;

a feature distribution map determining module 34, configured to determine a feature distribution map of the object to be counted from the second feature map by using the distribution position information;

a total number determining module 35, configured to determine a total number of the objects to be counted based on the feature distribution map.

In one possible implementation, the total number determining module includes:

the second loss determining module is used for determining second loss based on the density map of the object to be counted and a pre-labeled density map of the object to be counted;

the sample construction module comprises:

the image transformation includes at least one of:

In one possible implementation manner, the initial feature extraction module includes:

the input feature acquisition module is used for acquiring input features, wherein the input features comprise input images or features obtained by performing convolution operation on the input images for at least one time;

the size information acquisition module is used for acquiring the size information of the target object in the input image;

the position offset determining module is used for determining the position offset of a sampling point when the convolution kernel samples in the input feature based on the size information and the input feature;

and the convolution operation module is used for performing convolution operation on the input features based on the convolution kernels after the position offset is adjusted to obtain output features, and the output features are used for determining target objects in the input images.

In a possible implementation manner, the size information obtaining module is configured to use an average value of the obtained size information of the at least two target objects as the size information of the target object in the input image.

In one possible implementation, the position offset determining module includes:

the fusion characteristic determining module is used for fusing the size information and the input characteristic to obtain a fusion characteristic;

and the position offset determining submodule is used for determining the position offset of a sampling point when the convolution kernel samples in the input features based on the fusion features.

In one possible implementation manner, the fusion feature determining module includes:

the nonlinear transformation operation module is used for carrying out nonlinear transformation operation on the size information to obtain size characteristics representing the size information;

the convolution submodule is used for carrying out convolution operation on the input features to obtain convolution input features;

and fusing the size characteristic and the convolution input characteristic to obtain a fused characteristic.

In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and specific implementation thereof may refer to the description of the above method embodiments, and for brevity, will not be described again here.

Embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the above-mentioned method. The computer readable storage medium may be a volatile or non-volatile computer readable storage medium.

An embodiment of the present disclosure further provides an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the memory-stored instructions to perform the above-described method.

The disclosed embodiments also provide a computer program product comprising computer readable code or a non-transitory computer readable storage medium carrying computer readable code, which when run in a processor of an electronic device, the processor in the electronic device performs the above method.

The electronic device may be provided as a terminal, server, or other form of device.

Fig. 9 illustrates a block diagram of an electronic device 800 in accordance with an embodiment of the disclosure. For example, the electronic device 800 may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or other terminal device.

Referring to fig. 9, electronic device 800 may include one or more of the following components: a processing component 802, a memory 804, a power component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensor component 814, and a communication component 816.

The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the electronic device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 800.

The multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the electronic device 800 is in an operation mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the electronic device 800. For example, the sensor assembly 814 may detect an open/closed state of the electronic device 800, the relative positioning of components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in the position of the electronic device 800 or a component of the electronic device 800, the presence or absence of user contact with the electronic device 800, orientation or acceleration/deceleration of the electronic device 800, and a change in the temperature of the electronic device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a Complementary Metal Oxide Semiconductor (CMOS) or Charge Coupled Device (CCD) image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 may access a wireless network based on a communication standard, such as a wireless network (Wi-Fi), a second generation mobile communication technology (2G), a third generation mobile communication technology (3G), a fourth generation mobile communication technology (4G), a long term evolution of universal mobile communication technology (LTE), a fifth generation mobile communication technology (5G), or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium, such as the memory 804, is also provided that includes computer program instructions executable by the processor 820 of the electronic device 800 to perform the above-described methods.

The disclosure relates to the field of augmented reality, and in particular relates to a method for detecting or identifying relevant features, states and attributes of a target object by acquiring image information of the target object in a real environment and by means of various visual correlation algorithms, so as to obtain an AR effect combining virtual and reality matched with specific applications. For example, the target object may relate to a face, a limb, a gesture, an action, etc. associated with a human body, or a marker, a marker associated with an object, or a sand table, a display area, a display item, etc. associated with a venue or a place. The vision-related algorithms may involve visual localization, SLAM, three-dimensional reconstruction, image registration, background segmentation, key point extraction and tracking of objects, pose or depth detection of objects, and the like. The specific application can not only relate to interactive scenes such as navigation, explanation, reconstruction, virtual effect superposition display and the like related to real scenes or articles, but also relate to special effect treatment related to people, such as interactive scenes such as makeup beautification, limb beautification, special effect display, virtual model display and the like. The detection or identification processing of the relevant characteristics, states and attributes of the target object can be realized through the convolutional neural network. The convolutional neural network is a network model obtained by performing model training based on a deep learning framework.

Fig. 10 shows a block diagram of an electronic device 1900 according to an embodiment of the disclosure. For example, the electronic device 1900 may be provided as a server or terminal device. Referring to fig. 10, electronic device 1900 includes a processing component 1922 further including one or more processors and memory resources, represented by memory 1932, for storing instructions, e.g., applications, executable by processing component 1922. The application programs stored in memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the processing component 1922 is configured to execute instructions to perform the above-described method.

The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output (I/O) interface 1958. The electronic device 1900 may operate based on an operating system, such as the Microsoft Server operating system (Windows Server), stored in the memory 1932^TM) Apple Inc. of the present application based on the graphic user interface operating System (Mac OS X)^TM) Multi-user, multi-process computer operating system (Unix)^TM) Free and open native code Unix-like operating System (Linux)^TM) Open native code Unix-like operating System (FreeBSD)^TM) Or the like.

In an exemplary embodiment, a non-transitory computer readable storage medium, such as the memory 1932, is also provided that includes computer program instructions executable by the processing component 1922 of the electronic device 1900 to perform the above-described methods.

The present disclosure may be systems, methods, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the present disclosure.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The computer program product may be embodied in hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.

The foregoing description of the various embodiments is intended to highlight various differences between the embodiments, and the same or similar parts may be referred to each other, and for brevity, will not be described again herein.

It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.

If the technical scheme of the application relates to personal information, a product applying the technical scheme of the application clearly informs personal information processing rules before processing the personal information, and obtains personal independent consent. If the technical scheme of the application relates to sensitive personal information, a product applying the technical scheme of the application obtains individual consent before processing the sensitive personal information, and simultaneously meets the requirement of 'express consent'. For example, at a personal information collection device such as a camera, a clear and significant identifier is set to inform that the personal information collection range is entered, the personal information is collected, and if the person voluntarily enters the collection range, the person is regarded as agreeing to collect the personal information; or on the device for processing the personal information, under the condition of informing the personal information processing rule by using obvious identification/information, obtaining personal authorization by modes of popping window information or asking a person to upload personal information of the person by himself, and the like; the personal information processing rule may include information such as a personal information processor, a personal information processing purpose, a processing method, and a type of personal information to be processed.

Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A counting method, comprising:

2. The method according to claim 1, wherein the extracting the object feature of the object to be counted from the first feature map according to the position information of the at least one object to be counted labeled in the input image comprises:

3. The method according to claim 1, wherein the determining distribution position information of at least two objects to be counted in the first feature map according to the object features comprises:

4. The method according to any one of claims 1 to 3, wherein the determining the feature distribution map of the object to be counted from the second feature map by using the distribution position information comprises:

5. The method of claim 1, wherein determining the total number of objects to be counted based on the feature distribution map comprises:

determining a total number of objects to be counted based on the density map.

6. The method of claim 1, wherein the extracting features of the input image to obtain a first feature map and a second feature map comprises:

7. The method according to any one of claims 1 to 6, wherein the input image is a sample image pre-labeled with the distribution positions of the at least two objects to be counted, the counting method is implemented based on a neural network, and a parameter updating process of the neural network comprises:

determining a first loss based on the distribution position information and the distribution positions of at least two objects to be counted in the sample image which is labeled in advance;

8. The method according to any one of claims 1 to 7, wherein the counting method is implemented based on a neural network, and the method for constructing the training samples and/or the test samples of the neural network comprises the following steps:

and taking the pasting position information as the labeling information in the synthetic image to generate a synthetic sample image.

9. The method of claim 8, wherein obtaining at least one target sub-graph of the target object according to the annotation information of the first sample image comprises:

extracting an image in a target area corresponding to the labeling information as a target subgraph; and/or the presence of a gas in the gas,

10. The method according to claim 8 or 9, before obtaining at least one target sub-graph of the target object according to the annotation information of the first sample image, the method further comprises:

11. The method of claim 10, wherein after determining the number of target subgraphs to be generated, the performing at least one image transformation on the image in the target region corresponding to the annotation information respectively comprises:

the image transformation includes at least one of:

12. A counting device, comprising:

13. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to invoke the memory-stored instructions to perform the method of any of claims 1 to 11.

14. A computer readable storage medium having computer program instructions stored thereon, which when executed by a processor implement the method of any one of claims 1 to 11.