CN113902703A

CN113902703A - Training method of object statistical model, object statistical method and device

Info

Publication number: CN113902703A
Application number: CN202111173473.8A
Authority: CN
Inventors: 王洪昌; 鉴海防; 鲁华祥
Original assignee: Institute of Semiconductors of CAS
Current assignee: Institute of Semiconductors of CAS
Priority date: 2021-10-08
Filing date: 2021-10-08
Publication date: 2022-01-07

Abstract

The embodiment of the disclosure provides a training method of an object statistical model, an object statistical method and an object statistical device. The training method comprises the following steps: acquiring a training sample data set, wherein training samples in the training sample data set comprise training images and label data of the training images, the training images comprise a plurality of objects, and the objects comprise animals and/or plants; inputting the training image into a deep neural network model, and outputting a predicted density map; calculating a loss function according to the predicted density map and the label data to obtain a loss result; and iteratively adjusting network parameters of the deep neural network model according to the loss result to generate a trained object statistical model.

Description

Training method of object statistical model, object statistical method and device

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for training an object statistics model, an electronic device, a computer-readable storage medium, and a computer program product.

Background

In the ecological field, different objects have different sensitivities to the living environment, so that the ecological environment can be determined according to the number of certain objects. For example, birds are used as the indicator species of the ecological environment because of their strong sensitivity to the living environment, and the environmental conditions of the whole ecological protection area can be reflected by the number of birds in the key area of the natural protection area, so as to make a corresponding protection policy in time.

In implementing the disclosed concept, the inventors found that there are at least the following problems in the related art: the way of counting the number of objects manually is tedious, and the labor cost is high.

Disclosure of Invention

In view of the above, the disclosed embodiments provide a training method for an object statistical model, an object statistical method, a training apparatus for an object statistical model, an object statistical apparatus, an electronic device, a computer-readable storage medium, and a computer program product.

One aspect of the disclosed embodiments provides a method for training a statistical model of an object, including

Acquiring the training sample data set, wherein training samples in the training sample data set comprise training images and label data of the training images, the training images comprise a plurality of objects, and the objects comprise animals and/or plants;

inputting the training image into a deep neural network model, and outputting a predicted density map;

calculating a loss function according to the predicted density map and the label data to obtain a loss result; and

iteratively adjusting network parameters of the deep neural network model according to the loss result to generate the trained object statistical model.

According to an embodiment of the present disclosure, the deep neural network model includes a feature extraction layer, a convolution layer, a plurality of void convolution layers, an upsampling layer, and a density generation network;

wherein, the inputting the training image into the deep neural network model and outputting a prediction density map includes:

inputting the training sample data set into the feature extraction layer, and outputting a first feature graph;

inputting the first characteristic diagram into a plurality of the void convolution layers and outputting a second characteristic diagram;

inputting the second characteristic diagram into a convolution layer and outputting a third characteristic diagram;

inputting the third characteristic diagram into the upper sampling layer and outputting a fourth characteristic diagram; and

inputting the fourth feature map into the density generation network, and outputting the predicted density map.

Another aspect of the embodiments of the present disclosure provides an object statistics method, including:

acquiring a target image, wherein the target image comprises a plurality of objects, and the objects comprise animals and/or plants; and

inputting the target image into an object statistical model to obtain a recognition result, wherein the recognition result comprises a statistical density map;

wherein the object statistical model method is trained based on the method.

According to an embodiment of the present disclosure, the number of the target images includes a plurality;

the method further comprises the following steps:

combining a plurality of density maps into a density general map by using an object statistical model; and

and generating statistical data according to the density general graph.

According to an embodiment of the present disclosure, the object statistical method further includes:

acquiring a plurality of shot images, wherein the shot images comprise a plurality of objects;

preprocessing a plurality of shot images to obtain an image to be identified, wherein the image to be identified is a super-resolution image; and

and cutting the image to be recognized to obtain a plurality of target images.

According to an embodiment of the present disclosure, the number of the target images is N, and a calculation formula of the number N of the target images is as follows:

wherein x represents the real length pixel value of the image to be identified, y represents the real height pixel value of the image to be identified, and x₀Characterizing a preset length pixel value, y₀The method comprises the steps of representing a preset height pixel value, a representing an integer obtained by dividing a real length pixel value of an image to be recognized by the preset length pixel value and rounding down, b representing an integer obtained by dividing the real length pixel value of the image to be recognized by the preset length pixel value and rounding up, m representing a remainder obtained by dividing the real length pixel value of the image to be recognized by the preset length pixel value, and lambda is a preset constant and represents a proportion for adjusting the real height pixel value.

According to an embodiment of the present disclosure, the performing a cropping process on the image to be recognized to obtain a plurality of target images includes:

under the condition that the pixel size of the image to be recognized does not meet the preset condition, adjusting the pixel size of the image to be recognized to obtain an adjusted image to be recognized; and

and cutting the adjusted image to be recognized to obtain a plurality of target images.

According to an embodiment of the present disclosure, the target image includes an image number;

the object statistical method further comprises:

and splicing a plurality of target images according to the image number of each target image to generate an output image.

According to an embodiment of the present disclosure, the preprocessing the plurality of captured images to obtain an image to be recognized includes:

screening a plurality of shot images by using a statistical method to obtain screened shot images; and

and splicing a plurality of screened shot images to obtain the image to be identified.

According to an embodiment of the present disclosure, the animal includes birds.

Another aspect of the embodiments of the present disclosure provides a training apparatus for a statistical model of an object, including:

a first obtaining module, configured to obtain the training sample data set, where a training sample in the training sample data set includes a training image and tag data of the training image, where the training image includes a plurality of objects, and the objects include animals and/or plants;

the first output module is used for inputting the training image into a deep neural network model and outputting a prediction density map;

the second output module is used for calculating a loss function according to the predicted density map and the label data to obtain a loss result; and

and the generating module is used for iteratively adjusting the network parameters of the deep neural network model according to the loss result to generate the trained object statistical model.

Another aspect of the embodiments of the present disclosure provides an object statistics apparatus, including:

a second obtaining module, configured to obtain a target image, where the target image includes a plurality of objects, and the objects include animals and/or plants; and

the obtaining module is used for inputting the target image into an object statistical model to obtain a recognition result, wherein the recognition result comprises a statistical density map;

wherein the object statistical model method is trained based on the method.

Another aspect of an embodiment of the present disclosure provides an electronic device including: one or more processors; memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method as described above.

Another aspect of embodiments of the present disclosure provides a computer-readable storage medium storing computer-executable instructions for implementing the method as described above when executed.

Another aspect of an embodiment of the present disclosure provides a computer program product comprising computer executable instructions for implementing the method as described above when executed.

According to the embodiment of the disclosure, the object statistical model is obtained by training the deep neural network model, so that the density of animals and/or plants can be predicted by using the object statistical model, the technical problems of complex mode and high labor cost of manually counting the number of the objects are at least partially solved, and the technical effects of improving the convenience of number counting of the objects and reducing the counting cost are further achieved.

Drawings

The above and other objects, features and advantages of the present disclosure will become more apparent from the following description of embodiments of the present disclosure with reference to the accompanying drawings, in which:

FIG. 1 schematically illustrates an exemplary system architecture of a training method or object statistics method applying an object statistics model according to an embodiment of the present disclosure;

FIG. 2 schematically shows a flow chart of a method of training a statistical model of a subject according to an embodiment of the present disclosure;

FIG. 3 schematically illustrates a flow chart of a method of outputting a predicted density map by a deep neural network model, in accordance with an embodiment of the present disclosure;

FIG. 4 schematically illustrates a flow chart of an object statistics method according to an embodiment of the present disclosure;

FIG. 5 schematically illustrates a method diagram of an object statistics method according to an embodiment of the present disclosure;

FIG. 6 schematically shows a block diagram of a training apparatus for a statistical model of a subject according to an embodiment of the present disclosure;

FIG. 7 schematically shows a block diagram of an object statistics apparatus according to an embodiment of the present disclosure; and

fig. 8 schematically shows a block diagram of an electronic device implementing a training method of an object statistical model or an object statistical method according to an embodiment of the present disclosure.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.

Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).

In the related art, a camera is used for acquiring images of a key area of a protection area, and then counting objects in the images in a manual mode, for example, counting the number of birds in the images in a manual mode. However, the manual counting process is complicated, and consumes a lot of manpower and financial resources.

In view of this, the inventor finds that a deep neural network model can be trained by using a training sample data set including objects such as birds, a trained object statistical model can be obtained, and the number statistics of the objects such as the birds can be realized by using the object statistical model, so that the waste of manpower and financial resources caused by a manual statistical mode can be avoided.

Embodiments of the present disclosure provide a training method of an object statistical model, an object statistical method, a training apparatus of an object statistical model, an object statistical apparatus, an electronic device, a computer-readable storage medium, and a computer program product. The method comprises the steps of obtaining a training sample data set, wherein training samples in the training sample data set comprise training images and label data of the training images, the training images comprise a plurality of objects, and the objects comprise animals and/or plants; inputting the training image into a deep neural network model, and outputting a predicted density map; calculating a loss function according to the predicted density map and the label data to obtain a loss result; and iteratively adjusting network parameters of the deep neural network model according to the loss result to generate a trained object statistical model.

Fig. 1 schematically illustrates an exemplary system architecture 100 to which a training method or object statistics method of an object statistics model may be applied, according to an embodiment of the present disclosure. It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.

As shown in fig. 1, the system architecture 100 according to this embodiment may include

terminal devices

101, 102, 103, a network 104 and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired and/or wireless communication links, and so forth.

A user may use the

terminal devices

101, 102, 103 to interact with the server 105 over the network 104 to receive or transmit data or the like, the data comprising a set of training sample data or a target image. Various client applications may be installed on the

terminal devices

101, 102, 103, such as an object statistics application, a web browser application, a search-class application, an instant messaging tool, a mailbox client, and/or social platform software, etc. (by way of example only).

The

terminal devices

101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 105 may be a server providing various services, such as a background management server (for example only) providing support for data transmitted by the user with the

terminal devices

101, 102, 103. The background management server may analyze and the like the received data, and feed back a processing result (for example, an object statistical model or a statistical density map generated according to the data of the user) to the terminal device.

It should be noted that the training method of the object statistical model or the object statistical method provided by the embodiment of the present disclosure may be generally performed by the server 105. Accordingly, the training device or the object statistical device of the object statistical model provided by the embodiment of the present disclosure may be generally disposed in the server 105. The training method or the object statistical method of the object statistical model provided by the embodiment of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the

terminal devices

101, 102, 103 and/or the server 105. Accordingly, the training device or the object statistics device of the object statistics model provided in the embodiment of the present disclosure may also be disposed in a server or a server cluster that is different from the server 105 and can communicate with the

terminal devices

101, 102, 103 and/or the server 105. Alternatively, the training method of the object statistical model or the object statistical method provided by the embodiment of the present disclosure may also be executed by the

terminal device

101, 102, or 103, or may also be executed by another terminal device different from the

terminal device

101, 102, or 103. Accordingly, the training apparatus of the object statistical model or the object statistical apparatus provided in the embodiments of the present disclosure may also be disposed in the

terminal device

101, 102, or 103, or disposed in another terminal device different from the

terminal device

101, 102, or 103.

For example, the training sample data set or the target image may be originally stored in any one of the

terminal devices

101, 102, or 103 (e.g., the terminal device 101, but not limited thereto), or stored on an external storage device and may be imported into the terminal device 101. Then, the terminal device 101 may locally perform the training method or the object statistical method of the object statistical model provided in the embodiment of the present disclosure, or transmit the training sample data set or the target image to another terminal device, a server, or a server cluster, and perform the training method or the object statistical method of the object statistical model provided in the embodiment of the present disclosure by another terminal device, a server, or a server cluster that receives the training sample data set or the target image.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Fig. 2 schematically shows a flow chart of a method of training a statistical model of a subject according to an embodiment of the present disclosure.

As shown in fig. 2, the training method of the subject statistical model may include operations S210 to S240.

In operation S210, a training sample data set is obtained, where training samples in the training sample data set include training images and label data of the training images. The training image may include a plurality of objects, which may include animals and/or plants.

In operation S220, the training image is input to the deep neural network model, and a predicted density map is output.

In operation S230, a loss function is calculated according to the predicted density map and the tag data, resulting in a loss result.

In operation S240, network parameters of the deep neural network model are iteratively adjusted according to the loss result, generating a trained object statistical model.

According to embodiments of the present disclosure, the training images may include images captured by a camera. Animals may include, but are not limited to, fish, reptiles, birds, amphibians, or mammals. Plants may include, but are not limited to, spermatophytes, algae, bryophytes, ferns, and the like.

According to an embodiment of the present disclosure, the object may also include other objects, such as an object of a car, train, or ship, or an item in a warehouse, etc.

According to the embodiment of the disclosure, a labeling tool can be adopted to label the training image, and obtain the position coordinates of the object in the target image, and generate the label data according to the position coordinates. The file format of the tag data may include a mat file.

To facilitate the description of the invention, the following examples are illustrated in birds.

According to the embodiment of the disclosure, the feature extraction part of the Deep neural network model may adopt a Deep residual network (ResNet), for example, because the background of the sample data set is complex and contains information of various objects, a ResNet34 network with a good effect in the field of image recognition may be adopted as the feature extraction part of the Deep neural network model. The loss function may employ euclidean distances.

The deep neural network of the present embodiment is not limited to the deep residual network such as the ResNet34 network, and may be another type of neural network. The disclosed embodiments are not so limited.

According to the embodiment of the disclosure, the acquired training image is input into the deep neural network model, the prediction density map is output, the loss function is calculated according to the prediction density map and the label data, and the network parameters of the deep neural network model are iteratively adjusted according to the loss result. In case of convergence of the loss result, a trained statistical model of the object can be obtained.

According to an embodiment of the present disclosure, a deep neural network model may include a feature extraction layer, a convolutional layer, a plurality of hole convolutional layers, an upsampling layer, and a density generation network.

According to embodiments of the present disclosure, a density map may be used to represent a plot of bird population and distribution. The density map generation network may refer to an algorithmic structure built using a convolutional neural network for generating a density map.

According to an embodiment of the disclosure, the feature extraction layer of the ResNet34 network serves as a feature extraction part of the deep neural network model, and the full connection layer of the ResNet34 network can be removed in order to be connected with the density generation network.

According to the embodiment of the disclosure, the ResNet34 network is pre-trained based on a million-level data set (such as an imagenet data set) before training, so that the weight of the ResNet34 network contains abundant target feature information, and thus the ResNet34 network is used as a feature extraction part, so that the convergence speed of deep neural network model training can be effectively increased.

FIG. 3 schematically illustrates a flow chart of a method of outputting a predicted density map by a deep neural network model according to an embodiment of the present disclosure.

As shown in fig. 3, inputting the training image into the deep neural network model and outputting the predicted density map may include operations S310 to S350.

In operation S310, a training sample data set is input into the feature extraction layer, and a first feature map is output.

In operation S320, the first feature map is input into the plurality of hole convolution layers, and the second feature map is output.

In operation S330, the second feature map is input into the convolutional layer, and a third feature map is output.

In operation S340, the third feature map is input to the upsampling layer, and the fourth feature map is output.

In operation S350, the fourth feature map is input to the density generation network, and the predicted density map is output.

According to an embodiment of the present disclosure, the size of the first feature map output by the feature extraction layer of the ResNet34 network may be 128 × 96, with the unit of size being a pixel. The units of dimensions in the following examples are pixels and are not described separately.

According to the embodiment of the disclosure, as the individuals of birds in the training images are small and the individual shielding condition is common, more abundant and complete local feature information is needed to assist in generating the feature map. In this embodiment, the first feature map is input into the plurality of hole convolution layers to perform hole calculation, so that the field of experience of convolution operation can be increased, and the feature map output by each hole convolution layer contains information in a larger range on the premise of not performing feature extraction of the convolution network. In addition, in order to preserve the characteristic information of birds in the training image as much as possible, the deep neural network model in the present embodiment does not use a pooling layer.

According to an embodiment of the present disclosure, in order to reduce the influence of the label data error on the training of the object statistical model, the third feature map may be input into the density generation network for an upsampling operation before the density generation network, and the size of the fourth feature map output by the upsampling layer may be 256 × 192.

Fig. 4 schematically shows a flow chart of an object statistics method according to an embodiment of the present disclosure.

Fig. 5 schematically shows a method diagram of an object statistics method according to an embodiment of the present disclosure.

As shown in fig. 4 and 5, the object statistical method may include operations S410 to S420.

In operation S410, a target image is acquired, wherein the target image includes a plurality of objects including animals and/or plants.

In operation S420, the target image is input into the object statistical model to obtain a recognition result, wherein the recognition result includes a statistical density map. The subject statistical model method is trained based on the method described above.

According to embodiments of the present disclosure, the animal may comprise a bird.

According to the embodiment of the disclosure, by identifying the object in the target image by using the trained object statistical model, a statistical density map about the object can be obtained, so that the aggregation degree of the object in the target image can be judged.

According to an embodiment of the present disclosure, the number of target images includes a plurality.

As shown in fig. 5, the object statistics method may further include the following operations.

And combining the plurality of density maps into a density total map by using the object statistical model. And generating statistical data according to the density general graph.

According to the embodiment of the disclosure, in order to more conveniently know the total number of birds in a plurality of target images, the density maps generated by the density map stitching module according to each target image can be merged to obtain one density map, and the total number of birds in the plurality of target images can be obtained by integrating the density map.

According to the embodiment of the disclosure, the complexity of manually counting the number of objects can be reduced by integrating the density map to calculate the number of birds, and the method is convenient and quick and has low cost.

A plurality of captured images is acquired, wherein the captured images include a plurality of objects. And preprocessing the plurality of shot images to obtain an image to be identified. The image to be identified is a super-resolution image. And cutting the image to be recognized to obtain a plurality of target images.

According to an embodiment of the present disclosure, the photographed image may include a plurality of frames of images in a video photographed by the camera. The photographed image may be a three-channel RGB image.

According to the embodiment of the disclosure, due to the limited visual angle of the camera, in order to ensure that the image frames in the collected video are clear enough, the camera needs to be rotated at a certain speed to acquire the video containing the global scene range where bird counting is needed. Because the multiple frames of shot images may have cross-overlapped invalid images, the multiple frames of shot images can be preprocessed by screening and the like, so that shot images which can represent the complete spliced global scene to be counted can be obtained.

According to the embodiment of the disclosure, the multiple screened shot images are spliced to obtain a super-resolution image in a global large-view-angle scene, wherein the super-resolution image may include a size greater than or equal to 30720 × 1080. And cutting the super-resolution image to obtain a plurality of target images with the size of 1024 × 768.

It should be noted that, the pixel values of 30720 × 1080 or 1024 × 768, etc. mentioned in the embodiments of the present disclosure and in the following text are not intended to limit the scope of the present disclosure, but are merely examples for illustrating the technical solutions of the present application, and the specific values thereof may be specifically set according to the requirements.

As shown in fig. 5, performing a cropping process on the image to be recognized to obtain a plurality of target images may include the following operations.

And under the condition that the pixel size of the image to be recognized does not meet the preset condition, adjusting the pixel size of the image to be recognized to obtain the adjusted image to be recognized. And cutting the adjusted image to be recognized to obtain a plurality of target images.

According to an embodiment of the present disclosure, the preset conditions may include that the length pixel is N × 1024 and the width pixel is 768, where N is a positive integer greater than or equal to 1.

According to the embodiment of the disclosure, in the case that the pixel size of the image to be recognized does not satisfy the preset condition, the pixel of the image to be recognized needs to be adjusted, wherein the adjustment can be performed by using a pixel operation algorithm, which includes but is not limited to Resize algorithm.

According to the embodiment of the disclosure, by cutting the adjusted image to be recognized, N1024 × 768 target images can be obtained.

According to the embodiment of the present disclosure, the number of target images is N, and the calculation formula of the number N of target images is as follows:

According to an embodiment of the present disclosure, the preset length pixel value may be 1024 and the preset height pixel value may be 768.

According to the embodiment of the present disclosure, before the adjustment is performed, the number N of target images needs to be determined, and the adjustment ratio is determined according to the number N of target images. For example, it may be determined by a formula that the image to be recognized may be cropped into 4.5 target images, and the number of final target images is determined to be 5 according to a rounding principle, and then the size of the image to be recognized may be adjusted to (5 × 1024) × 768.

According to an embodiment of the present disclosure, the target image includes an image number.

The above object statistical method may further include the following operations.

And splicing the plurality of target images according to the image number of each target image to generate an output image.

According to the embodiment of the disclosure, the image number of the target image may be numbered in the cropping process of the image to be recognized, for example, the number of the target image may be numbered in the following naming manner: jpg "original image name _ xx.

According to an embodiment of the present disclosure, while outputting the density map, the output image in which the region corresponding to each target image outputs the density map and the count data including the target image may be output. The density general graph and the statistical data of the image to be identified can also be output in other areas of the output image. Wherein the count data may be generated by integrating from the density map of each target image.

According to the embodiment of the disclosure, preprocessing a plurality of shot images to obtain an image to be recognized may include the following operations.

And screening the plurality of shot images by using a statistical method to obtain the screened shot images.

And splicing the plurality of screened shot images to obtain an image to be identified.

According to the embodiment of the disclosure, under the condition that the angle and the rotating speed of the camera around the rotating are determined, the number of frames of the to-be-identified images of the global scene to be counted in the video shot by the camera can be determined through a statistical method, a video key frame extraction method is designed to obtain effective frames in the video, and the effective frames are spliced into the to-be-identified images with super-resolution by using an image splicing module.

For example, by determining which object in the shot images needs to be determined is the object needing to be counted, determining the feature of the object after the object needing to be counted is determined, and performing statistics in the plurality of shot images according to the feature, however, due to the fact that the plurality of shot images have the crossed regions, determining which shot image is the invalid frame according to the statistical result, and thus determining the valid frame of the image to be recognized which can represent the global scene to be counted.

Fig. 6 schematically shows a block diagram of a training apparatus of a subject statistical model according to an embodiment of the present disclosure.

As shown in fig. 6, the training apparatus 600 for a subject statistical model may include a first obtaining module 610, a first outputting module 620, a second outputting module 630 and a generating module 640.

The first obtaining module 610 is configured to obtain a training sample data set, where a training sample in the training sample data set includes a training image and tag data of the training image, where the training image includes a plurality of objects, and the objects include animals and/or plants.

The first output module 620 is configured to input the training image into the deep neural network model, and output a predicted density map.

The second output module 630 is configured to calculate a loss function according to the predicted density map and the label data, so as to obtain a loss result.

The generating module 640 is configured to iteratively adjust network parameters of the deep neural network model according to the loss result to generate a trained object statistical model.

According to an embodiment of the present disclosure, the first output module 620 may include a first output unit, a second output unit, a third output unit, a fourth output unit, and a fifth output unit.

The first output unit is used for inputting the training sample data set into the feature extraction layer and outputting a first feature map.

The second output unit is used for inputting the first characteristic diagram into the plurality of void convolution layers and outputting a second characteristic diagram.

The third output unit is used for inputting the second characteristic diagram into the convolution layer and outputting a third characteristic diagram.

And the fourth output unit is used for inputting the third characteristic diagram into the up-sampling layer and outputting a fourth characteristic diagram.

And the fifth output unit is used for inputting the fourth feature map into the density generation network and outputting the predicted density map.

Fig. 7 schematically shows a block diagram of an object statistics apparatus according to an embodiment of the present disclosure.

As shown in fig. 7, the object statistics apparatus 700 may include a second obtaining module 710 and a obtaining module 720.

The second acquiring module 710 is configured to acquire a target image, wherein the target image includes a plurality of objects, and the objects include animals and/or plants.

The obtaining module 720 is configured to input the target image into the object statistical model to obtain a recognition result, where the recognition result includes a statistical density map.

Wherein the subject statistical model method is trained based on the method as described above.

According to an embodiment of the present disclosure, the number of target images may include a plurality.

According to an embodiment of the present disclosure, the object statistics apparatus 700 may further include a merging module and a generating module.

The merging module is used for merging the density maps into a density general map by using the object statistical model.

The generation module is used for generating statistical data according to the density general graph.

According to an embodiment of the present disclosure, the object statistics apparatus 700 may further include a third obtaining module, a preprocessing module, and a clipping module.

The third acquisition module is used for acquiring a plurality of shot images, wherein the shot images comprise a plurality of objects.

The preprocessing module is used for preprocessing the plurality of shot images to obtain an image to be recognized, wherein the image to be recognized is a super-resolution image.

And the cutting module is used for cutting the image to be recognized to obtain a plurality of target images.

According to an embodiment of the present disclosure, the number of target images may be N, and a calculation formula of the number N of target images is as follows:

According to an embodiment of the present disclosure, a cropping module may include an adjustment unit and a cropping unit.

The adjusting unit is used for adjusting the pixel size of the image to be recognized under the condition that the pixel size of the image to be recognized does not meet the preset condition, and the adjusted image to be recognized is obtained.

And the cutting unit is used for cutting the adjusted image to be recognized to obtain a plurality of target images.

According to an embodiment of the present disclosure, the target image may include an image number.

According to an embodiment of the present disclosure, the object statistics apparatus 700 may further include a concatenation module.

The splicing module is used for splicing the target images according to the image number of each target image to generate an output image.

According to an embodiment of the present disclosure, the preprocessing module may include a screening unit and a splicing unit.

And the screening unit is used for screening the plurality of shot images by using a statistical method to obtain the screened shot images.

And the splicing unit is used for splicing the plurality of screened shot images to obtain an image to be identified.

Any of the modules, units, or at least part of the functionality of any of them according to embodiments of the present disclosure may be implemented in one module. Any one or more of the modules and units according to the embodiments of the present disclosure may be implemented by being split into a plurality of modules. Any one or more of the modules, units according to the embodiments of the present disclosure may be implemented at least partially as a hardware Circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented by hardware or firmware in any other reasonable manner of integrating or packaging a Circuit, or implemented by any one of three implementations of software, hardware, and firmware, or any suitable combination of any of them. Alternatively, one or more of the modules, units according to embodiments of the present disclosure may be implemented at least partly as computer program modules, which, when executed, may perform the respective functions.

For example, any number of the first obtaining module 610, the first outputting module 620, the second outputting module 630 and the generating module 640, or the second obtaining module 710 and the obtaining module 720 may be combined and implemented in one module/unit, or any one of the modules/units may be split into a plurality of modules/units. Alternatively, at least part of the functionality of one or more of these modules/units may be combined with at least part of the functionality of other modules/units/sub-units and implemented in one module/unit. According to an embodiment of the present disclosure, at least one of the first obtaining module 610, the first outputting module 620, the second outputting module 630 and the generating module 640, or the second obtaining module 710 and the obtaining module 720 may be at least partially implemented as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented by hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or implemented by any one of three manners of software, hardware and firmware, or implemented by a suitable combination of any several of them. Alternatively, at least one of the first obtaining module 610, the first outputting module 620, the second outputting module 630 and the generating module 640, or the second obtaining module 710 and the obtaining module 720 may be at least partially implemented as a computer program module, which when executed, may perform a corresponding function.

It should be noted that, in the embodiment of the present disclosure, the training device portion of the object statistical model corresponds to the training method portion of the object statistical model in the embodiment of the present disclosure, the training method portion of the object statistical model is described with specific reference to the object statistical model, and the object statistical device portion in the embodiment of the present disclosure corresponds to the object statistical method portion in the embodiment of the present disclosure, and the description of the object statistical device portion is described with specific reference to the object statistical method portion, which is not described herein again.

Fig. 8 schematically shows a block diagram of an electronic device adapted to implement the above described method according to an embodiment of the present disclosure. The electronic device shown in fig. 8 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 8, an electronic device 800 according to an embodiment of the present disclosure includes a processor 801 that can perform various appropriate actions and processes according to a program stored in a Read-Only Memory (ROM) 802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. The processor 801 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 801 may also include onboard memory for caching purposes. The processor 801 may include a single processing unit or multiple processing units for performing different actions of the method flows according to embodiments of the present disclosure.

In the RAM 803, various programs and data necessary for the operation of the electronic apparatus 800 are stored. The processor 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. The processor 801 performs various operations of the method flows according to the embodiments of the present disclosure by executing programs in the ROM 802 and/or RAM 803. Note that the programs may also be stored in one or more memories other than the ROM 802 and RAM 803. The processor 801 may also perform various operations of method flows according to embodiments of the present disclosure by executing programs stored in the one or more memories.

Electronic device 800 may also include input/output (I/O) interface 805, input/output (I/O) interface 805 also connected to bus 804, according to an embodiment of the present disclosure. The system 800 may also include one or more of the following components connected to the I/O interface 805: an input portion 806 including a keyboard, a mouse, and the like; an output portion 807 including a Display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage portion 808 including a hard disk and the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. A drive 810 is also connected to the I/O interface 805 as necessary. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as necessary, so that a computer program read out therefrom is mounted on the storage section 808 as necessary.

According to embodiments of the present disclosure, method flows according to embodiments of the present disclosure may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable storage medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 809 and/or installed from the removable medium 811. The computer program, when executed by the processor 801, performs the above-described functions defined in the system of the embodiments of the present disclosure. The systems, devices, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.

The use of the object statistical model may employ a parallel inference approach. For example, the weight of one model is 980MB, the processor 801 space occupied by the object statistical process is about 1200MB, the number of parallel statistical images of a certain processor 801 is 9, and in order to ensure the margin of the processor 801 space, the number of parallel statistical threads is designed to be 8, so that the density map and the counting data corresponding to the target image are obtained.

The present disclosure also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the disclosure.

According to an embodiment of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium. Examples may include, but are not limited to: a portable Computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM) or flash Memory), a portable compact Disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the preceding. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

For example, according to embodiments of the present disclosure, a computer-readable storage medium may include the ROM 802 and/or RAM 803 described above and/or one or more memories other than the ROM 802 and RAM 803.

Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the method provided by the embodiments of the present disclosure, when the computer program product is run on an electronic device, the program code being adapted to cause the electronic device to implement the method for training a statistical model of an object or the method for statistical analysis of an object as provided by the embodiments of the present disclosure.

The computer program, when executed by the processor 801, performs the above-described functions defined in the system/apparatus of the embodiments of the present disclosure. The systems, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.

In one embodiment, the computer program may be hosted on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted in the form of a signal on a network medium, distributed, downloaded and installed via communication section 809, and/or installed from removable media 811. The computer program containing program code may be transmitted using any suitable network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

In accordance with embodiments of the present disclosure, program code for executing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, these computer programs may be implemented using high level procedural and/or object oriented programming languages, and/or assembly/machine languages. The programming language includes, but is not limited to, programming languages such as Java, C + +, python, the "C" language, or the like. The program code may execute entirely on the user computing device, partly on the user device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not expressly recited in the present disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present disclosure may be made without departing from the spirit or teaching of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.

The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims

1. A method for training a statistical model of an object comprises

Acquiring the training sample data set, wherein training samples in the training sample data set comprise training images and label data of the training images, wherein the training images comprise a plurality of objects, and the objects comprise animals and/or plants;

inputting the training image into a deep neural network model, and outputting a prediction density map;

2. The method of claim 1, wherein the deep neural network model comprises a feature extraction layer, a convolutional layer, a plurality of hole convolutional layers, an upsampling layer, and a density generation network;

wherein the inputting the training image into the deep neural network model and outputting a predicted density map comprises:

inputting the training sample data set into the feature extraction layer, and outputting a first feature map;

inputting the second characteristic diagram into a convolutional layer and outputting a third characteristic diagram;

inputting the third feature map into the up-sampling layer, and outputting a fourth feature map; and

3. An object statistics method, comprising:

wherein the subject statistical model method is trained based on the method of any one of claims 1 to 2.

4. The method of claim 3, wherein the number of target images comprises a plurality;

the method further comprises the following steps:

merging a plurality of density maps into a density total map by using an object statistical model; and

and generating statistical data according to the density general graph.

5. The method of claim 3, further comprising:

acquiring a plurality of shot images, wherein the shot images include a plurality of the objects;

preprocessing the shot images to obtain an image to be identified, wherein the image to be identified is a super-resolution image; and

and cutting the image to be recognized to obtain a plurality of target images.

6. The method according to claim 5, wherein the number of the target images is N, and the calculation formula of the number of the target images N is as follows:

7. The method according to claim 5, wherein the cropping the image to be recognized to obtain a plurality of target images comprises:

8. The method of claim 5, wherein the target image comprises an image number;

the method further comprises the following steps:

and splicing the target images according to the image number of each target image to generate an output image.

9. The method of claim 5, wherein the pre-processing the plurality of captured images to obtain an image to be recognized comprises:

screening the plurality of shot images by using a statistical method to obtain screened shot images; and

and splicing the screened shot images to obtain the image to be identified.

10. A method according to any one of claims 3 to 9 wherein the animal comprises a bird.

11. An apparatus for training a statistical model of a subject, comprising:

12. An object statistics apparatus comprising:

13. An electronic device, comprising:

one or more processors;

a memory for storing one or more programs,

wherein the plurality of programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-2 or 3-10.

14. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to carry out the method of any one of claims 1 to 2 or 3 to 10.

15. A computer program product comprising a computer program which, when executed by a processor, is adapted to carry out the method of any one of claims 1 to 2 or 3 to 10.