CN116129211A

CN116129211A - Target identification method, device, equipment and storage medium

Info

Publication number: CN116129211A
Application number: CN202211031300.7A
Authority: CN
Inventors: 杨杰之; 李艾仑; 吴海英; 曾定衡; 周迅溢
Original assignee: Mashang Xiaofei Finance Co Ltd
Current assignee: Mashang Xiaofei Finance Co Ltd
Priority date: 2022-08-26
Filing date: 2022-08-26
Publication date: 2023-05-16

Abstract

The embodiment of the application provides a target identification method, a device, equipment and a storage medium, wherein the method comprises the following steps: acquiring a target image including a plurality of target detection objects; inputting the target image into a pre-trained first counting model for quantity recognition, and outputting first counting recognition results of a plurality of target detection objects in the target image; if the first counting recognition result represents that the number of the plurality of target detection objects is larger than a preset number threshold, inputting the target image into a pre-trained second counting model for number recognition, and outputting a second counting recognition result of the plurality of target detection objects in the target image; and determining the number of target detection objects in the target image according to the second counting identification result. Under the condition that the number of the target detection objects is larger than a preset number threshold, the second counting model is used for carrying out more accurate number identification on the target detection objects again, so that the accuracy rate of target identification is improved.

Description

Target identification method, device, equipment and storage medium

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for identifying a target.

Background

At present, when the number of target detection objects such as chickens, ducks, pigs and the like which are bred is counted in a farm, the statistics can be performed through a neural network model. However, the target detection objects in the farm are easy to gather and flow, so that statistics points face statistics scenes with dense target detection objects and statistics scenes with sparse target detection objects, and the same neural network model is difficult to maintain a good identification effect under the statistics scenes with dense numbers and the statistics scenes with sparse numbers at the same time, so that accurate numbers are identified, and therefore, how to accurately identify the number of the target detection objects in the farm through the neural network model is a technical problem to be solved.

Disclosure of Invention

The application provides a target identification method, device, equipment and storage medium, so as to improve the accuracy and precision of quantity identification of target detection objects in a target image.

In a first aspect, an embodiment of the present application provides a target identification method, including:

acquiring a target image including a plurality of target detection objects;

inputting the target image into a pre-trained first counting model for quantity recognition, and outputting first counting recognition results of the plurality of target detection objects in the target image;

If the first counting identification result represents that the number of the plurality of target detection objects is larger than a preset number threshold, inputting the target image into a pre-trained second counting model for number identification, and outputting a second counting identification result of the plurality of target detection objects in the target image;

and determining the number of target detection objects in the target image according to the second counting and identifying result.

In a second aspect, an embodiment of the present application provides an object recognition apparatus, including:

an acquisition module for acquiring a target image including a plurality of target detection objects;

the first output module is used for inputting the target image into a pre-trained first counting model for quantity recognition and outputting first counting recognition results of the plurality of target detection objects in the target image;

the second output module is used for inputting the target image into a pre-trained second counting model for quantity recognition if the quantity of the plurality of target detection objects represented by the first counting recognition result is larger than a preset quantity threshold value, and outputting a second counting recognition result of the plurality of target detection objects in the target image;

And the determining module is used for determining the number of target detection objects in the target image according to the second counting and identifying result.

In a third aspect, an embodiment of the present application provides an electronic device, including: a processor; and a memory arranged to store computer executable instructions configured to be executed by the processor, the executable instructions comprising steps for performing the above-described object recognition method.

In a fourth aspect, embodiments of the present application provide a storage medium. The storage medium is for storing computer-executable instructions that cause a computer to perform the above-described object recognition method.

In this embodiment of the present application, after obtaining a target image including a plurality of target detection objects, the target image may be input into a pre-trained first count model to perform quantity recognition, and a first count recognition result of the plurality of target detection objects in the target image is output, and if the first count recognition result characterizes that the number of the plurality of target detection objects is greater than a preset quantity threshold, the target image may be input into a pre-trained second count model to perform quantity recognition, and a second count recognition result of the plurality of target detection objects in the target image is output, and the number of the target detection objects in the target image is determined according to the second count recognition result. Under the condition that the number of the target detection objects is larger than a preset number threshold, the number of the target detection objects is further identified through a second counting model, the number of the target detection objects in the target image is determined according to a second counting identification result output by the second counting model, the first counting model is used for marking the area of each target detection object, and is suitable for small-scale number identification, the area of each target detection object can be marked when the number is smaller, so that a first counting identification result can be obtained according to the number of the area marks, the second counting model is used for marking the position point of each target detection object, and is suitable for large-scale number identification, because the area of each target detection object cannot be marked when the number is larger, and especially when the occupied area is smaller due to the fact that a certain target detection object is blocked, the area frame cannot be identified, the position point of the target detection object is more suitable for marking, and therefore, the second counting identification result can be obtained according to the number of the position point marks. Therefore, when the number of the target detection objects is identified, the number of the target detection objects in the target image can be preferentially identified through the first counting model, and whether the number of the target detection objects is needed to be identified again through the second counting model is judged according to the identification result, so that the accuracy and the precision of the number identification of the target detection objects are improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present application, and that other drawings may be obtained according to these drawings without inventive effort to a person skilled in the art.

Fig. 1 is a schematic flow chart of a target recognition method according to an embodiment of the present application;

FIG. 2 is a schematic structural diagram of a first counting model according to an embodiment of the present disclosure;

FIG. 3 is a schematic structural diagram of a second counting model according to an embodiment of the present disclosure;

fig. 4 is a schematic block diagram of a target recognition device according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to better understand the technical solutions in the embodiments of the present application, the following description will clearly and completely describe the technical solutions in the embodiments of the present application with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.

The inventive concept of the present application is as follows: based on the above-mentioned technical problems, the present technical solution proposes a target recognition method, which can be used for recognizing the number of target detection objects in a farm. In the specific implementation, firstly, target images comprising a plurality of target detection objects are acquired, the target images are input into a pre-trained first counting model for quantity recognition, first counting recognition results of the plurality of target detection objects in the target images are output, if the first counting recognition results represent that the quantity of the plurality of target detection objects is larger than a preset quantity threshold, the target images are input into a pre-trained second counting model for quantity recognition, second counting recognition results of the plurality of target detection objects in the target images are output, and then the quantity of the target detection objects in the target images is determined according to the second counting recognition results. Under the condition that the number of the target detection objects is larger than a preset number threshold, the number of the target detection objects is further identified through a second counting model, the number of the target detection objects in the target image is determined according to a second counting identification result output by the second counting model, the first counting model is used for marking the area of each target detection object, and is suitable for small-scale number identification, the area of each target detection object can be marked when the number is smaller, so that a first counting identification result can be obtained according to the number of the area marks, the second counting model is used for marking the position point of each target detection object, and is suitable for large-scale number identification, because the area of each target detection object cannot be marked when the number is larger, and especially when the occupied area is smaller due to the fact that a certain target detection object is blocked, the area frame cannot be identified, the position point of the target detection object is more suitable for marking, and therefore, the second counting identification result can be obtained according to the number of the position point marks. Therefore, when the number of the target detection objects is identified, the number of the target detection objects in the target image can be preferentially identified through the first counting model, and whether the number of the target detection objects is needed to be identified again through the second counting model is judged according to the identification result, so that the accuracy and the precision of the number identification of the target detection objects are improved.

Referring to fig. 1, a flowchart of a target recognition method is provided in an embodiment of the present application, and an execution subject of the method may be an electronic device. The electronic device may be a terminal device or a server, where the terminal device may be a mobile terminal device such as a mobile phone, a tablet computer, or a device such as a Personal Computer (PC), and the server may be an independent server, or may be a server cluster formed by a plurality of different servers, or may be a cloud server capable of performing cloud computing. The method can be applied to any scene of target identification, and specifically comprises the following steps:

step S102, acquiring a target image including a plurality of target detection objects.

In the embodiment of the present application, a target image including a plurality of target detection objects may be acquired. Wherein, when obtaining, can obtain through the camera of plant installation. In order to improve the diversity of the acquired target images, the position, angle and the like of the camera can be adjusted when the target images are acquired through the camera, and at the moment, the same camera can acquire the target images with different scene visual angles. In addition, the camera can acquire target images under different illumination conditions in different time periods so as to acquire target images with different illumination changes.

Because the target detection objects are easy to gather and flow and are easily influenced by time, weather and the like, for example, the target detection objects are easy to gather outdoors in daytime, the target detection objects are easy to gather in cages at night, the target detection objects are easy to gather outdoors in sunny days, the target detection objects are easy to gather in cages in rainy days and the like, the number of the target detection objects in the target images acquired through the same camera can be different.

Step S104, inputting the target image into a pre-trained first counting model for quantity recognition, and outputting first counting recognition results of a plurality of target detection objects in the target image, wherein the first counting model is used for acquiring first area marking information of the plurality of target detection objects, the first area marking information comprises area marks of each target detection object, and the first counting recognition results are the quantity of the area marks.

As an alternative embodiment, as shown in fig. 2, the first counting model may include SPP (spatial pyramid pooling, spatial pyramid pooling network), FPN (Feature Pyramid Networks, feature pyramid network), and PAN (Path Aggregation Network ), and the inputting the target image into the pre-trained first counting model for performing quantity recognition, and outputting a first counting recognition result for obtaining a plurality of target detection objects in the target image may include: step A1 to step A4:

And A1, carrying out maximum pooling treatment on the target image through a spatial pyramid pooling network to obtain a spliced characteristic diagram formed by splicing the characteristics of a plurality of target detection objects.

In this embodiment of the present application, after obtaining a target image including a plurality of target detection objects, the target image may be input into an SPP network of a first counting model, so as to perform a maximum pooling process on the target image through the SPP network, and obtain a stitching feature map formed by stitching features of the target detection objects. The spliced characteristic diagram can comprise characteristics of different scales output by SPP networks of different network sizes.

And A2, performing feature fusion processing from top to bottom on the spliced feature graphs through a feature pyramid network to obtain first multi-level feature graphs of a plurality of target detection objects.

After the spliced feature map is obtained, the spliced feature map can be input into an FPN network, and feature fusion processing from top to bottom among different levels of features of the spliced feature map is performed through the FPN network. When the FPN network performs the feature fusion processing on the spliced feature images, the feature transfer fusion processing can be performed in an up-sampling mode, so that a first multi-level feature image of a plurality of target detection objects is obtained.

And step A3, performing feature fusion processing from bottom to top among different levels of features on the first multi-level feature map through a path aggregation network to obtain a second multi-level feature map of the plurality of target detection objects.

After the first multi-level feature map of the plurality of target detection objects is obtained, the first multi-level feature map can be input into a PAN network, and position information is transmitted from bottom to top through the PAN network by using a feature pyramid, so that feature fusion processing among different levels of features of the first multi-level feature map is realized, and a second multi-level feature map of the plurality of target detection objects is obtained.

And A4, performing non-maximum value suppression processing on the second multi-level feature graphs of the plurality of target detection objects, performing counting processing on the non-maximum value suppression processing results, and outputting first counting identification results of the plurality of target detection objects in the target image.

After the second multi-level feature graphs of the plurality of target detection objects are obtained, non-maximum value suppression processing can be performed on the second multi-level feature graphs of the plurality of target detection objects, counting processing is performed on non-maximum value suppression processing results, first counting recognition results of the plurality of target detection objects in the target image are output, and counting of the target detection objects in the target image is achieved.

In an embodiment of the present application, the model training process of the pre-trained first count model may include: step B1 to step B3:

and B1, acquiring a plurality of first target images, wherein the number of first target detection objects in the first target images is smaller than a preset number threshold.

In an embodiment of the present application, a plurality of first target images for training a first count model may be acquired. The number of first target detection objects in the first target image may be smaller than a preset number threshold.

For example, the preset number threshold may be 200, and the number of first target detection objects in the first target image may be less than 200.

In one embodiment, the plurality of first target images may be acquired by a camera, such as a data set (e.g., video) acquired by a camera installed at a farm. For example, a video acquired by a camera may be acquired first, and a picture may be extracted from the video, for example, one picture is extracted every 5 frames by using opencv, the format of the extracted picture may be png format, and then, a picture with the number of first target detection objects smaller than a preset number threshold is acquired from the extracted picture. In order to test the trained first counting model, a certain proportion of images can be extracted from the images of which the number of the first target detection objects is smaller than a preset number threshold value to serve as a test set, for example, ten percent of images are extracted to serve as the test set, and other ninety percent of images are taken as images of a subsequent generation sample set.

And B2, marking the area where each first target detection object in the first target image is located according to each first target image to obtain second area marking information.

In this embodiment of the present application, when marking an area where each first target detection object in the first target image is located, a minimum rectangular frame that can include each first target detection object may be searched, and the area where each first target detection object is located is marked with the minimum rectangular frame as a marking box, so as to obtain second area marking information. The marking box is a pointer which represents a rectangular frame, and four values [ x, y, w, h ] are altogether used, wherein (x, y) represents the coordinates of the center point of the smallest rectangular frame, w and h represent the width and the height of the smallest rectangular frame respectively, and the four values can be normalized according to the size of the image size.

Step B3, generating a first training sample set according to the plurality of first target images and the corresponding second region marking information, and training a first counting model to be trained through the first training sample set and a loss function to obtain a pre-trained first counting model; wherein the loss function is a two-class cross entropy loss function for the target mark.

As an optional implementation manner, generating a first training sample set according to the plurality of first target images and the corresponding second region marking information may include: step C1 to step C2:

and C1, preprocessing the plurality of first target images to obtain preprocessed plurality of first target images.

In the embodiment of the application, the plurality of first target images may be preprocessed to increase the number of the first target images.

Wherein the pre-treatment may comprise at least one of: the method comprises the steps of performing scaling processing on a plurality of first target images, performing cropping processing on the plurality of first target images, performing stitching processing on the plurality of first target images, performing flipping processing on the plurality of first target images, and performing brightness adjustment processing on the plurality of first target images.

After the above preprocessing is performed on the plurality of first target images, a plurality of new images after the preprocessing may be obtained, and then the obtained plurality of new images may be added to the first target images to enrich the number of the first target images.

And C2, generating a first training sample set according to the preprocessed first target images and the second region marking information corresponding to the first target images.

As an optional implementation manner, training the first count model to be trained through the first training sample set and the loss function to obtain a pre-trained first count model may include: step D1 to step D2:

and D1, inputting each first target image in the first training sample set into a first counting model to be trained to obtain prediction region marking information of the first target detection objects, wherein the prediction region marking information is used for performing prediction marking on regions where each first target detection object in the first target images is located.

In this embodiment of the present application, the area where each first target detection object marked by the predicted area marking information is located may be a rectangular area, where each first target detection object may be encompassed by one rectangular area marked by the predicted area marking information. The rectangular region marked by the prediction region marking information can be the smallest circumscribed rectangle of the target detection object.

As an optional implementation manner, inputting each first target image in the first training sample set into the first counting model to be trained to obtain the predicted region marking information of the first target detection object may include: step E1 to step E3:

E1, if the number of target detection objects of target images in a first training sample set is not smaller than a specified threshold, identifying the characteristics of target detection objects with different scales by using a preset multi-scale identification frame to obtain the characteristics of the target detection objects with multiple scales; wherein the specified threshold is less than a preset number of thresholds.

In this embodiment of the present disclosure, if the number of target detection objects in the target image is smaller than the preset number threshold and not smaller than the specified threshold, for example, the preset number threshold is 200, the specified threshold is 150, and the number of target detection objects in the target image is 160, the features of the target detection objects with different scales may be identified by using the preset multi-scale identification frame, so as to obtain the features of the target detection objects with multiple scales.

In this embodiment of the present application, if the number of target detection objects in the target image is less than the preset number threshold and not less than the specified threshold, a multi-scale identification frame may be set, for example, anchor frames with different scales may be set, and the first count model to be trained may be propagated forward to generate a multi-scale feature map of 13×13, 26×26, 52×52. Wherein, apply the larger prior frame (116 x 90), (156 x 198), (373 x 326) on the 13 x13 characteristic map of the minimum (having the largest receptive field), in order to detect the object that occupies the larger area is located; a medium prior box (30 x 61), (62 x 45), (59 x 119) is applied on the medium 26 x 26 profile (medium receptive field) to detect where an object of a medium size is located in the occupied area; smaller 52 x 52 feature maps (smaller receptive fields) apply smaller a priori boxes (10 x 13), (16 x 30), (33 x 23) to detect where smaller occupied objects are located.

The characteristics of the target detection objects with different scales can be identified through the preset multi-scale identification frame, so that the characteristics of the target detection objects with the multiple scales are obtained.

And E2, encoding the characteristics of the target detection objects with multiple scales to obtain characteristic offset.

Through the multi-scale recognition frames, the first counting model to be trained can predict the offset of the multi-scale recognition frames and the prediction area. At this time, the obtained characteristics of the target detection object of multiple scales may be encoded, for example, normalized, to obtain the characteristic offset.

And E3, inputting the initial prediction area marking information obtained after the first counting model to be trained into each first target image with the number of the target detection objects not smaller than a specified threshold according to the characteristic offset, and obtaining the prediction area marking information of the first target detection objects.

And reversely decoding the characteristic offset to obtain the information of the prediction area, so that the prediction area marking information of each first target detection object in the first target image can be obtained. Since the reverse decoding process is the prior art, the embodiments of the present application are not described herein.

And D2, training the first counting model to be trained according to the predicted area marking information, the second area marking information and the loss function to obtain the first counting model, wherein the training process is used for enabling the first number of the first target detection objects marked in the predicted area marking information to approach the second number of the first target detection objects marked in the second area marking information.

As an optional implementation manner, according to the predicted region marking information, the second region marking information, and the loss function, the training process of the first counting model to be trained may include: step F1 to step F3:

and F1, acquiring a first center point of a predicted area corresponding to the predicted area marking information of the target detection object and a second center point of a marking area corresponding to the second area marking information.

After the predicted region marking information and the second region marking information are obtained, a first center point of the predicted region corresponding to the predicted region marking information and a second center point of the marking region corresponding to the second region marking information may be obtained, respectively. Then, the euclidean distance of the first center point and the second center point may be further acquired, and a difference between the diagonal length of the prediction region and the diagonal length of the mark region, an aspect ratio of the prediction region, and an aspect ratio of the mark region may be acquired.

And F2, determining a loss value according to the Euclidean distance between the first center point and the second center point and the loss function.

And F3, training the first counting model to be trained according to the loss value.

After the first center point and the second center point are determined, the Euclidean distance between the first center point and the second center point is further determined, and then the loss value is determined by the Euclidean distance between the first center point and the second center point, the difference value between the diagonal length of the prediction area and the diagonal length of the marking area, the length-width ratio of the prediction area and the length-width of the marking area, and a preset loss function.

For example, the first and second center points may be b, respectively ^gt A representation; the euclidean distance of the first center point from the second center point may be represented by ρ (); the diagonal length difference value, by which the difference between the diagonal length of the prediction area and the diagonal length of the mark area can be determined, may be denoted by c; the width of the prediction region may be denoted by w; the height of the prediction area can be expressed by h; the aspect ratio of the prediction region and the labeling region may be denoted by α and v, respectively, and α and v may be influencing factors. In addition, a measurement index IOU (Intersection of Union, cross ratio) of target detection can be introduced to determine the loss value together:

After the loss value is obtained, the first counting model may be trained according to the loss value, so that the first number of the first target detection objects marked in the predicted area marking information output by the first counting model approximates to the second number of the first target detection objects marked in the second area marking information.

In this embodiment of the present application, when training the first counting model, 300 samples may be trained, where the image size may be set to 640, the number of samples selected by one training may be 6, and the learning rate may be preheated first, so that the model slowly tends to be stable when training is initialized, and then training may be performed according to the original learning rate, and an SGD optimizer may be used to improve the convergence rate.

Step S106, if the first counting identification result represents that the number of the plurality of target detection objects is greater than a preset number threshold, inputting the target image into a pre-trained second counting model for number identification, and outputting a second counting identification result for obtaining the plurality of target detection objects in the target image, wherein the second counting model is used for obtaining first position point mark information of the plurality of target detection objects, the first position point mark information comprises position point marks of each target detection object, and the second counting identification result is the number of the position point marks.

As an alternative embodiment, the second counting model may include a feature processing network, a truth density processing network, and a desired processing network. Inputting the target image into a pre-trained second counting model for quantity recognition, and outputting second counting recognition results of a plurality of target detection objects in the target image, wherein the method can comprise the steps of G1-G4:

and G1, processing the input target image through a feature processing network to obtain a feature vector of the target detection object.

In an embodiment of the present application, as shown in fig. 3, the second counting model may include a feature processing network (e.g., FCN (Fully Convolutional Networks, full convolutional neural network)), a true-density processing network (a network for generating a true-density map from features), and a desired processing network (a network for calculating a mathematical expectation of a predicted target).

When the target image is input into the second counting model to identify the number of the target detection objects, the target image can be input into a feature processing network of the second counting model to perform feature extraction processing, so that feature vectors of the target detection objects are obtained.

And G2, processing the feature vector of the target detection object through a true value density processing network to obtain a true value density diagram of the position of the target detection object.

After the feature vector of the target detection object is obtained, the feature vector of the target detection object can be input into a true value density processing network to perform true value density value processing, and a true value density diagram of the position of the target detection object is obtained.

And G3, processing the true value density map of the position of the target detection object through an expected processing network to obtain an expected value of the number of the target detection objects.

After the truth value density map of the positions of the target detection objects is obtained, the truth value density map of the positions of the target detection objects can be input into a desired processing network to perform expected value processing, and expected values of the number of the target detection objects are obtained.

And G4, outputting second counting identification results of a plurality of target detection objects in the target image according to expected values of the number of the target detection objects.

In an embodiment of the present application, the model training process of the pre-trained second count model may include: step H1 to step H3:

and step H1, acquiring a plurality of second target images, wherein the number of second target detection objects in the second target images is not less than a preset number threshold.

In an embodiment of the present application, a plurality of second target images for training the second count model may be acquired. The number of second target detection objects in the second target image may be smaller than a preset number threshold.

For example, the preset number threshold may be 200, and the number of second target detection objects in the second target image may be greater than 200.

In one embodiment, the plurality of second target images may be acquired by a camera, such as a data set (e.g., video) acquired by a camera installed at the farm. For example, the video acquired by the camera may be acquired first, and the picture may be extracted from the video, for example, one picture is extracted every 5 frames by using opencv, the format of the extracted picture may be png format, and then, the picture with the number of second target detection objects smaller than the preset number threshold is acquired from the extracted picture. In order to test the trained second counting model, a certain proportion of images can be extracted from the images of which the number of the second target detection objects is smaller than a preset number threshold value to serve as a test set, for example, ten percent of images are extracted to serve as the test set, and other ninety percent of images are taken as images of a subsequent generated sample set.

H2, marking the position point of each second target detection object in the second target image aiming at each second target image to obtain second position point marking information;

Step H3, generating a second training sample set according to the plurality of second target images and the corresponding second position point marking information, and training a second counting model to be trained through the second training sample set and the loss function to obtain a pre-trained second counting model; wherein the penalty function is determined by a metric function for the position markers of the training sample set and a bayesian penalty function at a specified expected value.

As an alternative embodiment, generating the second training sample set according to the plurality of second target images and the corresponding second position point marking information thereof may include: step I1 to step I2:

and step I1, preprocessing a plurality of second target images to obtain preprocessed second target images.

In the embodiment of the application, the plurality of second target images may be preprocessed to increase the number of the second target images.

Wherein the pre-treatment may comprise at least one of: the method comprises the steps of performing scaling processing on a plurality of second target images, performing cropping processing on the plurality of second target images, performing stitching processing on the plurality of second target images, performing flipping processing on the plurality of second target images, and performing brightness adjustment processing on the plurality of second target images.

After the above preprocessing is performed on the plurality of second target images, a plurality of new images after the preprocessing may be obtained, and then the obtained plurality of new images may be added to the second target images to enrich the number of the second target images.

And step I2, generating a second training sample set according to the preprocessed plurality of second target images and second region marking information corresponding to the plurality of second target images.

As an optional implementation manner, training the second counting model to be trained through the second training sample set and the loss function to obtain a pre-trained second counting model, which comprises the steps of J1-J2:

and step J1, inputting each second target image in the second training sample set into a second counting model to be trained to obtain predicted position point marking information of the second target detection objects, wherein the predicted position point marking information is used for carrying out predicted marking on the position point of each second target detection object in the second target images.

In this embodiment of the present application, the same second target detection object in the second target image may have only one location point mark, and location point marks may be performed on a specified body part, such as a head, of the second target detection object. At the time of marking, all second target detection objects in the target image can be manually marked through existing software.

And step J2, training the second counting model to be trained according to the predicted position point marking information, the second position point marking information and the loss function of the second target detection object to obtain the second counting model.

As an alternative embodiment, the training process for the second counting model to be trained according to the predicted position point marking information, the second position point marking information and the loss function of the second target detection object includes: step K1 to step K2:

and step K1, acquiring a true value density map of the predicted position points corresponding to the predicted position point marking information of the second target detection object and density differences of the true value density map of the marking position points corresponding to the second position point marking information of the second target detection object.

And acquiring a true value density map of the predicted position point corresponding to the predicted position point marking information of the second target detection object according to the second target image.

For example, a second target image input feature including h×w (high×wide×channel number) of the second target detection object may be put into the network, and then, after extracting the feature, a feature vector of h×w (high×wide) is generated, and then, a true value density map is generated according to the following formula:

Wherein x is _m Represented by the n points with the largest probability, z, of the response maximization (each point has a corresponding probability on the H-W feature map) after feature extraction _m Expressed as random points, z _n Position mark of the second target detection object denoted as mark, delta is denoted as x _m And z _n Is the mean of the standard deviation of (c).

And acquiring a density difference of a true value density map of the mark position point corresponding to the second position point mark information of the second target detection object after acquiring the true value density map of the predicted position point corresponding to the predicted position point mark information of the second target detection object. In this embodiment, the density difference between the true density map of the position of the second target detection object and the true density map of the marker position point may be D ^est (x _m )-D ^gt (z _m )。

And step K2, training the feature processing network to be trained, the true value density processing network to be trained and the expected processing network to be trained according to the predicted position point marking information, the second position point marking information and the density difference of the second target detection object.

In this embodiment of the present application, after obtaining the density difference between the true value density map of the position of the second target detection object and the true value density map of the marker position point, the metric function value corresponding to the density difference may be obtained, and the metric function value corresponding to the density difference may be determined as the metric function loss value.

For example, the metric function loss value may be:

L1＝F(D ^est (x _m )-D ^gt (z _m ))

where F may be a metric function.

After the truth value density map is obtained, the truth value density map can be input into an expected processing network to be processed, and expected values of the number of the second target detection objects are obtained.

Wherein, the above formula

Can be described as a likelihood function of the approximate target, let x be the spatial location random variable and y be the random variable labeled with the second target detection object. Approximating likelihood probability distribution of second target detection object using two-dimensional Gaussian distribution, i.e. "y when present _n When the target is detected, a second target detection object appears at the position x _m "conditional probability:

it is considered that the likelihood probability of the second target detection object decreases with distance from the target center mark point. On this basis, given the likelihood probability distribution of the second target detection object, the posterior probability of the second target detection object appearing at the specified pixel can be further estimated:

thus, the mathematical expectation of the position at which each second target detection object appears can be calculated based on the above expression, expressed by:

E(c _n )＝p(y _n |x _m )D ^est (x _m )

after calculating the expected value of the position of the second target detection object, a bayesian function loss value can be generated through a bayesian function.

The bayesian function loss value may be:

L ^Bayes ＝F(1-E(c _n ))

after the metric function loss value and the Bayesian function loss value are obtained, the two loss values can be weighted and calculated to obtain a final loss function corresponding to the second counting model to be trained. Parameters in the feature extraction network may then be updated by a gradient descent method based on the final loss function to enable training of the log recognition model.

When the second counting model is trained, the data can be divided into 5 parts, 1 part of verification set and 4 parts of training set, the second counting model is trained on the 4 parts of training set respectively, each round of epoch is trained on the 4 training sets, after one round of iteration is completed on a single training set, the data is verified on the verification set, and the model is stored. And (3) summing and averaging the prediction results of the 4 models during reasoning, namely, obtaining the counting result of the second target detection object.

Step S108, determining the number of target detection objects in the target image according to the second counting identification result.

In this embodiment of the present invention, after obtaining a target image including a plurality of target detection objects, the target image may be input into a first pre-trained counting model to perform quantity recognition, a first counting recognition result of the plurality of target detection objects in the target image is obtained by outputting, if the first counting recognition result characterizes that the number of the plurality of target detection objects is greater than a preset quantity threshold, the target image may be input into a second pre-trained counting model to perform quantity recognition, a second counting recognition result of the plurality of target detection objects in the target image is obtained by outputting, and the number of each target detection object in the target image is determined according to the second counting recognition result.

Under the condition that the number of the target detection objects is larger than a preset number threshold, the number of the target detection objects is further identified through a second counting model, the number of the target detection objects in the target image is determined according to a second counting identification result output by the second counting model, the first counting model is used for marking the area of each target detection object, and is suitable for small-scale number identification, the area of each target detection object can be marked when the number is smaller, so that a first counting identification result can be obtained according to the number of the area marks, the second counting model is used for marking the position point of each target detection object, and is suitable for large-scale number identification, because the area of each target detection object cannot be marked when the number is larger, and especially when the occupied area is smaller due to the fact that a certain target detection object is blocked, the area frame cannot be identified, the position point of the target detection object is more suitable for marking, and therefore, the second counting identification result can be obtained according to the number of the position point marks. Therefore, when the number of the target detection objects is identified, the number of the target detection objects in the target image can be preferentially identified through the first counting model, and whether the number of the target detection objects is needed to be identified again through the second counting model is judged according to the identification result, so that the accuracy and the precision of the number identification of the target detection objects are improved.

In the following, taking a chicken flock counting scene as an example, how the target recognition method provided by the application can realize the number recognition of the target detection objects is described.

In a chicken farm, the target detection object is each chicken in a chicken flock in a chicken house of the chicken farm, the target image is a chicken flock image, when a large number of chickens in the chicken house need to be counted, the chicken flock image of each chicken in the chicken house is firstly obtained, and the chicken is easy to gather outdoors in daytime, is easy to gather outdoors in the evening, is easy to gather outdoors in sunny days and is easy to gather in the chicken house in rainy days, so that the chicken flock image can be obtained by a camera at night or in rainy days, and the counting result is more accurate. In the specific implementation, firstly, a chicken group image comprising a plurality of chickens is acquired, the chicken group image is input into a pre-trained first counting model for quantity recognition, a first counting recognition result of a plurality of chickens in the chicken group image is output, if the first counting recognition result represents that the number of chickens in the chicken group image is larger than a preset quantity threshold value, for example, the number of chickens in the chicken group is larger than 200, the chicken group image is input into a pre-trained second counting model for quantity recognition, a second counting recognition result of the chickens in the chicken group image is output, and then the number of the chickens in the chicken group image is determined according to the second counting recognition result, so that the image of the chickens in the chicken group can be determined. In addition, a plurality of chicken images of the chicken house can be obtained, after the counting results of the chickens are obtained for each chicken image, average value processing is carried out on each counting result, and finally the number of the chickens in the chicken house is obtained. Under the condition that the number of chickens in the chicken flock is larger than a preset number threshold value, further carrying out number recognition on the chickens in the chicken flock through a second counting model, determining the number of the chickens in the chicken flock image according to a second counting recognition result output by the second counting model, wherein the first counting model is suitable for small-scale number recognition, is high in recognition speed, is suitable for large-scale number recognition, and is high in recognition accuracy.

The method schematic diagram of the target recognition method provided in the embodiment of the present application is based on the same concept, and the embodiment of the present application further provides a target recognition device, as shown in fig. 4.

The object recognition device includes: an acquisition module 401, a first output module 402, a second output module 403, and a determination module 404, wherein:

an acquisition module 401 for acquiring a target image including a plurality of target detection objects;

a first output module 402, configured to input the target image into a pre-trained first count model for number identification, and output a first count identification result of the plurality of target detection objects in the target image;

a second output module 403, configured to input the target image into a pre-trained second count model for performing number recognition if the number of the plurality of target detection objects represented by the first count recognition result is greater than a preset number threshold, and output a second count recognition result of the plurality of target detection objects in the target image;

a determining module 404, configured to determine the number of target detection objects in the target image according to the second count identification result.

Optionally, the first counting model includes a spatial pyramid pooling network, a feature pyramid network, and a path aggregation network; the first output module 402 is configured to:

Carrying out maximum pooling treatment on the target image through the spatial pyramid pooling network to obtain a spliced characteristic diagram spliced by the characteristics of the plurality of target detection objects;

performing top-down feature fusion processing on the spliced feature images through the feature pyramid network to obtain first multi-level feature images of the plurality of target detection objects;

performing feature fusion processing from bottom to top among different levels of features on the first multi-level feature map through the path aggregation network to obtain a second multi-level feature map of the plurality of target detection objects;

performing non-maximum value suppression processing on the second multi-level feature graphs of the plurality of target detection objects, performing counting processing on non-maximum value suppression processing results, and outputting first counting identification results of the plurality of target detection objects in the target image.

Optionally, the second counting model comprises a feature processing network, a true value density processing network, and a desired processing network; the second output module 403 is configured to:

processing the input target image through the feature processing network to obtain a feature vector of the target detection object;

Processing the feature vector of the target detection object through the true value density processing network to obtain a true value density chart of the position of the target detection object;

processing the true value density map of the position of the target detection object through the expected processing network to obtain an expected value of the number of the target detection objects;

and outputting second counting identification results of the plurality of target detection objects in the target image according to expected values of the number of the target detection objects.

Optionally, the first output module 402 is configured to:

acquiring a plurality of first target images, wherein the number of first target detection objects in the first target images is smaller than a preset number threshold;

marking the area where each first target detection object in the first target image is located according to each first target image to obtain second area marking information;

generating a first training sample set according to the plurality of first target images and the corresponding second region marking information, and training a first counting model to be trained through the first training sample set and a loss function to obtain a pre-trained first counting model; wherein the loss function is a two-class cross entropy loss function for the target mark.

Optionally, the first output module 402 is further configured to:

preprocessing the plurality of first target images to obtain preprocessed plurality of first target images;

generating the first training sample set according to the preprocessed plurality of first target images and second region marking information corresponding to the plurality of first target images;

wherein the pretreatment comprises at least one of: the method comprises the steps of performing scaling processing on the first target images, performing cropping processing on the first target images, performing stitching processing on the first target images, performing overturning processing on the first target images, and performing brightness adjustment processing on the first target images.

Optionally, the first output module 402 is further configured to:

inputting each first target image in the first training sample set into the first counting model to be trained to obtain prediction area marking information of the first target detection objects, wherein the prediction area marking information is used for predicting and marking the area where each first target detection object in the first target images is located;

and training the first counting model to be trained according to the predicted area marking information, the second area marking information and the loss function to obtain a first counting model, wherein the training process is used for enabling the first number of the first target detection objects marked in the predicted area marking information to approach the second number of the first target detection objects marked in the second area marking information.

Optionally, the first output module 402 is further configured to:

acquiring a first center point of a predicted area corresponding to the predicted area marking information of the first target detection object and a second center point of a marking area corresponding to the second area marking information;

determining a loss value according to the Euclidean distance between the first center point and the second center point and the loss function;

and training the first counting model to be trained according to the loss value.

Optionally, the first output module 402 is further configured to:

if the number of target detection objects of the target images in the first training sample set is not smaller than a specified threshold, identifying the characteristics of the target detection objects with different scales by using a preset multi-scale identification frame to obtain the characteristics of the target detection objects with multiple scales; wherein the specified threshold is less than the preset number threshold;

encoding the characteristics of the target detection objects with the multiple scales to obtain characteristic offset; and according to the characteristic offset and the initial prediction area marking information obtained after inputting each first target image of which the number of target detection objects is not less than a specified threshold value into the first counting model to be trained, obtaining the prediction area marking information of the first target detection objects.

Optionally, the second output module 403 is configured to:

acquiring a plurality of second target images, wherein the number of second target detection objects in the second target images is not less than a preset number threshold;

marking the position point of each second target detection object in the second target image aiming at each second target image to obtain second position point marking information;

generating a second training sample set according to the plurality of second target images and the corresponding second position point marking information thereof, and training a second counting model to be trained through the second training sample set and a loss function to obtain a pre-trained second counting model; wherein the loss function is determined from a metric function for the position markers of the training sample set and a bayesian loss function at a specified expected value.

Optionally, the second output module 403 is further configured to:

inputting each second target image in the second training sample set into the second counting model to be trained to obtain predicted position point marking information of the second target detection objects, wherein the predicted position point marking information is used for predicting and marking the position point of each second target detection object in the second target image;

And training the second counting model to be trained according to the predicted position point marking information of the second target detection object, the second position point marking information and the loss function to obtain a second counting model.

Optionally, the second output module 403 is further configured to:

acquiring a true value density map of a predicted position point corresponding to the predicted position point marking information of the second target detection object and a density difference of the true value density map of a marking position point corresponding to the second position point marking information of the second target detection object;

and training the feature processing network to be trained, the true value density processing network to be trained and the expected processing network to be trained according to the predicted position point marking information, the second position point marking information and the density difference of the second target detection object.

The object recognition device in this embodiment can implement the respective processes in the above-described method embodiments and achieve the same effects and functions, and will not be repeated here.

According to the target recognition method provided by the foregoing embodiments, based on the same technical concept, the embodiments of the present application further provide an electronic device, where the electronic device is configured to perform the foregoing target recognition method, fig. 5 is a schematic hardware structure of an electronic device implementing the embodiments of the present application, and the electronic device 50 shown in fig. 5 includes, but is not limited to: radio frequency unit 51, network module 52, audio output unit 53, input unit 54, sensor 55, user input unit 56, interface unit 57, memory 54, processor 59, and power supply 510. It will be appreciated by those skilled in the art that the electronic device structure shown in fig. 5 is not limiting of the electronic device and that the electronic device may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.

It should be noted that, the electronic device 50 provided in the embodiment of the present application can implement each process implemented by the session management device in the embodiment of the target identification method, and in order to avoid repetition, a description is omitted here.

It should be understood that, in the embodiment of the present application, the radio frequency unit 51 may be used for receiving and transmitting signals during the process of receiving and transmitting information or communication, specifically, after receiving downlink data from an upstream device, processing the downlink data with the processor 59; in addition, upstream data is transmitted to the upstream device. Typically, the radio frequency unit 51 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like. In addition, the radio frequency unit 51 may also communicate with networks and other devices through a wireless communication system.

The electronic device provides wireless broadband internet access to the user via the network module 52, such as helping the user to send and receive e-mail, browse web pages, and access streaming media, etc.

The audio output unit 53 may convert audio data received by the radio frequency unit 51 or the network module 52 or stored in the memory 59 into an audio signal and output as sound. Also, the audio output unit 53 may also provide audio output (e.g., a call signal reception sound, a message reception sound, etc.) related to a specific function performed by the mobile terminal 50. The audio output unit 53 includes a speaker, a buzzer, a receiver, and the like.

The input unit 54 is for receiving an audio or video signal. The input unit 54 may include a graphics processor (Graphics Processing Unit, GPU) 541 and a microphone 542, the graphics processor 541 processing image data of still pictures or video obtained by an image capturing apparatus (such as a camera) in a video capturing mode or an image capturing mode. The processed image frames may be displayed on the display unit 56. The image frames processed by the graphics processor 541 may be stored in the memory 59 (or other storage medium) or transmitted via the radio frequency unit 51 or the network module 52. The microphone 542 may receive sound and may be capable of processing such sound into audio data. The processed audio data may be converted into a format output that can be transmitted to the mobile communication base station via the radio frequency unit 51 in the case of a telephone call mode.

The interface unit 53 is an interface for connecting an external device to the electronic apparatus 50. For example, the external devices may include a wired or wireless headset port, an external power (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a device having an identification module, an audio input/output (I/O) port, a video I/O port, an earphone port, and the like. The interface unit 53 may be used to receive input (e.g., data information, power, etc.) from an external device and transmit the received input to one or more elements within the electronic apparatus 50 or may be used to transmit data between the electronic apparatus 50 and an external device.

The memory 58 may be used to store software programs as well as various data. The memory 58 may mainly include a storage program area that may store an operating system, application programs required for at least one function (such as a sound playing function, an image playing function, etc.), and a storage data area; the storage data area may store data (such as audio data, phonebook, etc.) created according to the use of the handset, etc. In addition, the memory 58 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.

Processor 59 is a control center of the electronic device and utilizes various interfaces and lines to connect various portions of the overall electronic device, perform various functions of the electronic device and process data by running or executing software programs and/or modules stored in memory 58, and invoking data stored in memory 58, thereby performing overall monitoring of the electronic device. Processor 59 may include one or more processing units; preferably, the processor 59 may integrate an application processor that primarily handles operating systems, user interfaces, applications, etc., with a modem processor that primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 59.

The electronic device 50 may also include a power supply 511 (e.g., a battery) for powering the various components, and preferably the power supply 511 may be logically connected to the processor 59 via a power management system that performs functions such as managing charging, discharging, and power consumption.

In addition, the electronic device 50 includes some functional modules, which are not shown, and will not be described herein.

Preferably, the embodiment of the present application further provides an electronic device, including a processor 59, a memory 58, and a computer program stored in the memory 58 and capable of running on the processor 59, where the computer program when executed by the processor 59 implements each process of the above embodiment of the target identifying method, and the same technical effects can be achieved, and for avoiding repetition, a detailed description is omitted herein.

Further, based on the method shown in fig. 1, one or more embodiments of the present application further provide a storage medium, where the storage medium is used to store computer executable instruction information, and in a specific embodiment, the storage medium may be a U disc, an optical disc, a hard disk, etc., where the computer executable instruction information stored in the storage medium, when being executed by a processor, can implement each process implemented by an electronic device in the embodiment of the target identification method, so that repetition is avoided and details are not repeated herein.

The foregoing describes specific embodiments of the present application. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

In the 90 s of the 20 th century, improvements to one technology could clearly be distinguished as improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) or software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable Gate Array, FPGA)) is an integrated circuit whose logic function is determined by the programming of the device by a user. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented by using "logic compiler" software, which is similar to the software compiler used in program development and writing, and the original code before the compiling is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but not just one of the hdds, but a plurality of kinds, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), lava, lola, myHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.

The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application specific integrated circuits (Application Specific Integrated Circuit, ASIC), programmable logic controllers, and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each unit may be implemented in the same piece or pieces of software and/or hardware when implementing the embodiments of the present application.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present embodiments may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

Embodiments of the application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. One or more embodiments of the present application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

All embodiments in the application are described in a progressive manner, and identical and similar parts of all embodiments are mutually referred, so that each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.

The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and changes may be made to the present application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. which are within the spirit and principles of the present application are intended to be included within the scope of the claims of the present application.

Claims

1. A method of target identification, comprising:

acquiring a target image including a plurality of target detection objects;

inputting the target image into a pre-trained first counting model for quantity recognition, and outputting a first counting recognition result of the plurality of target detection objects in the target image, wherein the first counting model is used for acquiring first area marking information of the plurality of target detection objects, the first area marking information comprises area marks of each target detection object, and the first counting recognition result is the quantity of the area marks;

If the first counting identification result represents that the number of the plurality of target detection objects is larger than a preset number threshold, inputting the target image into a pre-trained second counting model for number identification, and outputting a second counting identification result of the plurality of target detection objects in the target image, wherein the second counting model is used for acquiring first position point mark information of the plurality of target detection objects, the first position point mark information comprises position point marks of each target detection object, and the second counting identification result is the number of the position point marks;

2. The method of claim 1, wherein the first counting model comprises a spatial pyramid pooling network, a feature pyramid network, and a path aggregation network;

inputting the target image into a pre-trained first counting model for quantity recognition, and outputting first counting recognition results of the plurality of target detection objects in the target image, wherein the first counting recognition results comprise:

3. The method of claim 1, wherein the second count model comprises a feature processing network, a true value density processing network, and a desired processing network;

inputting the target image into a pre-trained second counting model for quantity recognition, and outputting second counting recognition results of the plurality of target detection objects in the target image, wherein the second counting recognition results comprise:

processing the input target image through the feature processing network to obtain feature vectors of the plurality of target detection objects;

Processing the feature vectors of the plurality of target detection objects through the true value density processing network to obtain a true value density chart of the positions of the plurality of target detection objects;

processing the true value density graphs of the positions of the plurality of target detection objects through the expected processing network to obtain expected values of the number of the plurality of target detection objects;

and outputting a second counting identification result of the plurality of target detection objects in the target image according to expected values of the number of the plurality of target detection objects.

4. The method of claim 1, wherein the model training process of the pre-trained first count model comprises the steps of:

5. The method of claim 4, wherein generating a first training sample set from the plurality of first target images and their corresponding second region marking information comprises:

6. The method of claim 4, wherein the training the first count model to be trained by the first training sample set and the loss function to obtain a pre-trained first count model comprises:

7. The method of claim 6, wherein the training the first count model to be trained based on the predicted region marking information, the second region marking information, and the loss function comprises:

8. The method of claim 6, wherein inputting each first target image in the first training sample set into the first counting model to be trained, to obtain the predicted region marking information of the first target detection object, comprises:

If the number of the first target detection objects in the first target image in the first training sample set is not smaller than a specified threshold, identifying the characteristics of the first target detection objects with different scales by using a preset multi-scale identification frame to obtain the characteristics of the first target detection objects with multiple scales; wherein the specified threshold is less than the preset number threshold;

encoding the characteristics of the first target detection objects with the multiple scales to obtain characteristic offset;

obtaining the predicted region marking information of the first target detection object according to the characteristic offset and the initial predicted region marking information; the initial prediction area marking information is output by the first counting model to be trained according to a first target image of which the input number of first target detection objects is not smaller than a specified threshold value.

9. The method of claim 1, wherein the model training process of the pre-trained second count model comprises the steps of:

10. The method of claim 9, wherein generating a second training sample set from the plurality of second target images and their corresponding second location point marker information comprises:

preprocessing the plurality of second target images to obtain preprocessed plurality of second target images;

generating a second training sample set according to the preprocessed plurality of second target images and second position point marking information corresponding to the plurality of second target images;

wherein the pretreatment comprises at least one of: the method comprises the steps of performing scaling processing on the plurality of second target images, performing cropping processing on the plurality of second target images, performing stitching processing on the plurality of second target images, performing overturning processing on the plurality of second target images, and performing brightness adjustment processing on the plurality of second target images.

11. The method according to claim 9, wherein the training the second count model to be trained by the second training sample set and the loss function to obtain a pre-trained second count model includes:

12. The method of claim 11, wherein the training the second count model to be trained based on the predicted location point marking information of the second target detection object, the second location point marking information, and the loss function comprises:

acquiring a first truth value density map of a predicted position point corresponding to the predicted position point marking information of the second target detection object and a second truth value density map of a marked position point corresponding to the second position point marking information of the second target detection object;

Calculating a density difference between the first truth density map and the second truth density map;

and training the feature processing network, the true value density processing network and the expected processing network in the second counting model to be trained according to the predicted position point marking information, the second position point marking information and the density difference of the second target detection object.

13. An object recognition apparatus, comprising:

14. An electronic device, comprising:

a processor; the method comprises the steps of,

a memory arranged to store computer executable instructions configured to be executed by the processor, the executable instructions comprising steps for performing the method of any of claims 1-12.

15. A storage medium storing computer executable instructions for causing a computer to perform the method of any one of claims 1-12.