CN110472599B

CN110472599B - Object quantity determination method and device, storage medium and electronic equipment

Info

Publication number: CN110472599B
Application number: CN201910769944.8A
Authority: CN
Inventors: 郁昌存; 王德鑫
Original assignee: Beijing Haiyi Tongzhan Information Technology Co Ltd
Current assignee: Jingdong Shuke Haiyi Information Technology Co Ltd; Jingdong Technology Information Technology Co Ltd
Priority date: 2019-08-20
Filing date: 2019-08-20
Publication date: 2021-09-03
Anticipated expiration: 2039-08-20
Also published as: WO2021031954A1; CN110472599A

Abstract

The disclosure provides an object quantity determination method, an object quantity determination device, a computer readable storage medium and electronic equipment, and belongs to the technical field of computer vision. The method comprises the following steps: identifying objects in an image to be processed, and taking the number of the identified objects as a first numerical value; comparing the first value with a preset threshold value; if the first numerical value is smaller than the preset threshold value, determining the number of the objects in the image to be processed as the first numerical value; and if the first numerical value is larger than the preset threshold value, performing density detection on the objects in the image to be processed to obtain a second numerical value related to the number of the objects, and determining the number of the objects in the image to be processed as the second numerical value. The method and the device can accurately determine the number of the objects under the condition that the objects are densely distributed, and have high applicability.

Description

Object quantity determination method and device, storage medium and electronic equipment

Technical Field

The present disclosure relates to the field of computer vision technologies, and in particular, to a method for determining an object number, an apparatus for determining an object number, a computer-readable storage medium, and an electronic device.

Background

In many cases, it is necessary to count the number of certain objects, such as the number of visitors in a scenic spot, the number of vehicles in a parking lot, and the like.

The conventional method is to count the number of objects flowing in and out at the entrance and exit of a target area, such as a gate or an infrared sensing device at the entrance and exit of a scenic spot, a barrier device at the entrance and exit of a parking lot, and the like, but this method cannot count the number of objects in an open area, such as the number of visitors in an open scenic spot, the number of vehicles in a street, and the like, and can only count the total number of objects in the target area, and cannot determine the distribution of the objects.

With the development of deep learning and computer vision, a method for determining the number of objects based on monitoring images appears in the prior art, taking the statistics of the number of tourists in a scenic spot as an example, monitoring cameras are arranged at different positions of the scenic spot, images of the scenic spot are shot in real time, and the tourists are identified from the images, so that the number of the tourists is counted. Compared with the traditional method, the prior art is obviously improved, can be applied to an open area, and can count the object distribution condition in the area; however, there is a problem that in the case of a high density of objects, especially in the case of an occlusion, such as a scenic spot during holiday times and a street section during rush hour of commute, the accuracy of the prior art is low, and the number of objects determined is different from the actual number by a large amount, usually lower than the actual number, thereby limiting the application.

It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

The present disclosure provides an object number determining method, an object number determining apparatus, a computer-readable storage medium, and an electronic device, so as to at least improve to some extent the problem of low accuracy in determining the number of objects in the prior art when the object density is high.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.

According to a first aspect of the present disclosure, there is provided an object number determination method, including: identifying objects in an image to be processed, and taking the number of the identified objects as a first numerical value; comparing the first value with a preset threshold value; if the first numerical value is smaller than the preset threshold value, determining the number of the objects in the image to be processed as the first numerical value; and if the first numerical value is larger than the preset threshold value, performing density detection on the objects in the image to be processed to obtain a second numerical value related to the number of the objects, and determining the number of the objects in the image to be processed as the second numerical value.

In an exemplary embodiment of the present disclosure, the method further comprises: acquiring a target image, dividing the target image into a plurality of areas, and taking the image of each area as the image to be processed.

In an exemplary embodiment of the present disclosure, each of the regions has a corresponding preset threshold.

In an exemplary embodiment of the present disclosure, the identifying an object in an image to be processed includes: and identifying the object in the image to be processed through a pre-trained first neural network model.

In an exemplary embodiment of the disclosure, the first neural network model includes a YOLO model (a real-time object detection algorithm framework including multiple versions v1, v2, v3, etc., any of which may be employed by the disclosure).

In an exemplary embodiment of the present disclosure, the performing density detection on an object in the image to be processed includes: and carrying out density detection on the object in the image to be processed through a pre-trained second neural network model.

In an exemplary embodiment of the present disclosure, the second neural network model includes: the first branch network is used for performing first convolution processing on the image to be processed to obtain a first characteristic image; the second branch network is used for carrying out second convolution processing on the image to be processed to obtain a second characteristic image; the third branch network is used for carrying out third convolution processing on the image to be processed to obtain a third characteristic image; the merging layer is used for merging the first characteristic image, the second characteristic image and the third characteristic image into a final characteristic image; and the output layer is used for mapping the final characteristic image into a density image.

According to a second aspect of the present disclosure, there is provided an object number determination apparatus including: the identification module is used for identifying objects in the image to be processed and taking the number of the identified objects as a first numerical value; the comparison module is used for comparing the first numerical value with a preset threshold value; a first determining module, configured to determine, if the first numerical value is smaller than the preset threshold, the number of the objects in the image to be processed as the first numerical value; a second determining module, configured to perform density detection on the objects in the image to be processed if the first value is greater than the preset threshold, to obtain a second value related to the number of the objects, and determine the number of the objects in the image to be processed as the second value.

In an exemplary embodiment of the present disclosure, the apparatus further includes: the acquisition module is used for acquiring a target image, dividing the target image into a plurality of areas, and taking the image of each area as the image to be processed.

In an exemplary embodiment of the disclosure, the recognition module is configured to recognize an object in the image to be processed through a pre-trained first neural network model.

In an exemplary embodiment of the present disclosure, the first neural network model includes a YOLO model.

In an exemplary embodiment of the present disclosure, the second determining module includes: and the density detection unit is used for carrying out density detection on the object in the image to be processed through a pre-trained second neural network model.

According to a third aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of any one of the above.

According to a fourth aspect of the present disclosure, there is provided an electronic device comprising: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform the method of any one of the above via execution of the executable instructions.

Exemplary embodiments of the present disclosure have the following advantageous effects:

and identifying the object in the image to be processed, and judging whether the object in the image is sparse or dense according to the size relation between the first numerical value obtained by identification and a preset threshold value, so as to determine whether the first numerical value is adopted as a final result or a second numerical value obtained by density detection is adopted as the final result. On one hand, if the first numerical value is greater than the preset threshold value, the objects in the image are dense, and a blocking situation may exist, at this time, a density detection mode is adopted, the obtained second numerical value is used as a final structure, the number of the objects can be determined more accurately, and the exemplary embodiment has higher accuracy. On the other hand, the combination of the object identification mode and the density detection mode is adopted, so that the flexibility is high, the preset threshold value is adjusted, the exemplary embodiment can be applied to various different scenes, and the applicability is high.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty.

Fig. 1 shows a flowchart of an object number determination method in the present exemplary embodiment;

FIG. 2 illustrates a scenic spot monitoring image to be processed;

FIG. 3 illustrates a visualization effect graph for tourist identification of a scenic spot surveillance image;

FIG. 4 is a block diagram illustrating a neural network model in the exemplary embodiment;

fig. 5 shows a schematic diagram of dividing a region of a target image in the present exemplary embodiment;

fig. 6 shows a flowchart of another object number determination method in the present exemplary embodiment;

fig. 7 is a block diagram showing the structure of an object quantity determination apparatus in the present exemplary embodiment;

FIG. 8 illustrates a computer-readable storage medium for implementing the above-described method in the present exemplary embodiment;

fig. 9 shows an electronic device for implementing the above method in the present exemplary embodiment.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

Exemplary embodiments of the present disclosure first provide a method for determining the number of objects in an image, and application scenarios of the method include but are not limited to: counting the number of people in scenic spots, shopping malls and other areas; counting the number of vehicles in areas such as parking lots, streets and the like; monitoring the number of ships in ports, wharfs and other areas; the number of animals in the farm is monitored. In the following, a scene in which the number of people in the scenic region is counted is taken as an example, and the contents of the method are also applicable to other scenes.

Fig. 1 shows a method flow of the present exemplary embodiment, which may include steps S110 to S140:

step S110, identifying objects in the image to be processed, and taking the number of identified objects as a first numerical value.

The image to be processed may be a monitoring image of a scenic spot or a GIS image (Geographic Information System, GIS image including a satellite view of the earth surface, a population thermodynamic diagram, etc.), and the like. For example: a background Computer or a server pulls a video stream of a monitoring camera in a scenic spot, and at present, a webcam provides video streams of protocols such as rtmp (Real Time Messaging Protocol), http (Hyper Text Transfer Protocol), and the like, and can pull an online video stream through an OpenCV (Open Source Computer Vision Library) to obtain a Real-Time video frame, and a single frame image therein is used as an image to be processed, for example, fig. 2 shows a single frame monitoring image of a certain scenic spot.

After the image to be processed is acquired, the object therein may be identified. In an exemplary embodiment, a deep learning technique may be employed to identify an object in the image to be processed through a pre-trained first neural network model. For example, the first neural network model may adopt a YOLO model, which may be trained through an open-source dense pedestrian detection data set, or may be trained by manually labeling pictures in an application scene to obtain a data set (for example, a large number of scenic spot monitoring images are labeled with visitors). The YOLO model takes the scenic spot monitoring image as input and all the guest Bounding Box (Bounding Box) information in the image as output, for example, fig. 2 is input into the YOLO model, the visualization effect of the output can be shown in fig. 3, the YOLO model identifies the guests in the image, and finally, the Bounding Box (x, y, w, h) of each guest is obtained actually, x and y represent the position coordinates of the center of the Bounding Box in the image, and w and h represent the width and height of the Bounding Box. In addition, the first Neural Network model may also adopt other algorithm models for target detection, such as R-CNN (Region-Convolutional Neural Network, or modified versions of Fast R-CNN, and the like), SSD (Single Shot Multi Box Detector), and the like. In an exemplary embodiment, an object contour may also be detected from the image to be processed, and an object whose contour shape is close to the object shape may be recognized as the object.

In the present exemplary embodiment, the number of objects recognized from the image to be processed is a first numerical value.

Step S120, comparing the first value with a preset threshold.

Usually, under the condition that the number of objects in the image to be processed is small, each object is relatively complete in the image and is easy to identify, so that the first numerical value obtained in step S110 is close to the real number of the objects, that is, the reliability of the first numerical value is high; in the case of a large number of objects, there may be a problem that several objects are occluded or the image resolution of a single object is low, so that the objects are difficult to recognize, and the confidence of the first value is low. As shown in fig. 2 and 3, when there are many visitors in the scenic spot, the first neural network model identifies the visitors in the monitored image, and there are many missed detections in the central area where the visitors are dense.

In the exemplary embodiment, whether the first value is credible or not is determined by comparing the relative size of the first value and a preset threshold value, if the first value is smaller than the preset threshold value, an object in the image to be processed is relatively sparse, and the first value is credible; otherwise, the objects in the image to be processed are relatively dense, and the first numerical value is not credible. The preset threshold may be determined according to experience, a region characteristic corresponding to the image to be processed, a size relationship between the image to be processed and the object, and the like, which is not particularly limited in this disclosure.

Step S130, if the first value is smaller than the preset threshold, determining the number of the objects in the image to be processed as the first value.

As described above, when the condition of step S130 is satisfied, the first numerical value is reliable, and therefore, the result can be output as the number of objects in the image to be processed.

Step S140, if the first value is greater than the preset threshold, performing density detection on the objects in the image to be processed to obtain a second value related to the number of the objects, and determining the number of the objects in the image to be processed as the second value.

If the condition of step S140 is satisfied, and the first value is not trusted, another method other than object recognition, that is, a density detection method, may be adopted to determine the number of objects in the image to be processed. The density detection is different from the object identification, and mainly performs regression on the probability of the object existing in each region (or each pixel) in the image to be processed, and obtains the number of the objects in the image to be processed in a statistical manner, which is the second numerical value. In the case where there are many objects, particularly, in the case where there is a dense distribution and occlusion, the density detection has a higher degree of reliability than the object recognition, and therefore, the second numerical value can be used as the number of objects in the image to be processed, and the result can be output.

It should be added that, for the case that the first value is equal to the preset threshold, it may be regarded as a special case that the condition of step S130 is satisfied, and it may also be regarded as a special case that the condition of step S140 is satisfied, so as to execute step S130 or S140, which is not particularly limited by the present disclosure.

In an exemplary embodiment, the object in the image to be processed may be density detected by a pre-trained second neural network model. For example, the second Neural Network model may adopt an MCNN (Multi-column Convolutional Neural Network), and fig. 4 shows a structure of the MCNN model 400, which may include: an input layer 410 for inputting an image to be processed; the first branch network 420 is configured to perform a first convolution process on an image to be processed to obtain a first feature image; the second branch network is used for performing second convolution processing on the image to be processed to obtain a second characteristic image; the third branch network is used for performing third convolution processing on the image to be processed to obtain a third characteristic image; the merging layer is used for merging the first characteristic image, the second characteristic image and the third characteristic image into a final characteristic image; and the output layer is used for mapping the final characteristic image into a density image. The first convolution processing, the second convolution processing and the third convolution processing respectively comprise a series of operations such as convolution, pooling and the like, and in the first convolution processing, the second convolution processing and the third convolution processing, the used parameters (such as convolution kernel size, pooling parameters and the like) are different, which is equivalent to extracting the features of the image to be processed from different scales to respectively obtain a first feature image, a second feature image and a third feature image; and then merging the images into a final characteristic image, mapping the final characteristic image into a density image in a mode of 1-by-1 convolution and the like, wherein in the density image, the numerical value of each point represents the probability that the point is an object, and accumulating the numerical values of all the points to obtain a second numerical value representing the number of the objects in the image to be processed.

The training of the MCNN model may be based on an open source data set, the image annotation may be coordinates of each human head, the human head coordinates are converted into a probability density image using a geometric adaptive gaussian kernel, and the sum of the probabilities of each human head region is 1. The initial image is used as a sample, and the converted probability density image is a label (ground route), so that the model can be trained.

It should be understood that the second neural network model may also adopt other density detection networks, for example, a variant form of MCNN, a fourth branch network is added on the basis of the structure of fig. 4, or an intermediate layer is added in the first, second or third branch networks, or one or more full connection layers are added, and the disclosure is not limited thereto.

Based on the above description, the present exemplary embodiment identifies an object in an image to be processed, and determines whether the object in the image is sparse or dense according to the magnitude relationship between the first value obtained by the identification and the preset threshold, thereby determining whether to use the first value as a final result or to use the second value obtained by density detection as a final result. On one hand, if the first numerical value is greater than the preset threshold value, the objects in the image are dense, and a blocking situation may exist, at this time, a density detection mode is adopted, the obtained second numerical value is used as a final structure, the number of the objects can be determined more accurately, and the exemplary embodiment has higher accuracy. On the other hand, the combination of the object identification mode and the density detection mode is adopted, so that the flexibility is high, the preset threshold value is adjusted, the exemplary embodiment can be applied to various different scenes, and the applicability is high.

In an exemplary embodiment, after the target image is acquired, the target image may be divided into a plurality of regions, and the images of the regions may be used as the images to be processed. The target image is a complete image requiring the determination of the number of objects, for example, an original monitoring image of a scenic spot shown in fig. 2, because the camera is arranged at a higher angle and has a wider shooting range, the shot image includes a part of fixed scenery, sky and the like, and generates more interference factors, which generate certain interference on the estimation of the number of tourists, and the distribution of the tourists in different areas has dense and sparse differences, which can be respectively processed in a targeted manner. In view of this, referring to fig. 5, it is possible to divide fig. 2 into a plurality of regions according to a priori experience, perform the method flow of fig. 1 on each region image, and finally add the number of objects in each region to obtain the total number of objects in the target image.

In FIG. 5, zone one is unlikely to have guests present, so the number of guests in zone one can always be set to 0. And the tourists in the second area and the tourists in the third area are relatively sparse, and the fixed scene ratio is large, so that the tourists can be identified in an object identification mode, and the number is counted. And the fourth area is an area in which tourists are mainly concentrated, the area is dense, a serious shielding condition exists, the object identification effect on the area is poor, and therefore the number of the tourists can be counted in a density detection mode.

In addition to partitioning the regions based on a priori experience, several exemplary ways are provided below, but the following should not be taken as limiting the scope of the disclosure:

(1) dividing regions according to the characteristics of object distribution in the target image: firstly, carrying out object recognition on a target image to obtain the approximate position of each object; then roughly selecting a part with more densely distributed objects, drawing a boundary line at a position where the distance between two objects exceeds a certain distance to obtain a region, and calculating the object density (the number of objects in the region/the image area of the region) of the region; gradually expanding the region in each direction, replacing the region before expansion with the expanded region if the density of the expanded object is increased, and recovering the region before expansion if the density of the object is reduced; until the object density reaches the maximum, determining the area as a defined area. And (4) dividing the determined region from the target image, repeating the process on the rest part, and finally finishing region division.

(2) The method is suitable for monitoring images and the condition that the scene area shot by the camera is not changed. The method comprises the steps of calling a certain number of representative historical images from a monitoring image, for example, in the monitoring image of the past week, selecting a plurality of frames of images between two points and three points (during peak time of tourists in a scenic spot) every afternoon, dividing the images into a plurality of fine squares, calculating the occurrence probability of the tourists in each square (the number of the historical images of the tourists appearing in the square/the total number of the selected historical images), obtaining a probability map, and connecting the squares with similar probabilities into a region according to the probability distribution condition, thereby dividing the images into a plurality of regions. The monitoring images taken thereafter all adopt the result of the region division.

After the target area is divided into a plurality of areas, the method of fig. 1 is performed on each area image, where the preset threshold values used for the areas may be the same or different, that is, the areas may have a uniform preset threshold value or may have corresponding preset threshold values. For example: in fig. 5, smaller preset thresholds may be set for regions two and three, and larger preset thresholds may be set for region four. The preset threshold of each region may be determined empirically or calculated according to image characteristics, for example: the area of a part of the image of each region where the tourists are likely to appear is calculated, the area is divided by the area of the image occupied by each tourist, the number of the tourists filled with the tourists in each region and without occlusion is estimated, the number can be used as a preset threshold, or an empirical coefficient (such as 0.9) which is smaller than 1 is multiplied by the number to be used as a preset threshold, and the like, and the disclosure does not particularly limit the number. And a targeted preset threshold value is adopted for each region, so that the total number of the objects in the target image can be obtained more accurately.

Fig. 6 shows another flow of the present exemplary embodiment, including: step S601, acquiring a target image, which may be a monitoring image, for example; step S602, dividing a target image into a plurality of areas; step S603, taking the image of each region as the image to be processed, executing steps S604 to S608 respectively: step S604, detecting the number of objects in the image to be processed as a first numerical value through object identification; step S605, judging the first numerical value and the preset threshold value; step S606, if the first value is smaller than the preset threshold value, determining the number of the objects in the area as the first value; step S607, if the first value is larger than the preset threshold value, the first value is not credible, and object density detection needs to be carried out on the image to be processed to obtain a second value; step S608, determining the number of objects in the area as a second value; based on the above process, the number of objects in each region can be obtained, and step S609 is finally executed to accumulate the number of objects in each region to obtain the total number of objects in the target image, so as to finally determine the number of objects in the target image.

An exemplary embodiment of the present disclosure also provides an object number determining apparatus, as shown in fig. 7, the apparatus 700 may include: the identification module 710 is configured to identify objects in the image to be processed, and use the number of the identified objects as a first numerical value; a comparing module 720, configured to compare the first value with a preset threshold; a first determining module 730, configured to determine the number of objects in the image to be processed as a first numerical value if the first numerical value is smaller than a preset threshold; the second determining module 740 is configured to, if the first value is greater than the preset threshold, perform density detection on the objects in the image to be processed to obtain a second value related to the number of the objects, and determine the number of the objects in the image to be processed as the second value.

In an exemplary embodiment, the object number determining apparatus 700 may further include: and an acquiring module (not shown in the figure) for acquiring a target image, dividing the target image into a plurality of areas, and taking the image of each area as an image to be processed.

In an exemplary embodiment, each of the regions has a corresponding preset threshold.

In an exemplary embodiment, the recognition module 710 may be configured to recognize the object in the image to be processed through a pre-trained first neural network model.

In an exemplary embodiment, the first neural network model may be a YOLO model.

In an exemplary embodiment, the second determining module 740 may include: and the density detection unit (not shown in the figure) is used for carrying out density detection on the object in the image to be processed through the pre-trained second neural network model.

In an exemplary embodiment, the second neural network model may include: the first branch network is used for performing first convolution processing on the image to be processed to obtain a first characteristic image; the second branch network is used for performing second convolution processing on the image to be processed to obtain a second characteristic image; the third branch network is used for performing third convolution processing on the image to be processed to obtain a third characteristic image; the merging layer is used for merging the first characteristic image, the second characteristic image and the third characteristic image into a final characteristic image; and the output layer is used for mapping the final characteristic image into a density image.

Details of the solution not disclosed in the above apparatus can be found in the embodiments of the method section, and thus are not described again.

As will be appreciated by one skilled in the art, aspects of the present disclosure may be embodied as a system, method or program product. Accordingly, various aspects of the present disclosure may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.

Exemplary embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, various aspects of the disclosure may also be implemented in the form of a program product comprising program code for causing a terminal device to perform the steps according to various exemplary embodiments of the disclosure described in the above-mentioned "exemplary methods" section of this specification, when the program product is run on the terminal device.

Referring to fig. 8, a program product 800 for implementing the above method according to an exemplary embodiment of the present disclosure is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

Exemplary embodiments of the present disclosure also provide an electronic device capable of implementing the above method. An electronic device 900 according to such an exemplary embodiment of the present disclosure is described below with reference to fig. 9. The electronic device 900 shown in fig. 9 is only an example and should not bring any limitations to the functionality or scope of use of the embodiments of the present disclosure.

As shown in fig. 9, electronic device 900 may take the form of a general purpose computing device. Components of electronic device 900 may include, but are not limited to: the at least one processing unit 910, the at least one storage unit 920, a bus 930 connecting different system components (including the storage unit 920 and the processing unit 910), and a display unit 940.

The storage unit 920 stores program code, which may be executed by the processing unit 910, so that the processing unit 910 performs the steps according to various exemplary embodiments of the present disclosure described in the above-mentioned "exemplary method" section of this specification. For example, processing unit 910 may perform the method steps shown in fig. 4 or fig. 5, and so on.

The storage unit 920 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM)921 and/or a cache memory unit 922, and may further include a read only memory unit (ROM) 923.

Storage unit 920 may also include a program/utility 924 having a set (at least one) of program modules 925, such program modules 925 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 930 can be any of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 900 may also communicate with one or more external devices 1000 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 900, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 900 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interface 950. Also, the electronic device 900 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet) via the network adapter 960. As shown, the network adapter 960 communicates with the other modules of the electronic device 900 via the bus 930. It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with the electronic device 900, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the exemplary embodiments of the present disclosure.

Furthermore, the above-described figures are merely schematic illustrations of processes included in methods according to exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit according to an exemplary embodiment of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is to be limited only by the terms of the appended claims.

Claims

1. A method for determining a number of objects, comprising:

acquiring a target image, identifying objects in the target image, selecting a densely distributed part of the objects and marking a boundary line at a position where the distance between the two objects exceeds a certain distance to obtain a region, iteratively expanding the region to increase the object density of the expanded region until the object density reaches the maximum, and segmenting the region from the target image; repeating the process to divide the target image into a plurality of areas, and taking the image of each area as an image to be processed;

identifying the objects in the image to be processed, and taking the number of the identified objects as a first numerical value;

comparing the first value with a preset threshold value;

if the first numerical value is smaller than the preset threshold value, determining the number of the objects in the image to be processed as the first numerical value;

if the first numerical value is larger than the preset threshold value, performing density detection on the objects in the image to be processed to obtain a second numerical value related to the number of the objects, and determining the number of the objects in the image to be processed as the second numerical value;

wherein each of the regions has a corresponding preset threshold determined by:

and respectively calculating the area of partial images in which the objects possibly appear in each region, dividing the area by the area of the image occupied by each object, and multiplying the obtained numerical value by an empirical coefficient smaller than 1 to obtain a corresponding preset threshold value.

2. The method of claim 1, wherein the identifying the object in the image to be processed comprises:

and identifying the object in the image to be processed through a pre-trained first neural network model.

3. The method of claim 2, wherein the first neural network model comprises a YOLO model.

4. The method according to claim 1, wherein the density detection of the object in the image to be processed comprises:

and carrying out density detection on the object in the image to be processed through a pre-trained second neural network model.

5. The method of claim 4, wherein the second neural network model comprises:

the first branch network is used for performing first convolution processing on the image to be processed to obtain a first characteristic image;

the second branch network is used for carrying out second convolution processing on the image to be processed to obtain a second characteristic image;

the third branch network is used for carrying out third convolution processing on the image to be processed to obtain a third characteristic image;

the merging layer is used for merging the first characteristic image, the second characteristic image and the third characteristic image into a final characteristic image;

and the output layer is used for mapping the final characteristic image into a density image.

6. An object quantity determination apparatus, comprising:

the acquisition module is used for acquiring a target image, identifying objects in the target image, selecting a densely distributed part of the objects, marking a boundary line at a position where the distance between the two objects exceeds a certain distance to obtain a region, iteratively expanding the region to increase the object density of the expanded region until the object density reaches the maximum, and segmenting the region from the target image; repeating the process to divide the target image into a plurality of areas, and taking the image of each area as an image to be processed;

the identification module is used for identifying the objects in the image to be processed and taking the number of the identified objects as a first numerical value;

the comparison module is used for comparing the first numerical value with a preset threshold value;

a first determining module, configured to determine, if the first numerical value is smaller than the preset threshold, the number of the objects in the image to be processed as the first numerical value;

a second determining module, configured to perform density detection on the objects in the image to be processed if the first value is greater than the preset threshold, to obtain a second value related to the number of the objects, and determine the number of the objects in the image to be processed as the second value;

wherein each of the regions has a corresponding preset threshold determined by:

7. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of any one of claims 1-5.

8. An electronic device, comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the method of any of claims 1-5 via execution of the executable instructions.