CN111815671A

CN111815671A - Target quantity statistical method, system, computer device and storage medium

Info

Publication number: CN111815671A
Application number: CN201910284714.2A
Authority: CN
Inventors: 张谷力; 吴旻烨
Original assignee: Yaoke Intelligent Technology Shanghai Co ltd
Current assignee: Yaoke Intelligent Technology Shanghai Co ltd
Priority date: 2019-04-10
Filing date: 2019-04-10
Publication date: 2020-10-23
Anticipated expiration: 2039-04-10
Also published as: CN111815671B

Abstract

The target quantity statistical method, the system, the computer device and the storage medium of the application acquire continuous frames of images and sequentially execute the following processing on each frame of image: detecting each target in the current frame image through a detector; constructing a current tracker for each target detected from the current frame image to form a current tracker set; the current tracker has a time attribute; comparing each current tracker in the constructed current tracker set with the existing tracker set to judge the attribution condition of the current tracker so as to update the existing tracker set; judging the effectiveness of the trackers by judging whether the set values of the time attributes meet preset time conditions or not, wherein the number of the trackers is the target number; the method and the device have the advantages that the quantity of the targets is obtained by reciprocating detection and tracking and generating and updating the corresponding tracker of each target, and because the tracking is lower than the detected operation quantity in principle and has more stable results on counting results, the efficiency and the accuracy are guaranteed.

Description

Target quantity statistical method, system, computer device and storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a method and a system for counting a target number, a computer device, and a storage medium.

Background

The stable and reliable real-time people counting system can play a crucial role in many fields, for example, the real-time people flow conditions in some public places can help decision makers to find potential dangers, and also can help them to dynamically schedule some public transportation resources to facilitate people dispersion, and the real-time people flow conditions in amusement parks can also facilitate users to select own playing sequences. In addition, the development of the current electronic manufacturing technology makes the acquisition of video streams more and more convenient, for example, people can live broadcast only by one intelligent device at present. In the traditional video monitoring, people count one by checking videos, so that the labor cost is increased and the efficiency is low; in addition, it is difficult for a person to obtain an accurate result even when the person is shielded from the other person.

In order to solve the problems, the related real-time people counting system is additionally provided with a depth camera besides an RGB video collector on video collection, RGB video streams are input to a multi-class classifier to obtain the detected people, and depth information is used for judging whether shielding occurs or not to improve the people counting accuracy.

Such a combination can indeed improve the technical accuracy slightly, but the recognition accuracy of multi-class classifiers for occluded human bodies in complex scenes is not very high, which fundamentally affects the final accuracy of the system.

Disclosure of Invention

In view of the above-mentioned shortcomings of the prior art, the present application is directed to providing a target number counting method, system, computer device and storage medium, which solve the problems of the prior art, such as inefficient people counting scheme.

To achieve the above and other related objects, the present application provides a method for counting a target number, comprising: acquiring continuous frames of images, and sequentially executing the following processing on each frame of image: when each current frame image is received, each target in the current frame image is detected through a detector; constructing a current tracker for each target detected from the current frame image to form a current tracker set; the current tracker has a time attribute, and the set value of the time attribute is related to the acquisition time of the current frame image; comparing each current tracker in the constructed current tracker set with each existing tracker in the existing tracker set according to the previous frame image to judge whether each current tracker and each existing tracker belong to the same target or not; updating the existing tracker set according to the attribution condition of each current tracker, wherein the updating comprises the following steps: updating the set value of the time attribute of each existing tracker to the set value of the time attribute of the current tracker belonging to the same target; adding the current tracker which does not belong to the same target with each existing tracker into the existing tracker set as the existing tracker of the newly added target; and judging the effectiveness of each tracker in the updated existing tracker set by judging whether the set value of the time attribute accords with a preset time condition, and counting the number of each tracker judged to be effective to obtain the target number.

In an embodiment of the present application, the detector is configured to obtain target feature information of each target in each frame of image; wherein the target feature information includes: one or more combinations of position characteristic information of a region where the target is located, skeleton characteristic information of the target and confidence of the skeleton characteristic information; each of the trackers is associated with the feature information of a corresponding target in a frame image.

In an embodiment of the present application, the comparing each current tracker in the constructed current tracker set with each existing tracker in the existing tracker set according to the previous frame image to determine whether each current tracker and each existing tracker belong to the same target includes: judging whether each current tracker and each existing tracker belong to the same target or not according to the difference of target characteristic information between the current tracker and the existing trackers, wherein the judging comprises the following steps: and judging whether the current tracker and the existing tracker belong to the same target or not according to one or more combinations of differences among the position characteristic information of the region where the target is located, the skeleton characteristic information of the target and the confidence coefficient of the skeleton characteristic information.

In an embodiment of the present application, the area where the target is located is marked by a bounding box; the position characteristic information of the area where the target is located comprises: position information of the bounding box; the difference between the position characteristic information of the area where the target is located comprises the proportion of the overlapping area between the two bounding boxes in the combined area of the two bounding boxes.

In an embodiment of the present application, the skeleton feature information includes: position information of the skeleton key points, the difference of the skeleton feature information including a position offset of the skeleton key points.

In an embodiment of the present application, the predetermined time condition includes: and the difference value between the set value of the time attribute and the current time is smaller than a preset threshold value.

To achieve the above and other related objects, the present application provides a target quantity statistical system, comprising: the receiving module is used for acquiring continuous images of each frame; a processing module, configured to perform the following processing on each acquired frame image: when each current frame image is received, each target in the current frame image is detected through a detector; constructing a current tracker for each target detected from the current frame image to form a current tracker set; the current tracker has a time attribute, and the set value of the time attribute is related to the acquisition time of the current frame image; comparing each current tracker in the constructed current tracker set with each existing tracker in the existing tracker set according to the previous frame image to judge whether each current tracker and each existing tracker belong to the same target or not; updating the existing tracker set according to the attribution condition of each current tracker, wherein the updating comprises the following steps: updating the set value of the time attribute of each existing tracker to the set value of the time attribute of the current tracker belonging to the same target; adding the current tracker which does not belong to the same target with each existing tracker into the existing tracker set as the existing tracker of the newly added target; and judging the effectiveness of each tracker in the updated existing tracker set by judging whether the set value of the time attribute accords with a preset time condition, and counting the number of each tracker judged to be effective to obtain the target number.

To achieve the above and other related objects, the present application provides a computer apparatus comprising: one or more communicators, at least one of which communicates with one or more cameras to acquire successive image frames acquired by the one or more cameras; one or more memories for storing computer programs; one or more processors, coupled to the one or more communicators and memory, for executing the computer program to perform any of the target quantity statistics methods.

To achieve the above and other related objects, the present application provides a computer-readable storage medium storing a computer program, which when executed by one or more processors performs any one of the target quantity statistical methods.

As described above, the target quantity statistical method, system, computer device, and storage medium of the present application acquire successive frames of images, and sequentially perform the following processing for each frame of image: when each current frame image is received, each target in the current frame image is detected through a detector; constructing a current tracker for each target detected from the current frame image to form a current tracker set; the current tracker has a time attribute, and the set value of the time attribute is related to the acquisition time of the current frame image; comparing each current tracker in the constructed current tracker set with each existing tracker in the existing tracker set according to the previous frame image to judge whether each current tracker and each existing tracker belong to the same target or not; updating the existing tracker set according to the attribution condition of each current tracker, wherein the updating comprises the following steps: updating the set value of the time attribute of each existing tracker to the set value of the time attribute of the current tracker belonging to the same target; adding the current tracker which does not belong to the same target with each existing tracker into the existing tracker set as the existing tracker of the newly added target; judging the effectiveness of each tracker in the updated existing tracker set by judging whether the set value of the time attribute accords with a preset time condition, and counting the number of each tracker judged to be effective to obtain the target number; this application is through to every frame image reciprocating detection and tracking to the tracker that corresponds of every target is generated and updated, and then obtains the target quantity according to tracker quantity, because the tracking will be lower a lot and bring more stable result on the counting result from the operand that detects in principle, efficiency and accuracy are all guaranteed.

Drawings

Fig. 1 is a schematic structural diagram of an application scenario in an embodiment of the present application.

Fig. 2 is a schematic circuit diagram of a processing device according to an embodiment of the present disclosure.

Fig. 3 is a schematic diagram illustrating the tracking principle in the embodiment of the present application.

Fig. 4 is a flowchart illustrating a target quantity statistical method in an embodiment of the present application.

Fig. 5 is a functional block diagram of a target quantity statistical system in the embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application is provided by way of specific examples, and other advantages and effects of the present application will be readily apparent to those skilled in the art from the disclosure herein. The present application is capable of other and different embodiments and its several details are capable of modifications and/or changes in various respects, all without departing from the spirit of the present application. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

Embodiments of the present application will be described in detail below with reference to the accompanying drawings so that those skilled in the art to which the present application pertains can easily carry out the present application. The present application may be embodied in many different forms and is not limited to the embodiments described herein.

In order to clearly explain the present application, components that are not related to the description are omitted, and the same reference numerals are given to the same or similar components throughout the specification.

Throughout the specification, when it is said that a certain component is "connected" or "coupled" to another component, this includes not only the case of "directly connecting" but also the case of "indirectly connecting" with other elements interposed therebetween. In addition, when a component is referred to as "including" a certain constituent element, unless otherwise stated, it means that the component may include other constituent elements, without excluding other constituent elements.

When an element is referred to as being "on" another element, it can be directly on the other element, or intervening elements may also be present. When a component is referred to as being "directly on" another component, there are no intervening components present.

Although the terms first, second, etc. may be used herein to describe various elements in some instances, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, the first interface and the second interface, etc. are described. Also, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context indicates otherwise. It will be further understood that the terms "comprises," "comprising," "includes" and/or "including," when used in this specification, specify the presence of stated features, steps, operations, elements, components, items, species, and/or groups, but do not preclude the presence, or addition of one or more other features, steps, operations, elements, components, species, and/or groups thereof. The terms "or" and/or "as used herein are to be construed as inclusive or meaning any one or any combination. Thus, "A, B or C" or "A, B and/or C" means "any of the following: a; b; c; a and B; a and C; b and C; A. b and C ". An exception to this definition will occur only when a combination of elements, functions, steps or operations are inherently mutually exclusive in some way.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used herein, the singular forms "a", "an" and "the" include plural forms as long as the words do not expressly indicate a contrary meaning. The term "comprises/comprising" when used in this specification is taken to specify the presence of stated features, regions, integers, steps, operations, elements, and/or components, but does not exclude the presence or addition of other features, regions, integers, steps, operations, elements, and/or components.

Terms indicating "lower", "upper", and the like relative to space may be used to more easily describe a relationship of one component with respect to another component illustrated in the drawings. Such terms are intended to include not only the meanings indicated in the drawings, but also other meanings or operations of the device in use. For example, if the device in the figures is turned over, elements described as "below" other elements would then be oriented "above" the other elements. Thus, the exemplary terms "under" and "beneath" all include above and below. The device may be rotated 90 or other angles and the terminology representing relative space is also to be interpreted accordingly.

Although not defined differently, including technical and scientific terms used herein, all terms have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. Terms defined in commonly used dictionaries are to be additionally interpreted as having meanings consistent with those of related art documents and the contents of the present prompts, and must not be excessively interpreted as having ideal or very formulaic meanings unless defined.

In public places such as airports, railway stations, amusement parks and the like, the identification, tracking, people counting and the like of pedestrians are realized in an image processing mode; however, as mentioned above, the existing solutions are very inefficient, and even though the purpose of deblocking is achieved in cooperation with the advanced technology of the depth camera, the multi-class classifier that distinguishes based on the image features of different pedestrians is also very inefficient in terms of accuracy.

In view of the above, the technical solution of the present application is to provide a solution for image processing with low computation amount, high efficiency and accuracy, so as to solve the problems in the prior art one by one.

As shown in fig. 1, a schematic structural diagram of an application scenario in the embodiment of the present application is shown.

In this embodiment, an application scenario of the technical solution of the present application may be a system shown in fig. 1, where the system includes: one or more cameras 101, and a processing device 102.

Wherein, the camera 101 can be a normal camera, a single lens reflex camera, a video camera, etc.; in addition, in the case of a plurality of cameras 101, the plurality of cameras 101 may be separated independently or integrated in the same camera array.

The processing device 102 is communicatively coupled to the camera 101 to enable the camera 101 to transmit image data to the processing device 102 and/or the processing device 102 to send control instructions to the camera 101.

In some examples, the communication connection may be a wired connection through an electrical line, such as a connection through a USB interface, an HDMI interface, or the like of a counterpart to a standard line.

In some examples, the communication connection may also be a wireless connection, for example, a connection through a wireless communicator of the opposite end, such as WiFi, bluetooth, mobile communication module (2G/3G/4G/5G), and the like.

In some examples, the communication connection may also be a network connection, i.e., a long-range communication connection over a local area network and/or the internet.

The processing device 102 has processing and computing capabilities, and takes image data (such as one or more of photos, videos, and the like) collected by the camera 101 as input, and performs image computing processing to output results, such as target tracking results, target quantity statistics, and the like.

The processing device 102 may vary in its implementation according to the application scenarios of the various embodiments, for example, in some examples, the processing device 102 may be integrated as a component in the same device as the camera 101.

Alternatively, in some examples, the processing device 102 may be located as a different device from the camera 101, and the processing device 102 may be a server, a desktop computer, a laptop computer, a tablet computer, a smart phone, or the like.

In some embodiments, the processing device 102 may also access the network 103 to connect to the remote device 104. In some embodiments, the network may include one or more networks of any type, such as a Local Area Network (LAN), a Wide Area Network (WAN), a Metropolitan Area Network (MAN), a telephone network, such as the Public Switched Telephone Network (PSTN) or a Public Land Mobile Network (PLMN), an intranet, the internet, a storage device, or a combination of networks. The PLMN may further include a packet-switched subnet, such as a General Packet Radio Service (GPRS), Cellular Digital Packet Data (CDPD), or mobile IP subnet.

In some embodiments, the remote device 104 is an electronic terminal with a display, such as a desktop computer, a notebook computer, a tablet computer, or a smart phone. The remote device 104 may display results obtained by the processing means 102 processing the image, such as target tracking results, target quantity statistics, and the like.

Referring to fig. 2, a schematic structural diagram of a processing apparatus 200 according to an embodiment of the present disclosure is shown.

The processing device 200 may be implemented by an architecture of a computer system, comprising: one or more communicators 201, memory 202, and processors 203.

In the present embodiment, the number of components shown in the drawings is only an example, and is not limited thereto.

The one or more communicators 201, comprising: and a first communicator which communicates with the camera and can be realized by adopting an interface circuit for communicating in a wired connection (such as USB and HDMI) mode or a wireless connection (such as WiFi and 2G/3G/4G/5G) mode. Optionally, the one or more communicators 201 may further include a second communicator, which may be implemented using, for example, a wired communication circuit or a wireless communication circuit (e.g., WiFi, 2G/3G/4G/5G) for connecting a communication network to communicate with a network device.

The memory 202 for storing computer instructions;

the processor 203, coupled to the communicator 201 and memory 202, is used to execute the computer instructions to perform desired image processing functions, such as object recognition, tracking, and statistics.

The memory 202 may include, but is not limited to, a high speed random access memory 202, a non-volatile memory 202. Such as one or more magnetic disk storage devices, flash memory devices, or other non-volatile solid-state storage devices.

The Processor 203 may be a general-purpose Processor 203, and includes a Central processing unit 203 (CPU), a Network Processor 203 (NP), and the like; the device may also be a Digital Signal processor 203 (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete gate or transistor logic device, or discrete hardware components.

Fig. 3 is a schematic flow chart showing a target quantity statistical method in the embodiment of the present application. Alternatively, the method may be implemented by the processor of fig. 2 executing computer instructions.

The method obtains the target number by performing image operation processing on continuous frame images (which can be from acquired videos) obtained from a camera.

The method acquires continuous frames of images and sequentially executes the following processing to each frame of image:

step S301: upon receiving each current frame image, each object in the current frame image is detected by a detector.

And the detector is used for obtaining the target characteristic information of each target in each frame of image.

In some embodiments, the detector may be implemented by a deep learning target detection algorithm, such as R-CNN, Faster R-CNN, R-FCN, YOLO, SSD, and the like.

Each frame of image may contain one or more targets, and after detection by the detector, each target may be assigned a unique target identifier, i.e., a target ID; in addition, the area where the target is located can be determined; the markers may be framed, for example, by a bounding box, or further marked in the image by semantic segmentation.

Optionally, in order to avoid the problem of occlusion, the detector may preferably detect skeleton feature information of the target and associate the skeleton feature information with the ID of the target for representing the target. The obtaining of the skeleton feature information is actually an estimation of skeleton key points, the skeleton feature information includes a plurality of skeleton key points, generally joint points, such as one or more of right shoulder, right elbow, right wrist, left shoulder, left elbow, left wrist, right hip, right knee, right ankle, left hip, left knee, left ankle, vertex, and neck, and the skeleton can be obtained by connecting the skeleton key points.

There are many algorithms for human skeleton extraction, such as models proposed in various papers of ECCV (international conference on computer vision in europe) 2016 and 2018, and can be implemented by using LSTM network.

Accordingly, in an embodiment, the target feature information includes: one or more combinations of position characteristic information of a region where the target is located, skeleton characteristic information of the target and confidence of the skeleton characteristic information; each of the trackers is associated with the feature information of a corresponding target in a frame image.

Optionally, the position characteristic information of the area where the target is located may be represented by partial characteristic information of the aforementioned bounding box, for example, the position and size information of the center point of the bounding box.

The skeleton feature information of the target may include position information (may be represented by coordinates) of each skeleton key point of the target skeleton, and the confidence degree is a probability value estimated for each skeleton key point feature information.

Step S302: and constructing a current tracker for each target detected from the current frame image to form a current tracker set.

In some embodiments, the tracker may be trained to generate from features in the image data of the region in which the target is located.

The algorithm based on target feature tracking is generally based on the following:

1) method of dot representation: the target is represented by a point, i.e. its center point. For a relatively large target one may choose to represent it with a set of points.

2) Geometric representation: the target area is represented by defining an ellipse or a matrix box.

3) Skeleton representation: and (4) transforming the contour of the target through the middle axis to extract an object skeleton model.

4) Outline representation: the contour of the object represents the boundary of the object.

Optionally, the skeleton representation method of 3) may be adopted, and the tracker is trained and generated according to the skeleton feature information of the target in the image data. The tracker may be obtained according to a tracking algorithm implemented based on the aforementioned skeleton extraction algorithm.

Wherein the current tracker has a time attribute, and the set value of the time attribute is related to the acquisition time of the current frame image.

As shown in fig. 4, the design idea of the present application is to establish a tracker 403 for each target 402 in each frame of image 400 detected by the detector 401, and maintain the effective trackers 403, so that the number of targets to be counted can be obtained according to the number of trackers 403.

Preferably, when the detector 401 and the tracker 403 capable of detecting and tracking the skeleton feature information of the skeleton 404 of the target 402 are used, problems of the target being blocked, such as being blocked by an object or being blocked between targets, can be solved more effectively.

Step S303: and comparing each current tracker in the constructed current tracker set with each existing tracker in the existing tracker set according to the previous frame image to judge whether each current tracker and each existing tracker belong to the same target or not.

Specifically, by comparing the difference between the existing tracker of the target in the previous frame image and the current tracker of the target in the next frame image, if the targets belong to the same target, the repeated counting can be avoided.

The alignment specifically comprises: and judging whether the current tracker and the existing tracker belong to the same target or not according to one or more combinations of differences among the position characteristic information of the region where the target is located, the skeleton characteristic information of the target and the confidence coefficient of the skeleton characteristic information.

For example, let each tracker i have the location of each skeletal keypoint of the target it is tracking

And confidence of each skeletal keypoint

And position information A of the bounding box of the objectⁱ。

For the degree of similarity between tracker i and tracker j (e.g., an existing tracker and a current tracker), the following formula can be used for calculation:

where α, β, and γ are positive numbers, the weight occupied by each portion is controlled separately. The first part in the above formula represents the position offset between the corresponding key points of the target tracked by the two trackers, the second part represents the predicted confidence difference of the corresponding key point network, and the third part is the proportion of the overlapping area of the boundary frame of the human body.

Of course, the above formula is only an example, and the composition, calculation parameters, etc. may be changed in practical cases, and the invention is not limited thereto.

The similarity degree (or the difference degree) calculated by the principle can judge the difference between the two trackers, namely the similarity degree between the current tracker in the current frame image and the existing tracker in the previous frame image can be judged, and if the similarity is proved, the two trackers belong to the same target.

Step S304: updating the existing tracker set according to the attribution condition of each current tracker, wherein the updating comprises the following steps: updating the set value of the time attribute of each existing tracker to the set value of the time attribute of the current tracker belonging to the same target; and adding the current tracker which does not belong to the same target with each existing tracker into the existing tracker set as the existing tracker of the newly added target.

Specifically, the existing tracker and the current tracker that are determined to be the same target may be integrated, that is, the set value of the time attribute of the current tracker may be assigned to the existing tracker of the same target. For example, if the time attribute setting value of the current tracker a is a, and after comparison, a and the existing tracker B belong to the same target, and the time attribute setting value of the existing tracker B is older B, B of B is updated to a.

And if the current tracker of the existing tracker which does not belong to the same target appears, the current tracker is the newly added target belonging to the current frame image, and the current tracker is added into the existing tracker set.

Step S305: and judging the effectiveness of each tracker in the updated existing tracker set by judging whether the set value of the time attribute meets the preset time condition.

In some embodiments, the preset time condition comprises: and the difference value between the set value of the time attribute and the current time is smaller than a preset threshold value.

Specifically, if the difference between the set value of the time attribute of an existing tracker and the current time is greater than the preset threshold, it indicates that the existing tracker is not updated, that is, the corresponding target of the existing tracker may have disappeared, and it may be determined as an invalid tracker.

Step S306: and counting the number of the trackers judged to be effective to obtain the target number.

And counting the effective trackers in the updated existing tracker set to obtain the number of the targets required to be counted.

Thereafter, steps S301 to S306 may be repeated to process successive frame images in turn, thereby obtaining a final statistical target number.

It should be noted that, in the case of target detection based on the target skeleton features, the statistics of the number of targets based on the skeleton detector has a good effect, and in the case of severe occlusion, the skeleton information can be well detected.

In addition, based on the strategy of performing "detection-tracking" reciprocating operation on each frame of image adopted by the method, rather than a pure detection method, the method can bring about two improvements: one is an increase in efficiency because tracking is in principle much less computationally intensive than detection. Secondly, a more stable result is brought to the counting result, because in some cases, for example, the illumination environment is not very stable, the tracker can obtain a good tracking effect, and the detector can give a result with serious jump under the circumstance.

Fig. 5 is a schematic diagram showing functional modules of the target quantity statistical system in the embodiment of the present application.

It should be noted that the principle of the present embodiment is basically the same as that of the previous embodiment, and various technical features in the embodiment may be applied to the present embodiment, so that the following description is not repeated.

The target quantity statistical system comprises:

a receiving module 501, configured to obtain consecutive frames of images;

a processing module 502, configured to perform the following processing on each acquired frame image: when each current frame image is received, each target in the current frame image is detected through a detector; constructing a current tracker for each target detected from the current frame image to form a current tracker set; the current tracker has a time attribute, and the set value of the time attribute is related to the acquisition time of the current frame image; comparing each current tracker in the constructed current tracker set with each existing tracker in the existing tracker set according to the previous frame image to judge whether each current tracker and each existing tracker belong to the same target or not; updating the existing tracker set according to the attribution condition of each current tracker, wherein the updating comprises the following steps: updating the set value of the time attribute of each existing tracker to the set value of the time attribute of the current tracker belonging to the same target; adding the current tracker which does not belong to the same target with each existing tracker into the existing tracker set as the existing tracker of the newly added target; and judging the effectiveness of each tracker in the updated existing tracker set by judging whether the set value of the time attribute accords with a preset time condition, and counting the number of each tracker judged to be effective to obtain the target number.

It should be noted that the division of the modules in the system embodiment of fig. 5 is only a logical division, and the actual implementation may be wholly or partially integrated into one physical entity, or may be physically separated. And these modules can be realized in the form of software called by processing element; or may be implemented entirely in hardware; and part of the modules can be realized in the form of calling software by the processing element, and part of the modules can be realized in the form of hardware. For example, the processing module may be a processing element separately set up, or may be implemented by being integrated in a chip of the apparatus, or may be stored in a memory of the apparatus in the form of program code, and a function of the processing module may be called and executed by a processing element of the apparatus. Other modules are implemented similarly. In addition, all or part of the modules can be integrated together or can be independently realized. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in the form of software.

For example, the above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), among others. For another example, when one of the above modules is implemented in the form of a Processing element scheduler code, the Processing element may be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor capable of calling program code. For another example, these modules may be integrated together and implemented in the form of a system-on-a-chip (SOC).

In addition, the various computer instructions involved in the method embodiment of FIG. 3 may be loaded onto a computer-readable storage medium, which may include, but is not limited to, floppy diskettes, optical disks, CD-ROMs (compact disk-read only memory), magneto-optical disks, ROMs (read-only memory), RAMs (random access memory), EPROMs (erasable programmable read-only memory), EEPROMs (electrically erasable programmable read-only memory), magnetic or optical cards, flash memory, or other media/machine-readable medium suitable for storing the machine-executable instructions. The computer readable storage medium may be a product that is not accessed by the computer device or may be a component that is used by an accessed computer device.

In particular implementations, the computer programs are routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types.

To sum up, the target quantity statistical method, the system, the computer device and the storage medium of the present application acquire each continuous frame of image, and sequentially perform the following processing for each frame of image: when each current frame image is received, each target in the current frame image is detected through a detector; constructing a current tracker for each target detected from the current frame image to form a current tracker set; the current tracker has a time attribute, and the set value of the time attribute is related to the acquisition time of the current frame image; comparing each current tracker in the constructed current tracker set with each existing tracker in the existing tracker set according to the previous frame image to judge whether each current tracker and each existing tracker belong to the same target or not; updating the existing tracker set according to the attribution condition of each current tracker, wherein the updating comprises the following steps: updating the set value of the time attribute of each existing tracker to the set value of the time attribute of the current tracker belonging to the same target; adding the current tracker which does not belong to the same target with each existing tracker into the existing tracker set as the existing tracker of the newly added target; judging the effectiveness of each tracker in the updated existing tracker set by judging whether the set value of the time attribute accords with a preset time condition, and counting the number of each tracker judged to be effective to obtain the target number; this application is through to every frame image reciprocating detection and tracking to the tracker that corresponds of every target is generated and updated, and then obtains the target quantity according to tracker quantity, because the tracking will be lower a lot and bring more stable result on the counting result from the operand that detects in principle, efficiency and accuracy are all guaranteed.

The above embodiments are merely illustrative of the principles and utilities of the present application and are not intended to limit the application. Any person skilled in the art can modify or change the above-described embodiments without departing from the spirit and scope of the present application. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical concepts disclosed in the present application shall be covered by the claims of the present application.

Claims

1. A method for counting a target quantity, comprising:

acquiring continuous frames of images, and sequentially executing the following processing on each frame of image:

when each current frame image is received, each target in the current frame image is detected through a detector;

constructing a current tracker for each target detected from the current frame image to form a current tracker set; the current tracker has a time attribute, and the set value of the time attribute is related to the acquisition time of the current frame image;

comparing each current tracker in the constructed current tracker set with each existing tracker in the existing tracker set according to the previous frame image to judge whether each current tracker and each existing tracker belong to the same target or not;

updating the existing tracker set according to the attribution condition of each current tracker, wherein the updating comprises the following steps: updating the set value of the time attribute of each existing tracker to the set value of the time attribute of the current tracker belonging to the same target; adding the current tracker which does not belong to the same target with each existing tracker into the existing tracker set as the existing tracker of the newly added target;

and judging the effectiveness of each tracker in the updated existing tracker set by judging whether the set value of the time attribute accords with a preset time condition, and counting the number of each tracker judged to be effective to obtain the target number.

2. The method of claim 1, wherein the detector is configured to obtain object feature information of each object in each frame of image; wherein the target feature information includes: one or more combinations of position characteristic information of a region where the target is located, skeleton characteristic information of the target and confidence of the skeleton characteristic information; each of the trackers is associated with the feature information of a corresponding target in a frame image.

3. The method according to claim 2, wherein the comparing each current tracker in the constructed current tracker set with each existing tracker in the existing tracker set according to the previous frame image to determine whether each current tracker and each existing tracker belong to the same target includes:

judging whether each current tracker and each existing tracker belong to the same target or not according to the difference of target characteristic information between the current tracker and the existing trackers, wherein the judging comprises the following steps: and judging whether the current tracker and the existing tracker belong to the same target or not according to one or more combinations of differences among the position characteristic information of the region where the target is located, the skeleton characteristic information of the target and the confidence coefficient of the skeleton characteristic information.

4. The method of claim 3, wherein the region in which the target is located is marked by a bounding box; the position characteristic information of the area where the target is located comprises: position information of the bounding box; the difference between the position characteristic information of the area where the target is located comprises the proportion of the overlapping area between the two bounding boxes in the combined area of the two bounding boxes.

5. The method of claim 3, wherein the skeletal feature information comprises: position information of the skeleton key points, the difference of the skeleton feature information including a position offset of the skeleton key points.

6. The method of claim 1, wherein the preset time condition comprises: and the difference value between the set value of the time attribute and the current time is smaller than a preset threshold value.

7. A target quantity statistics system, comprising:

the receiving module is used for acquiring continuous images of each frame;

a processing module, configured to perform the following processing on each acquired frame image:

when each current frame image is received, each target in the current frame image is detected through a detector; constructing a current tracker for each target detected from the current frame image to form a current tracker set; the current tracker has a time attribute, and the set value of the time attribute is related to the acquisition time of the current frame image; comparing each current tracker in the constructed current tracker set with each existing tracker in the existing tracker set according to the previous frame image to judge whether each current tracker and each existing tracker belong to the same target or not; updating the existing tracker set according to the attribution condition of each current tracker, wherein the updating comprises the following steps: updating the set value of the time attribute of each existing tracker to the set value of the time attribute of the current tracker belonging to the same target; adding the current tracker which does not belong to the same target with each existing tracker into the existing tracker set as the existing tracker of the newly added target; and judging the effectiveness of each tracker in the updated existing tracker set by judging whether the set value of the time attribute accords with a preset time condition, and counting the number of each tracker judged to be effective to obtain the target number.

8. The system of claim 7, wherein the detector is configured to obtain object feature information of each object in each frame of image; wherein the target feature information includes: one or more combinations of position characteristic information of a region where the target is located, skeleton characteristic information of the target and confidence of the skeleton characteristic information; each of the trackers is associated with the feature information of a corresponding target in a frame image.

9. The system according to claim 8, wherein the comparing each current tracker in the constructed current tracker set with each existing tracker in the existing tracker set according to the previous frame image to determine whether each current tracker and each existing tracker belong to the same target includes:

10. The system of claim 9, wherein the region in which the target is located is marked by a bounding box; the position characteristic information of the area where the target is located comprises: position information of the bounding box; the difference between the position characteristic information of the area where the target is located comprises the proportion of the overlapping area between the two bounding boxes in the combined area of the two bounding boxes.

11. The system of claim 9, wherein the skeletal feature information comprises: position information of the skeleton key points, the difference of the skeleton feature information including a position offset of the skeleton key points.

12. The system of claim 7, wherein the preset time condition comprises: and the difference value between the set value of the time attribute and the current time is smaller than a preset threshold value.

13. A computer device, comprising:

one or more communicators, at least one of which communicates with one or more cameras to acquire successive image frames acquired by the one or more cameras;

one or more memories for storing computer programs;

one or more processors, coupled to the one or more communicators and memory, for executing the computer program to perform the method of any of claims 1-6.

14. A computer-readable storage medium, in which a computer program is stored which, when being executed by one or more processors, carries out the method according to any one of claims 1 to 6.