CN116229369A

CN116229369A - Method, device and equipment for detecting people flow and computer readable storage medium

Info

Publication number: CN116229369A
Application number: CN202310226256.3A
Authority: CN
Inventors: 吴新涛
Original assignee: Jiayang Smart Security Technology Beijing Co ltd
Current assignee: Jiayang Smart Security Technology Beijing Co ltd
Priority date: 2023-03-03
Filing date: 2023-03-03
Publication date: 2023-06-06
Anticipated expiration: 2043-03-03
Also published as: CN116229369B

Abstract

The embodiment of the application provides a method, a device, equipment and a computer readable storage medium for detecting the flow of people, wherein the method comprises the following steps: acquiring a first image including a person object obtained when a first scene is shot; preprocessing the first image to obtain a second image; identifying the second image through the instance segmentation model to obtain a detection frame of the person object, an instance type of the person object and a confidence coefficient of the detection frame in the second image; based on a non-maximum suppression algorithm, respectively screening a first detection frame set corresponding to each instance category from the detection frames; screening second detection frames with confidence degrees larger than a preset confidence degree threshold value from each first detection frame set respectively; and determining the people flow corresponding to the first scene according to the number of the second detection frames. According to the detection method, the people flow corresponding to the first scene can be accurately determined.

Description

Method, device and equipment for detecting people flow and computer readable storage medium

Technical Field

The application belongs to the technical field of artificial intelligence, and particularly relates to a method, a device, equipment and a computer readable storage medium for detecting traffic.

Background

In areas with dense personnel such as airports, gyms and schools, the people flow in the areas often needs to be detected in real time, so that abnormal conditions can be timely found and corresponding early warning can be sent out when people are too dense, and accordingly, people can be timely guided and limited in flow, and potential safety hazards are avoided.

In the prior art, the infrared detection device is generally used for detecting the people flow in the people-dense area, but when the phenomenon that people intensively pass through the detection area occurs, the phenomenon that people are missed to be detected often occurs by using the infrared detection device, so that the people flow is detected inaccurately.

Disclosure of Invention

The embodiment of the application provides a method, a device and equipment for detecting the traffic of people and a computer readable storage medium, which can accurately detect the traffic of people in a dense area.

In a first aspect, an embodiment of the present application provides a method for detecting a traffic flow, where the method includes: acquiring a first image including a person object obtained when a first scene is shot; preprocessing the first image to obtain a second image; identifying the second image through the instance segmentation model to obtain a detection frame of the person object, an instance type of the person object and a confidence coefficient of the detection frame in the second image; based on a non-maximum suppression algorithm, respectively screening a first detection frame set corresponding to each instance category from the detection frames; screening second detection frames with confidence degrees larger than a preset confidence degree threshold value from each first detection frame set respectively; and determining the people flow corresponding to the first scene according to the number of the second detection frames.

According to an embodiment of the first aspect of the present application, after determining the traffic of people corresponding to the first scene according to the number of the second detection frames, the detection method further includes: according to the corresponding relation between the people flow and the type, determining the people flow type corresponding to the people flow; and uploading the first image to the target equipment under the condition that the traffic type is the target type.

According to any one of the foregoing embodiments of the first aspect of the present application, the acquiring a first image including a person object obtained when capturing a first scene specifically includes: acquiring a video of a shot first scene; and carrying out character object recognition on the video frames in the video to obtain a first image comprising the character object.

According to any one of the foregoing embodiments of the first aspect of the present application, based on a non-maximum suppression algorithm, a first detection frame set corresponding to each instance category is respectively screened from detection frames, and specifically includes: for each instance category, the following operations are performed: according to the confidence coefficient of the detection frame, determining a first target detection frame with the highest confidence coefficient corresponding to the instance category; calculating the cross ratio between the first target detection frame and other detection frames; the other detection frames are other detection frames except the first target detection frame in the example category; screening a second target detection frame with the cross-over ratio smaller than or equal to a preset cross-over ratio threshold value from other detection frames; and determining a first detection frame set according to the first target detection frame and the second target detection frame.

According to any of the foregoing embodiments of the first aspect of the present application, the example segmentation model is constructed according to a feature pyramid network, a region candidate network, a candidate region matching algorithm, a fast target detection algorithm, and a full convolution neural network.

According to any one of the foregoing embodiments of the first aspect of the present application, before the identifying, by the instance segmentation model, the second image to obtain the detection frame of the person object, the instance class of the person object, and the confidence level of the detection frame in the second image, the detection method further includes: acquiring a plurality of training samples, wherein each training sample comprises a first person image sample in a scene sample and a first label image sample corresponding to the first person image sample; preprocessing each first human image sample respectively to obtain preprocessed first human image samples; and performing model training according to the preprocessed first person image sample and the first label image sample to obtain a trained example segmentation model.

According to any of the foregoing embodiments of the first aspect of the present application, preprocessing the first image to obtain a second image specifically includes: the size of each first image is respectively adjusted to obtain a first image with a target size; and respectively carrying out standardization processing on the first image of each target size to obtain a second image.

According to any one of the foregoing embodiments of the first aspect of the present application, before each of the first human image samples is preprocessed to obtain a preprocessed first human image sample, the detection method further includes: expanding the first person image sample and the first label image sample based on an automatic data enhancement algorithm to obtain a second person image sample and a second label image sample; each first human image sample is preprocessed respectively to obtain preprocessed first human image samples, and the method specifically comprises the following steps: preprocessing each second character image sample to obtain preprocessed second character image samples; model training is carried out according to the preprocessed first person image sample and the first label image sample, and a trained example segmentation model is obtained, which specifically comprises the following steps: and performing model training according to the preprocessed second character image sample and the second label image sample to obtain a trained example segmentation model.

In a second aspect, an embodiment of the present application provides a device for detecting a flow of people, where the device includes: the acquisition module is used for acquiring a first image comprising a person object, which is obtained when a first scene is shot; the first processing module is used for preprocessing the first image to obtain a second image; the identification module is used for identifying the second image through the instance segmentation model to obtain a detection frame of the person object, an instance type of the person object and a confidence coefficient of the detection frame in the second image; the first screening module is used for screening a first detection frame set corresponding to each instance category from the detection frames based on a non-maximum suppression algorithm; the second screening module is used for screening out second detection frames with confidence degrees larger than a preset confidence degree threshold value from each first detection frame set respectively; the first determining module is used for determining the people flow corresponding to the first scene according to the number of the second detecting frames.

In a third aspect, an embodiment of the present application provides an electronic device, including: a processor, a memory and a computer program stored on the memory and executable on the processor, the computer program when executed by the processor performing the steps of the method for detecting a flow of persons as provided in the first aspect.

In a fourth aspect, embodiments of the present application provide a computer readable storage medium having a computer program stored thereon, which when executed by a processor, implements the steps of the method for detecting a flow of people as provided in the first aspect.

In order to enable the example segmentation model to identify the person object in the image, the method, the device, the equipment and the computer readable storage medium for detecting the person flow are provided, the first image is preprocessed before the first image obtained through shooting is input into the example segmentation model to obtain the second image, and then the second image comprising the person object is identified through the example segmentation model to obtain a detection frame of the person object, an example category of the person object and a confidence degree of the detection frame in the second image. According to the embodiment of the application, when the phenomenon of intensive personnel and mutual shielding in the second image occurs by using the example segmentation model, the example segmentation can be accurately performed on the person object in the second image, and the phenomenon of personnel omission is avoided. Meanwhile, in order to avoid the possibility that the same person object corresponds to a plurality of detection frames, after the detection frames are obtained, the embodiment of the application also respectively screens out first detection frame sets corresponding to each instance category from the detection frames based on a non-maximum suppression algorithm, and then respectively screens out second detection frames with confidence degrees larger than a preset confidence degree threshold value from each first detection frame set, so that the probability that one person object corresponds to one detection frame is improved. Therefore, the number of the second detection frames can be used for measuring the number of the person objects in the first scene, so that the person flow corresponding to the first scene can be accurately determined according to the number of the second detection frames, and the accuracy of person flow detection is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments of the present application will be briefly described, and it is possible for a person skilled in the art to obtain other drawings according to these drawings without inventive effort.

Fig. 1 is a flow chart of a method for detecting a flow of people according to an embodiment of the present application;

fig. 2 is a flow chart of another method for detecting a flow of people according to an embodiment of the present application;

fig. 3 is a flow chart of another method for detecting a flow of people according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a device for detecting a flow of people according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Features and exemplary embodiments of various aspects of the present application are described in detail below to make the objects, technical solutions and advantages of the present application more apparent, and to further describe the present application in conjunction with the accompanying drawings and the detailed embodiments. It should be understood that the specific embodiments described herein are intended to be illustrative of the application and are not intended to be limiting. It will be apparent to one skilled in the art that the present application may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the present application by showing examples of the present application.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

It should be understood that the term "and/or" as used herein is merely one relationship describing the association of the associated objects, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.

It will be apparent to those skilled in the art that various modifications and variations can be made in the present application without departing from the spirit or scope of the application. Accordingly, this application is intended to cover such modifications and variations of this application as fall within the scope of the appended claims (the claims) and their equivalents. The embodiments provided in the examples of the present application may be combined with each other without contradiction.

Before describing the technical solution provided by the embodiments of the present application, in order to facilitate understanding of the embodiments of the present application, the present application first specifically describes a problem existing in the prior art:

as described above, the inventor of the present application found that in the prior art, the infrared detection device is generally used to detect the traffic of people in the area where people are densely located, but when the phenomenon that people are densely located through the detection area occurs, the phenomenon that people are missed to be detected often occurs when the infrared detection device is used, so that the traffic detection is inaccurate, and further, a proper early warning cannot be sent according to the traffic detection result.

In view of the above research findings of the inventor, the embodiments of the present application provide a method, an apparatus, a device, and a computer readable storage medium for detecting a traffic flow, which can solve the technical problem in the prior art that the traffic flow cannot be accurately detected.

The following first describes a method for detecting a traffic flow according to an embodiment of the present application.

Fig. 1 is a flow chart of a method for detecting a flow of people according to an embodiment of the present application. As shown in fig. 1, the method may include the following steps S101 to S106:

s101, acquiring a first image including a person object, which is obtained when a first scene is shot.

The method comprises the steps of shooting a video of a first scene to be detected in real time by using monitoring equipment, wherein the first scene can be a scene with dense personnel, such as an airport, a gym, a school and the like, and obtaining a first image comprising a person object to be detected in the video according to the shot video of the first scene.

S102, preprocessing the first image to obtain a second image.

And respectively adjusting the size of each first image to the target size, and respectively carrying out standardization processing on the first images adjusted to the target size to obtain a second image which accords with the input condition of the example segmentation model.

S103, identifying the second image through the instance segmentation model to obtain a detection frame of the person object, an instance type of the person object and a confidence degree of the detection frame in the second image.

Inputting the second image into a pre-trained instance segmentation model, and identifying the second image through the instance segmentation model, wherein the obtained model output result is a detection frame of the person object in the second image, an instance type of the person object and a confidence level of the detection frame, and the detection frame of the person object is determined according to the center coordinates of the detection frame and the width and the height of the detection frame.

S104, based on a non-maximum suppression algorithm, respectively screening out a first detection frame set corresponding to each instance category from the detection frames.

And based on a non-maximum suppression algorithm, performing non-maximum suppression operation on each detection frame in the model output result, and screening detection frames corresponding to each instance category from a plurality of detection frames to form a first detection frame set of the instance category.

S105, screening out second detection frames with confidence degrees larger than a preset confidence degree threshold value from each first detection frame set.

And screening the second detection frames with the confidence coefficient larger than a preset confidence coefficient threshold value from the first detection frame set corresponding to each instance category respectively to obtain the second detection frames corresponding to each instance category respectively. Illustratively, the preset confidence threshold may be set to 0.4, which is not limited by the embodiments of the present application.

S106, according to the number of the second detection frames, determining the people flow corresponding to the first scene.

The number of the second detection frames can measure the number of the person objects in the second image, so that the person flow corresponding to the first scene can be determined according to the number of the person objects in the second image.

In order to enable the example segmentation model to identify the person object in the image, the method for detecting the flow of people provided by the embodiment of the invention is characterized in that the first image is preprocessed before the first image obtained through shooting is input into the example segmentation model to obtain the second image, and then the second image comprising the person object is identified through the example segmentation model to obtain the detection frame of the person object, the example type of the person object and the confidence degree of the detection frame in the second image. According to the embodiment of the application, when the phenomenon of intensive personnel and mutual shielding in the second image occurs by using the example segmentation model, the example segmentation can be accurately performed on the person object in the second image, and the phenomenon of personnel omission is avoided. Meanwhile, in order to avoid the possibility that the same person object corresponds to a plurality of detection frames, after the detection frames are obtained, the embodiment of the application also respectively screens out first detection frame sets corresponding to each instance category from the detection frames based on a non-maximum suppression algorithm, and then respectively screens out second detection frames with confidence degrees larger than a preset confidence degree threshold value from each first detection frame set, so that the probability that one person object corresponds to one detection frame is improved. Therefore, the number of the second detection frames can be used for measuring the number of the person objects in the first scene, so that the person flow corresponding to the first scene can be accurately determined according to the number of the second detection frames, and the accuracy of person flow detection is improved.

In some embodiments, after determining the traffic of people corresponding to the first scene according to the number of the second detection frames, the detection method further includes: according to the corresponding relation between the people flow and the type, determining the people flow type corresponding to the people flow; and uploading the first image to the target equipment under the condition that the traffic type is the target type.

The method includes the steps that after the people flow corresponding to the first scene is detected, whether the current people flow type is a dense people flow type needing early warning is judged according to the corresponding relation between the preset people flow and the type, if the current people flow type is the dense people flow type needing early warning, an acquired first image is uploaded to target equipment, and accordingly an operator of an alarm platform can send early warning timely according to the first image.

In some embodiments, acquiring a first image including a person object obtained when capturing a first scene specifically includes: acquiring a video of a shot first scene; and carrying out character object recognition on the video frames in the video to obtain a first image comprising the character object.

The monitoring device is used for shooting the video of the first scene in real time, and frame cutting processing is carried out on the video to obtain continuous N frame images, the continuous N frame images are identified by people, and the frame images including the person object in the N frame images, namely the first image, are determined.

Fig. 2 is a flow chart of another method for detecting a traffic flow according to an embodiment of the present application, as shown in fig. 2, optionally, S104, based on a non-maximum suppression algorithm, respectively screens out a first detection frame set corresponding to each instance category from detection frames, and specifically may further include the following steps S201 to S204:

s201, determining a first target detection frame with highest confidence corresponding to the instance category according to the confidence of the detection frame.

Classifying the detection frames of all the person objects according to the instance categories of the person objects, arranging all the detection frames corresponding to each instance category according to the confidence level, and determining the detection frame with the highest confidence level corresponding to the instance category as a first target detection frame.

S202, calculating the cross-over ratio between the first target detection frame and other detection frames.

The cross-over ratios between the other detection frames than the first target detection frame and the first target detection frame in the example category are calculated based on the following formula (1), respectively.

Wherein b _high For the first target detection frame, b _rest For other detection frames than the first target detection frame, insertion is an intersection solution formula, and unit is a union solution formula, IOU (b _high ,b _rest ) Is the cross-over ratio between the first target detection frame and the other detection frames.

S203, screening a second target detection frame with the cross-over ratio smaller than or equal to a preset cross-over ratio threshold value from other detection frames.

And screening detection frames with the cross-over ratio smaller than or equal to a preset cross-over ratio threshold value from other detection frames to serve as second target detection frames corresponding to the example category, so that redundant detection frames with higher overlap ratio with the first target detection frames in the other detection frames can be eliminated.

S204, determining a first detection frame set according to the first target detection frame and the second target detection frame.

The first target detection frame and the second target detection frame corresponding to each instance category form a first detection frame set corresponding to the instance category together, so that the first detection frame set corresponding to each instance category is obtained.

In some embodiments, the example segmentation model is constructed from a feature pyramid network, a region candidate network, a candidate region matching algorithm, a fast target detection algorithm, and a full convolutional neural network.

Illustratively, embodiments of the present application construct an instance segmentation model (Mask R-CNN) in the order of a feature pyramid network (Feature Pyramid Networks, FPN), a region candidate network (Region Proposal Network, RPN), a candidate region matching algorithm (Region Of Interest Align, ROIAlign), a Fast target detection algorithm (Fast R-CNN), and a full convolutional neural network (Fully Convolutional Networks, FCN) based on neural architecture search (Neural Architecture Search, NAS) techniques.

Because in the target detection and semantic segmentation model, the low-level characteristic semantic information is less, but the spatial resolution is higher; while the feature semantic information of the higher layer is rich, the spatial resolution is lower, and the method is not suitable for generating a refined result. Therefore, the embodiment of the application adopts the NAS-FPN network which can integrate the characteristics of different layers and combine the advantages of the characteristics of multiple layers to construct the instance segmentation model so as to further improve the performance of the instance segmentation model. Therefore, even if the phenomenon of intensive personnel and mutual shielding exists in the model input image, the example segmentation model can still accurately segment the person object in the image, and the phenomenon of personnel omission is avoided to the greatest extent.

Fig. 3 is a flowchart of another method for detecting a traffic flow according to an embodiment of the present application, as shown in fig. 3, optionally, before the second image is identified by the instance segmentation model in S103 to obtain a detection frame of a person object, an instance category of the person object, and a confidence level of the detection frame in the second image, the detection method may further include the following steps S301 to S303:

s301, acquiring a plurality of training samples, wherein each training sample comprises a first person image sample in a scene sample and a first label image sample corresponding to the first person image sample.

And acquiring a video of a certain scene by using monitoring equipment, and identifying each frame of image in the video by manpower to obtain a plurality of first person image samples comprising person objects. And manually labeling pixels belonging to different person instance categories in each first person image sample respectively to obtain a plurality of first label image samples corresponding to the first person image samples.

At this time, the first person image sample and the labeled first label image sample may be divided according to a certain proportion, so as to obtain a plurality of training samples and a plurality of test samples. For example, 75% of the first person image samples and the first label image samples corresponding thereto are used as training samples, and 25% of the first person image samples and the first label image samples corresponding thereto are used as test samples. The training sample is used for training the instance segmentation model, and the test sample is used for checking whether the trained instance segmentation model meets the standard.

S302, preprocessing each first human image sample respectively to obtain preprocessed first human image samples.

And respectively preprocessing each first human image sample in the training sample and the test sample to enable the preprocessed first human image samples to accord with the input conditions of the example segmentation model.

And S303, performing model training according to the preprocessed first person image sample and the first label image sample to obtain a trained example segmentation model.

And inputting the preprocessed first human image sample in the training sample into a pre-constructed example segmentation model to obtain a model output result. And calculating a loss function value of the model according to the model output result and the first label image sample corresponding to the first person image sample. And updating the weight of the model according to the loss function value until the loss function value meets the preset training stopping condition, so as to obtain the trained example segmentation model.

After the example segmentation model is trained, the preprocessed first person image sample in the test sample is input into the trained example segmentation model, and a model output result is obtained. And respectively comparing the model output result with the first label image sample to obtain the accuracy of the model output result. If the accuracy of the model output result is lower than a preset threshold, the character image sample needs to be re-acquired, and training is continued on the instance segmentation model until the accuracy is greater than or equal to the preset threshold, and at this time, a trained instance segmentation model is obtained.

In some embodiments, preprocessing the first image to obtain a second image specifically includes: the size of each first image is respectively adjusted to obtain a first image with a target size; and respectively carrying out standardization processing on the first image of each target size to obtain a second image.

Illustratively, the preprocessing of the first images requires that the size of each first image is adjusted to a fixed size, and then the normalization processing is performed on the first images adjusted to the fixed size according to the following formula (2), so that the normalized first person image samples conform to the input conditions of the instance segmentation model.

Where I is the matrix of image pixels, m is the mean of the image pixels, and σ is the variance of the image pixels.

In some embodiments, before each first human image sample is separately preprocessed to obtain a preprocessed first human image sample, the detection method further includes: expanding the first person image sample and the first label image sample based on an automatic data enhancement algorithm to obtain a second person image sample and a second label image sample; each first human image sample is preprocessed respectively to obtain preprocessed first human image samples, and the method specifically comprises the following steps: preprocessing each second character image sample to obtain preprocessed second character image samples; model training is carried out according to the preprocessed first person image sample and the first label image sample, and a trained example segmentation model is obtained, which specifically comprises the following steps: and performing model training according to the preprocessed second character image sample and the second label image sample to obtain a trained example segmentation model.

In an exemplary embodiment, in consideration of the fact that the image features shot by the camera of the monitoring device are greatly changed due to the influence of weather, and the image with single features is not beneficial to training of the model, in the embodiment of the application, before each first person image sample is preprocessed respectively, in order to enrich the features of the model training sample, the robustness of the trained model is made stronger, and an automatic data enhancement algorithm is adopted to expand the acquired first person image sample and first label image sample. The second character image sample and the second label image sample after expansion are obtained, and after the second character image sample after expansion is preprocessed, the preprocessed second character image sample and the second label image sample are used for subsequent model training.

In the process of automatically enhancing the data of the first person image sample and the first label image sample, each training sample data set generates 5 data enhancement strategies, wherein one data enhancement strategy comprises 5 sub-strategies, and one sub-strategy is completed by two data enhancement operations, namely the data enhancement operation of the first person image sample and the data enhancement operation of the first label image sample. A data enhancement operation consists of two parts, including the probability of the operation and the magnitude to which the operation is related.

In the search space of the data enhancement strategy thus constituted, the embodiments of the present application use reinforcement learning to obtain the best combination of data enhancement operations. Finally, in the ShearX/Y, translateX/Y, rotate, autoContrast, invert, equalize, solarize, posterize, contrast, color, brightness, sharpness, cutout and Sample printing, five data enhancement operations of TranslateX_BBox and Equalise, translateY_only_BBoxes and Cutout, sharpness and ShearX_BBox, shearY_BBox and TranslateY_only_BBoxes, rotation_BBox and Color are further selected. The TranslateX_BBox translates the first person image sample and the first label image sample; the equallize is used for carrying out histogram equalization on each pixel channel of the first person image sample and the first label image sample; translateY_only_BBoxes are used for randomly translating the first label image sample; cutout is to delete part of rectangular areas in the first person image sample and the first label image sample; sharpness is the image sharpening of the first person image sample and the first label image sample; shearX_BBox is used for cutting the horizontal edges of the first person image sample and the first label image sample; shearY_BBox is used for cutting the vertical edges of the first person image sample and the first label image sample; rotate_bbox is to Rotate the first person image sample and the first label image sample; color is the Color transformation of the first person image sample and the first label image sample.

Based on the method for detecting the traffic of people provided by the embodiment, correspondingly, the application also provides a specific implementation mode of the device for detecting the traffic of people. Please refer to the following examples.

As shown in fig. 4, the device 40 for detecting the flow of people provided in the embodiment of the present application includes the following modules:

an obtaining module 401, configured to obtain a first image including a person object obtained when a first scene is captured;

a first processing module 402, configured to pre-process the first image to obtain a second image;

the recognition module 403 is configured to recognize the second image through the instance segmentation model, so as to obtain a detection frame of the person object in the second image, an instance type of the person object, and a confidence coefficient of the detection frame;

the first screening module 404 is configured to screen, based on a non-maximum suppression algorithm, a first detection frame set corresponding to each instance category from the detection frames respectively;

a second screening module 405, configured to screen, from each of the first detection frame sets, a second detection frame with a confidence level greater than a preset confidence level threshold;

the first determining module 406 is configured to determine, according to the number of second detection frames, a traffic volume corresponding to the first scene.

In order to enable the example segmentation model to identify the person object in the image, the device for detecting the flow of people provided by the embodiment of the invention pre-processes the first image before inputting the first image obtained by shooting into the example segmentation model to obtain a second image, and then identifies the second image comprising the person object through the example segmentation model to obtain a detection frame of the person object, an example category of the person object and a confidence degree of the detection frame in the second image. According to the embodiment of the application, when the phenomenon of intensive personnel and mutual shielding in the second image occurs by using the example segmentation model, the example segmentation can be accurately performed on the person object in the second image, and the phenomenon of personnel omission is avoided. Meanwhile, in order to avoid the possibility that the same person object corresponds to a plurality of detection frames, after the detection frames are obtained, the embodiment of the application also respectively screens out first detection frame sets corresponding to each instance category from the detection frames based on a non-maximum suppression algorithm, and then respectively screens out second detection frames with confidence degrees larger than a preset confidence degree threshold value from each first detection frame set, so that the probability that one person object corresponds to one detection frame is improved. Therefore, the number of the second detection frames can be used for measuring the number of the person objects in the first scene, so that the person flow corresponding to the first scene can be accurately determined according to the number of the second detection frames, and the accuracy of person flow detection is improved.

In some embodiments, the above-mentioned device 40 for detecting the flow of people may further include: the image uploading module is used for determining the people flow type corresponding to the people flow according to the corresponding relation between the people flow and the type; and uploading the first image to the target equipment under the condition that the traffic type is the target type.

In some embodiments, the acquiring module 401 is specifically configured to: acquiring a video of a shot first scene; and carrying out character object recognition on the video frames in the video to obtain a first image comprising the character object.

In some embodiments, the first screening module 404 is specifically configured to: for each instance category, the following operations are performed: according to the confidence coefficient of the detection frame, determining a first target detection frame with the highest confidence coefficient corresponding to the instance category; calculating the cross ratio between the first target detection frame and other detection frames; the other detection frames are other detection frames except the first target detection frame in the example category; screening a second target detection frame with the cross-over ratio smaller than or equal to a preset cross-over ratio threshold value from other detection frames; and determining a first detection frame set according to the first target detection frame and the second target detection frame.

In some embodiments, the above-mentioned device 40 for detecting the flow of people may further include: the model training module is used for acquiring a plurality of training samples, and each training sample comprises a first person image sample in the scene sample and a first label image sample corresponding to the first person image sample; preprocessing each first human image sample respectively to obtain preprocessed first human image samples; and performing model training according to the preprocessed first person image sample and the first label image sample to obtain a trained example segmentation model.

In some embodiments, the first processing module 402 is specifically configured to: the size of each first image is respectively adjusted to obtain a first image with a target size; and respectively carrying out standardization processing on the first image of each target size to obtain a second image.

In some embodiments, the above-mentioned device 40 for detecting the flow of people may further include: the sample expansion module is used for expanding the first person image sample and the first label image sample based on an automatic data enhancement algorithm to obtain a second person image sample and a second label image sample; each first human image sample is preprocessed respectively to obtain preprocessed first human image samples, and the method specifically comprises the following steps: preprocessing each second character image sample to obtain preprocessed second character image samples; model training is carried out according to the preprocessed first person image sample and the first label image sample, and a trained example segmentation model is obtained, which specifically comprises the following steps: and performing model training according to the preprocessed second character image sample and the second label image sample to obtain a trained example segmentation model.

Each module in the apparatus shown in fig. 4 has a function of implementing each step in fig. 1, and can achieve a corresponding technical effect, which is not described herein for brevity.

Based on the method for detecting the traffic of people provided by the embodiment, correspondingly, the application also provides a specific implementation mode of the electronic equipment. Please refer to the following examples.

Fig. 5 shows a schematic hardware structure of an electronic device according to an embodiment of the present application.

The electronic device may include a processor 501 and a memory 502 storing computer program instructions.

In particular, the processor 501 may include a central processing unit (Central Processing Unit, CPU), or an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or may be configured to implement one or more integrated circuits of embodiments of the present application.

Memory 502 may include mass storage for data or instructions. By way of example, and not limitation, memory 502 may comprise a Hard Disk Drive (HDD), floppy Disk Drive, flash memory, optical Disk, magneto-optical Disk, magnetic tape, or universal serial bus (Universal Serial Bus, USB) Drive, or a combination of two or more of the foregoing. In one example, memory 502 may include removable or non-removable (or fixed) media, or memory 502 may be a non-volatile solid state memory. Memory 502 may be internal or external to the integrated gateway disaster recovery device.

In one example, memory 502 may be Read Only Memory (ROM). In one example, the ROM may be mask-programmed ROM, programmable ROM (PROM), erasable PROM (EPROM), electrically Erasable PROM (EEPROM), electrically rewritable ROM (EAROM), or flash memory, or a combination of two or more of these.

Memory 502 may include Read Only Memory (ROM), random Access Memory (RAM), magnetic disk storage media devices, optical storage media devices, flash memory devices, electrical, optical, or other physical/tangible memory storage devices. Thus, in general, the memory includes one or more tangible (non-transitory) computer-readable storage media (e.g., memory devices) encoded with software comprising computer-executable instructions and when the software is executed (e.g., by one or more processors) it is operable to perform the operations described with reference to a method according to an aspect of the present application.

The processor 501 reads and executes the computer program instructions stored in the memory 502 to implement the methods/steps S101 to S106 in the embodiment shown in fig. 1, and achieve the corresponding technical effects achieved by executing the methods/steps in the embodiment shown in fig. 1, which are not described herein for brevity.

In one example, the electronic device may also include a communication interface 503 and a bus 510. As shown in fig. 5, the processor 501, the memory 502, and the communication interface 503 are connected to each other by a bus 510 and perform communication with each other.

The communication interface 503 is mainly used to implement communication between each module, apparatus, unit and/or device in the embodiments of the present application.

Bus 510 includes hardware, software, or both that couple components of the electronic device to one another. By way of example, and not limitation, the buses may include an accelerated graphics port (Accelerated Graphics Port, AGP) or other graphics Bus, an enhanced industry standard architecture (Extended Industry Standard Architecture, EISA) Bus, a Front Side Bus (FSB), a HyperTransport (HT) interconnect, an industry standard architecture (Industry Standard Architecture, ISA) Bus, an infiniband interconnect, a Low Pin Count (LPC) Bus, a memory Bus, a micro channel architecture (MCa) Bus, a Peripheral Component Interconnect (PCI) Bus, a PCI-Express (PCI-X) Bus, a Serial Advanced Technology Attachment (SATA) Bus, a video electronics standards association local (VLB) Bus, or other suitable Bus, or a combination of two or more of the above. Bus 510 may include one or more buses, where appropriate. Although embodiments of the present application describe and illustrate a particular bus, the present application contemplates any suitable bus or interconnect.

In addition, in combination with the method for detecting the flow of people in the above embodiment, the embodiment of the application may be implemented by providing a computer readable storage medium. The computer readable storage medium has stored thereon computer program instructions; the computer program instructions, when executed by a processor, implement the method for detecting traffic in any of the above embodiments. Examples of computer readable storage media include non-transitory computer readable storage media such as electronic circuits, semiconductor memory devices, ROMs, random access memories, flash memories, erasable ROMs (EROM), floppy disks, CD-ROMs, optical disks, hard disks.

It should be clear that the present application is not limited to the particular arrangements and processes described above and illustrated in the drawings. For the sake of brevity, a detailed description of known methods is omitted here. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present application are not limited to the specific steps described and illustrated, and those skilled in the art can make various changes, modifications, and additions, or change the order between steps, after appreciating the spirit of the present application.

The functional blocks shown in the above-described structural block diagrams may be implemented in hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic circuit, an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), suitable firmware, a plug-in, a function card, or the like. When implemented in software, the elements of the present application are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine readable medium or transmitted over transmission media or communication links by a data signal carried in a carrier wave. A "machine-readable medium" may include any medium that can store or transfer information. Examples of machine-readable media include electronic circuitry, semiconductor memory devices, ROM, flash memory, erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, radio Frequency (RF) links, and the like. The code segments may be downloaded via computer networks such as the internet, intranets, etc.

It should also be noted that the exemplary embodiments mentioned in this application describe some methods or systems based on a series of steps or devices. However, the present application is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be different from the order in the embodiments, or several steps may be performed simultaneously.

Aspects of the present application are described above with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, enable the implementation of the functions/acts specified in the flowchart and/or block diagram block or blocks. Such a processor may be, but is not limited to being, a general purpose processor, a special purpose processor, an application specific processor, or a field programmable logic circuit. It will also be understood that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware which performs the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In the foregoing, only the specific embodiments of the present application are described, and it will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the systems, modules and units described above may refer to the corresponding processes in the foregoing method embodiments, which are not repeated herein. It should be understood that the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the present application, which are intended to be included in the scope of the present application.

Claims

1. A method for detecting a flow of people, the method comprising:

acquiring a first image including a person object obtained when a first scene is shot;

preprocessing the first image to obtain a second image;

identifying the second image through an instance segmentation model to obtain a detection frame of a person object in the second image, an instance category of the person object and a confidence coefficient of the detection frame;

based on a non-maximum suppression algorithm, respectively screening a first detection frame set corresponding to each instance category from the detection frames;

Screening out second detection frames with the confidence coefficient larger than a preset confidence coefficient threshold value from each first detection frame set;

and determining the people flow corresponding to the first scene according to the number of the second detection frames.

2. The method of claim 1, wherein after the determining the traffic of people corresponding to the first scene according to the number of the second detection frames, the method further comprises:

according to the corresponding relation between the people flow and the type, determining the people flow type corresponding to the people flow;

and uploading the first image to target equipment under the condition that the traffic type is the target type.

3. The method according to claim 1, wherein the acquiring the first image including the person object obtained when the first scene is captured specifically includes:

acquiring a shot video of the first scene;

and carrying out character object recognition on the video frames in the video to obtain the first image comprising the character object.

4. The method according to claim 1, wherein the screening the first detection box set corresponding to each instance category from the detection boxes based on a non-maximum suppression algorithm, specifically includes:

For each instance category, the following operations are performed:

according to the confidence coefficient of the detection frame, determining a first target detection frame with the highest confidence coefficient corresponding to the instance category;

calculating the cross ratio between the first target detection frame and other detection frames; the other detection frames are other detection frames except the first target detection frame in the example category;

screening a second target detection frame of which the cross ratio is smaller than or equal to a preset cross ratio threshold value from the other detection frames;

and determining the first detection frame set according to the first target detection frame and the second target detection frame.

5. The method of claim 1, wherein the example segmentation model is constructed from a feature pyramid network, a region candidate network, a candidate region matching algorithm, a fast target detection algorithm, and a full convolutional neural network.

6. The method of claim 5, wherein prior to identifying the second image by the instance segmentation model to obtain a detection box for the person object, an instance category for the person object, and a confidence level for the detection box in the second image, the method further comprises:

Acquiring a plurality of training samples, wherein each training sample comprises a first person image sample in a scene sample and a first label image sample corresponding to the first person image sample;

preprocessing each first human image sample respectively to obtain preprocessed first human image samples;

and performing model training according to the preprocessed first person image sample and the first label image sample to obtain a trained example segmentation model.

7. The method according to any one of claims 1-6, wherein the preprocessing the first image to obtain a second image specifically comprises:

respectively adjusting the size of each first image to obtain a first image with a target size;

and respectively carrying out standardization processing on the first image of each target size to obtain the second image.

8. The method of claim 6, wherein prior to the separately pre-processing each of the first human image samples to obtain a pre-processed first human image sample, the method further comprises:

expanding the first person image sample and the first label image sample based on an automatic data enhancement algorithm to obtain a second person image sample and a second label image sample;

Each first human image sample is preprocessed to obtain a preprocessed first human image sample, and the method specifically comprises the following steps:

preprocessing each second character image sample to obtain preprocessed second character image samples;

model training is carried out according to the preprocessed first person image sample and the first label image sample to obtain a trained example segmentation model, and the model specifically comprises the following steps:

and performing model training according to the preprocessed second character image sample and the second label image sample to obtain a trained example segmentation model.

9. A device for detecting a flow of people, the device comprising:

the acquisition module is used for acquiring a first image comprising a person object, which is obtained when a first scene is shot;

the first processing module is used for preprocessing the first image to obtain a second image;

the identification module is used for identifying the second image through an instance segmentation model to obtain a detection frame of the person object in the second image, an instance type of the person object and a confidence coefficient of the detection frame;

the first screening module is used for screening a first detection frame set corresponding to each instance category from the detection frames based on a non-maximum suppression algorithm;

The second screening module is used for screening out second detection frames with the confidence coefficient larger than a preset confidence coefficient threshold value from each first detection frame set;

and the first determining module is used for determining the people flow corresponding to the first scene according to the number of the second detection frames.

10. An electronic device, the electronic device comprising: a processor, a memory and a computer program stored on the memory and executable on the processor, which when executed by the processor, implements the steps of the method for detecting a flow of persons as claimed in any one of claims 1 to 8.

11. A computer-readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, implements the steps of the method for detecting a flow of persons according to any one of claims 1 to 8.