CN114120353A

CN114120353A - Monitoring method and device based on deep learning

Info

Publication number: CN114120353A
Application number: CN202110723095.XA
Authority: CN
Inventors: 杨录; 辛学仕; 姚生意; 张程
Original assignee: Shanghai Wiwide Network Technology Co ltd
Current assignee: Shanghai Wiwide Network Technology Co ltd
Priority date: 2021-06-28
Filing date: 2021-06-28
Publication date: 2022-03-01

Abstract

The monitoring method and device based on deep learning comprise the following steps: respectively inputting 2 images collected by a camera lens into a tracking network; respectively determining detection areas; detecting whether a person enters/leaves a detection area in the 2 images and processing the detection area; selecting one ID in an ID group containing a plurality of IDs as the person ID of the person corresponding to the ID group; when a person whose existing person ID is detected enters/leaves the detection area again, the person is associated with the previously same ID and ID group; the number of people who appear in a period of time is counted. The invention improves the accuracy of counting the number of people appearing in the camera picture by the monitoring system based on the fisheye camera and the deep learning. Redundant character feature extraction and comparison processes are not adopted in the implementation process, so that the running time and the storage space of software can be obviously saved; meanwhile, the scheme is insensitive to image distortion, and is simple, easy to deploy and implement.

Description

Monitoring method and device based on deep learning

Technical Field

The invention relates to the technical field of target detection, in particular to a monitoring method and a monitoring device based on deep learning.

Background

A fisheye lens, which means a lens having a focal length of 16nm or less and a viewing angle of approximately 180 °, is an extreme wide-angle lens. In order to make the lens reach the maximum shooting visual angle, the fish-eye lens has the advantages that the diameter of the front lens is very short, the front lens is in a parabolic shape and protrudes towards the front part of the lens, and the fish-eye lens is similar to the fish eye lens, so that the fish-eye lens is named.

A fisheye camera, i.e. a camera with a fisheye lens. As mentioned above, it has an extremely short focal length (16nm or less) and a viewing angle close to or equal to 180 °

Deep learning belongs to one of machine learning, and is a necessary way for realizing artificial intelligence. Deep learning represents attribute categories or features by combining low-level features to form a more abstract high-level to discover a distributed feature representation of the data.

Based on the fisheye camera and the deep learning, people appearing in a camera picture can be monitored, and for example, people flow statistics and the like can be carried out.

However, in the prior art, a monitoring system based on a fisheye camera and deep learning sometimes repeatedly calculates the same person twice or more times, so that the monitoring system cannot accurately count the number of persons appearing in a camera picture.

Disclosure of Invention

The technical problem solved by the invention is as follows: the monitoring system based on the fisheye camera and the deep learning can accurately count the number of people appearing in a camera picture.

In order to solve the above technical problem, an embodiment of the present invention provides a monitoring method based on deep learning, including:

capturing an image through a camera lens;

respectively inputting 2 images collected by a camera lens into a tracking network;

respectively determining detection areas in the 2 images;

detecting whether a person enters/leaves a detection area in the 2 images and processing the detection area;

selecting one ID in an ID group containing a plurality of IDs as the person ID of the person corresponding to the ID group;

when a person whose existing person ID is detected enters/leaves the detection area again, the person is associated with the previously same ID and ID group;

the number of people who appear in a period of time is counted.

Optionally, the determining the detection regions in the 2 images respectively includes:

determining the boundary area of the 2 images;

and taking the boundary area as a detection area in the 2 images.

Optionally, the determining the boundary area of the 2 images includes: and responding to the operation of an operator on the interactive interface to select the boundary areas from the 2 images respectively.

Optionally, the method further includes: when it is detected that a person enters the detection area in the above 2 images, ID groups are assigned to the persons entering the detection area in each of the images, respectively.

Optionally, the selecting one ID in the ID group as the person ID of the person corresponding to the ID group includes: the smallest ID in the ID group is selected as the person ID of the person corresponding to the ID group.

Optionally, the detecting whether a person enters/leaves the detection area in the 2 images and processing includes: when detecting that a person enters a detection area in the previous image, recording the ID of the person, and judging whether the ID belongs to any existing ID group; if yes, the ID group is added; if not, a new ID group is created, and the ID belongs to the new ID group.

Optionally, the detecting whether a person enters/leaves the detection area in the 2 images and processing includes: when it is detected that a person leaves the detection area in the previous image and enters the detection area in the next image, the person ID in the next image is added to the ID group to which the person ID in the previous image belongs.

Optionally, the detecting whether a person enters/leaves the detection area in the 2 images and processing includes: and when the fact that the person leaves the detection area in the next image is detected, the generation of the ID group corresponding to the person is completed.

Optionally, the method further includes: judging whether different ID groups contain any same ID; if yes, the ID groups containing any same ID are combined.

Optionally, the smallest ID is selected from the combined ID groups as the person ID of the person corresponding to the combined ID group.

Optionally, the camera lens is a fisheye camera lens.

In order to solve the above technical problem, an embodiment of the present invention further provides a monitoring device based on deep learning, including:

a processor adapted to load and execute instructions of a software program;

a memory adapted to store a software program comprising instructions for performing the steps of:

capturing an image through a camera lens;

respectively determining detection areas in the 2 images;

the number of people who appear in a period of time is counted.

Compared with the prior art, the technical scheme of the invention has the following beneficial effects:

respectively inputting 2 images collected by a camera lens into a tracking network; respectively determining detection areas in the 2 images; detecting whether a person enters/leaves a detection area in the 2 images and processing the detection area; selecting one ID in an ID group containing a plurality of IDs as the person ID of the person corresponding to the ID group; when a person whose existing person ID is detected enters/leaves the detection area again, the number of persons appearing in a period of time is counted by associating the person with the previously same ID and ID group, thereby improving the accuracy of counting the number of persons appearing in the camera screen by the monitoring system based on the fisheye camera and the deep learning.

Furthermore, redundant character feature extraction and comparison processes are not adopted in the implementation process, so that the running time and the storage space of software can be obviously saved; meanwhile, the scheme is insensitive to image distortion, can obtain a good effect as long as the performance of a tracking network is good enough and the matching accuracy is high, is simple and easy to deploy and implement, is very suitable for application occasions using a fisheye camera lens as a monitoring camera, and is also suitable for occasions monitored by multiple guns and other panoramic monitoring solutions.

Further, it is disclosed that a specific embodiment of detecting whether a person enters/leaves the detection area in the 2 images and processing the detection area includes: when detecting that a person enters a detection area in the previous image, recording the ID of the person, and judging whether the ID belongs to any existing ID group; if yes, the ID group is added; if not, a new ID group is created, and the ID belongs to the new ID group; when a person is detected to leave a detection area in the previous image and enter a detection area in the next image, the ID of the person in the next image is added into an ID group to which the ID of the person in the previous image belongs; when the person is detected to leave the detection area in the next image, the ID group corresponding to the person is generated, and the method is simple and has small calculation amount.

Drawings

FIG. 1 is a flow chart of a monitoring method based on deep learning according to an embodiment of the present invention;

FIG. 2 is a flow chart of ID matching according to an embodiment of the present invention;

FIG. 3 is a schematic diagram illustrating an expanded view of fisheye lens data according to an embodiment of the invention;

FIG. 4 is a schematic cross-lens diagram between 2P images after the data of the fisheye lens is unfolded according to the embodiment of the invention;

FIG. 5 is a schematic cross-lens diagram between the center view and the 2P view after the fisheye lens data is unfolded according to the embodiment of the invention;

FIG. 6 is a diagram illustrating ID merging according to an embodiment of the present invention.

Detailed Description

As can be seen from the analysis in the background art, based on the fisheye camera and the deep learning, it is possible to monitor people appearing in the camera screen, for example, to perform statistics of the flow of people.

The inventor finds, after research, that in a monitoring system based on a fisheye camera and depth learning, because picture data sent to a depth neural network needs to be expanded into a 2P map (the 2P map means that, as shown in fig. 2, for an image collected by the fisheye camera, an annular region except a central map is the 2P map of the fisheye camera, and the annular region is equally divided into two parts, which is equivalent to dividing the annular region into two lens frames for display, and the two parts are respectively on the 2P map and under the 2P map), which is equivalent to frame split display, so that IDs detected by the same person in different frames are inconsistent, which may cause the above-mentioned defects in the prior art, that is, the monitoring system based on the fisheye camera and the deep learning sometimes repeatedly calculates the same person twice or more times, so that the monitoring system cannot accurately count the number of detected persons.

Therefore, how to accurately count the number of people appearing in a camera picture by a monitoring system based on a fisheye camera and deep learning is an urgent problem to be solved in the field.

Specifically, in the prior art, (refer to 201910826602.5), a plurality of paths of video images are output and displayed by acquiring real-time video images of a plurality of paths of cameras; then, respectively drawing a mask image for each path of real-time video image, and configuring a video analysis area and a common area of an adjacent camera; then, according to the public areas of the adjacent cameras obtained through configuration, feature point matching is carried out on the public areas of the adjacent cameras, and homography matrixes of the adjacent cameras are calculated; then, selecting a specific character target in the video image through an interactive interface, obtaining an ROI (region of interest) region of the corresponding image, classifying all the ROI regions by adopting a deep learning image algorithm, and obtaining an ROI region with the highest score as a character ROI region; finally, the person ROI area is input into a tracking network (also called a tracker) to perform cross-camera tracking, so that the cross-camera tracking ID matching is realized.

The prior art scheme utilizes feature point matching and a homography matrix, is relatively complex, combines a deep learning method, and has no certain guarantee on processing speed; meanwhile, the method is not high in accuracy and depends too much on the distortion degree of the camera, and if the distortion is too large, the accuracy is obviously reduced.

It is known that the extremely short focal length of a fisheye camera brings a larger viewing angle range and also causes certain disadvantages, such as barrel distortion. Barrel distortion refers to a phenomenon in which an image formed by a lens exhibits barrel-shaped expansion, and is also called barrel distortion. Fisheye cameras suffer from substantial barrel distortion due to their extremely short focal length.

Therefore, the above prior art scheme is not suitable for a monitoring system based on a fisheye camera, and in addition, the deployment of edge end engineering also has a certain real-time obstacle.

In another prior art scheme, (reference 201910969309.4) a method for cross-shot tracking of suspicious people based on spatial constraints is adopted, a deep learning algorithm is adopted to detect a human target in a video in real time, a selected suspicious person target concerned by a user is operated and selected by using a provided interactive interface, single-monitoring-camera tracking of the selected suspicious person and cross-shot tracking based on the fact that human features are extracted by the deep learning algorithm to serve as comparison bases are achieved, global optimization is achieved in a local scene according to position information and motion information of the selected suspicious person target, and efficiency of re-identification of human retrieval is improved to a certain extent. However, in the implementation process of the prior art, training of a character feature extraction network and consumption of feature storage are required, which is complex, deployment at an edge end is relatively difficult, and there is a limitation on recognition speed.

Based on the above analysis, the inventor believes that the monitoring scheme based on the fish-eye camera and the deep learning in the prior art has at least the following 2 serious defects:

1) the multiple methods are integrated, and the realization process is complex;

2) the real-time performance in the actual use process cannot be guaranteed, and the accuracy is improved by greatly sacrificing the real-time performance on the basis of the deep learning model.

The inventor further considers that the trans-ocular head after the fisheye camera lens data is unfolded (as shown in fig. 1) at least includes the following 2 types: one (as shown in FIG. 2) is the cross-border between 2P graphs; (as shown in fig. 3) the second is the cross border between the central map and the 2P map. Specifically, the method comprises the following steps:

for the cross-border situation between 2P pictures (i.e. the above-mentioned type 1 situation), firstly, we consider that the cross-shot problem is that the tracking network based on deep learning does not perform association when processing different pictures, so that the same person appears in different shot images and is regarded as a different person to perform ID assignment, as shown in fig. 2, the person on the 2P picture has an original ID of 1, and the ID changes to 2 when leaving the 2P picture, but this is actually the same person and should have the same ID; secondly, under different shots, a person's action track should be from one shot to another, so there is a boundary area, i.e. the "detection area" in the solution of the present invention, such as the area to the right of the black line on the 2P diagram in fig. 2 and the area to the left of the black line on the 2P diagram.

For the cross-border case between the 2P graph and the central graph (i.e., the type 2 case described above), similarly, since the graphs are fed separately into the tracking network, for the same person, the system considers that there is a new person in each graph and assigns it a different ID, for example, in FIG. 3, a person has an ID of 1 in the 2P graph, and becomes 2 after leaving the central graph, and then becomes 3 again on the 2P graph, but the same person should have the same ID.

In the scheme of the invention, the IDs of the persons appearing in sequence in one interval are fixed, so that the same person is prevented from being repeatedly calculated twice or even more times. Specifically, the method comprises the following steps:

the invention inputs 2 images collected by the camera lens into the tracking network respectively; respectively determining detection areas in the 2 images; detecting whether a person enters/leaves a detection area in the 2 images and processing the detection area; selecting one ID in an ID group containing a plurality of IDs as the person ID of the person corresponding to the ID group; when a person whose existing person ID is detected enters/leaves the detection area again, the person is assigned to the previously same ID and ID group, thereby improving the accuracy of counting the number of persons appearing in the camera screen by the monitoring system based on the fisheye camera and the depth learning.

In order that those skilled in the art will better understand and realize the present invention, the following detailed description is given by way of specific embodiments with reference to the accompanying drawings.

Example one

As described below, an embodiment of the present invention provides a monitoring method based on deep learning.

Referring to a flow chart of a monitoring method based on deep learning shown in fig. 1 (and referring to a flow chart of ID matching shown in fig. 2), the following is detailed by specific steps:

and S101, acquiring an image through a camera lens.

Wherein, the camera lens is a fisheye camera lens.

For the defect of the prior art that the number of people is wrong in statistics, the inventor analyzes the following steps: referring to fig. 3, 4 and 5, after the fisheye lens data is unfolded (fig. 3), the cross-lens is divided into two cases: cross shot between 2P maps (fig. 4); cross-shot between the center and 2P maps (fig. 5).

For the cross-shot case between 2P graphs (as shown in fig. 4): first, the cross-mirror problem is that the tracker based on deep learning does not associate when processing different pictures, so that the same person is considered as a different person to perform ID assignment when appearing in different shot images, as shown in fig. 4, the original ID of the person on the 2P image is 1, and the ID changes to 2 when entering the 2P image after leaving the 2P image, but the same person is the same person and should have the same ID; secondly, the action trace of the person under different shots should be from one shot to another, so there is a boundary area, i.e. the "detection area" in this embodiment, such as the area to the right of the black line on the 2P diagram and the area to the left of the black line under the 2P diagram in FIG. 4.

For the cross shot between the 2P map and the center map (as shown in fig. 5): similarly, because the graphs are fed into the tracker separately, for the same person, new people in each graph will be assigned different IDs, for example in FIG. 5, a person's ID is 1 in FIG. 2P, will become 2 after leaving the central graph, and then will become 3 again after entering the 2P graph, but this is the same person and should have the same ID.

The inventors overcome the above-mentioned drawbacks of the prior art by: selecting a boundary area through an interactive interface or other modes; then, each person newly entering the boundary area is assigned an ID group, and through the precedence relationship in the time dimension, that is, one person inevitably appears at the beginning of the next image after the end of one image disappears, two (or more, different detection IDs of the same person may appear in the same area due to the performance problem of the tracker) different IDs appearing in the person are divided into the same group, for example, fig. 5, three IDs appear in total, and the three IDs should be one person, so that the person is divided into a group [ 1, 2, 3 ], and finally the ID with the smallest ID (the ID appearing first) is selected as the ID of the person, so that the fixation of the cross-lens ID of one person is realized.

S102, respectively inputting 2 images collected by the camera lens into the tracking network.

S103, determining the detection regions in the 2 images.

Specifically, in some embodiments, the determining the detection regions in the 2 images respectively includes:

determining the boundary area of the 2 images;

and taking the boundary area as a detection area in the 2 images.

Further, in some embodiments, the determining the boundary area of the 2 images includes: responding to the operation of the operator on the interactive interface to select a boundary area from the 2 images,

And S104, detecting whether a person enters or leaves the detection area in the 2 images and processing.

Specifically, in some embodiments, the detecting whether a person enters/leaves the detection area in the 2 images and the processing includes: when detecting that a person enters a detection area in the previous image, recording the ID of the person, and judging whether the ID belongs to any existing ID group; if yes, the ID group is added; if not, a new ID group is created, and the ID belongs to the new ID group.

In some embodiments, the detecting whether a person enters/leaves the detection area in the 2 images and the processing includes: when it is detected that a person leaves the detection area in the previous image and enters the detection area in the next image, the person ID in the next image is added to the ID group to which the person ID in the previous image belongs.

In some embodiments, the detecting whether a person enters/leaves the detection area in the 2 images and the processing includes: when a person is detected to leave a detection area in a later image, the generation of the ID group corresponding to the person is completed

Wherein, for an ID group including a plurality of IDs, one ID in the ID group is selected as the person ID of the person corresponding to the ID group;

when a person whose existing person ID is detected enters/leaves the detection area again, the person is associated with the previously same ID and ID group.

Further, in some embodiments, the selecting one ID in the ID group as the person ID of the person corresponding to the ID group includes: the smallest ID in the ID group is selected as the person ID of the person corresponding to the ID group.

In some embodiments, further comprising: when it is detected that a person enters the detection area in the above 2 images, ID groups are assigned to the persons entering the detection area in each of the images, respectively.

In some embodiments, further comprising: judging whether different ID groups contain any same ID; if yes, the ID groups containing any same ID are combined.

Further, in some embodiments, the smallest ID is selected from the combined ID group as the person ID of the person corresponding to the combined ID group.

And S105, counting the number of the persons appearing in a period of time.

The above description of the technical solution shows that: in this embodiment, 2 images collected by a camera lens are input into a tracking network, respectively; respectively determining detection areas in the 2 images; detecting whether a person enters/leaves a detection area in the 2 images and processing the detection area; selecting one ID in an ID group containing a plurality of IDs as the person ID of the person corresponding to the ID group; when a person whose existing person ID is detected enters/leaves the detection area again, the number of persons appearing in a period of time is counted by associating the person with the previously same ID and ID group, thereby improving the accuracy of counting the number of persons appearing in the camera screen by the monitoring system based on the fisheye camera and the deep learning.

That is, as shown in fig. 2, the present embodiment avoids the defect of the prior art that the number of people is counted incorrectly by the following means:

1) respectively sending two images acquired by the fisheye camera into a tracking network (tracker);

2) dividing a detection area, namely a boundary area of the two pictures;

3) when a person enters the detection area of the first graph, recording the ID of the person, if the person exists in the existing ID group, recording the ID group, and if the person does not exist in the ID group, indicating that the person is a different person, creating a new ID group;

4) when a person leaves the first image and enters the detection area of the second image, the ID of the person is counted in an ID group to which the ID belongs in the first image;

5) when a person leaves the detection area of the second graph, completing the generation of a group of IDs;

6) when the same person returns to the detection area again, the ID can not change, and then the person belongs to the previous ID group, and finally one group of IDs represents one person;

7) there were different persons, repeat the above 5 steps.

The key parts in the specific implementation are as follows: the logic of ID grouping, and the merging of duplicate IDs. In the implementation process, all IDs of a person from entering to leaving a detection area are divided into a group; when the same person repeatedly detects areas back and forth, different ID groups are generated, but a short jump may occur when the person enters the detection area, and then the ID groups are changed back, at this time, a new ID group is generated because the IDs of the entering areas are different, and the existence of the ID group is unnecessary, so that redundant ID groups are filtered by combining the existing ID groups, and the judgment condition that any one of the ID groups is the same ID is combined, because the same ID is the same person as long as the same ID exists. For example, in fig. 6, a person starts from below the 2P diagram, passes through the center diagram and arrives at the 2P diagram, forming an ID group [ 1, 2, 3 ]; then the image passes through the detection area on the 2P to leave, enters the detection area under the 2P image, and generates a new ID group [ 3, 4 ]. In fact this belongs to the same person, so the two should be merged into a new ID group [ 1, 2, 3, 4 ], eventually selecting 1 as the person's ID.

Example two

As described below, embodiments of the present invention provide a monitoring device based on deep learning.

The monitoring device based on deep learning comprises:

a processor adapted to load and execute instructions of a software program;

capturing an image through a camera lens;

respectively determining detection areas in the 2 images;

the number of people who appear in a period of time is counted.

Those skilled in the art will understand that, in the methods of the embodiments, all or part of the steps can be performed by hardware associated with program instructions, and the program can be stored in a computer-readable storage medium, which can include: ROM, RAM, magnetic or optical disks, and the like.

Although the present invention is disclosed above, the present invention is not limited thereto. Various changes and modifications may be effected therein by one skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A monitoring method based on deep learning is characterized by comprising the following steps:

capturing an image through a camera lens;

respectively determining detection areas in the 2 images;

the number of people who appear in a period of time is counted.

2. The monitoring method based on deep learning of claim 1, wherein the determining the detection areas in the 2 images respectively comprises:

determining the boundary area of the 2 images;

and taking the boundary area as a detection area in the 2 images.

3. The monitoring method based on deep learning of claim 2, wherein the determining the boundary area of the 2 images comprises: and responding to the operation of an operator on the interactive interface to select the boundary areas from the 2 images respectively.

4. The deep learning based monitoring method of claim 1, further comprising: when it is detected that a person enters the detection area in the above 2 images, ID groups are assigned to the persons entering the detection area in each of the images, respectively.

5. The monitoring method based on deep learning of claim 1, wherein the selecting one ID in the ID group as the person ID of the person corresponding to the ID group comprises: the smallest ID in the ID group is selected as the person ID of the person corresponding to the ID group.

6. The monitoring method based on deep learning of claim 1, wherein the detecting whether there is a person entering/leaving the detection area in the 2 images and processing comprises: when detecting that a person enters a detection area in the previous image, recording the ID of the person, and judging whether the ID belongs to any existing ID group; if yes, the ID group is added; if not, a new ID group is created, and the ID belongs to the new ID group.

7. The monitoring method based on deep learning of claim 1, wherein the detecting whether there is a person entering/leaving the detection area in the 2 images and processing comprises: when it is detected that a person leaves the detection area in the previous image and enters the detection area in the next image, the person ID in the next image is added to the ID group to which the person ID in the previous image belongs.

8. The monitoring method based on deep learning of claim 1, wherein the detecting whether there is a person entering/leaving the detection area in the 2 images and processing comprises: and when the fact that the person leaves the detection area in the next image is detected, the generation of the ID group corresponding to the person is completed.

9. The deep learning based monitoring method of claim 1, further comprising: judging whether different ID groups contain any same ID; if yes, the ID groups containing any same ID are combined.

10. The monitoring method based on deep learning of claim 9, wherein a smallest ID is selected from the combined ID group as the person ID of the person corresponding to the combined ID group.

11. The deep learning-based monitoring method of claim 1, wherein the camera lens is a fisheye camera lens.

12. A monitoring device based on deep learning, comprising:

a processor adapted to load and execute instructions of a software program;

capturing an image through a camera lens;

respectively determining detection areas in the 2 images;

the number of people who appear in a period of time is counted.